HTTP Streaming: Benefits and Drawbacks

TL;DR

Use HTTP streaming when your page has multiple independent data sources with varying response times, and you want progressive rendering for better perceived performance.

Keep SEO-critical content (titles, descriptions, main copy, structured data) in the shell; stream supplemental content (reviews, recommendations).

Prefer data-layer caching (per-source TTLs and cache tags) over response-level caching for streamed pages.

Avoid streaming for fast pages, single data-source pages, static/ISR-friendly content, or where strict status codes/redirects are essential.

Introduction

Have you ever wondered how modern web applications manage to load content progressively, providing a smoother user experience? Or how chatbots and live feeds update in real-time without refreshing the entire page? The answer often lies in a technique called HTTP streaming.

HTTP streaming has become increasingly popular in modern web development, especially with frameworks like Next.js embracing it as a core feature. But what exactly is HTTP streaming, and when should you use it? In this post, we'll explore the benefits and drawbacks of this approach to help you make informed decisions for your projects.

What is HTTP Streaming?

Traditional HTTP responses work by waiting for the entire response to be ready before sending it to the client. HTTP streaming, on the other hand, allows the server to send chunks of data as they become available, rather than waiting for the complete response.

In the context of React and Next.js, this means you can start sending HTML to the browser while your server is still fetching data or rendering components. The browser can begin parsing and displaying content immediately, rather than waiting on the full response while staring at a blank page.

// Next.js example with Suspense for streaming
import { Suspense } from 'react'

export default function Page() {
  return (
    <div>
      <h1>Welcome to my Great Blog Post</h1>
      <main>
        <p>Whatever great content</p>
      </main>
      <Suspense fallback={<p>Loading comments...</p>}>
        <Comments />
      </Suspense>
    </div>
  )
}

async function Comments() {
  const comments = await fetchComments() // This might take a while
  return (
    <ul>
      {comments.map((comment) => (
        <li key={comment.id}>{comment.text}</li>
      ))}
    </ul>
  )
}

In this example, the heading and loading fallback are sent to the browser immediately, while the comments are streamed in once the data fetch completes.

How it Works Under the Hood

To understand HTTP streaming, we need to look at both the HTTP protocol level and how React leverages it.

HTTP/1.1: Chunked Transfer Encoding

Traditional HTTP responses include a Content-Length header that tells the browser exactly how many bytes to expect. The browser waits until all bytes arrive before processing the response. In HTTP/1.1, streaming uses a different approach called chunked transfer encoding.

With chunked transfer encoding, the server sends the Transfer-Encoding: chunked header instead of Content-Length. This tells the browser: "I'm going to send you data in pieces, and I'll let you know when I'm done."

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: text/html

1a
<html><head></head><body>
2f
<h1>Welcome</h1><div id="loading">Loading...</div>
... more chunks arrive later ...
4b
<script>replaceContent('loading', '<ul><li>Comment 1</li></ul>')</script>
0

Each chunk is prefixed with its size in hexadecimal, and a chunk of size 0 signals the end of the response. The browser can start parsing and rendering HTML as soon as the first chunk arrives.

HTTP/2 and HTTP/3: Native Streaming

HTTP/2 disallows the Transfer-Encoding header entirely. Instead, streaming is built directly into the protocol through its binary framing layer. All communication in HTTP/2 is split into binary frames, and DATA frames can be sent incrementally without declaring the total size upfront. When the server is done sending data, it sets an END_STREAM flag on the final frame. No special headers are needed - streaming is simply how the protocol works.

This approach is actually more efficient than HTTP/1.1 chunked encoding. There's no per-chunk overhead like size prefixes or CRLF terminators, and the binary format is more compact than text-based HTTP/1.1. HTTP/2 also supports multiplexing, which means multiple streams can share a single TCP connection, so streaming one response doesn't block others.

HTTP/3 works similarly but uses QUIC instead of TCP, providing even better performance for streaming scenarios, especially on unstable connections where packet loss would otherwise stall the entire TCP connection.

React's Streaming SSR

React's renderToPipeableStream API (used internally by Next.js) takes advantage of these streaming capabilities to implement streaming server-side rendering. Here's a simplified view of what happens:


// Extremely simplified example of what Next.js does internally

import { renderToPipeableStream } from 'react-dom/server'


app.get('/', (req, res) => {
  const { pipe } = renderToPipeableStream(<App />, {
    onShellReady() {
      // The shell (everything outside Suspense boundaries) is ready
      res.setHeader('Content-Type', 'text/html')
      res.setHeader('Transfer-Encoding', 'chunked')
      pipe(res) // Start streaming to the response
    },
  })
})

The Streaming Process Step by Step

Let's walk through what happens when a user requests a page with streaming enabled:

Step 1: Initial Request The browser sends a request to your server. The server begins rendering your React component tree.

Step 2: Shell Rendering The “shell” is the initial HTML that renders without waiting for async data (i.e., outside Suspense boundaries). It should include critical UI and SEO content; streamed sections fill in progressively later.

React renders the "shell" of your application — the initial HTML that renders without waiting for async data (i.e., outside Suspense boundaries). It should include critical UI and SEO content; streamed sections fill in progressively later. When React encounters a <Suspense> boundary with a pending promise, it renders the fallback instead.

<!-- First chunk sent to browser -->
<!DOCTYPE html>
<html>
<head>...</head>
<body>
  <h1>Welcome to my Great Blog Post</h1>
  <main><p>Whatever great content</p></main>
  <!--$?--><template id="B:0"></template><p>Loading comments...</p><!--/$-->
</body>

Notice the special comment markers () and the <template> tag. These are placeholders that React uses to know where to inject content later.

Step 3: Async Data Resolution Meanwhile, your Comments component is fetching data. The server keeps the connection open and continues working.

Step 4: Streaming the Resolved Content Once the data fetch completes, React renders the actual Comments component and streams it to the browser as a new chunk:

<!-- Later chunk with resolved content -->
<div hidden id="S:0">
  <ul>
    <li>Great post!</li>
    <li>Thanks for sharing</li>
  </ul>
</div>
<script>
  // React's internal function to swap content
  $RC("B:0", "S:0")
</script>

This chunk contains:

The actual rendered content, hidden initially
A small inline script that tells React to swap the fallback with the real content

Step 5: Client-Side Hydration The $RC function (React's internal "replace content" function) finds the template placeholder B:0, removes the fallback, and inserts the content from S:0. This happens instantly without any network requests - the content is already in the DOM.

Why This Is Different from Client-Side Fetching

You might wonder: "How is this different from just showing a loading spinner and fetching data on the client?" Key differences:

Single round trip: The initial HTML (shell) is sent immediately; data fetching happens on the server. Client-side fetching requires an additional round trip after JavaScript loads.
Real HTML upfront: If JavaScript fails to load, users still see content (though it won’t be interactive). Client-side fetching may show nothing until JS executes.
Server proximity and caching: Server-side data fetching can be faster (closer to databases/internal services) and benefits more from server-side caching.

Traditional SSR:         [====Server Render====]---->[Browser Render]
                         (waits for all data)

Client-Side Fetching:    [Server]-->[Browser Render]-->[Fetch]-->[Re-render]
                                    (shows spinner)

Streaming SSR:           [=Shell=]-->[Browser Render]
                         [===Data Fetch===]-->[Stream]-->[Swap]
                         (parallel, progressive)

Benefits of HTTP Streaming

After investigating how HTTP streaming works, you will probably already see some of its advantages. Here are the key benefits:

1. Improved Time to First Byte (TTFB)

With streaming, the browser receives the first bytes of your response way faster. This is particularly noticeable on pages with slow data sources.

// Without streaming: User waits 3 seconds for anything to appear
// With streaming: User sees the shell immediately, content streams in after 3 seconds

export default function ProductPage() {
  return (
    <main>
      <Header /> {/* Sent immediately */}
      <Suspense fallback={<ProductSkeleton />}>
        <ProductDetails /> {/* Streams in when ready */}
      </Suspense>
      <Footer /> {/* Sent immediately */}
    </main>
  )
}

This is especially beneficial for users that have slow network connections or when your backend services are slow to respond, or the data generation is compute-intensive and takes longer, e.g. a response from an LLM. Imagine having to wait for a large language model to generate a response and only then display it - with streaming, users can see the response as it is generated and sent down to the client.

2. Better Perceived Performance

Even if the total load time remains the same, users perceive the page as faster because they see content appearing progressively. A page that shows something immediately feels more responsive than one that shows nothing for several seconds. On top of that, users can start interacting with parts of the page that have loaded while waiting for other sections. In the example from above, users can already read and interact with the blog post, while the comments are streamed in later.

3. Parallel Data Fetching

Streaming naturally encourages parallel data fetching. Different parts of your page can fetch their own data independently, and each section streams in as soon as it's ready.

export default function Dashboard() {
  return (
    <div className="grid">
      <Suspense fallback={<ChartSkeleton />}>
        <SalesChart /> {/* Fetches sales data */}
      </Suspense>
      <Suspense fallback={<ListSkeleton />}>
        <RecentOrders /> {/* Fetches orders data */}
      </Suspense>
      <Suspense fallback={<StatsSkeleton />}>
        <UserStats /> {/* Fetches user stats */}
      </Suspense>
    </div>
  )
}

All three components fetch data in parallel, and each one appears as soon as its data is ready.

4. Better Core Web Vitals

Streaming can significantly improve your Core Web Vitals scores, which directly impacts your search engine rankings and user experience metrics.

LCP (Largest Contentful Paint): Content appears faster since it doesn't wait for slow data. If your largest contentful element (like a hero image or main heading) is part of the shell, it renders immediately rather than waiting for all data fetches to complete.
FCP (First Contentful Paint): The initial shell renders immediately, giving users visual feedback that the page is loading. This is crucial for user perception - studies show users start to feel a page is slow after just 1 second of waiting.
INP (Interaction to Next Paint): Because the shell is sent immediately and hydration can begin sooner, the page becomes interactive faster. Users can click buttons, fill forms, and navigate without waiting for slower components to load.
TTFB (Time to First Byte): Since the server starts sending the response immediately rather than waiting for all data, TTFB is dramatically reduced. This is especially impactful for users on high-latency connections.

// Example: A page with a 3-second database query
// Without streaming: TTFB = 3+ seconds, FCP = 3+ seconds
// With streaming: TTFB = ~50ms, FCP = ~200ms, full content = 3 seconds

export default function Page() {
  return (
    <>
      <Header />  {/* FCP happens here */}
      <Suspense fallback={<Skeleton />}>
        <SlowDatabaseContent />  {/* Streams in after 3 seconds */}
      </Suspense>
    </>
  )
}

5. Graceful Degradation

With proper Suspense boundaries and fallbacks, your page remains usable even when some data sources are slow or fail. This creates a more resilient user experience.

Consider a dashboard that pulls data from multiple microservices. Without streaming, if one service is slow or down, the entire page fails to load. With streaming, the rest of the dashboard loads normally while the problematic section shows a fallback:

export default function Dashboard() {
  return (
    <div>
      <Suspense fallback={<StatsSkeleton />}>
        <StatsFromServiceA />  {/* Loads in 100ms */}
      </Suspense>
      
      <Suspense fallback={<ChartSkeleton />}>
        <ChartFromServiceB />  {/* Loads in 500ms */}
      </Suspense>
      
      <ErrorBoundary fallback={<p>Orders temporarily unavailable</p>}>
        <Suspense fallback={<OrdersSkeleton />}>
          <OrdersFromServiceC />  {/* Service is down - shows error */}
        </Suspense>
      </ErrorBoundary>
    </div>
  )
}

Users can still view stats and charts while the orders section gracefully shows an error message. The page remains functional even when parts of your infrastructure are struggling.

6. Reduced Server Memory Pressure

Traditional SSR requires the server to build the entire HTML document in memory before sending it. For large pages with lots of data, this can consume significant memory, especially under high traffic.

With streaming, the server can flush chunks to the client as they're ready, freeing up memory incrementally. This can lead to better server resource utilization and the ability to handle more concurrent requests:

// Traditional SSR: Server holds entire response in memory
// ~500KB HTML built up before sending

// Streaming SSR: Server sends chunks as they're ready
// Memory freed as each chunk is flushed to the network

This is particularly beneficial for pages that render large lists or data tables.

Drawbacks of HTTP Streaming

As with any technology, HTTP streaming comes with its own set of challenges and trade-offs. Here are some drawbacks to consider:

1. Complexity in Error Handling

When an error occurs in a streamed component, part of your page has already been sent to the client. You can't simply return an error page or redirect - the response has already started with a 200 status code.

Every Suspense Boundary Needs Error Handling

You need to wrap each streamed section with its own Error Boundary. Without this, an error in one component can break the entire page:


// Add an Error Boundary for each streamed section
export default function Page() {

  return (
    <div>
      <ErrorBoundary fallback={<p>Failed to load comments</p>}>
        <Suspense fallback={<p>Loading...</p>}>
          <Comments />
        </Suspense>
      </ErrorBoundary>
    </div>
  )
}

Error Recovery is Limited

With traditional SSR, if something fails, you can return a proper error page. With streaming, your options are limited:

// Traditional SSR: Full control over error response
export async function getServerSideProps() {
  try {
    const data = await fetchData()
    return { props: { data } }
  } catch (error) {
    return { redirect: { destination: '/error', permanent: false } }
  }
}

// Streaming: Error happens after response started
// Can only show inline error UI, not redirect or change status code

Nested Error Boundaries Add Complexity

For complex pages, you may end up with many nested Error Boundaries, each requiring its own fallback UI:

export default function Dashboard() {
  return (
    <div>
      <ErrorBoundary fallback={<StatsError />}>
        <Suspense fallback={<StatsSkeleton />}>
          <Stats />
        </Suspense>
      </ErrorBoundary>
      
      <ErrorBoundary fallback={<ChartError />}>
        <Suspense fallback={<ChartSkeleton />}>
          <Chart />
        </Suspense>
      </ErrorBoundary>
      
      <ErrorBoundary fallback={<TableError />}>
        <Suspense fallback={<TableSkeleton />}>
          <DataTable />
        </Suspense>
      </ErrorBoundary>
    </div>
  )
}

This means designing and maintaining multiple error states, which adds to your UI/UX workload.

2. SEO Considerations

While modern search engines like Google can handle JavaScript and streaming content, there are important SEO implications to consider when using HTTP streaming.

Crawler Behavior Varies

Not all search engine crawlers handle streaming the same way. Googlebot uses a headless Chrome browser and can execute JavaScript and wait for streamed content, but even Google has limits on how long it will wait. Other search engines like Bing, and social media crawlers (Facebook, Twitter/X, LinkedIn) may not wait for streamed content at all.

// The crawler might only see this:
<h1>Product Title</h1>
<p>Loading reviews...</p>  {/* Fallback, not actual content */}

// Instead of:
<h1>Product Title</h1>
<ul>
  <li>Great product! 5 stars</li>
  <li>Would buy again</li>
</ul>

This means your fallback content might be what gets indexed, not your actual content.

Critical Content Should Be in the Shell

Rule of thumb: Keep SEO-critical elements (page title, primary description/copy, structured data) in the shell; stream non-critical or supplemental content (reviews, recommendations, related items).

For SEO-critical content, you should ensure it's part of the initial shell rather than streamed in later. Think carefully about what content matters for search rankings:

export default function ProductPage() {
  return (
    <main>
      {/* SEO-critical: rendered immediately in the shell */}
      <h1>{product.name}</h1>
      <p>{product.description}</p>
      <script type="application/ld+json">{structuredData}</script>
      
      {/* Not SEO-critical: can be streamed */}
      <Suspense fallback={<ReviewsSkeleton />}>
        <CustomerReviews />
      </Suspense>
    </main>
  )
}

HTTP Status Code Limitations

One significant challenge is that HTTP status codes are sent with the first chunk of the response. Once you start streaming, you cannot change the status code. This creates problems for:

404 pages: If your data fetch fails or returns no results, you've already sent a 200 status code with the shell. You can't retroactively change it to 404.
Redirects: You cannot redirect after streaming has begun since redirect headers must be sent before any body content.
Error pages: Server errors that occur during streaming can't return a proper 500 status code.

// This is problematic with streaming:
async function ProductPage({ params }) {
  // Shell already sent with 200 status...
  const product = await getProduct(params.id)
  
  if (!product) {
    // Too late! Can't send 404 status anymore
    notFound() // This won't work as expected with streaming
  }
  
  return <ProductDetails product={product} />
}

To handle this, you may need to fetch critical data before the shell renders, or avoid streaming for pages where correct status codes are essential.

3. Caching Challenges

Traditional CDN and HTTP caching works by storing complete responses. When a request comes in, the cache checks if it has a stored response for that URL, and if so, returns it immediately. This works great when you have a full HTML document with a known Content-Length.

With streaming, the response is incomplete by design - the server starts sending chunks before it knows what the final content will be. This creates several problems:

CDNs Need the Full Response to Cache

Most CDNs (like Cloudflare, Fastly, or CloudFront) buffer the entire response before they can cache it. They need to know the full content to generate a cache key, store the complete response for future requests, and verify the response completed successfully.

With streaming, the CDN has to wait until the stream finishes before it can cache anything. This means the first user always gets the slow, streamed experience, only subsequent users benefit from the cache, and if your stream takes 5 seconds to complete, the CDN waits those 5 seconds before caching.

Cache Invalidation Becomes Complex

With a traditional response, you cache one thing and invalidate one thing. With streaming, different parts of your page might have different freshness requirements. Your header/footer might be valid for days, your product details might be valid for hours, and your comments might need to be fresh every minute.

But the CDN sees it as one response - you can't tell it "cache the shell for 24 hours but refresh the comments part every minute."

Personalized Content Breaks Caching

Streaming is often used for pages with personalized content (e.g., user-specific recommendations). But caches work on URL keys - if two users request /dashboard, the cache doesn't know to serve different content.

You end up having to either disable caching entirely for streamed pages, use complex cache key strategies (like including user segments in the key), or only cache the non-personalized shell.

The Solution: Data-Level Caching

Instead of caching the HTTP response, you cache at the data layer:

async function getProduct(id: string) {
  const cached = await cache.get(`product:${id}`)
  if (cached) return cached

  const product = await db.product.findUnique({ where: { id } })
  await cache.set(`product:${id}`, product, { ttl: 3600 })
  return product
}

This way each data source can have its own TTL, personalized and shared data can be cached separately, and the streaming still works, but data fetches are fast because they hit the cache.

Next.js also has built-in support for this with unstable_cache or the fetch API's caching options:

// Next.js automatically caches this fetch for 1 hour
const data = await fetch('https://api.example.com/product', {
  next: { revalidate: 3600 }
})

Another powerful strategy is using cache tags, which allow you to invalidate specific cached data on-demand rather than relying solely on time-based expiration:

// Tag your cached data
const product = await fetch(`https://api.example.com/product/${id}`, {
  next: { tags: [`product-${id}`, 'products'] }
})

// Later, invalidate by tag (e.g., in a webhook or server action)
import { revalidateTag } from 'next/cache'

export async function updateProduct(id: string) {
  await db.product.update({ ... })
  revalidateTag(`product-${id}`) // Invalidate just this product
  // Or revalidateTag('products') to invalidate all products
}

This approach gives you fine-grained control over cache invalidation. When your CMS updates a product, you can instantly invalidate just that product's cache without affecting other cached data. This works particularly well with streaming, as each streamed component can have its own cache tags.

4. Layout Shifts

If your fallback UI doesn't match the dimensions of the actual content, you can cause Cumulative Layout Shift (CLS) issues. This hurts your Core Web Vitals scores and creates a jarring user experience as content jumps around the page.

// Bad: Different sizes cause layout shift
<Suspense fallback={<p>Loading...</p>}>
  <LargeDataTable />  {/* When this loads, everything below shifts down */}
</Suspense>

// Good: Skeleton matches final dimensions
<Suspense fallback={<TableSkeleton rows={10} columns={5} />}>
  <LargeDataTable />
</Suspense>

Designing good skeleton states requires extra effort:

Know your content dimensions: You need to anticipate the size of the final content, which isn't always possible for dynamic data.
Maintain skeleton components: Every component that can be streamed needs a corresponding skeleton that matches its dimensions.
Handle variable content: Lists with unknown lengths are particularly tricky - do you show 5 skeleton items or 10?

// Tricky: How many skeleton items should we show?
<Suspense fallback={<CommentsSkeleton count={???} />}>
  <Comments postId={id} />  {/* Could be 0 comments or 100 */}
</Suspense>

// One solution: Use a container with fixed/min height
<div className="min-h-[400px]">
  <Suspense fallback={<CommentsSkeleton />}>
    <Comments postId={id} />
  </Suspense>
</div>

5. Debugging Difficulties

Debugging streaming applications can be significantly more challenging than traditional request/response debugging.

Network Waterfall Complexity

The Chrome DevTools network tab shows streaming responses as a single long-running request. It's hard to see when individual chunks arrived and what triggered them:

You can't easily see the timing of each streamed chunk
The response preview shows the final HTML, not the progressive chunks
It's unclear which Suspense boundary resolved when

Server-Side Logging Challenges

Traditional request logging shows a request come in and a response go out. With streaming, the response is ongoing for potentially several seconds:

// Traditional: Clean request/response logs
// [09:00:00] GET /page - 200 - 150ms

// Streaming: Response time is ambiguous
// [09:00:00] GET /page - 200 - ???ms (shell sent at 50ms, final chunk at 3000ms)

Harder to Reproduce Issues

Issues that only occur during specific streaming sequences can be hard to reproduce. A bug might only appear when Component A resolves before Component B, which depends on network timing.

Recommended Debugging Strategies

Add detailed server-side logging for each Suspense boundary resolution
Use React DevTools Profiler to understand component render timing
Consider adding custom performance marks for each streamed section
Test with artificial delays to simulate various network conditions

6. Not Suitable for All Use Cases

Streaming adds overhead and complexity that isn't justified for every page. Before implementing streaming, consider whether your use case actually benefits from it.

When Streaming Adds Unnecessary Complexity

Fast pages: If your page already renders in under 200ms, streaming won't provide noticeable benefits but adds code complexity.
Single data source: If all your data comes from one fast query, there's nothing to parallelize.
Small pages: Simple pages with minimal data don't benefit from progressive loading.
Static content: For pages that can be statically generated at build time, ISR (Incremental Static Regeneration) is often a better choice.

// Unnecessary streaming - this page is already fast
export default function AboutPage() {
  return (
    <main>
      <h1>About Us</h1>
      <p>We are a company that does things.</p>
      {/* No slow data fetches, no need for Suspense */}
    </main>
  )
}

The Complexity Cost

Every Suspense boundary you add requires:

A fallback component to design and maintain
Error boundary consideration
Testing for both loading and loaded states
Documentation for other developers

For teams new to streaming, there's also a learning curve. Make sure the performance benefits justify these costs for your specific use case.

7. Connection and Timeout Considerations

Streaming keeps the HTTP connection open until all content is sent. This has implications you should be aware of:

Long-Running Connections

If a data fetch takes 30 seconds, the connection stays open for 30 seconds. This can:

Hit proxy or load balancer timeouts (many default to 30-60 seconds)
Consume server connection pool resources
Cause issues with serverless platforms that bill by execution time

// This could keep a connection open for a very long time
<Suspense fallback={<Loading />}>
  <VerySlowComponent />  {/* 45-second database query */}
</Suspense>

Mobile and Unstable Connections

Users on mobile networks may experience connection drops. If the connection breaks mid-stream:

The user sees a partially loaded page
There's no automatic retry mechanism for the remaining content
The user must refresh to try again

Consider implementing client-side fallbacks for critical streamed content that can fetch data if the stream fails.

When to Use HTTP Streaming

Streaming is particularly beneficial when:

Your page has multiple independent data sources with varying response times
You have a clear visual hierarchy where some content is more important than others
You want to improve perceived performance on data-heavy pages
Your users are on slower connections where progressive loading matters

Streaming might not be the best choice when:

Your page renders quickly without streaming
All your data comes from a single, fast source
You need complete control over caching at the response level
Your content is critical for SEO and you're unsure about crawler support

Conclusion

HTTP streaming is a powerful technique that can significantly improve the user experience of your web applications. However, like any technology, it comes with trade-offs. The key is understanding when streaming adds value and implementing it thoughtfully with proper error boundaries, well-designed loading states, and appropriate fallback content.

Next.js and React make it relatively easy to adopt streaming with Suspense boundaries, but the real skill lies in knowing where to place those boundaries and how to handle the complexities that come with progressive rendering.

Introduction

What is HTTP Streaming?

// Next.js example with Suspense for streaming
import { Suspense } from 'react'

export default function Page() {
  return (
    <div>
      <h1>Welcome to my Great Blog Post</h1>
      <main>
        <p>Whatever great content</p>
      </main>
      <Suspense fallback={<p>Loading comments...</p>}>
        <Comments />
      </Suspense>
    </div>
  )
}

async function Comments() {
  const comments = await fetchComments() // This might take a while
  return (
    <ul>
      {comments.map((comment) => (
        <li key={comment.id}>{comment.text}</li>
      ))}
    </ul>
  )
}

In this example, the heading and loading fallback are sent to the browser immediately, while the comments are streamed in once the data fetch completes.

How it Works Under the Hood

To understand HTTP streaming, we need to look at both the HTTP protocol level and how React leverages it.

HTTP/1.1: Chunked Transfer Encoding

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: text/html

1a
<html><head></head><body>
2f
<h1>Welcome</h1><div id="loading">Loading...</div>
... more chunks arrive later ...
4b
<script>replaceContent('loading', '<ul><li>Comment 1</li></ul>')</script>
0

Each chunk is prefixed with its size in hexadecimal, and a chunk of size 0 signals the end of the response. The browser can start parsing and rendering HTML as soon as the first chunk arrives.

HTTP/2 and HTTP/3: Native Streaming

React's Streaming SSR


// Extremely simplified example of what Next.js does internally

import { renderToPipeableStream } from 'react-dom/server'


app.get('/', (req, res) => {
  const { pipe } = renderToPipeableStream(<App />, {
    onShellReady() {
      // The shell (everything outside Suspense boundaries) is ready
      res.setHeader('Content-Type', 'text/html')
      res.setHeader('Transfer-Encoding', 'chunked')
      pipe(res) // Start streaming to the response
    },
  })
})

The Streaming Process Step by Step

Let's walk through what happens when a user requests a page with streaming enabled:

Step 1: Initial Request The browser sends a request to your server. The server begins rendering your React component tree.

<!-- First chunk sent to browser -->
<!DOCTYPE html>
<html>
<head>...</head>
<body>
  <h1>Welcome to my Great Blog Post</h1>
  <main><p>Whatever great content</p></main>
  <!--$?--><template id="B:0"></template><p>Loading comments...</p><!--/$-->
</body>

Notice the special comment markers () and the <template> tag. These are placeholders that React uses to know where to inject content later.

Step 3: Async Data Resolution Meanwhile, your Comments component is fetching data. The server keeps the connection open and continues working.

Step 4: Streaming the Resolved Content Once the data fetch completes, React renders the actual Comments component and streams it to the browser as a new chunk:

<!-- Later chunk with resolved content -->
<div hidden id="S:0">
  <ul>
    <li>Great post!</li>
    <li>Thanks for sharing</li>
  </ul>
</div>
<script>
  // React's internal function to swap content
  $RC("B:0", "S:0")
</script>

This chunk contains:

The actual rendered content, hidden initially
A small inline script that tells React to swap the fallback with the real content

Why This Is Different from Client-Side Fetching

You might wonder: "How is this different from just showing a loading spinner and fetching data on the client?" Key differences:

Single round trip: The initial HTML (shell) is sent immediately; data fetching happens on the server. Client-side fetching requires an additional round trip after JavaScript loads.
Real HTML upfront: If JavaScript fails to load, users still see content (though it won’t be interactive). Client-side fetching may show nothing until JS executes.
Server proximity and caching: Server-side data fetching can be faster (closer to databases/internal services) and benefits more from server-side caching.

Traditional SSR:         [====Server Render====]---->[Browser Render]
                         (waits for all data)

Client-Side Fetching:    [Server]-->[Browser Render]-->[Fetch]-->[Re-render]
                                    (shows spinner)

Streaming SSR:           [=Shell=]-->[Browser Render]
                         [===Data Fetch===]-->[Stream]-->[Swap]
                         (parallel, progressive)

Benefits of HTTP Streaming

After investigating how HTTP streaming works, you will probably already see some of its advantages. Here are the key benefits:

1. Improved Time to First Byte (TTFB)

With streaming, the browser receives the first bytes of your response way faster. This is particularly noticeable on pages with slow data sources.

// Without streaming: User waits 3 seconds for anything to appear
// With streaming: User sees the shell immediately, content streams in after 3 seconds

export default function ProductPage() {
  return (
    <main>
      <Header /> {/* Sent immediately */}
      <Suspense fallback={<ProductSkeleton />}>
        <ProductDetails /> {/* Streams in when ready */}
      </Suspense>
      <Footer /> {/* Sent immediately */}
    </main>
  )
}

2. Better Perceived Performance

3. Parallel Data Fetching

Streaming naturally encourages parallel data fetching. Different parts of your page can fetch their own data independently, and each section streams in as soon as it's ready.

export default function Dashboard() {
  return (
    <div className="grid">
      <Suspense fallback={<ChartSkeleton />}>
        <SalesChart /> {/* Fetches sales data */}
      </Suspense>
      <Suspense fallback={<ListSkeleton />}>
        <RecentOrders /> {/* Fetches orders data */}
      </Suspense>
      <Suspense fallback={<StatsSkeleton />}>
        <UserStats /> {/* Fetches user stats */}
      </Suspense>
    </div>
  )
}

All three components fetch data in parallel, and each one appears as soon as its data is ready.

4. Better Core Web Vitals

Streaming can significantly improve your Core Web Vitals scores, which directly impacts your search engine rankings and user experience metrics.

LCP (Largest Contentful Paint): Content appears faster since it doesn't wait for slow data. If your largest contentful element (like a hero image or main heading) is part of the shell, it renders immediately rather than waiting for all data fetches to complete.
FCP (First Contentful Paint): The initial shell renders immediately, giving users visual feedback that the page is loading. This is crucial for user perception - studies show users start to feel a page is slow after just 1 second of waiting.
INP (Interaction to Next Paint): Because the shell is sent immediately and hydration can begin sooner, the page becomes interactive faster. Users can click buttons, fill forms, and navigate without waiting for slower components to load.
TTFB (Time to First Byte): Since the server starts sending the response immediately rather than waiting for all data, TTFB is dramatically reduced. This is especially impactful for users on high-latency connections.

// Example: A page with a 3-second database query
// Without streaming: TTFB = 3+ seconds, FCP = 3+ seconds
// With streaming: TTFB = ~50ms, FCP = ~200ms, full content = 3 seconds

export default function Page() {
  return (
    <>
      <Header />  {/* FCP happens here */}
      <Suspense fallback={<Skeleton />}>
        <SlowDatabaseContent />  {/* Streams in after 3 seconds */}
      </Suspense>
    </>
  )
}

5. Graceful Degradation

With proper Suspense boundaries and fallbacks, your page remains usable even when some data sources are slow or fail. This creates a more resilient user experience.

export default function Dashboard() {
  return (
    <div>
      <Suspense fallback={<StatsSkeleton />}>
        <StatsFromServiceA />  {/* Loads in 100ms */}
      </Suspense>
      
      <Suspense fallback={<ChartSkeleton />}>
        <ChartFromServiceB />  {/* Loads in 500ms */}
      </Suspense>
      
      <ErrorBoundary fallback={<p>Orders temporarily unavailable</p>}>
        <Suspense fallback={<OrdersSkeleton />}>
          <OrdersFromServiceC />  {/* Service is down - shows error */}
        </Suspense>
      </ErrorBoundary>
    </div>
  )
}

Users can still view stats and charts while the orders section gracefully shows an error message. The page remains functional even when parts of your infrastructure are struggling.

6. Reduced Server Memory Pressure

// Traditional SSR: Server holds entire response in memory
// ~500KB HTML built up before sending

// Streaming SSR: Server sends chunks as they're ready
// Memory freed as each chunk is flushed to the network

This is particularly beneficial for pages that render large lists or data tables.

Drawbacks of HTTP Streaming

As with any technology, HTTP streaming comes with its own set of challenges and trade-offs. Here are some drawbacks to consider:

1. Complexity in Error Handling

Every Suspense Boundary Needs Error Handling

You need to wrap each streamed section with its own Error Boundary. Without this, an error in one component can break the entire page:


// Add an Error Boundary for each streamed section
export default function Page() {

  return (
    <div>
      <ErrorBoundary fallback={<p>Failed to load comments</p>}>
        <Suspense fallback={<p>Loading...</p>}>
          <Comments />
        </Suspense>
      </ErrorBoundary>
    </div>
  )
}

Error Recovery is Limited

With traditional SSR, if something fails, you can return a proper error page. With streaming, your options are limited:

// Traditional SSR: Full control over error response
export async function getServerSideProps() {
  try {
    const data = await fetchData()
    return { props: { data } }
  } catch (error) {
    return { redirect: { destination: '/error', permanent: false } }
  }
}

// Streaming: Error happens after response started
// Can only show inline error UI, not redirect or change status code

Nested Error Boundaries Add Complexity

For complex pages, you may end up with many nested Error Boundaries, each requiring its own fallback UI:

export default function Dashboard() {
  return (
    <div>
      <ErrorBoundary fallback={<StatsError />}>
        <Suspense fallback={<StatsSkeleton />}>
          <Stats />
        </Suspense>
      </ErrorBoundary>
      
      <ErrorBoundary fallback={<ChartError />}>
        <Suspense fallback={<ChartSkeleton />}>
          <Chart />
        </Suspense>
      </ErrorBoundary>
      
      <ErrorBoundary fallback={<TableError />}>
        <Suspense fallback={<TableSkeleton />}>
          <DataTable />
        </Suspense>
      </ErrorBoundary>
    </div>
  )
}

This means designing and maintaining multiple error states, which adds to your UI/UX workload.

2. SEO Considerations

While modern search engines like Google can handle JavaScript and streaming content, there are important SEO implications to consider when using HTTP streaming.

Crawler Behavior Varies

// The crawler might only see this:
<h1>Product Title</h1>
<p>Loading reviews...</p>  {/* Fallback, not actual content */}

// Instead of:
<h1>Product Title</h1>
<ul>
  <li>Great product! 5 stars</li>
  <li>Would buy again</li>
</ul>

This means your fallback content might be what gets indexed, not your actual content.

Critical Content Should Be in the Shell

For SEO-critical content, you should ensure it's part of the initial shell rather than streamed in later. Think carefully about what content matters for search rankings:

export default function ProductPage() {
  return (
    <main>
      {/* SEO-critical: rendered immediately in the shell */}
      <h1>{product.name}</h1>
      <p>{product.description}</p>
      <script type="application/ld+json">{structuredData}</script>
      
      {/* Not SEO-critical: can be streamed */}
      <Suspense fallback={<ReviewsSkeleton />}>
        <CustomerReviews />
      </Suspense>
    </main>
  )
}

HTTP Status Code Limitations

One significant challenge is that HTTP status codes are sent with the first chunk of the response. Once you start streaming, you cannot change the status code. This creates problems for:

404 pages: If your data fetch fails or returns no results, you've already sent a 200 status code with the shell. You can't retroactively change it to 404.
Redirects: You cannot redirect after streaming has begun since redirect headers must be sent before any body content.
Error pages: Server errors that occur during streaming can't return a proper 500 status code.

// This is problematic with streaming:
async function ProductPage({ params }) {
  // Shell already sent with 200 status...
  const product = await getProduct(params.id)
  
  if (!product) {
    // Too late! Can't send 404 status anymore
    notFound() // This won't work as expected with streaming
  }
  
  return <ProductDetails product={product} />
}

To handle this, you may need to fetch critical data before the shell renders, or avoid streaming for pages where correct status codes are essential.

3. Caching Challenges

With streaming, the response is incomplete by design - the server starts sending chunks before it knows what the final content will be. This creates several problems:

CDNs Need the Full Response to Cache

Cache Invalidation Becomes Complex

But the CDN sees it as one response - you can't tell it "cache the shell for 24 hours but refresh the comments part every minute."

Personalized Content Breaks Caching

You end up having to either disable caching entirely for streamed pages, use complex cache key strategies (like including user segments in the key), or only cache the non-personalized shell.

The Solution: Data-Level Caching

Instead of caching the HTTP response, you cache at the data layer:

async function getProduct(id: string) {
  const cached = await cache.get(`product:${id}`)
  if (cached) return cached

  const product = await db.product.findUnique({ where: { id } })
  await cache.set(`product:${id}`, product, { ttl: 3600 })
  return product
}

This way each data source can have its own TTL, personalized and shared data can be cached separately, and the streaming still works, but data fetches are fast because they hit the cache.

Next.js also has built-in support for this with unstable_cache or the fetch API's caching options:

// Next.js automatically caches this fetch for 1 hour
const data = await fetch('https://api.example.com/product', {
  next: { revalidate: 3600 }
})

Another powerful strategy is using cache tags, which allow you to invalidate specific cached data on-demand rather than relying solely on time-based expiration:

// Tag your cached data
const product = await fetch(`https://api.example.com/product/${id}`, {
  next: { tags: [`product-${id}`, 'products'] }
})

// Later, invalidate by tag (e.g., in a webhook or server action)
import { revalidateTag } from 'next/cache'

export async function updateProduct(id: string) {
  await db.product.update({ ... })
  revalidateTag(`product-${id}`) // Invalidate just this product
  // Or revalidateTag('products') to invalidate all products
}

4. Layout Shifts

// Bad: Different sizes cause layout shift
<Suspense fallback={<p>Loading...</p>}>
  <LargeDataTable />  {/* When this loads, everything below shifts down */}
</Suspense>

// Good: Skeleton matches final dimensions
<Suspense fallback={<TableSkeleton rows={10} columns={5} />}>
  <LargeDataTable />
</Suspense>

Designing good skeleton states requires extra effort:

Know your content dimensions: You need to anticipate the size of the final content, which isn't always possible for dynamic data.
Maintain skeleton components: Every component that can be streamed needs a corresponding skeleton that matches its dimensions.
Handle variable content: Lists with unknown lengths are particularly tricky - do you show 5 skeleton items or 10?

// Tricky: How many skeleton items should we show?
<Suspense fallback={<CommentsSkeleton count={???} />}>
  <Comments postId={id} />  {/* Could be 0 comments or 100 */}
</Suspense>

// One solution: Use a container with fixed/min height
<div className="min-h-[400px]">
  <Suspense fallback={<CommentsSkeleton />}>
    <Comments postId={id} />
  </Suspense>
</div>

5. Debugging Difficulties

Debugging streaming applications can be significantly more challenging than traditional request/response debugging.

Network Waterfall Complexity

The Chrome DevTools network tab shows streaming responses as a single long-running request. It's hard to see when individual chunks arrived and what triggered them:

You can't easily see the timing of each streamed chunk
The response preview shows the final HTML, not the progressive chunks
It's unclear which Suspense boundary resolved when

Server-Side Logging Challenges

Traditional request logging shows a request come in and a response go out. With streaming, the response is ongoing for potentially several seconds:

// Traditional: Clean request/response logs
// [09:00:00] GET /page - 200 - 150ms

// Streaming: Response time is ambiguous
// [09:00:00] GET /page - 200 - ???ms (shell sent at 50ms, final chunk at 3000ms)

Harder to Reproduce Issues

Issues that only occur during specific streaming sequences can be hard to reproduce. A bug might only appear when Component A resolves before Component B, which depends on network timing.

Recommended Debugging Strategies

Add detailed server-side logging for each Suspense boundary resolution
Use React DevTools Profiler to understand component render timing
Consider adding custom performance marks for each streamed section
Test with artificial delays to simulate various network conditions

6. Not Suitable for All Use Cases

Streaming adds overhead and complexity that isn't justified for every page. Before implementing streaming, consider whether your use case actually benefits from it.

When Streaming Adds Unnecessary Complexity

Fast pages: If your page already renders in under 200ms, streaming won't provide noticeable benefits but adds code complexity.
Single data source: If all your data comes from one fast query, there's nothing to parallelize.
Small pages: Simple pages with minimal data don't benefit from progressive loading.
Static content: For pages that can be statically generated at build time, ISR (Incremental Static Regeneration) is often a better choice.

// Unnecessary streaming - this page is already fast
export default function AboutPage() {
  return (
    <main>
      <h1>About Us</h1>
      <p>We are a company that does things.</p>
      {/* No slow data fetches, no need for Suspense */}
    </main>
  )
}

The Complexity Cost

Every Suspense boundary you add requires:

A fallback component to design and maintain
Error boundary consideration
Testing for both loading and loaded states
Documentation for other developers

For teams new to streaming, there's also a learning curve. Make sure the performance benefits justify these costs for your specific use case.

7. Connection and Timeout Considerations

Streaming keeps the HTTP connection open until all content is sent. This has implications you should be aware of:

Long-Running Connections

If a data fetch takes 30 seconds, the connection stays open for 30 seconds. This can:

Hit proxy or load balancer timeouts (many default to 30-60 seconds)
Consume server connection pool resources
Cause issues with serverless platforms that bill by execution time

// This could keep a connection open for a very long time
<Suspense fallback={<Loading />}>
  <VerySlowComponent />  {/* 45-second database query */}
</Suspense>

Mobile and Unstable Connections

Users on mobile networks may experience connection drops. If the connection breaks mid-stream:

The user sees a partially loaded page
There's no automatic retry mechanism for the remaining content
The user must refresh to try again

Consider implementing client-side fallbacks for critical streamed content that can fetch data if the stream fails.

When to Use HTTP Streaming

Streaming is particularly beneficial when:

Your page has multiple independent data sources with varying response times
You have a clear visual hierarchy where some content is more important than others
You want to improve perceived performance on data-heavy pages
Your users are on slower connections where progressive loading matters

Streaming might not be the best choice when:

Your page renders quickly without streaming
All your data comes from a single, fast source
You need complete control over caching at the response level
Your content is critical for SEO and you're unsure about crawler support

Introduction

What is HTTP Streaming?

How it Works Under the Hood

HTTP/1.1: Chunked Transfer Encoding

HTTP/2 and HTTP/3: Native Streaming

React's Streaming SSR

The Streaming Process Step by Step

Why This Is Different from Client-Side Fetching

Benefits of HTTP Streaming

1. Improved Time to First Byte (TTFB)

2. Better Perceived Performance

3. Parallel Data Fetching

4. Better Core Web Vitals

5. Graceful Degradation

6. Reduced Server Memory Pressure

Drawbacks of HTTP Streaming

1. Complexity in Error Handling

Every Suspense Boundary Needs Error Handling

Error Recovery is Limited

Nested Error Boundaries Add Complexity

2. SEO Considerations

Crawler Behavior Varies

Critical Content Should Be in the Shell

HTTP Status Code Limitations

3. Caching Challenges

CDNs Need the Full Response to Cache

Cache Invalidation Becomes Complex

Personalized Content Breaks Caching

The Solution: Data-Level Caching

4. Layout Shifts

5. Debugging Difficulties

Network Waterfall Complexity

Server-Side Logging Challenges

Harder to Reproduce Issues

Recommended Debugging Strategies

6. Not Suitable for All Use Cases

When Streaming Adds Unnecessary Complexity

The Complexity Cost

7. Connection and Timeout Considerations

Long-Running Connections

Mobile and Unstable Connections

When to Use HTTP Streaming

Conclusion

Further Reading

Introduction

What is HTTP Streaming?

How it Works Under the Hood

HTTP/1.1: Chunked Transfer Encoding

HTTP/2 and HTTP/3: Native Streaming

React's Streaming SSR

The Streaming Process Step by Step

Why This Is Different from Client-Side Fetching

Benefits of HTTP Streaming

1. Improved Time to First Byte (TTFB)

2. Better Perceived Performance

3. Parallel Data Fetching

4. Better Core Web Vitals

5. Graceful Degradation

6. Reduced Server Memory Pressure

Drawbacks of HTTP Streaming

1. Complexity in Error Handling

Every Suspense Boundary Needs Error Handling

Error Recovery is Limited

Nested Error Boundaries Add Complexity

2. SEO Considerations

Crawler Behavior Varies

Critical Content Should Be in the Shell

HTTP Status Code Limitations

3. Caching Challenges

CDNs Need the Full Response to Cache

Cache Invalidation Becomes Complex

Personalized Content Breaks Caching

The Solution: Data-Level Caching

4. Layout Shifts

5. Debugging Difficulties

Network Waterfall Complexity

Server-Side Logging Challenges

Harder to Reproduce Issues

Recommended Debugging Strategies

6. Not Suitable for All Use Cases