SSE Streaming on Fly.io: Two Bugs and a Fix

The Setup

I built a real-time alerting system called Red Alert. Alerts need to hit the browser the moment they fire. Server-Sent Events (SSE) is the simplest way to do that: one persistent HTTP connection, server pushes data, browser reconnects automatically.

The architecture:

Browser (EventSource) --> Fly Proxy --> Bun HTTP Server --> Hono streamSSE
                           60s idle     255s idle          10s keepalive

The client opens an EventSource to /api/v1/alerts/stream. The server holds the connection open and pushes alert JSON whenever one fires. Simple.

Getting it to actually work on Fly.io took debugging two non-obvious bugs.

WebSocket Doesn’t Work on Fly

We tried WebSocket first. It didn’t work for our setup. Fly’s HTTP proxy strips Upgrade and Connection hop-by-hop headers on the standard request path, so the handshake never completed. The Fly community confirmed SSE with heartbeats is the simpler, more reliable approach for server-push use cases.

Bug 1: Don’t Close Streams Server-Side

The first SSE implementation had a 5-minute maxLifetime that closed the stream with setTimeout. Seemed reasonable. Prevent zombie connections, right?

Wrong. When the server closes mid-stream, Fly’s proxy reports:

[PU05] could not finish reading HTTP body from instance

PU05 means the proxy expected more data but the upstream closed. Every 5 minutes, every connected client got disconnected.

Fix: Remove maxLifetime. SSE connections live until the client disconnects or a deploy happens. That’s fine. That’s what they’re for.

Bug 2: Bun’s idleTimeout (The Real One)

After fixing Bug 1, PU05s continued. Our database was falling over. There were a myriad of things going wrong at once. Then we noticed a pattern in the logs: connections dying every ~10 seconds or ~19 seconds.

This one was subtle. Bun’s HTTP server has a default idleTimeout of 10 seconds. If no data flows on a socket for 10s, Bun closes it at the kernel level. Our SSE keepalive also fires every 10s via setInterval, but JavaScript timers have event loop jitter. The heartbeat might fire at 10.003s, but the socket already closed at exactly 10.000s. Two timers, same interval, different clocks:

If Bun’s idle timer wins: connection dies at ~10s
If keepalive wins (resets the idle clock): connection survives to ~20s, then loses the next race

The evidence from structured logs confirmed it. Zero sse.keepalive_failed events. The server thought writes were succeeding because Bun tears down the socket underneath the JS runtime. Writes land in a buffer that’s already gone. Our application logs couldn’t catch it. The Fly proxy logs had to tell us. PU05 timestamps matched SSE disconnect timestamps exactly. Connection durations clustered at ~9.75s and ~18.7s. Multiples of 10s.

Fix: Set idleTimeout: 255 (Bun’s maximum) on the server export:

// packages/api/src/index.ts
export default {
  port,
  fetch: app.fetch,
  idleTimeout: 255,
}

Fly’s proxy handles real idle management at 60s. Our 10s keepalive keeps that alive. Bun just needs to get out of the way.

idleTimeout: 0 disables the timeout entirely, but 255 is safer. A broken client that never disconnects would leak sockets forever with 0. At 255s, Bun is still a backstop. See Bun HTTP docs.

The Complete Configuration

Here’s every setting that matters, across all three layers:

Layer	Setting	Value	Why
Bun server	`idleTimeout`	`255` (max)	Prevent Bun from killing idle SSE sockets
SSE handler	keepalive interval	`10s`	Keep Fly proxy alive (60s idle timeout)
SSE handler	`Content-Encoding`	`none`	Prevent proxy response buffering
SSE handler	`Cache-Control`	`no-cache, no-transform`	Prevent proxy caching
SSE handler	`X-Accel-Buffering`	`no`	Nginx-style proxy buffer disable
Hono timeout middleware	`/alerts/stream`	exempt	SSE is long-lived by design
fly.toml	`kill_timeout`	`30`	Grace period for SSE drain during deploys

Server-Side: The SSE Endpoint

The full endpoint using Hono’s streamSSE:

alertRoutes.get('/stream', sessionMiddleware, async (c) => {
  const user = c.get('user')

  // Content-Encoding: none — tells proxies not to gzip (would buffer the stream)
  c.header('Content-Encoding', 'none')
  // Cache-Control: no-cache, no-transform — prevents proxy caching and rewriting
  c.header('Cache-Control', 'no-cache, no-transform')
  // X-Accel-Buffering: no — disables response buffering in nginx-style proxies
  c.header('X-Accel-Buffering', 'no')

  return streamSSE(c, async (stream) => {
    await stream.writeSSE({ data: JSON.stringify({ type: 'connected' }) })

    const cleanup = addClient(user.id, {
      write: (data: string) => { stream.writeSSE({ data }) },
      close: () => { stream.close() },
    })

    // 10s heartbeat keeps Fly's 60s proxy alive
    // Sends empty data: field — client filters these out
    // Alternative: stream.write(': keepalive\n\n') uses SSE comments (silently ignored by EventSource)
    const keepalive = setInterval(() => {
      stream.writeSSE({ data: '' }).catch(() => {
        clearInterval(keepalive)
      })
    }, 10_000)

    stream.onAbort(() => {
      clearInterval(keepalive)
      cleanup()
    })

    // Hold the connection open indefinitely
    await new Promise(() => {})
  })
})

The timeout middleware also needs to exempt the SSE route:

app.use('*', async (c, next) => {
  if (c.req.path.endsWith('/alerts/stream')) return next()
  return timeout(30_000)(c, next)
})

Client-Side: Reconnection That Works

The browser side uses EventSource with a controlled reconnect pattern (modeled after ioredis):

function connect() {
  if (disposed) return
  const es = new EventSource('/api/v1/alerts/stream', { withCredentials: true })

  es.onmessage = (event) => {
    if (!event.data) return
    const parsed = JSON.parse(event.data)
    if (parsed.type === 'connected' || !parsed.id) return
    // Handle the alert...
  }

  es.onerror = () => {
    es.close() // Defeat EventSource auto-retry
    if (!disposed) {
      if (reconnectTimer != null) clearTimeout(reconnectTimer)
      reconnectTimer = setTimeout(connect, 5_000)
    }
  }
}

Three things to note:

disposed flag distinguishes intentional unmount from unexpected disconnect. Without it, you get reconnect attempts after the component unmounts.
es.close() on error defeats EventSource’s built-in auto-retry. You want controlled 5s reconnects, not the browser hammering the server immediately.
Clear before reschedule. Always clearTimeout before setting a new reconnect timer. Prevents stale timer races where multiple reconnects fire simultaneously.

Deploys: The kill_timeout

One last thing. Fly’s default kill_timeout is 5 seconds. During a blue-green deploy, active SSE connections get 5 seconds to close. That’s not enough if you want graceful drain.

# fly.toml
kill_timeout = 30

[deploy]
  strategy = "bluegreen"

30 seconds gives the old instance time to drain. Here’s how deploys actually play out: the old instance gets SIGTERM and stops accepting connections. Active SSE streams die. The client’s onerror fires, waits 5 seconds, and reconnects. Fly’s load balancer routes the new connection to the fresh instance. Without the explicit reconnect logic on the client side, browsers would just hang on the dead connection.

What Healthy Looks Like

Once configured correctly:

Zero PU05 proxy kills (except during deploys)
SSE connects match disconnects over time
Reconnect rate ~0.2/min per client (deploy-only)
Zero sse.keepalive_failed events

If you see rapid connect/disconnect churn (> 2/min per client), check idleTimeout. If you see PU05 spikes outside deploys, something is closing streams server-side.

The takeaway isn’t specific to SSE or Fly or Bun. Two timers with the same interval on different clocks will always race. When your application timer and your runtime timer are both set to 10 seconds, one of them is going to lose.