SSE Streaming on Fly.io: Two Bugs and a Fix
How to get Server-Sent Events working on Fly.io with Bun and Hono. Two non-obvious bugs, the debugging story, and the complete configuration.
The Setup
I built a real-time alerting system called Red Alert. Alerts need to hit the browser the moment they fire. Server-Sent Events (SSE) is the simplest way to do that: one persistent HTTP connection, server pushes data, browser reconnects automatically.
The architecture:
Browser (EventSource) --> Fly Proxy --> Bun HTTP Server --> Hono streamSSE
60s idle 255s idle 10s keepalive
The client opens an EventSource to /api/v1/alerts/stream. The server holds the connection open and pushes alert JSON whenever one fires. Simple.
Getting it to actually work on Fly.io took debugging two non-obvious bugs.
WebSocket Doesn’t Work on Fly
We tried WebSocket first. It didn’t work for our setup. Fly’s HTTP proxy strips Upgrade and Connection hop-by-hop headers on the standard request path, so the handshake never completed. The Fly community confirmed SSE with heartbeats is the simpler, more reliable approach for server-push use cases.
Bug 1: Don’t Close Streams Server-Side
The first SSE implementation had a 5-minute maxLifetime that closed the stream with setTimeout. Seemed reasonable. Prevent zombie connections, right?
Wrong. When the server closes mid-stream, Fly’s proxy reports:
[PU05] could not finish reading HTTP body from instance
PU05 means the proxy expected more data but the upstream closed. Every 5 minutes, every connected client got disconnected.
Fix: Remove maxLifetime. SSE connections live until the client disconnects or a deploy happens. That’s fine. That’s what they’re for.
Bug 2: Bun’s idleTimeout (The Real One)
After fixing Bug 1, PU05s continued. Our database was falling over. There were a myriad of things going wrong at once. Then we noticed a pattern in the logs: connections dying every ~10 seconds or ~19 seconds.
This one was subtle. Bun’s HTTP server has a default idleTimeout of 10 seconds. If no data flows on a socket for 10s, Bun closes it at the kernel level. Our SSE keepalive also fires every 10s via setInterval, but JavaScript timers have event loop jitter. The heartbeat might fire at 10.003s, but the socket already closed at exactly 10.000s. Two timers, same interval, different clocks:
- If Bun’s idle timer wins: connection dies at ~10s
- If keepalive wins (resets the idle clock): connection survives to ~20s, then loses the next race
The evidence from structured logs confirmed it. Zero sse.keepalive_failed events. The server thought writes were succeeding because Bun tears down the socket underneath the JS runtime. Writes land in a buffer that’s already gone. Our application logs couldn’t catch it. The Fly proxy logs had to tell us. PU05 timestamps matched SSE disconnect timestamps exactly. Connection durations clustered at ~9.75s and ~18.7s. Multiples of 10s.
Fix: Set idleTimeout: 255 (Bun’s maximum) on the server export:
// packages/api/src/index.ts
export default {
port,
fetch: app.fetch,
idleTimeout: 255,
}
Fly’s proxy handles real idle management at 60s. Our 10s keepalive keeps that alive. Bun just needs to get out of the way.
idleTimeout: 0 disables the timeout entirely, but 255 is safer. A broken client that never disconnects would leak sockets forever with 0. At 255s, Bun is still a backstop. See Bun HTTP docs.
The Complete Configuration
Here’s every setting that matters, across all three layers:
| Layer | Setting | Value | Why |
|---|---|---|---|
| Bun server | idleTimeout | 255 (max) | Prevent Bun from killing idle SSE sockets |
| SSE handler | keepalive interval | 10s | Keep Fly proxy alive (60s idle timeout) |
| SSE handler | Content-Encoding | none | Prevent proxy response buffering |
| SSE handler | Cache-Control | no-cache, no-transform | Prevent proxy caching |
| SSE handler | X-Accel-Buffering | no | Nginx-style proxy buffer disable |
| Hono timeout middleware | /alerts/stream | exempt | SSE is long-lived by design |
| fly.toml | kill_timeout | 30 | Grace period for SSE drain during deploys |
Server-Side: The SSE Endpoint
The full endpoint using Hono’s streamSSE:
alertRoutes.get('/stream', sessionMiddleware, async (c) => {
const user = c.get('user')
// Content-Encoding: none — tells proxies not to gzip (would buffer the stream)
c.header('Content-Encoding', 'none')
// Cache-Control: no-cache, no-transform — prevents proxy caching and rewriting
c.header('Cache-Control', 'no-cache, no-transform')
// X-Accel-Buffering: no — disables response buffering in nginx-style proxies
c.header('X-Accel-Buffering', 'no')
return streamSSE(c, async (stream) => {
await stream.writeSSE({ data: JSON.stringify({ type: 'connected' }) })
const cleanup = addClient(user.id, {
write: (data: string) => { stream.writeSSE({ data }) },
close: () => { stream.close() },
})
// 10s heartbeat keeps Fly's 60s proxy alive
// Sends empty data: field — client filters these out
// Alternative: stream.write(': keepalive\n\n') uses SSE comments (silently ignored by EventSource)
const keepalive = setInterval(() => {
stream.writeSSE({ data: '' }).catch(() => {
clearInterval(keepalive)
})
}, 10_000)
stream.onAbort(() => {
clearInterval(keepalive)
cleanup()
})
// Hold the connection open indefinitely
await new Promise(() => {})
})
})
The timeout middleware also needs to exempt the SSE route:
app.use('*', async (c, next) => {
if (c.req.path.endsWith('/alerts/stream')) return next()
return timeout(30_000)(c, next)
})
Client-Side: Reconnection That Works
The browser side uses EventSource with a controlled reconnect pattern (modeled after ioredis):
function connect() {
if (disposed) return
const es = new EventSource('/api/v1/alerts/stream', { withCredentials: true })
es.onmessage = (event) => {
if (!event.data) return
const parsed = JSON.parse(event.data)
if (parsed.type === 'connected' || !parsed.id) return
// Handle the alert...
}
es.onerror = () => {
es.close() // Defeat EventSource auto-retry
if (!disposed) {
if (reconnectTimer != null) clearTimeout(reconnectTimer)
reconnectTimer = setTimeout(connect, 5_000)
}
}
}
Three things to note:
disposedflag distinguishes intentional unmount from unexpected disconnect. Without it, you get reconnect attempts after the component unmounts.es.close()on error defeats EventSource’s built-in auto-retry. You want controlled 5s reconnects, not the browser hammering the server immediately.- Clear before reschedule. Always
clearTimeoutbefore setting a new reconnect timer. Prevents stale timer races where multiple reconnects fire simultaneously.
Deploys: The kill_timeout
One last thing. Fly’s default kill_timeout is 5 seconds. During a blue-green deploy, active SSE connections get 5 seconds to close. That’s not enough if you want graceful drain.
# fly.toml
kill_timeout = 30
[deploy]
strategy = "bluegreen"
30 seconds gives the old instance time to drain. Here’s how deploys actually play out: the old instance gets SIGTERM and stops accepting connections. Active SSE streams die. The client’s onerror fires, waits 5 seconds, and reconnects. Fly’s load balancer routes the new connection to the fresh instance. Without the explicit reconnect logic on the client side, browsers would just hang on the dead connection.
What Healthy Looks Like
Once configured correctly:
- Zero PU05 proxy kills (except during deploys)
- SSE connects match disconnects over time
- Reconnect rate ~0.2/min per client (deploy-only)
- Zero
sse.keepalive_failedevents
If you see rapid connect/disconnect churn (> 2/min per client), check idleTimeout. If you see PU05 spikes outside deploys, something is closing streams server-side.
The takeaway isn’t specific to SSE or Fly or Bun. Two timers with the same interval on different clocks will always race. When your application timer and your runtime timer are both set to 10 seconds, one of them is going to lose.