Finding, fixing, and verifying a relay I never chose
A follow-up to The follow-up file, where I argued that shipping a fix isn’t done until you’ve verified it moved a number. Here’s a case where I did.
The symptom
One Sentry error: negotiation-failed, one user, Safari/Mac, about 19 events in 25 minutes, then it recovered on its own. The kind of thing you write off as a bad network. I checked instead.
The numbers
My app (maguskartya.app) is a peer-to-peer card game over WebRTC, built on PeerJS – a small library that opens the browser-to-browser connections. I queried 90 days of production traffic – 20 players, 64 game sessions:
- 58% of sessions hit a connection error, a blocking reconnect overlay, or repeated join attempts.
- 41% showed the overlay; 75% of players were affected.
- May looked the same. Not new, not rare.
The cause
I never set iceServers, so PeerJS used its default config. That default already includes a TURN relay – a free, shared, rate-limited public one (turn:eu-0.turn.peerjs.com:3478, credentials peerjs/peerjsp, port 3478, no TLS). When a direct connection failed, players relayed through that shared box. Anyone behind a firewall that blocks 3478 had no fallback at all.
So I wasn’t missing a TURN server. I was unknowingly depending on a bad one.
The fix
A dedicated relay – Cloudflare Realtime TURN. A small Vercel function (turn-credentials.ts) verifies the Clerk session, checks an allowlist claim, and mints a 24h credential. The client fetches it, caches it 23h, and uses it for every peer connection it opens.
The one trap worth knowing: PeerJS’s config replaces the defaults, it doesn’t merge. Pass your own iceServers and you silently lose the default STUN. Re-add it:
const turn = Array.isArray(data.iceServers) ? data.iceServers : [data.iceServers];// peerjs's config REPLACES the defaults - re-add STUN or you lose it.const servers = [{ urls: 'stun:stun.l.google.com:19302' }, ...turn];
This is a deliberate fallback: if my credential endpoint is ever down, fetching a credential fails, and instead of erroring out, the client just creates the peer with no custom config – so PeerJS uses its old default, the public relay. Players stay connected on a worse relay rather than not connecting at all.
Did it work
Two matched 23-day windows, before and after the deploy, normalized per game so a drop means each game got smoother, not just that fewer people played:
- Connection errors per game: 1.38 → 0.36 (-74%).
- Reconnect overlays per game: 0.72 → 0.39 (-47%).
- Games actually went up over the same period (130 → 161).
Direct check: 24% of established connections now relay through my own TURN – a quarter of connections genuinely need a relay, and now it’s mine, not the public one. Credential fetches succeed 100% of the time (202 fetches, 32 people).
Lessons
- No
iceServersin PeerJS doesn’t mean no relay – it means PeerJS’s shared public one. Find out how much of your traffic depends on it. - “Mostly works” isn’t a number. P2P failures self-heal, so a common problem looks like an edge case until you count it.
- A meaningful fraction of users can’t connect directly and need a relay. If you didn’t set one up, you’re borrowing someone else’s.