Streaming AI without lying about progress

Most AI surfaces lose users in the silence between intent and answer. The model is busy, the wire is humming, but the interface is doing nothing. So the user does what users do — they doubt, refresh, or leave.

We started instrumenting the parts of our system that already emit signals — retrieval started, retrieval completed, planning, drafting, finalizing — and surfaced them as a first-class UI primitive. Not a spinner. Not a percentage. Honest, plain-language updates that mirror what the system is actually doing.

Three rules for a thinking-state layer

First, never lie. If retrieval failed and you're retrying, say so. The instinct to hide errors makes the surface feel less trustworthy, not more. Second, stay honest about latency. If a step normally takes eight seconds, do not animate optimism over it. Third, the layer is a contract — every backend stage that emits an event must do so reliably. Skipped events read as bugs.

Latency is mostly perception. Streaming the right intermediate signal is worth more than shaving 200ms off the wire.

We measured. Real wall-clock latency stayed exactly the same. Perceived latency, measured by abandonment and self-reported satisfaction, dropped meaningfully. The model didn't get faster — the silence did.

End · 12 Mar 2026

Streaming AI without lying about progress

Three rules for a thinking-state layer

Most AI bugs are retrieval bugs