All writing
12 Mar 2026 · 2 min read
AI/UX/Streaming

Streaming AI without lying about progress

People can wait longer than you think if the interface stays honest about what is happening.

Most AI surfaces lose users in the silence between intent and answer. The model is busy, the wire is humming, but the interface is doing nothing. So the user does what users do — they doubt, refresh, or leave.

We started instrumenting the parts of our system that already emit signals — retrieval started, retrieval completed, planning, drafting, finalizing — and surfaced them as a first-class UI primitive. Not a spinner. Not a percentage. Honest, plain-language updates that mirror what the system is actually doing.

Three rules for a thinking-state layer

First, never lie. If retrieval failed and you're retrying, say so. The instinct to hide errors makes the surface feel less trustworthy, not more. Second, stay honest about latency. If a step normally takes eight seconds, do not animate optimism over it. Third, the layer is a contract — every backend stage that emits an event must do so reliably. Skipped events read as bugs.

Latency is mostly perception. Streaming the right intermediate signal is worth more than shaving 200ms off the wire.

We measured. Real wall-clock latency stayed exactly the same. Perceived latency, measured by abandonment and self-reported satisfaction, dropped meaningfully. The model didn't get faster — the silence did.

End · 12 Mar 2026
Next post
22 Jan 2026 · 1 min

Most AI bugs are retrieval bugs

When an AI answer feels wrong, I usually start by checking what it was given to work with.