When a system responds in under 400 milliseconds, users stay engaged and productive. Above that threshold, attention begins to drift β and every additional second makes disengagement more likely.
In 1982, IBM researchers Walter Doherty and Ahrvind Thadani published a study in the IBM Systems Journal with a precise and counterintuitive finding: computer response time needed to be under 400 milliseconds to keep users in a productive, engaged state. Above that threshold, users began to lose focus, their thought process was interrupted, and the interaction shifted from a flow state into a wait state β with measurable consequences for both productivity and satisfaction.
The threshold wasn't arbitrary. It corresponded roughly to the limit of human short-term auditory memory β the window during which the brain maintains active attention on a single interaction. When a system response arrives within that window, the user experiences the interaction as continuous. When it doesn't, the brain involuntarily moves on, attention wanders, and the user must reconstruct their context when the response finally arrives.
βUsers are most productive when response times are less than 400ms β the threshold at which the human nervous system begins to perceive a pause.β
β Walter Doherty & Ahrvind Thadani, IBM, 1982
Search is the interaction most sensitive to the Doherty Threshold. When someone types a query and hits enter, their question is active in working memory and they are primed to process the answer. A two-second spinner breaks that state. By the time results arrive, the user has already begun wondering whether the search worked, whether they should retype, or whether the site is broken.
A skeleton screen β grey placeholder blocks in the shape of the results that are about to appear β does something simple and powerful: it tells the user that the system has understood their request and is working on it, while giving their eye somewhere to land. The content area doesn't feel empty. It feels imminent.
Nothing to read. Attention drifts before results arrive.
Content shape appears instantly. Users see structure forming; eye has a place to land.
The skeleton screen works because it provides a structural preview of what's coming β enough for the brain to start preparing to process the content. By the time the real data arrives, the user's eye is already positioned correctly. The transition from skeleton to content feels like resolution rather than arrival, because the destination was visible from the start.
Before streaming became standard in AI chat interfaces, the interaction looked like this: user sends a message, system shows a loading indicator, four seconds pass, the complete response appears all at once. The wait was dead time β nothing to read, nothing to process, nothing to engage with. Users who opened AI tools in one tab while working on something else in another were not impatient β they were rationally responding to a 4-second dead zone.
Streaming changes this completely. The first token arrives in under a second. Users begin reading before the response is complete. By the time the model finishes generating, the user has already processed the first half of the answer and is ready to act. The total time to complete response is often identical β but the experience is fundamentally different because the Doherty Threshold is never violated.
Pulsing dots β nothing to read yet. The response is being generated but the screen is empty. Attention drifts immediately.
Already streaming β reading has started. Words arrive before the model is done. The user spends generation time reading, not waiting.
Streaming is now standard across every major AI chat interface for exactly this reason. The model's first token arrives in under a second β well within the Doherty Threshold β and users spend the remaining generation time reading rather than waiting. The interaction stays continuous, attention stays engaged, and the perceived quality of the response is higher because users have processed it actively rather than passively received it in a dump.
Form validation is one of the most common and most overlooked applications of the Doherty Threshold. When validation happens at submit time β the user fills in all fields, taps submit, waits for the server round-trip, and then receives a list of everything wrong β the delay and the accumulation of errors both create cognitive load that has nothing to do with the user's actual task.
Inline validation responds to each field individually and immediately. The user types an email address: within 300 milliseconds of stopping typing, a green checkmark appears. They start typing a password: a strength indicator updates in real time. Each field gives instantaneous feedback, well under the Doherty Threshold, and the user understands whether they're correct before moving to the next field.
All errors appear at once after submit. The user completed the entire form before learning anything was wrong.
Each field validates as you type. Errors appear within 300ms of stopping. Users always know their status before moving to the next field.
The submit button on the good form is intentionally disabled until all fields are valid. This is not punitive β it is informative. Users always know exactly which fields still need attention, they never need to submit to find out, and the act of completing each field correctly produces a small, immediate confirmation that the task is progressing. Each green checkmark is a micro-resolution, delivered within the Doherty Threshold, that keeps the user in flow through the entire form.
The Doherty Threshold gives designers a concrete, measurable target: every system action should produce visible feedback within 400 milliseconds. Not necessarily the completed result β just evidence that the system has received the action and is working on it. A button changing state, a skeleton appearing, the first streaming token β any of these maintains the Doherty contract even when the full response takes longer.
The modern relevance of this 1982 finding is if anything higher than it was then. AI responses, API calls, search indexing, and complex renders all take time. The question is whether that time is dead time β a blank spinner β or engaged time, where the user is already processing something related to what they asked for. Skeleton screens, streaming, optimistic updates, and inline validation are all answers to the same question: what can we show the user while we figure out the full answer?
Doherty, W. J., & Thadani, A. J. (1982). The economic value of rapid response time. IBM Systems Journal, 21(2). Β· Miller, R. B. (1968). Response time in man-computer conversational transactions. AFIPS Fall Joint Computer Conference.