⚡

Doherty Threshold

When a system responds in under 400 milliseconds, users stay engaged and productive. Above that threshold, attention begins to drift — and every additional second makes disengagement more likely.

5 min readUX · Product · AI

In 1982, IBM researchers Walter Doherty and Ahrvind Thadani published a study in the IBM Systems Journal with a precise and counterintuitive finding: computer response time needed to be under 400 milliseconds to keep users in a productive, engaged state. Above that threshold, users began to lose focus, their thought process was interrupted, and the interaction shifted from a flow state into a wait state — with measurable consequences for both productivity and satisfaction.

The threshold wasn't arbitrary. It corresponded roughly to the limit of human short-term auditory memory — the window during which the brain maintains active attention on a single interaction. When a system response arrives within that window, the user experiences the interaction as continuous. When it doesn't, the brain involuntarily moves on, attention wanders, and the user must reconstruct their context when the response finally arrives.

✦ Three things to know

✓

400ms is the threshold, not the target. Under 400ms, the interaction feels instantaneous. But the fastest possible response is always better than 399ms — the goal is to respond as quickly as the system can, with 400ms as the hard ceiling above which user experience measurably degrades.

✓

Perceived speed matters as much as actual speed. Skeleton screens, streaming output, optimistic UI updates, and progress indicators don't make the system faster — they make the wait feel shorter by giving users something to process while the response loads. The brain continues to engage even when the data isn't fully there yet.

✓

AI streaming is the most visible modern application of this law. A language model that streams tokens as they are generated keeps users reading and engaged from the first word. The same model delivering the same response after a 4-second wait interrupts flow, invites context-switching, and feels slower — even though the time to complete response may be identical.

“Users are most productive when response times are less than 400ms — the threshold at which the human nervous system begins to perceive a pause.”
— Walter Doherty & Ahrvind Thadani, IBM, 1982

Search — spinner vs skeleton vs instant

Search is the interaction most sensitive to the Doherty Threshold. When someone types a query and hits enter, their question is active in working memory and they are primed to process the answer. A two-second spinner breaks that state. By the time results arrive, the user has already begun wondering whether the search worked, whether they should retype, or whether the site is broken.

A skeleton screen — grey placeholder blocks in the shape of the results that are about to appear — does something simple and powerful: it tells the user that the system has understood their request and is working on it, while giving their eye somewhere to land. The content area doesn't feel empty. It feels imminent.

Before — spinner, flow broken

search.yourapp.com/?q=design+systems

design systems

Searching...

Nothing to read. Attention drifts before results arrive.

After — skeleton screen

search.yourapp.com/?q=design+systems

design systems

Content shape appears instantly. Users see structure forming; eye has a place to land.

The skeleton screen works because it provides a structural preview of what's coming — enough for the brain to start preparing to process the content. By the time the real data arrives, the user's eye is already positioned correctly. The transition from skeleton to content feels like resolution rather than arrival, because the destination was visible from the start.

AI streaming — the most visible application of this law today

Before streaming became standard in AI chat interfaces, the interaction looked like this: user sends a message, system shows a loading indicator, four seconds pass, the complete response appears all at once. The wait was dead time — nothing to read, nothing to process, nothing to engage with. Users who opened AI tools in one tab while working on something else in another were not impatient — they were rationally responding to a 4-second dead zone.

Streaming changes this completely. The first token arrives in under a second. Users begin reading before the response is complete. By the time the model finishes generating, the user has already processed the first half of the answer and is ready to act. The total time to complete response is often identical — but the experience is fundamentally different because the Doherty Threshold is never violated.

Before — wait, then full response

9:41

AI Assistant

Online

What's the difference between margin and padding in CSS?

4.2s waiting...

Message...

Pulsing dots — nothing to read yet. The response is being generated but the screen is empty. Attention drifts immediately.

After — streaming, first words in under 1s

9:41

AI Assistant

Generating...

What's the difference between margin and padding in CSS?

Margin is space outside an element's border — it pushes neighbouring elements away. Padding is space inside the border — it creates breathing room between the content and the edge.

0.8s — already reading

Show me an exampleWhen to use which?Box model

Message...

Already streaming — reading has started. Words arrive before the model is done. The user spends generation time reading, not waiting.

Streaming is now standard across every major AI chat interface for exactly this reason. The model's first token arrives in under a second — well within the Doherty Threshold — and users spend the remaining generation time reading rather than waiting. The interaction stays continuous, attention stays engaged, and the perceived quality of the response is higher because users have processed it actively rather than passively received it in a dump.

Form validation — inline feedback vs submit-time errors

Form validation is one of the most common and most overlooked applications of the Doherty Threshold. When validation happens at submit time — the user fills in all fields, taps submit, waits for the server round-trip, and then receives a list of everything wrong — the delay and the accumulation of errors both create cognitive load that has nothing to do with the user's actual task.

Inline validation responds to each field individually and immediately. The user types an email address: within 300 milliseconds of stopping typing, a green checkmark appears. They start typing a password: a strength indicator updates in real time. Each field gives instantaneous feedback, well under the Doherty Threshold, and the user understands whether they're correct before moving to the next field.

Before — errors shown only on submit

9:41

Create account

Please fix 3 errors

Email address is invalid

Password must be at least 8 characters

Passwords do not match

Email address

john@

Password

•••

Confirm password

••••

Full name

John Doe

All errors appear at once after submit. The user completed the entire form before learning anything was wrong.

After — instant inline feedback per field

9:41

Create account

Email address Invalid

john@

Enter a complete email (e.g. you@example.com)

PasswordWeak

••••••

Add uppercase letters and symbols

Confirm password Matches

••••••

Passwords match

Full name

John Doe

Each field validates as you type. Errors appear within 300ms of stopping. Users always know their status before moving to the next field.

The submit button on the good form is intentionally disabled until all fields are valid. This is not punitive — it is informative. Users always know exactly which fields still need attention, they never need to submit to find out, and the act of completing each field correctly produces a small, immediate confirmation that the task is progressing. Each green checkmark is a micro-resolution, delivered within the Doherty Threshold, that keeps the user in flow through the entire form.

Applying this to your work

The Doherty Threshold gives designers a concrete, measurable target: every system action should produce visible feedback within 400 milliseconds. Not necessarily the completed result — just evidence that the system has received the action and is working on it. A button changing state, a skeleton appearing, the first streaming token — any of these maintains the Doherty contract even when the full response takes longer.

The modern relevance of this 1982 finding is if anything higher than it was then. AI responses, API calls, search indexing, and complex renders all take time. The question is whether that time is dead time — a blank spinner — or engaged time, where the user is already processing something related to what they asked for. Skeleton screens, streaming, optimistic updates, and inline validation are all answers to the same question: what can we show the user while we figure out the full answer?

✓ Apply it like this

→Replace loading spinners with skeleton screens — show the structural shape of the content that's coming, giving users' eyes a place to land while data loads.

→Stream AI responses token by token — the first word should appear within 600ms regardless of how long the full response takes to generate.

→Validate form fields inline as users type, with feedback appearing within 300ms of typing pause — not at submit time after all fields are filled.

→Use optimistic UI updates — show the result of an action immediately and revert only if the server call fails, rather than waiting for confirmation before updating the UI.

✗ Common mistakes

→Showing a spinner on a blank page — the user's eye has nowhere to land, attention drifts immediately, and the wait feels longer than it is.

→Waiting for a complete AI response before displaying anything — users who wait 4 seconds for text to appear are less engaged with it than users who spent 4 seconds reading as it streamed.

→Validating only on form submit — users discover all their errors simultaneously, after completing work they now have to redo, in a state of frustration rather than flow.

→Adding perceived speed tricks to genuinely slow systems instead of fixing the underlying performance — skeleton screens over a 10-second load help, but they don't replace actually being fast.

Doherty, W. J., & Thadani, A. J. (1982). The economic value of rapid response time. IBM Systems Journal, 21(2). · Miller, R. B. (1968). Response time in man-computer conversational transactions. AFIPS Fall Joint Computer Conference.

Back to library

⚡

Doherty Threshold

When a system responds in under 400 milliseconds, users stay engaged and productive. Above that threshold, attention begins to drift — and every additional second makes disengagement more likely.

5 min readUX · Product · AI

✦ Three things to know

✓

“Users are most productive when response times are less than 400ms — the threshold at which the human nervous system begins to perceive a pause.”
— Walter Doherty & Ahrvind Thadani, IBM, 1982

Search — spinner vs skeleton vs instant

Before — spinner, flow broken

search.yourapp.com/?q=design+systems

design systems

Searching...

Nothing to read. Attention drifts before results arrive.

After — skeleton screen

search.yourapp.com/?q=design+systems

design systems

Content shape appears instantly. Users see structure forming; eye has a place to land.

AI streaming — the most visible application of this law today

Before — wait, then full response

9:41

AI Assistant

Online

What's the difference between margin and padding in CSS?

4.2s waiting...

Message...

Pulsing dots — nothing to read yet. The response is being generated but the screen is empty. Attention drifts immediately.

After — streaming, first words in under 1s

9:41

AI Assistant

Generating...

What's the difference between margin and padding in CSS?

Margin is space outside an element's border — it pushes neighbouring elements away. Padding is space inside the border — it creates breathing room between the content and the edge.

0.8s — already reading

Show me an exampleWhen to use which?Box model

Message...

Already streaming — reading has started. Words arrive before the model is done. The user spends generation time reading, not waiting.

Form validation — inline feedback vs submit-time errors

Before — errors shown only on submit

9:41

Create account

Please fix 3 errors

Email address is invalid

Password must be at least 8 characters

Passwords do not match

Email address

john@

Password

•••

Confirm password

••••

Full name

John Doe

All errors appear at once after submit. The user completed the entire form before learning anything was wrong.

After — instant inline feedback per field

9:41

Create account

Email address Invalid

john@

Enter a complete email (e.g. you@example.com)

PasswordWeak

••••••

Add uppercase letters and symbols

Confirm password Matches

••••••

Passwords match

Full name

John Doe

Each field validates as you type. Errors appear within 300ms of stopping. Users always know their status before moving to the next field.

Applying this to your work

✓ Apply it like this

→Replace loading spinners with skeleton screens — show the structural shape of the content that's coming, giving users' eyes a place to land while data loads.

→Stream AI responses token by token — the first word should appear within 600ms regardless of how long the full response takes to generate.

→Validate form fields inline as users type, with feedback appearing within 300ms of typing pause — not at submit time after all fields are filled.

→Use optimistic UI updates — show the result of an action immediately and revert only if the server call fails, rather than waiting for confirmation before updating the UI.

✗ Common mistakes

→Showing a spinner on a blank page — the user's eye has nowhere to land, attention drifts immediately, and the wait feels longer than it is.

→Waiting for a complete AI response before displaying anything — users who wait 4 seconds for text to appear are less engaged with it than users who spent 4 seconds reading as it streamed.

→Validating only on form submit — users discover all their errors simultaneously, after completing work they now have to redo, in a state of frustration rather than flow.

→Adding perceived speed tricks to genuinely slow systems instead of fixing the underlying performance — skeleton screens over a 10-second load help, but they don't replace actually being fast.