Made with 🧠 and πŸ«€ by Youssef Bouksim

Back to library
🎰

Variable Rewards

Variable ratio reinforcement creates the strongest engagement loops.

5 min readProduct Β· UX Β· Ethics

In the 1950s, psychologist B.F. Skinner put rats in boxes with levers. When a rat pressed the lever and always got a food pellet, it pressed the lever when it was hungry and stopped when it wasn't. Predictable reward produced predictable behaviour. Then Skinner changed the box: sometimes pressing the lever produced a pellet, sometimes it didn't -- on a random schedule the rat couldn't predict. The rat started pressing the lever compulsively. It pressed it more often than the always-rewarded rats. And when the food stopped coming entirely, it kept pressing far longer, as if each press might still be the one that produces the pellet.

This is variable reinforcement -- the most powerful schedule of reward known to behavioural psychology. Unpredictability doesn't reduce the appeal of a reward. It amplifies it. The brain responds to uncertain rewards with a stronger dopamine response than to certain ones -- the anticipation of a possible reward activates the reward system more intensely than the certainty of receiving it.

Skinner's rats were pressing levers in 1950. Today, billions of people pull down on their phone screens to refresh their social media feeds. The gesture is almost identical. The mechanism is exactly the same. Whether the feed will contain something interesting, funny, moving, or rewarding is unpredictable -- and that unpredictability is not a design flaw. It's the design.

✦ Three things to know
βœ“
Unpredictability is the mechanism, not a side effect. Variable reward systems are deliberately engineered. The ratio of rewards to actions, the timing of positive feedback, the visual and audio design of the reward moment -- these are tuned carefully. Social media platforms A/B test notification timing, notification content, and the order of content in feeds specifically to maximise the frequency and intensity of the variable reward experience. This is not accidental. It is the product.
βœ“
The reward must be genuinely valuable sometimes. Variable rewards only work if the reward is occasionally real. A feed that never has anything interesting produces extinction -- the behaviour stops. A feed that's always interesting produces boredom -- the variability disappears. The sweet spot is an unpredictable mixture of good content and neutral content, where the user can never be sure which they'll get next. This ratio is what makes the pull-to-refresh gesture so powerful: sometimes the new content is something you're glad you saw. You can't know in advance when that will be.
βœ“
The mechanism doesn't know the difference between good and bad habits. Variable rewards produce compulsive checking behaviours regardless of whether the underlying activity is beneficial to the user. Checking email, pulling social feeds, opening apps -- these are all compulsive behaviours produced by the same mechanism, and the compulsiveness doesn't increase because the activity is valuable. A product that uses variable rewards to get users to check something useful is using the same mechanism as one that uses it to drive mindless scrolling. The ethics differ. The psychology doesn't.
β€œThe variable ratio schedule produces the highest rate of responding and the greatest resistance to extinction.”
β€” B.F. Skinner, 1938

Feel the difference -- click both buttons repeatedly

The fastest way to understand variable rewards is to feel them. Below are two buttons. One gives you a predictable outcome every single time. One gives you an unpredictable one. Click each button several times in a row and notice which one you want to keep clicking -- and which one you feel done with after the first click.

Predictable reward -- you always know what you get
Inbox3 spam
Clean up inbox
Remove spam messages
Tap the button above
Like a vending machine -- you always know exactly what you'll get.
Variable reward -- you never know what you will find
Feed
Check feed
Pull to see what's new
Tap the button above
Like a slot machine -- you never know what you'll find until you look.

The pull-to-refresh gesture on social media works exactly like the variable button. You pull down and release -- not knowing if what appears will be something worth seeing. Sometimes it is. Often it isn't. But the possibility that it might be is enough to keep the behaviour going. The predictable button stops feeling compelling the moment you know what it does. The variable one doesn't -- because you never quite know.

The pull-to-refresh gesture itself is neutral. It's the variability of what the gesture reveals that determines whether it produces compulsive use or deliberate use. A feed that updates unpredictably with content of variable quality teaches the brain to keep pulling. A reading list that updates on a fixed schedule with content you deliberately saved doesn't activate the same loop -- because there's no uncertainty to resolve.


How platforms engineer the checking behaviour

You got 6 likes on a post today. How those 6 likes are delivered to you -- all at once or one at a time -- determines how many times you open the app. Platforms know this. Most choose drip delivery. Below is the same 6 likes, two ways.

Drip delivery -- 6 app opens
Notifications
Today
S
Sara Kim liked your post
9:02 AM
M
Marco Reyes liked your post
9:47 AM
D
Design Weekly liked your post
11:13 AM
U
UX Thoughts liked your post
1:05 PM
L
Lila Park liked your post
2:31 PM
A
Ana Torres liked your post
4:18 PM
6 likes = 6 app opens
Each one interrupts you separately
Each notification is a potential reward. You open each time because you don't know what it is.
Batched delivery -- 1 app open
Notifications
Evening Digest
6 people liked your post
Today, 6:00 PM digest
S
M
D
U
L
A
Sara Kim, Marco Reyes, and 4 others liked your post about design systems.
Earlier this week
3 new replies
Yesterday digest
6 likes = 1 app open
You check when you want to
Same information, delivered once. You check when you want to, not every time the platform wants you to.

Notification dripping is one of the most documented dark patterns in social platform design. Multiple internal studies at major platforms have shown that releasing notifications individually rather than batching them increases the number of times users open the app -- sometimes by 2--3x for the same underlying engagement events. The engineering is straightforward. The ethical question is whether a product that exists to serve users should optimise for the number of times it interrupts them, or for the quality of value delivered per interruption.


Honest vs exploitative use

Variable rewards aren't inherently manipulative. A well-written newsletter is a variable reward -- sometimes it's exceptional, sometimes it's fine, and that variability is part of why you keep subscribing. A conversation with a smart person is a variable reward. Discovery in any form -- browsing a bookshop, exploring a city, trying a new restaurant -- involves variable rewards. The mechanism doesn't determine the ethics. The intent does.

The ethical question is: is the variability honest? Does the product vary because the underlying content genuinely varies in quality and relevance? Or is variability being engineered -- content held back and released in patterns designed to maximise compulsive checking -- independent of the content's value? The first is a feature of engaging products. The second is an exploitation of a neurological mechanism that bypasses the user's own preferences about how much time they want to spend.

βœ“ Apply it like this
β†’Let content vary naturally -- genuinely different content at different times produces honest variable rewards without engineering the unpredictability artificially.
β†’Batch low-priority notifications -- group social engagement updates into scheduled digests rather than dripping them to maximise re-opens.
β†’Provide natural stopping points -- "you're all caught up" states, session time limits, and content endings give users permission to stop without FOMO.
β†’Be transparent about the mechanism -- some apps now show "you've been here for 30 minutes" prompts. This respects the user's autonomy to decide how much variability-seeking is serving them.
βœ— Common mistakes
β†’Notification dripping -- releasing engagement events one by one rather than batched, specifically to maximise app-opens per event.
β†’Infinite scroll with no end state -- removing natural stopping points so users can never feel "done" and must actively decide to stop rather than naturally reaching an end.
β†’Algorithmic content ordering for maximum uncertainty -- showing content in sequences engineered to maximise session time rather than relevance or value to the user.
β†’Variable reward mechanics in products for children -- the ethical weight of these mechanisms is highest where the target users have the least ability to recognise and resist them.

Skinner, B. F. (1938). The Behavior of Organisms. Appleton-Century-Crofts. Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1), 1--27. Alter, A. (2017). Irresistible: The Rise of Addictive Technology and the Business of Keeping Us Hooked. Penguin Press. Eyal, N. (2014). Hooked. Portfolio/Penguin.