A/B testing

What is A/B testing?

A/B testing shows two versions of something to two random groups and compares a metric - clicks, sign-ups, sales - to see which performs better. Done right it settles design arguments with evidence; done hastily it manufactures false winners from noise.

Also known as: A/B test, split testing

The demo

Two versions of a sign-up button, tested head to head. Add visitors a batch at a time and watch the rates - then try declaring a winner early, and see whether the lead was ever real.

Variant A

0 visitors

Variant B

0 visitors

Add some visitors to begin.

Watch how the early "winner" behaves as the sample grows.

What this demo shows (text version)

A simulated A/B test comparing two sign-up buttons whose true conversion rates are actually identical. Adding visitors in batches of 100 (split between the two) updates each variant's conversion rate. With only a small number of visitors, random chance usually makes one variant look clearly ahead.

As more visitors are added, the two rates converge, because they were never really different. A "declare a winner" button reports whether the current difference is statistically significant - which, with equal true rates, it almost never is. The lesson is that early leads in an A/B test are usually noise, so you must fix the sample size in advance and avoid stopping the moment the numbers look good.

Early on, one version looked like a clear winner - and if you'd called it then, you'd have shipped a difference that was pure chance. Pour in more visitors and the gap melts into the noise, because the two were really the same all along. That gap between an early lead and a real result is the whole discipline of A/B testing.

The cardinal sin is peeking and stopping early. Small samples swing wildly, so a test checked after fifty visitors will often show a dramatic "winner" that vanishes by five thousand. Decide your sample size and duration before you start, and don't stop the moment the numbers look good - that's how noise gets promoted to a decision.

Test things that matter, and only one change at a time. A/B testing is superb at optimising within an idea (which button wording, which layout) and useless at telling you whether the idea was right - for that you need the qualitative work, like [usability testing](/entries/usability-testing/). And a "win" on a metric that doesn't reflect real value is just a tidier way to be wrong.