A/B testing is one of the tried and true methods of answering the question, “Which one is better?” Its use is critical in software, marketing, and various other facets of business. In an article for Harvard Business Review, Amy Gallo provides a comprehensive refresher on the way to use A/B testing for the best possible results, informed by Columbia University’s Kaiser Fung.
Test Your Assumptions
Gallo explains that the theory behind A/B testing has existed since the 1920s, long before it was being used to build websites. Yet websites are a typical domain of A/B testing now, so Fung uses websites as his example for describing it. A/B testing is about taking two different versions of the same thing and exposing two separate, randomized groups of people to one or the other version. In Fung’s example, size of a website’s “subscribe” button is the thing being compared in two different versions, and data is collected to see which version of the subscribe button gets clicked more often.
Randomization is the critical aspect of A/B testing because it creates fairness in the results even amid skewed conditions. For instance, Gallo explains that desktop versus mobile can affect whether a person chooses to click a subscribe button, but randomizing which users are in each group should mean a good batch of desktop and mobile users appear in each group. It is controlled chaos in that way.
Additionally, Fung acknowledges that most A/B tests are more complicated than testing just size of a button. Many would choose to do sequential tests to measure several related elements of the same thing individually (button size, then button color, then font choice, etc.), but Fung says statisticians have “debunked” the effectiveness of this method. Paradoxically, it is better to use this simple technique to conduct complex tests.
As for how to actually interpret the results of your test, well, the math is admittedly tricky:
Fung says that most software programs report two conversion rates for A/B testing: one for users who saw the control version, and the other for users who saw the test version. “The conversion rate may measure clicks, or other actions taken by users,” he says. The report might look like this: “Control: 15% (+/- 2.1%) Variation 18% (+/- 2.3%).” This means that 18% of your users clicked through on the new variation … with a margin of error of 2.3%. You might be tempted to interpret this as the actual conversion rate falling between 15.7% and 20.3%, but that wouldn’t be technically correct. “The real interpretation is that if you ran your A/B test multiple times, 95% of the ranges will capture the true conversion rate — in other words, the conversion rate falls outside the margin of error 5% of the time (or whatever level of statistical significance you’ve set),” Fung explains.
Gallo goes on to share three mistakes that people often make with A/B testing:
- Managers spend too little time watching the tests play out and select a winner too quickly.
- Managers track too many metrics and arrive at frivolous correlations.
- Companies do not do enough retesting.
Interestingly, all three of those problems seem to stem from forgetting to heed another simple question: “Could we be wrong?”
For additional elaboration on these ideas, you can view the original article here: https://hbr.org/2017/06/a-refresher-on-ab-testing