How many A/B tests should you run? What are other key learnings?

This is the last part of my four part series on A/B testing. Part 1, Part 2, and Part 3 are linked here.

Should A/B testing be limited to two different versions of an idea? Is there an optimal number?

A/B testing doesn’t need to be limited to two ideas at all. You can certainly have multiple variants of a specific idea. If you have a couple of different versions of a similar test, and you can run them easily against one another, that’s fine. The key is that you’re not changing too many variables at once. I’ll use a very simple example. If you’re going to test both button color and button copy at the same time, realize that those are two distinct and different variables and you’re going to need to control for that when you run tests.

The same is true with the core user experience test. If you’re testing changes to the user experience and you’re changing a lot of variables across the two different variants that you’re running, you might not be able to pinpoint what caused one or the other to do better. So it’s important when you’re running tests to think about those elements. Some times that really matters. Other times, if you’re just trying to establish a new baseline or a new core user experience, you’re okay with taking the risk on two wildly different variants for a new core user experience or for a new feature. So it comes down to what risk do you want to take in terms of understanding what’s causing the effect and also how well you understand the customer.

What’s the most surprising thing you’ve ever found out from an A/B test?

I would actually say that you get a lot of surprising findings with A/B tests. Sometimes it’s that something you’ve run multiple times and that has always worked no longer works. For example, I used to always assume that less fields was better on sign-up forms. And then over a succession of different companies, I had a realization that it actually really depends on what data you’re trying to collect and then what those fields indicate about the long-term retention of that customer.

I also used to have an assumption that a single page form was better than multi-screen form, but actually we’ve been seeing that escalation of commitment, where you enter more data on each successive page actually can have higher conversion rates. You see this with survey products like Typeform or other signup forms, where you collect one piece of data at a time. This can actually result in a higher completion rate.

The most important thing to remember is go into tests expecting to be surprised, but not expecting anything to particularly win. And go in with a mindset of learning and being ready to adapt what you’ve done before, because what you’ve done before may no longer work. Making assumptions based on what you’ve seen work previously is a really good way to end up not being successful when you’re doing A/B testing or when you’re trying to figure out how to optimize and improve your product. Listen to what customers are telling you, both in person and through the data. It’s going to be different product to product, especially as you encounter different types of customers.

What’s one last piece of feedback you’d give on A/B testing or testing in general?

I think when we look to get data about people, and in general when I’ve run different A/B teams, what you’re always trying to do is identify at the outset what is the core metric that you’re trying to move. And so that is key to starting and running any A/B test. If you don’t know what metric you’re trying to move and how it ladders up into your overall KPIs or OKRs or whatever system you’re using, then you really need to question why am I running this A/B test. At companies like Showmax and One Kings Lane, we spent a lot of time running A/B tests that were all about optimizing conversion rate, email open rate, and retention rates. And I think we did that because those were core metrics to the business, so we focused A/B testing on things that were core to the overall business and the company focus that we cared about.

I think that’s what matters. It really matters that you find and try to move a metric that is going to be meaningful in the grand scheme of the business. If you’re just moving vanity metrics that don’t matter, that’s a problem. One other key thing I would add is try to track those metrics down to the level of value. What you’ll often see is you’ll improve email clicks or email opens, and you might even improve visits, but you don’t see it actually translate to revenue, where you don’t see more total orders or total purchases coming from that.

It’s really important when you run these tests that you don’t just move the top line metrics, but that you also track, if you can, to the bottom line metrics, things like revenue, user acquisition, conversion rates. If you’re not moving those and you’re just adding more users to one more step in the funnel, then you need to go back and think, okay, I’ve moved people one step, but now I’ve got to keep moving them through my funnel. How do I do that? Or do you then start to question, do I have the right users actually coming to the product?