Non-Even AB Testing Splits with Vanity
A question recently resurfaced on the vanity-talk mailing list about setting up tests with non-even splits.
Wondering what people think about / how feasible it'd be to do the following:
Say I have a fairly experimental feature for my site that I'd like to quickly implement and test. But, it's really not ready for prime-time, even 50% of my users would be too much. Maybe I want it to be closer to 5% of my users, for various reasons (e.g. the software isn't ready for scale or I expect it to perform _worse_ but I want to ensure that's the case). I think it'd be great to have the definition of an alternative be able to take a percentage of users who should see each option and/or allow that to be changed via the dashboard.
As an added bonus, this could be a nice way to roll out new features to your site quietly, testing for production/scale bugs more gradually.
This isn’t a feature of Vanity but it can be achieved by overriding the
alternative_for method in your experiment definition like so:
Running an AB Test with different sized treatment and control groups isn’t necessarily wrong but I would urge caution. If you’re dealing with relatively low conversion rates (for instance purchase rates in e-commerce funnels) you might find yourself tempted to change the ratio when the test is running to get the smaller group up to a significant size. Do not do this. This will most likely invalidate your test in one of two ways:
- If there is any seasonality in your conversion (for instance higher conversion on weekends vs. weekdays) you will change the relative populations of users and will no longer be able to compare treatment vs. control
- If the test has already been running for an extended period of time you may also invalidate your test because of the difference in the relative age of customers in the treatment vs. control groups. Vanity doesn’t have a time limit on conversion. That is, if your customer becomes part of the experiment today, and converts a week later they still count as a conversion. People who entered into the experiment in the past had have more time to convert, so by changing your split you may flood one of the groups with newer users and make it impossible to directly compare the two groups.
I would suggest giving each alternative an equal split but only for a subset of your users. You could accomplish this segmentation by segmenting your user population before testing like this:
This also only puts 5% of the population in the treatment group. However, it also gives you roughly equal-sized control and treatment groups which I’ve found are much easier to understand and explain to non-technical stakeholders.
vanity_subgroup method also allows you to run multiple simultaneous tests while keeping each test segmented to a different subset of your user population to prevent one test from affecting the results of another.
More Articles on Software & Product Development
- Agile With a Lowercase “a”
- ”Agile“ is an adjective. It is not a noun. It isn’t something you do, it is something you are.
- How Do You End Up With A Great Product A Year From Now?
- Nail the next two weeks. 26 times in a row.
- Build it Twice
- Resist the urge to abstract until you've learned what is general to a class of problems and what is specific to each problem.