January 3, 2013

Non-Even AB Testing Splits with Vanity

A question recently resurfaced on the vanity-talk mailing list about setting up tests with non-even splits.

Hey guys,

Wondering what people think about / how feasible it'd be to do the following:

Say I have a fairly experimental feature for my site that I'd like to quickly implement and test. But, it's really not ready for prime-time, even 50% of my users would be too much. Maybe I want it to be closer to 5% of my users, for various reasons (e.g. the software isn't ready for scale or I expect it to perform _worse_ but I want to ensure that's the case). I think it'd be great to have the definition of an alternative be able to take a percentage of users who should see each option and/or allow that to be changed via the dashboard.

As an added bonus, this could be a nice way to roll out new features to your site quietly, testing for production/scale bugs more gradually.

Thoughts?

This isn’t a feature of Vanity but it can be achieved by overriding the alternative_for method in your experiment definition like so:

ab_test "your_ab_test" do
  description "Your AB test"

  metrics :conversions
  alternatives "control", "treatment"

  # Returns an index into the list of alternatives.
  #
  # The original alternative_for picks between alternatives evenly using
  #
  #   Digest::MD5.hexdigest("#{name}/#{identity}").to_i(17) % @alternatives.size
  #
  # This uses the same deterministic algorithm to put 95% in the control and 5% in
  # the treatement group. You can adjust the conditional to get different splits.
  def alternative_for(identity)
    Digest::MD5.hexdigest("#{name}/#{identity}").to_i(17) % 20 == 0 ? 1 : 0
  end
end

Caution

Running an AB Test with different sized treatment and control groups isn’t necessarily wrong but I would urge caution. If you’re dealing with relatively low conversion rates (for instance purchase rates in e-commerce funnels) you might find yourself tempted to change the ratio when the test is running to get the smaller group up to a significant size. Do not do this. This will most likely invalidate your test in one of two ways:

If there is any seasonality in your conversion (for instance higher conversion on weekends vs. weekdays) you will change the relative populations of users and will no longer be able to compare treatment vs. control
If the test has already been running for an extended period of time you may also invalidate your test because of the difference in the relative age of customers in the treatment vs. control groups. Vanity doesn’t have a time limit on conversion. That is, if your customer becomes part of the experiment today, and converts a week later they still count as a conversion. People who entered into the experiment in the past had have more time to convert, so by changing your split you may flood one of the groups with newer users and make it impossible to directly compare the two groups.

I would suggest giving each alternative an equal split but only for a subset of your users. You could accomplish this segmentation by segmenting your user population before testing like this:

def vanity_subgroup
  Digest::MD5.hexdigest("#{vanity_identity}").to_i(16) % 10
end

def in_your_ab_test?
  vanity_subgroup == 0 && ab_test(:your_ab_test) == "treatment"
end

ab_test "your_ab_test" do
  description "Your AB test"

  metrics :conversions
  alternatives "control", "treatment"
end

if in_your_ab_test?
  # treatment
else
  # control
end

This also only puts 5% of the population in the treatment group. However, it also gives you roughly equal-sized control and treatment groups which I’ve found are much easier to understand and explain to non-technical stakeholders.

The vanity_subgroup method also allows you to run multiple simultaneous tests while keeping each test segmented to a different subset of your user population to prevent one test from affecting the results of another.

pragmatist
Patrick Joyce

Non-Even AB Testing Splits with Vanity

Caution

More Articles on Software & Product Development