pragmatist

Patrick Joyce's Website

Agile With a Lowercase ‘A’

A few weeks ago I was listening to Episode 26 of the excellent podcast Debug where the guest was Flipboard co-founder Evan Doll. The whole conversation was interesting and is worth listening to. Around minute 56, the conversation turned to development methodologies and Evan said this:

I like to say that we’re sort of “Agile with a lowercase a” and I think that may be even overselling the amount of process that we have.

I immediately thought “he stole my line!”

“Agile with a lowercase ‘a’” is how I explained our process when I was recruiting engineers at LivingSocial, and it is how I described my values when I started interviewing for my next job.

I think it is an important concept.

The agile software movement began with a manifesto The manifesto is a statement of values, not a prescription for how to build software.

The people inspired by that manifesto developed a host of tools and techniques to help build software better. Things like Test Driven Development, Pair Programming, Daily Standups, Story Points, and Burndown Charts.

Those tools and techniques were then organized into methodologies like XP, Scrum, and Kanban.

I use and love many of those tools, but each of them is just that: a tool. They have their place, but there are also plenty of times when they are not appropriate.

And there isn’t anything wrong with those methodologies per se. Unfortunately, many people have forgotten the values that started the agile movement and instead placed blind faith in particular tools and methodologies.

“Agile” is an adjective. It is not a noun. It isn’t something you do, it is something you are.

Rigidly adhering to any set of rules is not agile. And that is why I try to keep the “a” lowercase.

Executable Comments: `say_with_time`

I have a rather well-documented distaste for comments.

There are times where you want explanatory information in code. One of my favorite ways to add that information while lessening the risk of it becoming stale is to replace a comment with executable code.

I was reminded of this recently (ed: I actually sketched this draft out 7 months ago) while working on a Rails migration. The code creates a join table and then adds some default data. My first pass looked something like:

1
2
3
4
5
6
7
8
9
10
11
12
def self.up
  create_table :post_category_mappings do |t|
    t.integer :post_id
    t.integer :category_id
  end

  # Populating initial post => category relationships
  post_category_mappings = YAML.load_file("#{RAILS_ROOT}/db/initial-post_category-mapping.yml")
  post_category_mappings.each do |category_id, post_ids|
    Category.find(category_id).post_ids << post_ids
  end
end

That isn’t the best code I’ve ever written, but its reasonable. The comment functions as a section heading. It tells you that we’re done with the normal work of a migration (schema changes), something different is happening now, and you should pay attention.

Migrations also don’t typically change after they’re checked in, so its really unlikely that the comment will fall out of sync with the code.

I don’t hate that comment.

However, Rails provides a nice mechanism for converting that comment into code—and getting some bonus functionality: say_with_time

1
2
3
4
5
6
7
8
9
10
11
12
13
def self.up
  create_table :post_category_mappings do |t|
    t.integer :post_id
    t.integer :category_id
  end

  say_with_time "Populating initial post => category relationships" do
    post_category_mappings = YAML.load_file("#{RAILS_ROOT}/db/initial-post_category-mapping.yml")
    post_category_mappings.each do |category_id, post_ids|
      Category.find(category_id).post_ids << post_ids
    end
  end
end

We preserve the section-header benefit of calling attention to a logical block of code; plus we now also get pretty output and timing information when the migration runs.

This is an improvement.

This is just one minor way in which you can replace comments with code, but every time I come across say_with_time it makes me smile.

Update (2013-01-31)

I should note that mixing data population into your migrations is generally A Bad Idea™.

If the population portion of the migration fails—and it can happen due to differences in dev / QA vs. production—your DB will be left in an inconsistent state. You won’t be able to run rake db:rollback as the migration hadn’t completed so the “current” version will be the previous version. You also won’t be able to run the migration again because the table will have already been created so running the up migration will fail.

You’ll be forced to mess around with the DB directly to restore things. We use migrations to avoid this.

You could make a fairly compelling argument for wrapping all migrations in transactions… but that introduces a whole host of other complexities.

say_with_time is still awesome, but you should probably put the data population code in its own migration—or leave it out of migrations entirely.

Installing Erlang R16B on Max OSX 10.8.4 Using Homebrew - Undefined Symbols for Architecture X86_64: ‘___sync_val_compare_and_swap_1’

Last week, I upgraded my personal laptop from a 2007 15” Macbook Pro to a brand, spanking new 13” Air.

I’ve been playing around with Elixir recently, so set about installing Erlang and Elixir with Homebrew.

Unfortunately, compilation of Erlang failed. In the end the solution was simple, but it took a little more googling for the error than I expected so I’m centralizing the solution here.

This is the error I was getting when running brew install -v erlang-r16.

Undefined symbols for architecture x86_64:
  "___sync_val_compare_and_swap_1", referenced from:
      _ethr_dw_atomic_cmpxchg in libethread.a(ethr_atomics.o)
      _ethr_dw_atomic_cmpxchg_ddrb in libethread.a(ethr_atomics.o)
      _ethr_dw_atomic_cmpxchg_rb in libethread.a(ethr_atomics.o)
      _ethr_dw_atomic_cmpxchg_wb in libethread.a(ethr_atomics.o)
      _ethr_dw_atomic_cmpxchg_acqb in libethread.a(ethr_atomics.o)
      _ethr_dw_atomic_cmpxchg_relb in libethread.a(ethr_atomics.o)
      _ethr_dw_atomic_cmpxchg_mb in libethread.a(ethr_atomics.o)
      ...
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [/private/tmp/erlang-h2cL/otp-OTP_R15B03-1/bin/i386-apple-darwin13.0.0/beam.smp] Error 1
make[2]: *** [opt] Error 2
make[1]: *** [smp] Error 2
make: *** [emulator] Error 2

Solution: Compile using GCC

I eventually found the solution on the Elixir mailing list where none other than Dave Thomas was having the same problem.

Erlang compiles without issue if you use GCC instead of Clang. But first you need to make sure you have GCC installed.

Here are the specific commands I ran to install GCC and Erlang:

brew tap homebrew/dupes
brew install apple-gcc42
brew tap homebrew/versions
brew install -v --use-gcc erlang-r16

That took care of installing Erlang for me and then I could processd to brew install elixir without any issue.

Great New Software: Calca

Calca is a hybrid Markdown editor and calculator. It’s awesome.

Markdown

First I should explain a few things about Markdown. I love Markdown. It is an “easy-to-read, easy-to-write plain text format” that converts to HTML.

Any writing I do longer than a paragraph is generally done in Markdown. This post was written in iA Writer. I compose notes and emails in markdown in TextMate 2.

Recently, I wrote a fairly formal 20-page report. I wrote in Markdown, then converted to LaTex using Pandoc, and then generated a PDF by applying the open source tufte-latex class. It was a much better writing experience than using a word processor and I was very pleased with quality of the output.

My friend Dave Copeland wrote his entire book in Markdown.

Markdown is great because it lets you organize and lightly format your thoughts, but by only providing very simple formatting you’re forced to focus on what you’re writing. There is no fiddling with fonts when writing something in Markdown. Because Markdown is plaintext it plays nice with version control you can use whatever text editor you prefer.

Markdown is a simple idea done well.

Markdown + Math = Calca

Calca takes Markdown and adds math.

An engineer named Frank Krueger made Calca to solve his own problem. As a fellow engineer I have similar problems. I imagine most people who would read my blog do too. This interview with Jason Brennan is an interesting look into Frank’s goals in building Calca.

Why Calca is Great

Here’s where Calca becomes useful: while writing I often want to do some basic math. I may be writing a story for an engineer to implement a new feature and need to justify why I’m asking them to do this work. I might write something like this:

1
2
3
4
5
6
7
8
9
10
11
For instance, if we were able to reduce cart abandonments by a small number we'll see a meaningful revenue lift

number of customers who abandoned cart last week = 1,395
average order value = $35
weekly revenue lift(conversion improvement) = number of customers who abandoned cart last week * conversion improvement * average order value 

weekly revenue lift(1%) => $488.25
weekly revenue lift(5%) => $2,441.25
weekly revenue lift(10%) => $4,882.5

* All numbers made up

In the past, I would write Markdown in a text editor and either do the calculation in my head (which is embarrassingly error prone), open up irb (more specifically a rails console from one of the applications I’m working on), or use the calculator function in Alfred.

Now I just write in Calca and the calculations happen as I type.

Alternatives

For the last couple of months I’ve been using Soulver for this type of calculation (based on Marco Arment’s near constant recommendations for it). Soulver was a big improvement over my previous “system”, but I still found myself fighting with it. The biggest problem was that Soulver still added a different file to edit. If I wanted to paste the calculations into a Markdown file—which I almost always did—I’d lose the dynamic benefits of Soulver. So while I liked Soulver, I still found myself slipping back to my old ways of writing Markdown in a text editor and doing calculations in another program.

Excel is another common choice for this type of ad hoc calculation. Excel is an extremely powerful piece of software. One of the guys on the mobile team at LivingSocial was in finance before joining HungryAcademy and it is amazing to watch him use Excel. But Excel has never felt natural to me.

Calcal feels natural.

Calca is (another) simple idea done well.

That is the highest praise I can give software.

Go buy it

Calca is $4.99 on the Mac App Store. There also is an iOS version available for $2.99 (universal iPhone and iPad).

That is an insanely low price for something so incredibly useful. Try it out.

Build It Twice

Good programmers hate duplication. They’re intelligently lazy. So as a general rule they try to avoid building the same thing twice.

However, this very reasonable desire often leads otherwise pragmatic developers to build over-abstracted, hard to maintain software.

Let’s take a real example. You’re an engineer working on an e-commerce site that currently accepts credit cards and you’re asked to add support for PayPal. You suspect you may have to integrate with similar systems in the future, so you start to think through how you would build a general solution for PayPal-like services.

You should avoid thinking about that now.

My advice when you’re building something new is to focus exclusively on the problem at hand. Don’t worry about how you’re going to need to build something similar in the future. Don’t try to build a general framework for solving that class of problem. Just build the best PayPal integration you can.

Most reasonable engineers can manage to suppress their urge to build a framework the first time they solve a problem. It’s when they see a similar problem that they get in trouble.

The trap

You built a great PayPal integration; customers are happy, conversion rates are up. A few months later, your VP of Product comes to you and asks you to add support for Amazon Payments. Surely now is the time to build that common base class or library?

I don’t think so.

When a similar project or requirement comes along it is very tempting to extract the parts of your first solution that apply to this one.

“This isn’t a science project, I’m extracting from working code!” you’ll think.

“But this will make maintenance so much easier” your programmer brain will scream.

“Think of how much less code I’ll have to write.”

I know this is how I start thinking when I see a problem that is similar to one I’ve already solved.

Stop.

You’re not ready to build a general version yet.

The trouble is that while you have solved the problem once, you still don’t know which parts of your program are general to the class of problem, and which were specific to that first project.

Build it again

If you try to extract a framework or library at this point you run the risk of creating something that is too specific to the first problem.

You’ll end up with a library that is so tailored to PayPal that it only awkwardly maps to other systems.

So solve the second problem the same way you did the first: by building the best solution to that specific problem that you can.

By all means, apply what you learned the first time, but don’t let that dictate what you build. If there is something different about this problem then solve it differently.

Third time’s the charm

A few more months go by. Both PayPal and Amazon Payments are working well. Your VP of Product comes back again, this time asking you to add Google Checkout.

Now you know enough to break out common code into a library or service. At this point you’ve solved similar problems twice and you’re thinking through the third case. This should give you a pretty good sense of what things are common to this type of problem and what is problem-specific.

You’ll also have solved two similar problems in similar—but somewhat different—ways. This will help you build a good solution to those common problems.

The rewards of patience

By waiting to build a framework you get to design from a position of experience: both with regards to the problem and how to solve it. This means you should end up with a better common solution than if you had created a framework at the beginning when you were just learning about the problem.

There’s another benefit, too. There’s a good chance that you’ll never be asked to solve those second and third similar problems. If those additional problems never come along you’re left with a solution perfectly tailored to your problem without the additional complexity of a generalized framework.

When you feel the itch to create a framework, wait. You and your code will be better off for it.

The Senior Software Engineer

Yesterday, my former colleague Dave Copeland published his second book: “The Senior Software Engineer”. Dave gave me an advance copy to review, so I’m fortunate to have already read it. I’m biased from working with Dave and seeing the quality and consistency of his work. That disclaimer aside, I think the book is great. Go buy it.

The premise of the book is that software isn’t valuable in and of itself but creates value through what it does. The responsibility of a senior engineer is to figure out how best to consistently create value.

The book is a distillation of the skills and techniques you need to progress from a job where you implement pre-defined specifications to someone who takes vague problem statements and creates demonstrable value.

You can guess which of those job descriptions I find more fun, interesting, and lucrative.

Who the book is for

Dave’s target audience are junior engineers looking to advance to positions of greater freedom and responsibility. The book is invaluable for that group. I wish someone had given me a copy of this book ten years ago. The relentless focus on results over “progress” is something it took me some time to learn.

That said, I think the book will also prove immensely valuable to senior developers, particularly those who are leading teams. I found several sections of the book clearly stated some of my strongly held but poorly articulated beliefs. In the future, I will definitely steal some of Dave’s examples when I’m coaching junior developers.

A word of caution

I am mildly concerned that certain sections of the book will come across as more dogmatic than I think Dave intended. Having worked with Dave I know that he is anything but dogmatic.

In one chapter Dave describes his process for implementing a minor feature. When written out, it may come across as somewhat ceremonious and rigid. However, I think that is just the result of committing a simple, lightweight process to paper. When written out in detailed form even making a peanut butter sandwich can look complicated.

He proposes you should internalize a simple process for fixing bugs or creating features that ensures you understand the problem, fix it quickly, and leave the code in a maintainable state. What he describes is a formalization of the steps most successful developers I know already follow implicitly.

As you read through the book, keep the central value of the book in mind: producing results is what matters. The internal processes Dave documents are tools for accomplishing that, not religious doctrine to be followed blindly.

Self-publishing

Dave had written a previous book (Building Awesome Command Line Applications in Ruby) published via the Pragmatic Programmers. This time he’s self publishing.

There is a buzzword from the bubble in the 90’s that I still find interesting: disintermediation. This is the process where the internet removes middlemen. So instead of going to a travel agent to book your flights you can go directly to Southwest.com.

The same thing has been happening in publishing. To their credit, the Pragmatic Programmers are a radically different type of publisher. They by all accounts have a better tooling for writing a book than other technical publishers, they pay fair royalties, and they have helped many engineers become authors.

Still, you’re now seeing people like Dave and Jesse Storimer from Shopify cutting out that even lighter middleman. In a world where you can build a medium-sized niche audience via blogging and twitter I’m not sure publishers are necessary.

A publisher does provide help with editing, type-setting, and distribution. But—particularly after going through the process once—you can hire editors, learn how to reasonably typeset a book, put up a website and take advantage of lightweight distribution mechanisms like digitaldeliveryapp.

Given the larger, built-in audience of the Pragmatic Programmers I imagine Dave would have sold some more copies of his book. But would he have sold twice as many copies? I’m not sure, and that is what he would need to do to cover PragProg’s very reasonable 50% commission (other technical publishers supposedly take 80-90%)

I’m intrigued to see how the self-publishing experiment works out for Dave. My completely uneducated guess is that self-publishing will prove more profitable. Although, this is still a technical book so I don’t think Stichfix is going to be in danger of losing Dave to a full-time writing career any time soon.

Go buy the book.

How Do You End Up With a Great Product a Year From Now?

Nail the next two weeks. 26 times in a row.

At the beginning of a startup there generally isn’t a lot of ceremony regarding planning. The founders agreed on some problem that needs solving and the engineers and designers get to work.

The problem could be something like “let’s build a thermostat that doesn’t suck” to “let’s help small distillers sell booze on the internet.”

But the many, many decisions and tasks that go into actually solving that problem aren’t mapped out.

I think this is because in the early stages everyone is fully aware of how little they know. You’re still learning about the problem, so why bother pretending you know what you need to do in a month, let alone a year?

What’s the plan?

This will change if you manage to build a product that gets some traction. The company will grow. Your investors will want more detail about where the product is going. Employees who aren’t engineers or designers will join and ask to know what’s coming. And the pressure to create detailed product development roadmaps will grow.

Resist this pressure.

There isn’t anything wrong with planning, per se. I don’t advocate thinking only of the short term. You should be able to articulate medium to long term goals for what you’re building. However, those goals will by definition fuzzy.

In my experience teams work best—and produce the most successful products—when they understand the problem they’re solving but have a maniacal focus on constant, incremental improvement.

Easier said than done

You need to define what success looks like, how you’re going to measure it, and make sure everyone working on the product knows what it is.

When we first started working on Daily Deals at LivingSocial there were 4 things we focused on:

  1. Were we able to sign up merchants to offer deals?
  2. How many subscribers were we acquiring?
  3. How many of those subscribers were becoming purchasers?
  4. Did those customers come back and buy again?

If we failed at any one of those things then we wouldn’t have a business. And if we nailed all four of those things then it looked like we’d have something.

Find what those four things are for your product. You may have only two or three key questions, but you shouldn’t have more than four. If you come up with six things then I don’t think you’ve honestly prioritized.

Prioritization and planning become much easier once you have those four agreed upon goals. Every proposed feature can be evaluated with “which of these four things will it help?” and if the answer is “none of them” then you get to stop talking about it.

To be clear, this doesn’t mean that every individual feature or test you run has to directly move one of those four metrics. Those four metrics are intentionally high level. It is going to be hard to move them. But there should be a plausible story connecting everything you build to one of those four goals.

As an example, you may want to work on improving the design of your email templates. There is a reasonable hope that better email templates will drive people to buy more. If we’re looking at the four things we cared about in the early days of LivingSocial, this feature is tied to question #3 (new customers) and questions #4 (repeat customers)

However, measuring a direct effect on purchasing from an email template change is hard. Because the base conversion rate is so low, you need a massive sample. So for a first test it may make sense to measure click rates. It is reasonable to assume that if you increase the number of people clicking to your site you’ll get more purchases.1

So don’t worry so much about roadmaps, or what you’re going to build in Q2 of next year. Instead, make sure you know what you need to do to be a successful business.

Then spend the next 2 weeks shipping things that will help those problems.

Then do it another 25 times.


  1. This isn’t always true, and you need to be careful that you don’t cheat. If you pick your top level problems correctly its hard to cheat on them. Revenue is revenue, and its hard to juke the stats when the stats consist of someone paying you. But the intermediate metrics can sometimes be manipulated, often unintentionally. For example, in the email example you could remove prices from your email. Now to find out how much something costs people have to click. This will probably increase clicks, but it no longer follows that purchase rate will necessarily increase because you’re generating less qualified clicks that may convert at a massively lower rate.

The Genius of Uber

Photo from Rob Nguyen on Flickr

The on-demand car-service company Uber has inspired much deserved admiration. Many new companies are trying to be the “Uber for X.” Most of them will fail, because most new companies fail. But I think many of them don’t even stand a chance because they don’t understand what makes Uber special.

Previously-Covered Misunderstandings

Jeff Morris, Jr. explored some of the reasons these companies are doomed a few weeks ago in a post on Medium.

His first point was that people starting “Uber for X” see the success Uber has had with a mobile-first strategy and think they must focus on an app. App-first works for Uber because they’re an inherently mobile offering; if you need a ride your often not sitting at your desk or on your couch—and even if you are—you still need to tell the driver where to pick you up.

What’s so special about car-services?

Uber has great execution. The app is simple to understand and the service is a pleasure to use.

However, Uber has been successful not only because of their great execution. They also picked a great market. The car service business was ideally suited for a company like Uber to upend.

Let’s look at the nature of the car service business pre-Uber:

1. Many, Independent Service Providers

Most car services are small firms with just a few cars. Many are single-car owner/driver operations. This means that Uber can’t be blocked by a single powerful gatekeeper.

Its the difference between selling consumer software versus selling enterprise software. Because there are many, small service provider no one car service company has the power to dictate terms to Uber.

2. Anonymous, Commoditized Transactions

With many other service providers there is a bit of a learning curve. With a cleaning service you’ll need to explain where everything goes and what you want done. After that initial setup, you’ll want to minimize future transaction costs by having (ideally) the same crew or (at least) the same company come back again.

In contrast, when I call for a car (or taxi) I don’t really care who the driver is. As long as they get me where I need to go quickly and comfortably I’ll be happy. Any car that meets Uber’s standards will do.

This anonymity is desirable to the drivers as well. At any given moment, the driver wants to find a fare that is close to where they are; not someone they happen to have picked up before.

This keeps incentives aligned between Uber and the drivers. Drivers won’t try to “steal” Uber customers by asking them to call directly in the future. On the other hand a cleaning service that is referred by an “Uber for Cleaning Services” will have an incentive to try to convince me to book directly with them in the future.

Companies like Cherry and Exec went so far as to hire staff to combat this, but growing and managing a workforce is a much harder (and less lucrative) problem than connecting buyers and sellers.

3. Idle Resources with Low Marginal Cost

Previously, car services were only booked in advance and by people willing to pay a significant premium. This means many car services had very low utilization rates. They would be booked for a few hours most nights, but some nights wouldn’t have any business.

Uber helps drivers find business for those times that they otherwise would be idle. Drivers love this as they can make more money without throwing away their established business.

4. Regulatory Inefficiencies

Car services (and taxis—which are Uber’s real competition) are a regulated industry. The number of taxis in New York city is determined—not by the free market—but by a commission. Whether taxis in DC will accept credit cards is decided by the government.

In pretty much every market Uber enters they are violating those regulations.

And Uber. Does. Not. Give. A. Fuck.

Uber is betting that before the regulators can shut them down they’ll be able to build enough goodwill with their customers to generate public pressure to keep them open. And so far they’ve been right.

What we should a learn from Uber

Be a marketplace, not a service provider. Hiring and training people to wash cars or clean houses doesn’t scale particularly well.

Choose your market carefully. Pick a market where your incentives align with your service providers.

Choose a product strategy that fits your market. If your chosen market is on-demand and location-centric then building a app first makes sense. On the other hand if your market requires research and scheduling appointments a web-first approach might be a better idea.

Be willing to make some enemies. Powerful entrenched players or government regulation can be a significant deterrent to enter a market. The good news is that if you’re willing to piss off the existing player or the government you’ll be rewarded with a valuable market with less competition than you would naturally expect.

Hey, ignoring regulations for as long as possible worked for PayPal.

The Slow Android Upgrade Curve Is a Real Problem

Tim Bray wrote a post on Tuesday agruing that Google Play Services means that the slow upgrade curve for Android “matters less and less for developers”. He concludes:

Yeah, if what you care about is new smoother glass and slicker chips and faster broadband, you’re still on the dessert schedule. But if what matters is what apps can do, you can pretty well ignore that Versions dashboard.

I can’t imagine how frustrating it must be to have worked on Android and see 45% of people stuck using a version that is more than 2 years old (particularly when more than 80% of iOS users are on the 5 month old iOS 6.) But to argue with a straight face that dealing with out of date Android versions isn’t a real problem for people who care about “what apps can do” is complete and utter bullshit.

The differences between 2.3 and 4.0 with regards to HTML and CSS support are very real. The performance differences for HTML rendering are very real (as Google brags about when talking about ICS)

I recently overheard a developer say “Android 2.3 is the new IE6” after fighting with a bug that only manifested in 2.3. Let’s put it this way: If you’re being compared to IE6 something has gone horribly wrong. Google Play Services seem to be a clever way to get around carriers and handset manufacturers refusing to issue updates. But if people in the Android world can’t see that it is a major problem for the most common version of Android to be over two years and two major releases out of date then the problem will never get solved.

Calculating Sample Sizes for AB Tests With Vanity (and R)

You want to run an AB test. How many participants do you need in your test?

As always, the answer is “it depends”. In this case, it depends on:

  1. What your base conversion rate is.
  2. How large of a difference you want to be able to detect.
  3. How concerned you are about Type I and Type II errors (false positives and false negatives)

There is no generic rule of thumb. Don’t trust any advice like “you want about 3,000 people in the test to be confident.” The correct sample size always depends on these 3 parameters for your specific test.

Basically:

  • The lower the base conversion rate the more participants you’re going to need
  • To detect smaller differences you’re going to need more participants
  • If you want to increase your confidence in your result, you guessed it, you’re going to need more participants.

How to calculate necessary sample size

If you know your base conversion rate and what size difference you wish to detect it is easy to calculate the necessary sample size using R.

> power.prop.test(p1=0.25, p2=0.275, power=0.8, alternative='two.sided', sig.level=0.05)

     Two-sample comparison of proportions power calculation 

              n = 4861.202
             p1 = 0.25
             p2 = 0.275
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

 NOTE: n is number in *each* group

So, in an ideal world you would run all tests as follows:

  1. Track your base conversion rate For example, 25% of people who reach a registration page successfully register.
  2. Agree on the size of the difference we want to detect We may only care about detecting relative differences of 10% or more (27.5% or better conversion using the example above)
  3. Decide on the desired significance level. This is the chance of a false positive. It is common to use 0.05 (which represents a 5% chance of a false positive)
  4. Decide on the desired statistical power. This is the chance of a false negative. It is common to use 0.80 (which means that if there is a difference there is a 20% chance we’ll miss it)
  5. Calculate the necessary sample size as described above. Using these examples we would need to have 4862 people in each group.
  6. Run the test until you have enough participants in both your control and treatment Don’t look at the results while the test is running
  7. End the test
  8. Analyze the test

Unfortunately, that isn’t how it normally goes in the real world:

  • We often don’t know what the baseline conversion is. Often times conversion rates for the control aren’t clearly tracked until you start the test. Sometimes, you’re unable to effecively baseline a conversion rate because it varies wildly. I have a little bit of experience dealing with optimizing ecommerce sites where inventory is only available for a limited time. The quality of inventory can have a large effect on the conversion rate, so it is very difficult to compare conversion rates across time.
  • Most AB Testing software provides real time results which make it easy to fall victim to repeated significance testing errors.

To combat these pitfalls we can use the control as an approximation of the true base conversion rate. Then we can use that as the base conversion rate to figure out how much longer we will need to run a test to detect a difference of the size demonstrated.

Further Reading

I am not a statistician. If you want to learn more go read Noah Lorang’s post about calculating sample sizes and Evan Miller’s explanation of repeated significance testing errors.