pragmatist
Patrick Joyce

February 9, 2025

The Time We Ran a Super Bowl Ad

I’ve got a story to tell.

It’s been 14 years, so I think we’re past the statute of limitations on this one. I’m also pretty sure nothing here qualifies as proprietary anymore.

One disclaimer: I’m working from 14-year-old memories, so I might get some of the details wrong, but I’m pretty sure I’ve got most of the story right.

The Setup

It’s late January 2011. I’m at LivingSocial. We’ve got a 12-person engineering team, $175 million freshly raised from Amazon, and are in the midst of the daily deals arms race with Groupon. They’re bigger than us, but we’re growing faster. We’ve just pulled off the largest one-day flash sale in history, selling 1.3 million Amazon gift cards in 24 hours. It took some heroic efforts to keep the site up, but we did it.

Then one day, our CTO grabs a couple of us and says:

“We’re going to run a Super Bowl ad.”

Cool.

“I’ve talked to people who’ve done this before. Here’s the pattern: there’s an insane burst of traffic for 30 to 60 seconds, then a gradual taper to a manageable level. That first minute, though? We’re talking 10–100x what we saw during the Amazon deal. We need to make sure we stay up.”

Less cool.

The Problem

We needed to:

  1. Handle the traffic surge without taking the whole site down.
  2. Make sure people could sign up for our email list and buy deals.

And we had some serious constraints:

  1. Our hardware capacity was fixed. This was 2011. We were hosted on actual physical servers at Rackspace. No AWS, no GCP. Scaling meant calling someone to order hardware and waiting for it to be racked. We’d already pushed Rackspace’s capacity about as far as it could go.
  2. Avoid saturating our network. We were legitimately worried about saturating Rackspace’s network ingress so needed to keep as much traffic as possible at the CDN.
  3. We had about 10 days.

The Plan

We started kicking ideas around. I remember being inspired by how The New York Times handled live updates election night traffic with a Rails app that generated static HTML pages and pushed them all into their CDN with a short TTL.

Step 1: Make the Site Static

We decided to pre-render the entire site and serve it as static HTML through Akamai with a 5-minute TTL. One of my coworkers built a Rake task to generate static snapshots of every key page.

That covered browsing, but we still had two critical features that ultimately required a persistent write:

  1. Email subscriptions – We needed to capture email addresses and cities.
  2. Purchases – We needed to process orders and credit cards.

Normally, our architecture looked something like this:

Akamai → F5 → nginx → unicorn workers (Rails) → Resque/MySQL

But even with the static site, we knew our unicorn workers couldn’t handle another 10x of our peak from the Amazon deal—even if all they were doing was dropping requests into Redis for background processing.

Step 2: Lean on Nginx and Log Everything

Then someone (I wish I remembered who) had an idea:

“Nginx is a beast. Writes to disk are fast. What if we just log all POST params directly in nginx logs, then process them asynchronously later?”

I built a parser to read nginx logs and queue background jobs for:

  • Vaulting credit cards with Braintree
  • Creating accounts
  • Logging purchases

We tested the hell out of it on staging. Then we did a small production run during off-peak hours. Everything worked.

Game Day

On Super Bowl Sunday, we were all in the office tired but full of adrenaline.

20 minutes before the ad aired, we regenerated the static site and tested it on a subdomain. Everything looked good.

2 minutes before the ad aired, we flipped the switch. The live site became a static site. Right before doing this, we cleared the nginx logs so they wouldn’t contain test data.

The ad ran.

Traffic spiked. No 500s. Everything looked good at the load balancer.

I was SSH’d into our production application servers and checking the logs.

Nothing.

The log files were empty.

My stomach dropped.

The Mistake

Nginx buffers logs in memory before writing them to disk, so there is often a slight delay before the logs are written to disk. But we should have been seeing data.

We weren’t.

I raised the alarm. We started digging. Then I heard a string of curses.

When we “cleared” the logs before the ad ran, we deleted the log files instead of truncating them.

Under normal circumstances, this wouldn’t be a huge deal—you just restart nginx, and it creates new log files. But we’d forgotten to restart nginx. So it kept an open file descriptor for the deleted file kept writing to a file that no longer existed, meaning everything was effectively going to /dev/null.

We’d just lost all the traffic data from those critical minutes.

The Aftermath

Within five minutes, traffic had dropped to a manageable level, so we switched back to the normal site.

But the logs were lost to the ether.

There was nothing to be done.

I poured myself a very large bourbon. My coworker who built the static site generator poured himself a very large tequila.

By halftime, I’d left the office to meet my wife at a Super Bowl party.

The (Sort of) Silver Lining

The traffic spike wasn’t quite as big as we’d feared or hoped—it was close to, but not as big as—our peak during the Amazon deal. Ironically, we could have just run on our normal system.

Losing that five-minute window cost us a few thousand purchases and subscriptions, which wasn’t great, but wasn’t catastrophic. Super Bowl ads aren’t really direct response plays—they’re brand plays. The purchases would have been a bonus.

The ad itself? Not great. I saw it for the first time the night before. Doesn’t age well and even at the time I didn’t feel great about making a trans person a punchline. Still, it managed to be get less backlash than Groupon’s, and we did get a ton of earned media mostly just about the fact that we were big enough to run a Super Bowl ad.

Would I call it a good return on ad spend? No. But screwing up five minutes of logging didn’t materially change the outcome.

The Takeaway

Even after all these years, I’m still incredibly proud of what we built. Five people, 10 days, and a system that could have handled 100x our normal traffic.

It should have worked.

It would have worked.

If not for you meddling kids failing to properly rotate nginx logs.

And it would have worked, too

More Articles on Software & Product Development

Agile With a Lowercase “a”
”Agile“ is an adjective. It is not a noun. It isn’t something you do, it is something you are.
How Do You End Up With A Great Product A Year From Now?
Nail the next two weeks. 26 times in a row.
Build it Twice
Resist the urge to abstract until you've learned what is general to a class of problems and what is specific to each problem.