8. 2. Have Data, Choose
Metrics
To test, you need:
• People using your product
• (Approximate) agreement on the metrics
that matter
@mike_greenfield
9. Not Many Users? Don’t A/B test!
• Laserlike, has ~60 users and has never
run an A/B test
• We will run many, many tests when we
have enough users
• A test should have at least a few hundred
instances (and a lot more if effect sizes
are likely to be small)
• Test iff you can have “business
significance”
@mike_greenfield
10. Know What You Want to Optimize
• If it’s important, you should be running
tests to improve it
• If it’s not important, spend time on other
things
• Most tests should be aimed at improving
1-2 specific variables
@mike_greenfield
11. 3. Have Clear Process, Tech
for Testing
@mike_greenfield
12. A/B Testing Process
• New feature: if possible, roll out to a small
test subset first (10s or 100s of thousands)
• Version change: always test things that
could (cumulatively) have business impact
• Everyone on the product team should be
running and resolving tests
@mike_greenfield
13. A/B Testing Tech
• Using a third party testing service is akin to
building your site on Wordpress: great at
some scales/competency levels
• No matter how you’re testing, a new test
should be at most a few lines of code
• It should be easy to see how each side of a
test compares across many variables
@mike_greenfield
15. Process: Same vs. New Tweak
• What’s the probability your tweak will have
a positive effect?
• What kind of effect might that have, and
how might that effect change the
company’s prospects?
• Will you be able to measure the change?
• Optimize on one variable, but look at
others
@mike_greenfield
16. Process: Same vs. Big Change
• What’s the probability that your change will
have a negative impact?
• How big an impact might there be?
• Will you be able to measure the change?
• Holistic approach
@mike_greenfield
17. A/B Test for Quality
• Circle of Moms: test “warning” users when
questions seemed short, low quality
• Resulting questions were graded for quality,
without grader knowing test bucket
• End result: warning yielded ~5% fewer questions,
but much higher quality
@mike_greenfield
19. Resolving Too Soon vs. Resolving
Too Late
• How big is the potential audience for this
test?
• Example 1: end of year “most popular
baby names” email that will never be sent
again
• Example 2: Facebook signup flow
@mike_greenfield
20. Longitudinal Tests vs. Immediate
Tests
• Longitudinal: change home page, email
frequency, product framing
• Need to examine effect over a long period
• Immediate: change button color, email
subject
• Likely that long-term effects will be
minimal
@mike_greenfield
21. Automatically Resolve Tests?
• Longitudinal tests should not be
automatically resolved
• Example: new home page design
• Immediate tests can be automatically
resolved when speed is important and
there is one clear objective function
• Example: Circle of Moms email subject
optimization
@mike_greenfield
22. Choose robust statistics
• Bad: # of page views
• Good: % of users viewing at least [5, 25,
100] pages
• Potentially bad: # of sales (when small)
• Potentially good: # of people getting
through the second step of a sales funnel
@mike_greenfield