Since the initial boom of A/B testing’s popularity in the early 2000s, marketers have learned to apply actual science to marketing and took a lot of the guesswork out of how to get more conversions or purchases. However, after running your first A/B test, you will most likely find yourself presented with questions such as what is a conclusive result or what sample size is required?
2. Participation in this meetup is purely on personal basis and not representing any firm in any
form or matter. The talk is based on learnings from work across industries and firms.
3. Agenda
o About A/B testing
o Pitfalls and Solutions
o Testing Ideas
o Experiment Template
B
∀
4. What is A/B test?
is an experiment where
two or more variants are
shown to users at
random, and statistical
analysis is used to
determine which variation
performs better.
Allows us to understand
the causal impact of a
change
Users
Treatment
Group
Test Metric p2
Control Group
Test Metric p1
Is (p1-p2) statistically
significant?
5. What is A/B test?
Clinton Bush Haiti Fund
Source: Siroker et Koomen 2013
Control Treatment
11 % Increase in $
per pageview
6. When to do A/B testing
Method Description Inference Stage
Prototyping Developing prototypes
Guide a direction of the
product
Ideation
User Testing In-depth interviews Understanding of why Ideation – Development
Surveys / Feedback
Surveys/ Pop-up
questionnaires
Large numbers, a
thorough analysis
Development – Post
launch
A/B Testing Measuring experiments Statistical methods Pre-launch
10. Pitfall #1 Ignoring statistical significance
• Correlation is not causation.
• Do your improvements actually affect user behavior or are
the changes due to chance?
• At the end of the experiment do we just pick a variation
that has better metrics?
11. Pitfall #1 Ignoring statistical significance
Source: Annie Ward , Mildred Murray-Ward 1999
o Statistical significance is a probability that a change is not
due to chance alone
12. Solution to Pitfall #1
Sample size in
each group
The desired
power (e.g.
.84 for 80% ).
The desired
level of
statistical
significance
(e.g. 1.96 for
95 %).
A measure of
variability
Effect size
(the
difference in
proportions)
n =
2 ̅% (1 − ̅%)(*+,- + *//1)1
(%+ − %1)1
Source: Altman 1991
Simple formula for difference in proportions
14. Pitfall #2 Not having a workflow for testing
o Choosing non-business related metrics as proxies
o Doing a little analysis in understanding the current user behaviour
o Not formulating a hypothesis before testing
o Testing if green buttons increase conversion rates
o Spending precious time and traffic on random ideas
15. Solution to Pitfall #2
What is
Success?
Plan
Hypothesis
Funnel
Diagnosis
Test
Measure
Results
16. Pitfall #3 Not prioritizing experimentation roadmap
o Taking too small risks (local maxima)
o Impacting too little users
o Running experiments that don’t produce a strategic value
o Not estimating designer’s or developer’s workload
o Time spent for coordinating experiments
17. Solution to Pitfall #3
Effort High
Low
High
Impact
Low
Do it!
Forget it
Reach
Uplift
Strategic
CoordinateTechCreative
If resources
are available
18. Solution to Pitfall #3
Potential Importance Ease (PIE) Framework by Chris Goward
https://widerfunnel.com/pie-framework/
Time, Impact Resources framework by Bryan Eisenberg
https://www.bryaneisenberg.com/3-steps-to-better-prioritization-and-faster-execution/
Impact Confidence Ease (ICE) Framework by Sean Ellis
https://tech.trello.com/ice-scoring/
19. Testing ideas to start
Website content. Do users prefer to scroll down the page or click through to
another page to learn more?
Headline copy. Do users prefer headlines that are straightforward, abstract,
goofy, or creative?
Media. Do users prefer to see auto-play or click-to-play video?
Funnels. Do users prefer to have more information on one page or the
information spread across multiple pages?
Social. What social proof users need: brands you work with, testimonials from
other users or influencers?
Pricing. Do you users prefer free trial vs moey-back guarantee?
20. Experiment template
What is Success?
e.g. Conversion rate
Qualitative
What user research insights supports the
decision to create an experiment
Hypothesis
If ___ then ___ due to ___
Proposed Change
E.g. Show FAQ page after registering
Results
What happened?
Another experiment?
Need to clean-up?
Audience Segment
e.g. Free Trial users
Quantitative
What analytics data supports the decision to
create an experiment
Sample Size
Sample Size and Duration