SlideShare uma empresa Scribd logo
1 de 20
A/B testing
Shlomo Lahav
The problem

Measuring the effect of multiple alternatives
on the performance over a given population.

2
Performance

A list of objective measurements

3
Possible solutions

• A model that describe the results and
evaluates the marginal effect of the
alternatives
• Test the alternatives side by side while all
the rest is equal

4
Example

• the problem: Testing two different layouts
of a web page (A and B)
•
•
•
•

Population: visitors/visits
Performance: conversion rate
Alternatives: two different layouts
Objective: the find the better layout and
asses the performance difference

5
What does it mean all the rest being equal

• Fairness: for every member in the
population, the probability to be allocated
to A is the same.
• For each member, any other decisions is
independent with the test allocation (A/B).
• Observations are independent

6
Population: Visitor vs. visit
Population
Visitor

Visitor

Visit

Measurement
Visit conversion
rate
Lifetime
conversions per
visitor
Visit conversion
rate

Issues
Independency is
violated

A visitor may be
exposed to both A
and B (in different
visits)

7
Errors

• When we compare a test alternative to the
control alternative
• False Positive – Calling the test to be the
winner by mistake
• False Negative – calling the control to be
the winner by mistake

8
When do we end the test

• After a predefined period/observations.
• When the difference is significant

9
What does it mean all the rest being equal

• Fairness: for every member in the
population, the probability to be allocated
to A is the same.
• For each member, any other decisions is
independent with the test allocation (A/B).
• Observations are independent

10
Example

• We want to test two alternatives and
select the better one.
• The results are: CR(A)=9.21%,
CR(B)=11.93%. The win of B is statistical
significant (p-value<5%).
• We need to estimate the gain of B vs. A.
• Is our estimate of 2.72% a fair estimate?

11
Results
p-value

Rate

Actual

A

B

Gain B
over A

10.00%

11.00%

1.00%

B wins

5%

92.5%

9.21%

11.93%

2.72%

A wins

5%

7.5%

13.71%

7.61%

-6.10%

B wins

1%

98.5%

9.59%

11.43%

1.84%

A wins

1%

1.5%

14.94%

7.05%

-7.89%

12
Selection bias

• An AB test is conducted between A1,
A2,…,An
• After the test is completed, we select Ak.
• Should we expect Ak to perform as it did
during the test?
• Does the test outcome (the rank of k)
affects our expectation?

13
What else can go wrong?

• Independency is not maintained (traffic,
changes etc.)
• The fairness is handled by random
allocation. This can be biased due chance
• The significance level is usually higher
than planned (continues evaluation) which
results in a higher false positive.

14
How to control the traffic split?

• By percentage or round robin?
• Can we change the split?

15
Another example

• Need to test two design layouts in multiple
location, while each location has a
different conversion rate.
• Different populations – use lifts and
accumulate the lifts.
• How do we calculate the lift: A over B or B
over A?

16
lifts
A

B
8%
10%

10%
8%

Average

Lift B over A Lift A over B
25%
-20%
-20%
25%
2.5%
-2.5%

17
Change in split - Simpson ‘s paradox

New

Returning

A

B

CR(A)

CR(B)

CR(A)

6%

15%

CR(B)

5%

14%

Weekday

80%

20%

90%

10%

7.80%

6.80%

Weekend

10%

90%

50%

50%

14.10%

13.10%

10.05%

12.05%

total

18
Can we remove alternatives

• Start with 3 alternatives (equal split)
• Remove one

start

0

0

0.5

0.5

1

1

modify

0

0

0

1

1

1

19
Multiple tests

• Is it valid to run multiple AB tests
simultaneously?

20

Mais conteúdo relacionado

Destaque

Map machinery
Map machinery Map machinery
Map machinery cvt2go
 
游戏运营(第二讲)
游戏运营(第二讲)游戏运营(第二讲)
游戏运营(第二讲)www.emean.com
 
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van Vastgoed
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van VastgoedDe Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van Vastgoed
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van VastgoedNetherlands Enterprise Agency (RVO.nl)
 
Internet Filtering In South Korea
Internet Filtering In South KoreaInternet Filtering In South Korea
Internet Filtering In South Koreamichroeder
 
Creating Compelling Videos2
Creating Compelling Videos2Creating Compelling Videos2
Creating Compelling Videos2GregTuke
 
How to deploy rpd and catalog without enterprise manger
How to deploy rpd and catalog without enterprise mangerHow to deploy rpd and catalog without enterprise manger
How to deploy rpd and catalog without enterprise mangerRavi Kumar Lanke
 
Suntronic Solsys Resist Dielectric Pv Products Feb2011
Suntronic Solsys Resist Dielectric Pv Products Feb2011Suntronic Solsys Resist Dielectric Pv Products Feb2011
Suntronic Solsys Resist Dielectric Pv Products Feb2011stu99dwn
 
Official Final CSP slideshow
Official Final CSP slideshowOfficial Final CSP slideshow
Official Final CSP slideshowlangevinm14
 
Innovation In Medical Care
Innovation In Medical CareInnovation In Medical Care
Innovation In Medical Caresirlkm
 
Recent Developments in Compensation Analysis
Recent Developments in Compensation AnalysisRecent Developments in Compensation Analysis
Recent Developments in Compensation AnalysisThomas Econometrics
 
W.K. Kellogg Foundation: Workforce Composition
W.K. Kellogg Foundation: Workforce CompositionW.K. Kellogg Foundation: Workforce Composition
W.K. Kellogg Foundation: Workforce CompositionW.K. Kellogg Foundation
 

Destaque (20)

2016 07 efw sap functional short
2016 07 efw sap functional short2016 07 efw sap functional short
2016 07 efw sap functional short
 
Map machinery
Map machinery Map machinery
Map machinery
 
MakerFaire Tokyo 2014, Yantra 3.0 Nepal, Aki Party in Shenzhen
MakerFaire Tokyo 2014, Yantra 3.0 Nepal, Aki Party in ShenzhenMakerFaire Tokyo 2014, Yantra 3.0 Nepal, Aki Party in Shenzhen
MakerFaire Tokyo 2014, Yantra 3.0 Nepal, Aki Party in Shenzhen
 
游戏运营(第二讲)
游戏运营(第二讲)游戏运营(第二讲)
游戏运营(第二讲)
 
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van Vastgoed
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van VastgoedDe Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van Vastgoed
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van Vastgoed
 
Internet Filtering In South Korea
Internet Filtering In South KoreaInternet Filtering In South Korea
Internet Filtering In South Korea
 
Creating Compelling Videos2
Creating Compelling Videos2Creating Compelling Videos2
Creating Compelling Videos2
 
How to deploy rpd and catalog without enterprise manger
How to deploy rpd and catalog without enterprise mangerHow to deploy rpd and catalog without enterprise manger
How to deploy rpd and catalog without enterprise manger
 
Tevii
TeviiTevii
Tevii
 
Hilversum Media Campus John Leek 160414
Hilversum Media Campus   John Leek  160414Hilversum Media Campus   John Leek  160414
Hilversum Media Campus John Leek 160414
 
Corporate Websites
Corporate WebsitesCorporate Websites
Corporate Websites
 
Suntronic Solsys Resist Dielectric Pv Products Feb2011
Suntronic Solsys Resist Dielectric Pv Products Feb2011Suntronic Solsys Resist Dielectric Pv Products Feb2011
Suntronic Solsys Resist Dielectric Pv Products Feb2011
 
Sonicview
SonicviewSonicview
Sonicview
 
Onderzoek CO2 reductiepotentieel Duurzaam Inkopen kantoorgebouwen
Onderzoek CO2 reductiepotentieel Duurzaam Inkopen kantoorgebouwenOnderzoek CO2 reductiepotentieel Duurzaam Inkopen kantoorgebouwen
Onderzoek CO2 reductiepotentieel Duurzaam Inkopen kantoorgebouwen
 
Official Final CSP slideshow
Official Final CSP slideshowOfficial Final CSP slideshow
Official Final CSP slideshow
 
Antech
AntechAntech
Antech
 
How To Use Green View With On Par
How To Use Green View With On ParHow To Use Green View With On Par
How To Use Green View With On Par
 
Innovation In Medical Care
Innovation In Medical CareInnovation In Medical Care
Innovation In Medical Care
 
Recent Developments in Compensation Analysis
Recent Developments in Compensation AnalysisRecent Developments in Compensation Analysis
Recent Developments in Compensation Analysis
 
W.K. Kellogg Foundation: Workforce Composition
W.K. Kellogg Foundation: Workforce CompositionW.K. Kellogg Foundation: Workforce Composition
W.K. Kellogg Foundation: Workforce Composition
 

Semelhante a How can A/B testing go wrong?

Multiple regression to findout drivers of online satisfaction
Multiple regression to findout drivers of  online satisfactionMultiple regression to findout drivers of  online satisfaction
Multiple regression to findout drivers of online satisfactionSomdeep Sen
 
A Introduction To A-B Test
A Introduction To A-B TestA Introduction To A-B Test
A Introduction To A-B Testyihucha
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference BerlinTom Capper
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonTom Capper
 
A B testing introduction.pptx
A B testing introduction.pptxA B testing introduction.pptx
A B testing introduction.pptxAhmed Khaled
 
Data-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PMData-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PMProduct School
 
Res 342 final exam
Res 342 final examRes 342 final exam
Res 342 final examnbvyut9878
 
Res 342 final exam
Res 342 final examRes 342 final exam
Res 342 final exammn8676766
 
You should test that: How to use A/B testing in product design
You should test that: How to use A/B testing in product designYou should test that: How to use A/B testing in product design
You should test that: How to use A/B testing in product designKelley Howell
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely
 
RES 342 Final Exam
RES 342 Final Exam RES 342 Final Exam
RES 342 Final Exam heightly
 
RES 342 Final Exam Answers
RES 342 Final Exam AnswersRES 342 Final Exam Answers
RES 342 Final Exam Answersheightly
 
Res 342 Final
Res 342 FinalRes 342 Final
Res 342 Finalheightly
 
How to know the impact of changes on audience reach - User and partner confer...
How to know the impact of changes on audience reach - User and partner confer...How to know the impact of changes on audience reach - User and partner confer...
How to know the impact of changes on audience reach - User and partner confer...AT Internet
 
Podium_20190115TRB
Podium_20190115TRBPodium_20190115TRB
Podium_20190115TRBXiaoyu Guo
 
Webinar: Common Mistakes in A/B Testing
Webinar: Common Mistakes in A/B TestingWebinar: Common Mistakes in A/B Testing
Webinar: Common Mistakes in A/B TestingOptimizely
 
Drippler's A/B test library
Drippler's A/B test libraryDrippler's A/B test library
Drippler's A/B test libraryNir Hartmann
 

Semelhante a How can A/B testing go wrong? (20)

The Finishing Line
The Finishing LineThe Finishing Line
The Finishing Line
 
Multiple regression to findout drivers of online satisfaction
Multiple regression to findout drivers of  online satisfactionMultiple regression to findout drivers of  online satisfaction
Multiple regression to findout drivers of online satisfaction
 
A Introduction To A-B Test
A Introduction To A-B TestA Introduction To A-B Test
A Introduction To A-B Test
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference Berlin
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference London
 
A B testing introduction.pptx
A B testing introduction.pptxA B testing introduction.pptx
A B testing introduction.pptx
 
Data-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PMData-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PM
 
Res 342 final exam
Res 342 final examRes 342 final exam
Res 342 final exam
 
Res 342 final exam
Res 342 final examRes 342 final exam
Res 342 final exam
 
You should test that: How to use A/B testing in product design
You should test that: How to use A/B testing in product designYou should test that: How to use A/B testing in product design
You should test that: How to use A/B testing in product design
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with Statistics
 
Ab testing
Ab testingAb testing
Ab testing
 
RES 342 Final Exam
RES 342 Final Exam RES 342 Final Exam
RES 342 Final Exam
 
RES 342 Final Exam Answers
RES 342 Final Exam AnswersRES 342 Final Exam Answers
RES 342 Final Exam Answers
 
Res 342 Final
Res 342 FinalRes 342 Final
Res 342 Final
 
How to know the impact of changes on audience reach - User and partner confer...
How to know the impact of changes on audience reach - User and partner confer...How to know the impact of changes on audience reach - User and partner confer...
How to know the impact of changes on audience reach - User and partner confer...
 
Podium_20190115TRB
Podium_20190115TRBPodium_20190115TRB
Podium_20190115TRB
 
Webinar: Common Mistakes in A/B Testing
Webinar: Common Mistakes in A/B TestingWebinar: Common Mistakes in A/B Testing
Webinar: Common Mistakes in A/B Testing
 
Drippler's A/B test library
Drippler's A/B test libraryDrippler's A/B test library
Drippler's A/B test library
 
Significance Tests
Significance TestsSignificance Tests
Significance Tests
 

Mais de LivePerson

Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafkaLivePerson
 
Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL IntroductionLivePerson
 
Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformLivePerson
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data PlatformLivePerson
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die() LivePerson
 
Resilience from Theory to Practice
Resilience from Theory to PracticeResilience from Theory to Practice
Resilience from Theory to PracticeLivePerson
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It LivePerson
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015 LivePerson
 
Http 2: Should I care?
Http 2: Should I care?Http 2: Should I care?
Http 2: Should I care?LivePerson
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsLivePerson
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices LivePerson
 
Functional programming with Java 8
Functional programming with Java 8Functional programming with Java 8
Functional programming with Java 8LivePerson
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]LivePerson
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonLivePerson
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern ApplicationLivePerson
 
Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API LivePerson
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolLivePerson
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceLivePerson
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...LivePerson
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceLivePerson
 

Mais de LivePerson (20)

Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafka
 
Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL Introduction
 
Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platform
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data Platform
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
 
Resilience from Theory to Practice
Resilience from Theory to PracticeResilience from Theory to Practice
Resilience from Theory to Practice
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
 
Http 2: Should I care?
Http 2: Should I care?Http 2: Should I care?
Http 2: Should I care?
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websockets
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices
 
Functional programming with Java 8
Functional programming with Java 8Functional programming with Java 8
Functional programming with Java 8
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern Application
 
Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP Protocol
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 

Último

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

How can A/B testing go wrong?

  • 2. The problem Measuring the effect of multiple alternatives on the performance over a given population. 2
  • 3. Performance A list of objective measurements 3
  • 4. Possible solutions • A model that describe the results and evaluates the marginal effect of the alternatives • Test the alternatives side by side while all the rest is equal 4
  • 5. Example • the problem: Testing two different layouts of a web page (A and B) • • • • Population: visitors/visits Performance: conversion rate Alternatives: two different layouts Objective: the find the better layout and asses the performance difference 5
  • 6. What does it mean all the rest being equal • Fairness: for every member in the population, the probability to be allocated to A is the same. • For each member, any other decisions is independent with the test allocation (A/B). • Observations are independent 6
  • 7. Population: Visitor vs. visit Population Visitor Visitor Visit Measurement Visit conversion rate Lifetime conversions per visitor Visit conversion rate Issues Independency is violated A visitor may be exposed to both A and B (in different visits) 7
  • 8. Errors • When we compare a test alternative to the control alternative • False Positive – Calling the test to be the winner by mistake • False Negative – calling the control to be the winner by mistake 8
  • 9. When do we end the test • After a predefined period/observations. • When the difference is significant 9
  • 10. What does it mean all the rest being equal • Fairness: for every member in the population, the probability to be allocated to A is the same. • For each member, any other decisions is independent with the test allocation (A/B). • Observations are independent 10
  • 11. Example • We want to test two alternatives and select the better one. • The results are: CR(A)=9.21%, CR(B)=11.93%. The win of B is statistical significant (p-value<5%). • We need to estimate the gain of B vs. A. • Is our estimate of 2.72% a fair estimate? 11
  • 12. Results p-value Rate Actual A B Gain B over A 10.00% 11.00% 1.00% B wins 5% 92.5% 9.21% 11.93% 2.72% A wins 5% 7.5% 13.71% 7.61% -6.10% B wins 1% 98.5% 9.59% 11.43% 1.84% A wins 1% 1.5% 14.94% 7.05% -7.89% 12
  • 13. Selection bias • An AB test is conducted between A1, A2,…,An • After the test is completed, we select Ak. • Should we expect Ak to perform as it did during the test? • Does the test outcome (the rank of k) affects our expectation? 13
  • 14. What else can go wrong? • Independency is not maintained (traffic, changes etc.) • The fairness is handled by random allocation. This can be biased due chance • The significance level is usually higher than planned (continues evaluation) which results in a higher false positive. 14
  • 15. How to control the traffic split? • By percentage or round robin? • Can we change the split? 15
  • 16. Another example • Need to test two design layouts in multiple location, while each location has a different conversion rate. • Different populations – use lifts and accumulate the lifts. • How do we calculate the lift: A over B or B over A? 16
  • 17. lifts A B 8% 10% 10% 8% Average Lift B over A Lift A over B 25% -20% -20% 25% 2.5% -2.5% 17
  • 18. Change in split - Simpson ‘s paradox New Returning A B CR(A) CR(B) CR(A) 6% 15% CR(B) 5% 14% Weekday 80% 20% 90% 10% 7.80% 6.80% Weekend 10% 90% 50% 50% 14.10% 13.10% 10.05% 12.05% total 18
  • 19. Can we remove alternatives • Start with 3 alternatives (equal split) • Remove one start 0 0 0.5 0.5 1 1 modify 0 0 0 1 1 1 19
  • 20. Multiple tests • Is it valid to run multiple AB tests simultaneously? 20