SlideShare uma empresa Scribd logo
1 de 62
Baixar para ler offline
Lean Experimentation
 How to leverage online experiments in research and practice


                         Thomas Høgenhaven
                        Twitter: @thogenhaven



                      Cornell IS Breakfast Talk
                         April 4th, 2012
Friday, April 6, 12
Agenda




                      1. Conducting Online Experiments
                      2. Experimentation Literature
                      3. Experimentation in SMEs and Government Today

                      4. Lean Experimentation




Friday, April 6, 12
I               Conducting Online Experiments




Friday, April 6, 12
The Why Bother Question



                      “While some social scientists engage in small-scale
                      controlled experimentation with dozens of users or
                      groups, the capacity to perform large-scale interventions
                      with thousands of users opens up new opportunities for
                      research."


                                         (Preece and Schneiderman 2009: 25).




Friday, April 6, 12
What I Mean With Online Experiments




                      In online experiments, we are interested in examining
                      online behavior. Not just using the internet as a means
                      to examine offline behavior.




Friday, April 6, 12
What I Mean With Online Experiments


     Users



     Independent            Variation   Variation    Variation
     variable                  A            B            n


     Dependent               Online      Online       Online
                            Behavior
     variable               Behavior    Behavior     Behavior


    Statistical                         Difference
    test

Friday, April 6, 12
The High-Level Experimental Process




                                    Thomke 1998: 745.


Friday, April 6, 12
Example: Experimentation At Microsoft



                      Guess which one performs better, in each of these 8 pairs.




                  Anyone getting 6/8 right,
                  wins a t-shirt




Friday, April 6, 12
Experimenting At Microsoft



1 A                   B   5 A             B                   Which one is significantly better?
                                                              [] A
                                                              [] B
2          A          B   6 A              B                  [] None of them



3 A                   B   7 A             B

4 A                   B   8 A             B


                                Kohavi et al (2009): Online Experimentation at Microsoft


Friday, April 6, 12
Experimenting At Microsoft
                      0 / 200 Microsoft employees
                      got more than 5 / 8 answers right



                      1 A                  B                5 A                          B

                      2   A                B                6 A                          B

                      3 A                 B                 7 A                          B

                      4 A                  B                8 A                          B

                              Kohavi et al (2009): Online Experimentation at Microsoft

Friday, April 6, 12
What Is The Effect Of Experiments?


                       Improvement                 No Effect                                Disimprovement




                                     33%                                       33%




                                                        33%




                                 Kohavi et al (2009): Online Experimentation at Microsoft

Friday, April 6, 12
Is That Just Microsoft Being Microsoft?



                      No. Estimating effects of changes is incredible hard.




                                           Netflix considers 90% of what they try to
                                           be wrong.




Friday, April 6, 12
It’s Actually Hard To Predict




                                https://whichtestwon.com/past-tests


Friday, April 6, 12
2                  Experimental Literature




Friday, April 6, 12
Current Experimental Framework in HCI


                                      Psychology &
                                    Social Psychology

                            Experimental methodology
                                   literature




                                      HCI


Friday, April 6, 12
Offline And Online Experiments




                      • Psychology literature sometimes uses the internet to study
                      human behavior

                      • But it does not use the internet to study the internet




Friday, April 6, 12
For example...




                                     No mentions of experimentation
                                     in online environments




                      2010

Friday, April 6, 12
Offline And Online Experiments




                                  Laboratory   Field


                      Offline


                      Online




Friday, April 6, 12
Offline And Online Experiments




                                  Laboratory   Field      Psychology
                                                          covers this

                      Offline


                      Online




Friday, April 6, 12
Offline And Online Experiments




                                  Laboratory   Field      Psychology
                                                          covers this

                      Offline

                                                          But not this
                      Online




Friday, April 6, 12
The Research There Is, Is Not Systematic




        "To the extent of our knowledge, no research has so far been
        reported on treating online test design and implementation in a
        systematic manner"


                                              (Cámara and Kobsa 2009: 18).




Friday, April 6, 12
Online Experiments In Academia




            CHI and CSCW use experiments all the time - but more can be
            invested in methodology literature.


            This will help explore possibilities and limitations of online
            experimentation




Friday, April 6, 12
3                  Experimentation In SMEs And
                       Government Agencies Today




Friday, April 6, 12
State Of The Art In Industry Today




                      • Experimentation is increasing

                      • At least 25 different software vendors
                        • $0 - $320,000 a year*




                                             *Source: whichmvt.com

Friday, April 6, 12
Practice Has Its Own Literature




Friday, April 6, 12
Website Experiments


                      Several ways to conduct experiments
                      1. Server-side / Client-side
                      2. A/B Test / Multivariate Test




Friday, April 6, 12
Not Overly Expensive Software
                                                           Just 2 out of
                                                           25+ vendors




               Google Website Optimizer   Visual Website Optimizer
                        (free)              ($600 - $3000 / year)



Friday, April 6, 12
A/B/n Experiment


     Users                    Javascript



     Independent       Webpage    Webpage      Webpage
     variable            A          B            n


     Dependent
     variable
                       Behavior    Behavior    Behavior



    Statistical test              Difference


Friday, April 6, 12
Google Website Optimizer




Friday, April 6, 12
Limitations Of Mainstream Experimental Software




                      1. Limited to between-subject design

                      2. Lack of data export

                      3. No control over statistical test

                      4. Expensive coding necessary




Friday, April 6, 12
Limitation 1: Limited To Between Subject Design




                         • Cannot control for individual differences (No such data
                         is collected / made available)

                         • Requires more experimental subjects

                         • No pre-experimental data is collected




Friday, April 6, 12
Limitation 2: Lack of Data Export




Friday, April 6, 12
Google Website Optimizer: Data Export




Friday, April 6, 12
Visual Website Optimizer




Friday, April 6, 12
Visual Website Optimizer: Data Export




Friday, April 6, 12
Software Limitations: Data Export



                      • Some software better than other

                      • No data on individual users

                      • No segmentation on background variables
                        • This might be the biggest problem, as this is where
                        many significant results lie.




Friday, April 6, 12
Limitation 3: No Choice Between Statistical Tests




                                     Okay?




Friday, April 6, 12
Statistical Test = Chance To Beat Original

                      “The chance to beat original ... displays the
                      probability that a combination will be more the
                      successful than the original version.


                      When numbers in this column are high, perhaps
                      around 95%, that means a given combination is
                      probably a good candidate to replace your
                      original content.


                      Low numbers in this column mean that the
                      corresponding combination is a poor candidate
                      for replacement.”



                      http://support.google.com/websiteoptimizer/bin/answer.py?hl=en&answer=55944
Friday, April 6, 12
Visual Website Optimizer Is More Transparent




                        “ Visual Website Optimizer uses z-tests for both A/
                        B tests and multivariate tests”


                        Standard Error (SE) = Square root of (p * (1-p) / n)




   http://visualwebsiteoptimizer.com/split-testing-blog/what-you-really-need-to-know-about-mathematics-of-ab-split-testing/
Friday, April 6, 12
z-tests


                                                                 We don’t know if
                                                                  data fits this




                      • Focus on a single parameter
                      • Assumes parametric assumptions are met




Friday, April 6, 12
Limitation 4: Coding Required

                                                             Have to
     Users                      Javascript                  be coded



     Independent         Webpage    Webpage      Webpage
     variable              A          B            n


     Dependent
     variable
                         Behavior    Behavior    Behavior



    Statistical test                Difference


Friday, April 6, 12
Software Limitations: Expensive Coding




                              We already coded it, so we
                              can as well keep it. I hate
                               working for no reason




Friday, April 6, 12
Software Limitations: Expensive Coding




                       I knew this wouldn’t work!
                          We should never have
                          spent resources on it...




Friday, April 6, 12
The Challenge




                      1. Overcome methodological limitations of experimental
                      software
                      2. Reduce development costs
                      3. Explore possibilities and limitations of online experimentation




Friday, April 6, 12
4                  Lean Experimentation




Friday, April 6, 12
Test Environment


     Users




     Independent         Proxy         Proxy         Proxy
     variable              A             B             n


     Dependent         Behavior on   Behavior on   Behavior on
     variable
                        Behavior
                         website       website       website



    Statistical test                 Difference


Friday, April 6, 12
Proxies For Experimentation

                      Website                       Email




                      Survey                         Ads




Friday, April 6, 12
Comparative Advantages And Disadvantages




Friday, April 6, 12
Lean Experimentation Principles




                      1.Test assumptions, ideas, and theories

                      2. Test before coding, not after

                      3. Test in the field




Friday, April 6, 12
1. Test Assumptions, Ideas, And Theories




Friday, April 6, 12
2. Test Before Coding, Not After



                      Ideas                                  Bad Idea

                                                            Good Idea



                      Experimentation




                      Implementation



Friday, April 6, 12
3. Test In The Field




                      • Identical design patterns have different effects in different
                      contexts
                        • E.g. social comparison information in respectively
                        competitive and cooperative communities

                      • Cocktail effects are largely unknown




Friday, April 6, 12
Requirements Of Lean Experimentation




                      1. Independent groups

                      2. Random assignment

                      3. Allows tracking




Friday, April 6, 12
Why Use Proxies For Experimentation?




Friday, April 6, 12
Test Environment




                      • Manipulates the independent variable through a proxy
                      • Examines dependent variable in natural field environment




Friday, April 6, 12
Test Subjects




                      • Existing users (when using website, email, and survey)
                      • Potential users (when using advertisements)




Friday, April 6, 12
Proposed Usage and limitations




                Good for                 Less suited for
                • Ideas                  • Small changes
                • Theories               • Graphical changes
                • Hypothesis
                • Features

                                                    Can be useful if
                                                  testing assumptions




Friday, April 6, 12
Data Output


                      • Mixed sources that need to be combined
                        • Open / CTR rates from proxy
                        • Web analytics
                        • SQL databases




Friday, April 6, 12
Durability of Proxy Experiment is short

                                                            Email experiment


                               Control        Experimentation

                      16


                      12


                       8


                       4


                       0
                        Wk0          Wk1         Wk2             Wk3



Friday, April 6, 12
Buy In Needed

                      Hard to sell




                                      1. Making changes on websites

                                      2. Sending Emails

                                      3. Conducting Surveys

                                      4. Running Ads




                       Easy to sell

Friday, April 6, 12
Feedback Quality

               Critical feedback




                                1. Wireframes / early stage development



                                2. Finished / Nearly finished stages




                  Not so critical
                    feedback
Friday, April 6, 12
Influence On Decisions




                      Increased likelihood of impact when getting
                      experimental effect data early




Friday, April 6, 12

Mais conteúdo relacionado

Semelhante a Lean Experimentation

How do we make science better?
How do we make science better?How do we make science better?
How do we make science better?Christian Bokhove
 
Reproducibility, preregistration, etc.: Making good science even better
Reproducibility,  preregistration, etc.: Making good science even betterReproducibility,  preregistration, etc.: Making good science even better
Reproducibility, preregistration, etc.: Making good science even betterAlex Holcombe
 
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016Fiona Nielsen
 
Greg Wilson - We Know (but ignore) More Than We Think
Greg Wilson - We Know (but ignore) More Than We ThinkGreg Wilson - We Know (but ignore) More Than We Think
Greg Wilson - We Know (but ignore) More Than We Think#DevTO
 
Internet mediatedresearch edirisingha_draft
Internet mediatedresearch edirisingha_draftInternet mediatedresearch edirisingha_draft
Internet mediatedresearch edirisingha_draftPalitha Edirisingha
 
Survey Analytics for Behavioural Change
Survey Analytics for Behavioural ChangeSurvey Analytics for Behavioural Change
Survey Analytics for Behavioural ChangeCheeTung Leong
 
Introductory Online Controlled Experiments
Introductory Online Controlled ExperimentsIntroductory Online Controlled Experiments
Introductory Online Controlled ExperimentsBowen Lee
 
PopcornFlow: Continuous Evolution Through Ultra-Rapid Experimentation
PopcornFlow: Continuous Evolution Through Ultra-Rapid ExperimentationPopcornFlow: Continuous Evolution Through Ultra-Rapid Experimentation
PopcornFlow: Continuous Evolution Through Ultra-Rapid ExperimentationClaudio Perrone
 
Directions for completing this assignmentIn this assignment, .docx
Directions for completing this assignmentIn this assignment, .docxDirections for completing this assignmentIn this assignment, .docx
Directions for completing this assignmentIn this assignment, .docxduketjoy27252
 
Evidence Hub Activity
Evidence Hub ActivityEvidence Hub Activity
Evidence Hub ActivityDoug Clow
 
You don't want to do it like THIS
You don't want to do it like THISYou don't want to do it like THIS
You don't want to do it like THISChris Willmott
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015William Gunn
 
Putting the science in computer science
Putting the science in computer sciencePutting the science in computer science
Putting the science in computer scienceFelienne Hermans
 
Ten Simple Rules for Open Access Publishers
Ten Simple Rules for Open Access PublishersTen Simple Rules for Open Access Publishers
Ten Simple Rules for Open Access PublishersPhilip Bourne
 

Semelhante a Lean Experimentation (20)

How do we make science better?
How do we make science better?How do we make science better?
How do we make science better?
 
Reproducibility, preregistration, etc.: Making good science even better
Reproducibility,  preregistration, etc.: Making good science even betterReproducibility,  preregistration, etc.: Making good science even better
Reproducibility, preregistration, etc.: Making good science even better
 
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
 
1325 keynote kohavi
1325 keynote kohavi1325 keynote kohavi
1325 keynote kohavi
 
Reproducibility
ReproducibilityReproducibility
Reproducibility
 
Greg Wilson - We Know (but ignore) More Than We Think
Greg Wilson - We Know (but ignore) More Than We ThinkGreg Wilson - We Know (but ignore) More Than We Think
Greg Wilson - We Know (but ignore) More Than We Think
 
Internet mediatedresearch edirisingha_draft
Internet mediatedresearch edirisingha_draftInternet mediatedresearch edirisingha_draft
Internet mediatedresearch edirisingha_draft
 
A/B tests
A/B testsA/B tests
A/B tests
 
Open Learning Analytics
Open Learning AnalyticsOpen Learning Analytics
Open Learning Analytics
 
Survey Analytics for Behavioural Change
Survey Analytics for Behavioural ChangeSurvey Analytics for Behavioural Change
Survey Analytics for Behavioural Change
 
Introductory Online Controlled Experiments
Introductory Online Controlled ExperimentsIntroductory Online Controlled Experiments
Introductory Online Controlled Experiments
 
PopcornFlow: Continuous Evolution Through Ultra-Rapid Experimentation
PopcornFlow: Continuous Evolution Through Ultra-Rapid ExperimentationPopcornFlow: Continuous Evolution Through Ultra-Rapid Experimentation
PopcornFlow: Continuous Evolution Through Ultra-Rapid Experimentation
 
Directions for completing this assignmentIn this assignment, .docx
Directions for completing this assignmentIn this assignment, .docxDirections for completing this assignmentIn this assignment, .docx
Directions for completing this assignmentIn this assignment, .docx
 
Let's test
Let's testLet's test
Let's test
 
ATD2K16
ATD2K16ATD2K16
ATD2K16
 
Evidence Hub Activity
Evidence Hub ActivityEvidence Hub Activity
Evidence Hub Activity
 
You don't want to do it like THIS
You don't want to do it like THISYou don't want to do it like THIS
You don't want to do it like THIS
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
Putting the science in computer science
Putting the science in computer sciencePutting the science in computer science
Putting the science in computer science
 
Ten Simple Rules for Open Access Publishers
Ten Simple Rules for Open Access PublishersTen Simple Rules for Open Access Publishers
Ten Simple Rules for Open Access Publishers
 

Último

Clean Mobility Options Program by Sarah Huang
Clean Mobility Options Program by Sarah HuangClean Mobility Options Program by Sarah Huang
Clean Mobility Options Program by Sarah HuangForth
 
Human Resource Practices TATA MOTORS.pdf
Human Resource Practices TATA MOTORS.pdfHuman Resource Practices TATA MOTORS.pdf
Human Resource Practices TATA MOTORS.pdfAditiMishra247289
 
Infineon-Infineon_DC_EV_Charging_Trends_and_system_solutions-ApplicationPrese...
Infineon-Infineon_DC_EV_Charging_Trends_and_system_solutions-ApplicationPrese...Infineon-Infineon_DC_EV_Charging_Trends_and_system_solutions-ApplicationPrese...
Infineon-Infineon_DC_EV_Charging_Trends_and_system_solutions-ApplicationPrese...IEABODI2SnVVnGimcEAI
 
Welcome to Auto Know University Orientation
Welcome to Auto Know University OrientationWelcome to Auto Know University Orientation
Welcome to Auto Know University Orientationxlr8sales
 
ABOUT REGENERATIVE BRAKING SYSTEM ON AUTOMOBILES
ABOUT REGENERATIVE BRAKING SYSTEM ON AUTOMOBILESABOUT REGENERATIVE BRAKING SYSTEM ON AUTOMOBILES
ABOUT REGENERATIVE BRAKING SYSTEM ON AUTOMOBILESsriharshaganjam1
 
Increasing Community Impact with Meaningful Engagement by Brytanee Brown
Increasing Community Impact with Meaningful Engagement by Brytanee BrownIncreasing Community Impact with Meaningful Engagement by Brytanee Brown
Increasing Community Impact with Meaningful Engagement by Brytanee BrownForth
 
怎么办理美国UCONN毕业证康涅狄格大学学位证书一手渠道
怎么办理美国UCONN毕业证康涅狄格大学学位证书一手渠道怎么办理美国UCONN毕业证康涅狄格大学学位证书一手渠道
怎么办理美国UCONN毕业证康涅狄格大学学位证书一手渠道7283h7lh
 
Bizwerx Innovation & Mobility Hub by Dr. Cassandra Little
Bizwerx Innovation & Mobility Hub by Dr. Cassandra LittleBizwerx Innovation & Mobility Hub by Dr. Cassandra Little
Bizwerx Innovation & Mobility Hub by Dr. Cassandra LittleForth
 

Último (8)

Clean Mobility Options Program by Sarah Huang
Clean Mobility Options Program by Sarah HuangClean Mobility Options Program by Sarah Huang
Clean Mobility Options Program by Sarah Huang
 
Human Resource Practices TATA MOTORS.pdf
Human Resource Practices TATA MOTORS.pdfHuman Resource Practices TATA MOTORS.pdf
Human Resource Practices TATA MOTORS.pdf
 
Infineon-Infineon_DC_EV_Charging_Trends_and_system_solutions-ApplicationPrese...
Infineon-Infineon_DC_EV_Charging_Trends_and_system_solutions-ApplicationPrese...Infineon-Infineon_DC_EV_Charging_Trends_and_system_solutions-ApplicationPrese...
Infineon-Infineon_DC_EV_Charging_Trends_and_system_solutions-ApplicationPrese...
 
Welcome to Auto Know University Orientation
Welcome to Auto Know University OrientationWelcome to Auto Know University Orientation
Welcome to Auto Know University Orientation
 
ABOUT REGENERATIVE BRAKING SYSTEM ON AUTOMOBILES
ABOUT REGENERATIVE BRAKING SYSTEM ON AUTOMOBILESABOUT REGENERATIVE BRAKING SYSTEM ON AUTOMOBILES
ABOUT REGENERATIVE BRAKING SYSTEM ON AUTOMOBILES
 
Increasing Community Impact with Meaningful Engagement by Brytanee Brown
Increasing Community Impact with Meaningful Engagement by Brytanee BrownIncreasing Community Impact with Meaningful Engagement by Brytanee Brown
Increasing Community Impact with Meaningful Engagement by Brytanee Brown
 
怎么办理美国UCONN毕业证康涅狄格大学学位证书一手渠道
怎么办理美国UCONN毕业证康涅狄格大学学位证书一手渠道怎么办理美国UCONN毕业证康涅狄格大学学位证书一手渠道
怎么办理美国UCONN毕业证康涅狄格大学学位证书一手渠道
 
Bizwerx Innovation & Mobility Hub by Dr. Cassandra Little
Bizwerx Innovation & Mobility Hub by Dr. Cassandra LittleBizwerx Innovation & Mobility Hub by Dr. Cassandra Little
Bizwerx Innovation & Mobility Hub by Dr. Cassandra Little
 

Lean Experimentation

  • 1. Lean Experimentation How to leverage online experiments in research and practice Thomas Høgenhaven Twitter: @thogenhaven Cornell IS Breakfast Talk April 4th, 2012 Friday, April 6, 12
  • 2. Agenda 1. Conducting Online Experiments 2. Experimentation Literature 3. Experimentation in SMEs and Government Today 4. Lean Experimentation Friday, April 6, 12
  • 3. I Conducting Online Experiments Friday, April 6, 12
  • 4. The Why Bother Question “While some social scientists engage in small-scale controlled experimentation with dozens of users or groups, the capacity to perform large-scale interventions with thousands of users opens up new opportunities for research." (Preece and Schneiderman 2009: 25). Friday, April 6, 12
  • 5. What I Mean With Online Experiments In online experiments, we are interested in examining online behavior. Not just using the internet as a means to examine offline behavior. Friday, April 6, 12
  • 6. What I Mean With Online Experiments Users Independent Variation Variation Variation variable A B n Dependent Online Online Online Behavior variable Behavior Behavior Behavior Statistical Difference test Friday, April 6, 12
  • 7. The High-Level Experimental Process Thomke 1998: 745. Friday, April 6, 12
  • 8. Example: Experimentation At Microsoft Guess which one performs better, in each of these 8 pairs. Anyone getting 6/8 right, wins a t-shirt Friday, April 6, 12
  • 9. Experimenting At Microsoft 1 A B 5 A B Which one is significantly better? [] A [] B 2 A B 6 A B [] None of them 3 A B 7 A B 4 A B 8 A B Kohavi et al (2009): Online Experimentation at Microsoft Friday, April 6, 12
  • 10. Experimenting At Microsoft 0 / 200 Microsoft employees got more than 5 / 8 answers right 1 A B 5 A B 2 A B 6 A B 3 A B 7 A B 4 A B 8 A B Kohavi et al (2009): Online Experimentation at Microsoft Friday, April 6, 12
  • 11. What Is The Effect Of Experiments? Improvement No Effect Disimprovement 33% 33% 33% Kohavi et al (2009): Online Experimentation at Microsoft Friday, April 6, 12
  • 12. Is That Just Microsoft Being Microsoft? No. Estimating effects of changes is incredible hard. Netflix considers 90% of what they try to be wrong. Friday, April 6, 12
  • 13. It’s Actually Hard To Predict https://whichtestwon.com/past-tests Friday, April 6, 12
  • 14. 2 Experimental Literature Friday, April 6, 12
  • 15. Current Experimental Framework in HCI Psychology & Social Psychology Experimental methodology literature HCI Friday, April 6, 12
  • 16. Offline And Online Experiments • Psychology literature sometimes uses the internet to study human behavior • But it does not use the internet to study the internet Friday, April 6, 12
  • 17. For example... No mentions of experimentation in online environments 2010 Friday, April 6, 12
  • 18. Offline And Online Experiments Laboratory Field Offline Online Friday, April 6, 12
  • 19. Offline And Online Experiments Laboratory Field Psychology covers this Offline Online Friday, April 6, 12
  • 20. Offline And Online Experiments Laboratory Field Psychology covers this Offline But not this Online Friday, April 6, 12
  • 21. The Research There Is, Is Not Systematic "To the extent of our knowledge, no research has so far been reported on treating online test design and implementation in a systematic manner" (Cámara and Kobsa 2009: 18). Friday, April 6, 12
  • 22. Online Experiments In Academia CHI and CSCW use experiments all the time - but more can be invested in methodology literature. This will help explore possibilities and limitations of online experimentation Friday, April 6, 12
  • 23. 3 Experimentation In SMEs And Government Agencies Today Friday, April 6, 12
  • 24. State Of The Art In Industry Today • Experimentation is increasing • At least 25 different software vendors • $0 - $320,000 a year* *Source: whichmvt.com Friday, April 6, 12
  • 25. Practice Has Its Own Literature Friday, April 6, 12
  • 26. Website Experiments Several ways to conduct experiments 1. Server-side / Client-side 2. A/B Test / Multivariate Test Friday, April 6, 12
  • 27. Not Overly Expensive Software Just 2 out of 25+ vendors Google Website Optimizer Visual Website Optimizer (free) ($600 - $3000 / year) Friday, April 6, 12
  • 28. A/B/n Experiment Users Javascript Independent Webpage Webpage Webpage variable A B n Dependent variable Behavior Behavior Behavior Statistical test Difference Friday, April 6, 12
  • 30. Limitations Of Mainstream Experimental Software 1. Limited to between-subject design 2. Lack of data export 3. No control over statistical test 4. Expensive coding necessary Friday, April 6, 12
  • 31. Limitation 1: Limited To Between Subject Design • Cannot control for individual differences (No such data is collected / made available) • Requires more experimental subjects • No pre-experimental data is collected Friday, April 6, 12
  • 32. Limitation 2: Lack of Data Export Friday, April 6, 12
  • 33. Google Website Optimizer: Data Export Friday, April 6, 12
  • 35. Visual Website Optimizer: Data Export Friday, April 6, 12
  • 36. Software Limitations: Data Export • Some software better than other • No data on individual users • No segmentation on background variables • This might be the biggest problem, as this is where many significant results lie. Friday, April 6, 12
  • 37. Limitation 3: No Choice Between Statistical Tests Okay? Friday, April 6, 12
  • 38. Statistical Test = Chance To Beat Original “The chance to beat original ... displays the probability that a combination will be more the successful than the original version. When numbers in this column are high, perhaps around 95%, that means a given combination is probably a good candidate to replace your original content. Low numbers in this column mean that the corresponding combination is a poor candidate for replacement.” http://support.google.com/websiteoptimizer/bin/answer.py?hl=en&answer=55944 Friday, April 6, 12
  • 39. Visual Website Optimizer Is More Transparent “ Visual Website Optimizer uses z-tests for both A/ B tests and multivariate tests” Standard Error (SE) = Square root of (p * (1-p) / n) http://visualwebsiteoptimizer.com/split-testing-blog/what-you-really-need-to-know-about-mathematics-of-ab-split-testing/ Friday, April 6, 12
  • 40. z-tests We don’t know if data fits this • Focus on a single parameter • Assumes parametric assumptions are met Friday, April 6, 12
  • 41. Limitation 4: Coding Required Have to Users Javascript be coded Independent Webpage Webpage Webpage variable A B n Dependent variable Behavior Behavior Behavior Statistical test Difference Friday, April 6, 12
  • 42. Software Limitations: Expensive Coding We already coded it, so we can as well keep it. I hate working for no reason Friday, April 6, 12
  • 43. Software Limitations: Expensive Coding I knew this wouldn’t work! We should never have spent resources on it... Friday, April 6, 12
  • 44. The Challenge 1. Overcome methodological limitations of experimental software 2. Reduce development costs 3. Explore possibilities and limitations of online experimentation Friday, April 6, 12
  • 45. 4 Lean Experimentation Friday, April 6, 12
  • 46. Test Environment Users Independent Proxy Proxy Proxy variable A B n Dependent Behavior on Behavior on Behavior on variable Behavior website website website Statistical test Difference Friday, April 6, 12
  • 47. Proxies For Experimentation Website Email Survey Ads Friday, April 6, 12
  • 48. Comparative Advantages And Disadvantages Friday, April 6, 12
  • 49. Lean Experimentation Principles 1.Test assumptions, ideas, and theories 2. Test before coding, not after 3. Test in the field Friday, April 6, 12
  • 50. 1. Test Assumptions, Ideas, And Theories Friday, April 6, 12
  • 51. 2. Test Before Coding, Not After Ideas Bad Idea Good Idea Experimentation Implementation Friday, April 6, 12
  • 52. 3. Test In The Field • Identical design patterns have different effects in different contexts • E.g. social comparison information in respectively competitive and cooperative communities • Cocktail effects are largely unknown Friday, April 6, 12
  • 53. Requirements Of Lean Experimentation 1. Independent groups 2. Random assignment 3. Allows tracking Friday, April 6, 12
  • 54. Why Use Proxies For Experimentation? Friday, April 6, 12
  • 55. Test Environment • Manipulates the independent variable through a proxy • Examines dependent variable in natural field environment Friday, April 6, 12
  • 56. Test Subjects • Existing users (when using website, email, and survey) • Potential users (when using advertisements) Friday, April 6, 12
  • 57. Proposed Usage and limitations Good for Less suited for • Ideas • Small changes • Theories • Graphical changes • Hypothesis • Features Can be useful if testing assumptions Friday, April 6, 12
  • 58. Data Output • Mixed sources that need to be combined • Open / CTR rates from proxy • Web analytics • SQL databases Friday, April 6, 12
  • 59. Durability of Proxy Experiment is short Email experiment Control Experimentation 16 12 8 4 0 Wk0 Wk1 Wk2 Wk3 Friday, April 6, 12
  • 60. Buy In Needed Hard to sell 1. Making changes on websites 2. Sending Emails 3. Conducting Surveys 4. Running Ads Easy to sell Friday, April 6, 12
  • 61. Feedback Quality Critical feedback 1. Wireframes / early stage development 2. Finished / Nearly finished stages Not so critical feedback Friday, April 6, 12
  • 62. Influence On Decisions Increased likelihood of impact when getting experimental effect data early Friday, April 6, 12