SlideShare uma empresa Scribd logo
1 de 56
Baixar para ler offline
Experimental Design
Sergei Vassilvitskii
Columbia University
Computational Social Science
April 5, 2013
Thursday, April 25, 13
Sergei Vassilvitskii
Measurement
2
“Half the money I spend on advertising
is wasted; the trouble is, I don’t know
which half.”
- John Wanamaker
Thursday, April 25, 13
Sergei Vassilvitskii
Measurement
3
“Half the money I spend on advertising
is wasted; the trouble is, I don’t know
which half.”
- John Wanamaker, 1875
Thursday, April 25, 13
Sergei Vassilvitskii
Helping John:
4
Thursday, April 25, 13
Sergei Vassilvitskii
Helping John:
Idea 1: Measure the final effect:
– Track total store sales, compare to advertising budget
5
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 1:
Idea 1: Measure the final effect:
– Track total store sales, compare to advertising budget
Findings:
– Total sales typically higher after intense advertising
6
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 1:
Idea 1: Measure the final effect:
– Track total store sales, compare to advertising budget
Findings:
– Total sales typically higher after intense advertising
Problems:
– Stores advertise when people tend to spend
– Christmas shopping periods
– Travel during the summer
– Ski gear in winter, etc.
7
Thursday, April 25, 13
Sergei Vassilvitskii
Correlation vs. Causation
8
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 1
Within Subject pre-test, post-test design.
9
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 2
“Measuring the online sales impact of an online ad or a
paid-search campaign -- in which a company pays to have
its link appear at the top of a page of search results -- is
straightforward: 
We determine who has viewed the ad, then compare online
purchases made by those who have and those who have
not seen it." 
10
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 2
“Measuring the online sales impact of an online ad or a
paid-search campaign -- in which a company pays to have
its link appear at the top of a page of search results -- is
straightforward: 
We determine who has viewed the ad, then compare online
purchases made by those who have and those who have
not seen it." 
– Magid Abraham, CEO, President & Co-Founder of ComScore, in HBR
article (2008)
11
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 2
Measure the difference between people who see ads and
who don’t.
12
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 2
Measure the difference between people who see ads and
who don’t.
Findings:
– People who see the ads are more likely to react to them
13
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 2
Measure the difference between people who see ads and
who don’t.
Findings:
– People who see the ads are more likely to react to them
Problems:
– Ads are finely targeted. These are exactly the people who are likely to
click!
– Don’t advertise cars in fashion magazines.
– Even more extreme online -- which ads are shown depends on the
propensity of the user to click on the ad.
14
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 3
Matching:
– Compare people in a group who saw an ad with people who are
similar, but didn’t see an ad, but are otherwise “the same.”
15
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 3
Matching:
– Compare people in a group who saw an ad with people who are
similar, but didn’t see an ad, but are otherwise “the same.”
Problems:
– Hard to define “the same.” Beware of lurking variables.
16
Thursday, April 25, 13
Sergei Vassilvitskii
Ad Wear-out
17
What is the optimal number of times to show an ad?
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study: Ad Wear-out
Few:
– Don’t want user to be annoyed
– No need to waste money if ad
is ineffective
Many:
– Make sure the user sees it
– Reinforce the message
18
What is the optimal number of times to show an ad?
Thursday, April 25, 13
Sergei Vassilvitskii
Observational Study
Look through the data:
– Find the users who saw the ad once
– Find the users who saw the ad many times
19
Thursday, April 25, 13
Sergei Vassilvitskii
Observational Study
Look through the data:
– Find the users who saw the ad once
– Find the users who saw the ad many times
Measure Revenue for the two sets of users:
–
Conclusion: Limit the number of impressions
20
Thursday, April 25, 13
Sergei Vassilvitskii
Correlations
Why did some users only see the ad once?
– They must use the web differently
– : Sign on once a week to check email
– : Are always online
21
Thursday, April 25, 13
Sergei Vassilvitskii
Correlations
Why did some users only see the ad once?
– They must use the web differently
– : Sign on once a week to check email
– : Are always online
Correct conclusion:
– People who visit the homepage often are unlikely to click on ads
– Have not measured the effect of wear-out
22
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 3
Matching:
– Compare people in a group who saw an ad with people who are
similar, but didn’t see an ad, but are otherwise “the same.”
Problems:
– Hard to define “the same.” Beware of lurking variables.
23
Thursday, April 25, 13
Sergei Vassilvitskii
Simpson’s Paradox
Kidney Stones [Real Data].
You have Kidney stones. There are two treatments A & B.
– Empirically, treatment A is effective 78% of time
– Empirically, treatment B is effective 83% of time
– Which one do you chose?
24
Thursday, April 25, 13
Sergei Vassilvitskii
Simpson’s Paradox
Kidney Stones [Real Data].
You have Kidney stones. There are two treatments A & B.
Digging into the data you see:
If they are large:
– Treatment A is effective 73% of the time
– Treatment B is effective 69% of the time
If they are small:
– Treatment A is effective 93% of the time
– Treatment B is effective 87% of the time
25
Thursday, April 25, 13
Sergei Vassilvitskii
Simpson’s Paradox
If they are large:
– Treatment A is effective 73% of the time
– Treatment B is effective 69% of the time
If they are small:
– Treatment A is effective 93% of the time
– Treatment B is effective 87% of the time
Overall:
– Treatment A is effective 78% of the time
– Treatment B is effective 83% of the time
26
Thursday, April 25, 13
Sergei Vassilvitskii
Simpson’s Paradox Summary Stats
27
A B
Small 81/87 (93%) 234/270 (87%)
Large 192/263 (73%) 55/80 (69%)
Combined 273/350 (78%) 289/350 (83%)
Thursday, April 25, 13
Sergei Vassilvitskii
Idea 3
Matching:
– Compare people in a group who saw an ad with people who are
similar, but didn’t see an ad, but are otherwise “the same.”
Problems:
– Hard to define “the same.” Beware of lurking variables.
– Simpson’s Paradox
28
Thursday, April 25, 13
Sergei Vassilvitskii
Getting at Causation
Randomized, Controlled Experiments.
– Select a target population
– Randomly decide whom to show the ad
– Subjects cannot influence whether they are in the treatment or control
groups
29
Thursday, April 25, 13
Sergei Vassilvitskii
Measuring Wear Out
30
Parallel Universe
Thursday, April 25, 13
Sergei Vassilvitskii
Measuring Wear Out
31
Parallel Universe
Control Treatment
++
Thursday, April 25, 13
Sergei Vassilvitskii
Measuring Wear Out
32
Parallel Universe
Control TreatmentControl Treatment
++
Thursday, April 25, 13
Sergei Vassilvitskii
Creating Parallel Universes
When user first arrives:
– Check browser cookie, assign to control or treatment group
– Control group: shown PSA
– Treatment group: shown ad
– Treatment the same on repeated visits
33
Thursday, April 25, 13
Sergei Vassilvitskii
Creating Parallel Universes
When user first arrives:
– Check browser cookie, assign to control or treatment group
– Control group: shown PSA
– Treatment group: shown ad
– Treatment the same on repeated visits
Advertising Effects:
– Positive !
– But smaller than reported through observational studies
34
Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:
35
Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:
– Can reach tens of millions of people!
• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere:
Correlated Online Behaviors Can Lead to Overestimates of the Effects of
Advertising." (WWW 2011). Estimate effects of 0.01%!
36
Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:
– Can reach tens of millions of people!
• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere:
Correlated Online Behaviors Can Lead to Overestimates of the Effects of
Advertising." (WWW 2011). Estimate effects of 0.01%!
– Can be relatively cheap (Mechanical Turk)
37
Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:
– Can reach tens of millions of people!
• Can estimate very small effects. Lewis et al., "Here, There, and Everywhere:
Correlated Online Behaviors Can Lead to Overestimates of the Effects of
Advertising." (WWW 2011). Estimate effects of 0.01%!
– Can be relatively cheap
– Can be recruit diverse subjects
• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD
societies (Western, Educated, Industrialized, Rich, and Democratic).
38
Thursday, April 25, 13
Sergei Vassilvitskii
WEIRD People
Which line is longer?
– Henrich, Joseph; Heine, Steven J.; Norenzayan, Ara (2010) : The
weirdest people in the world?, Working Paper Series des Rates für
Sozialund Wirtschaftsdaten
39
Thursday, April 25, 13
Sergei Vassilvitskii
WEIRD People
40
Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:
– Can reach tens of millions of people!
• Can estimate very small effects.
– Can be relatively cheap
– Can be recruit diverse subjects
• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD
societies (Western, Educated, Industrialized, Rich, and Democratic).
– Access: subjects in other countries, geographically diverse
– Can be quick
41
Thursday, April 25, 13
Sergei Vassilvitskii
Online Experiments
Advantages:
– Can reach tens of millions of people!
• Can estimate very small effects.
– Can be relatively cheap
– Can be recruit diverse subjects
• “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD
societies (Western, Educated, Industrialized, Rich, and Democratic).
– Access: subjects in other countries, geographically diverse
– Can be quick
Challenges:
– Limited choice in range of treatments (no MRI studies)
– Do people behave differently offline?
42
Thursday, April 25, 13
Sergei Vassilvitskii
External Validity
Major Challenge in all lab experiments:
– Virtual and physical labs
– Do findings hold outside the lab?
Enter:
– Natural Experiments
43
Thursday, April 25, 13
Sergei Vassilvitskii
Natural Experiments
The experimental condition:
– Is not decided by the experimenter
– But is exogenous (subjects have no effect on the results)
44
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study: Ad-wear out
Back to Ad-wear out.
Natural Experiment:
– When there were two competing campaigns, the Yahoo! ad server
decided which campaign to show at random!
– This was by engineering design -- both campaigns got an equal share
of pageviews. (Less complex, easy to distribute than a round robin
system)
45
Few:
– Don’t want user to be annoyed
– No need to waste money if ad is
ineffective
Many:
– Make sure the user sees it
– Reinforce the message
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study: Ad-wear out
Natural Experiment:
– When there were two competing campaigns, the Yahoo! ad server
decided which campaign to show at random!
– This was by engineering design -- both campaigns got an equal share
of pageviews. (Less complex, easy to distribute than a round robin
system)
Experiments:
– Compare behavior of people who saw the same total number of ads,
but different number of each campaign.
46
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study: Ad-wear out
47
Yes:
– Some advertisements see a 5x drop in click-through rate after the
first exposure
– These typically have very high click-through rates
No:
– Others see no decrease in click-through rate even after ten exposures
– Have lower, but steady click-through rates
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 2: Yelp
Does a higher Yelp Rating lead to higher revenue?
How to do the experiment?
48
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 2: Yelp
Does a higher Yelp Rating lead to higher revenue?
How to do the experiment?
– Observational -- no causality.
– Control -- deception.
– Natural?
49
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 2: Yelp
Does a higher Yelp Rating lead to higher revenue?
Natural Experiment:
– Yelp rounds ratings to the nearest half star.
– 4.24 becomes 4 stars, 4.26 is 4.5 stars
50
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 2: Yelp
Natural Experiment:
– Yelp rounds ratings to the nearest half star.
– 4.24 becomes 4 stars, 4.26 is 4.5 stars
Data:
– Raw ratings from Yelp
– Restaurant revenue (from tax records)
51
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 2: Yelp
Natural Experiment:
– Yelp rounds ratings to the nearest half star.
– 4.24 becomes 4 stars, 4.26 is 4.5 stars
Data:
– Raw ratings from Yelp
– Restaurant revenue (from tax records)
– Finding: a one star increase leads to a 5-9% increase in revenue.
52
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 3: Badges
How do Badges influence user behavior?
Specifically:
– The “epic” badge on stackoverflow.
– Awarded after hitting the maximum number of points (through posts,
responses, etc.) on 50 distinct days.
53
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 3: Badges
How do Badges influence user behavior?
Specifically:
– The “epic” badge on stackoverflow.
– Awarded after hitting the maximum number of points (through posts,
responses, etc.) on 50 distinct days.
Experimental Design:
– Within subject pre-post test (again)
– Look at user behavior before/after receiving badge
– Averaged over different user, different timings, (hopefully) all other
factors.
54
Thursday, April 25, 13
Sergei Vassilvitskii
Case Study 3: Badges
Results:
55
Thursday, April 25, 13
Sergei Vassilvitskii
Overall
Experimental Design is hard!
– Be extra skeptical in your analyses. Lots of spurious correlations
Experiments:
– Natural and Controlled are best way to measure effects
Observational Data:
– Sometimes best you can do
– Can lead interesting descriptive insights
– But beware of correlations!
56
Thursday, April 25, 13

Mais conteúdo relacionado

Destaque

Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIjakehofman
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Paris e suas igrejas
Paris e suas igrejasParis e suas igrejas
Paris e suas igrejasfilipj2000
 
οι κήποι του ζAhrady
οι κήποι του ζAhradyοι κήποι του ζAhrady
οι κήποι του ζAhradyfilipj2000
 
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...Rijksdienst voor Ondernemend Nederland
 
Defrag: Applying Twitter Analytics in Real Time
Defrag: Applying Twitter Analytics in Real TimeDefrag: Applying Twitter Analytics in Real Time
Defrag: Applying Twitter Analytics in Real TimeTerametric
 
Elearning rich media_search
Elearning rich media_searchElearning rich media_search
Elearning rich media_searchlanglearner
 
Greece october 2012
Greece october 2012Greece october 2012
Greece october 2012filipj2000
 
Do You Straight Talk
Do You Straight TalkDo You Straight Talk
Do You Straight Talktheomarx
 
Gen Y and Connected Consumers – A Study of their Opinion Management in Social...
Gen Y and Connected Consumers – A Study of their Opinion Management in Social...Gen Y and Connected Consumers – A Study of their Opinion Management in Social...
Gen Y and Connected Consumers – A Study of their Opinion Management in Social...Mou Mukherjee-Das
 
Conselho de classe - Pedagogo César Tavares
Conselho de classe - Pedagogo César TavaresConselho de classe - Pedagogo César Tavares
Conselho de classe - Pedagogo César TavaresCÉSAR TAVARES
 

Destaque (20)

Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part II
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Counting
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Paris e suas igrejas
Paris e suas igrejasParis e suas igrejas
Paris e suas igrejas
 
Valet Till Gymnasiet
Valet Till GymnasietValet Till Gymnasiet
Valet Till Gymnasiet
 
Presentation1
Presentation1Presentation1
Presentation1
 
οι κήποι του ζAhrady
οι κήποι του ζAhradyοι κήποι του ζAhrady
οι κήποι του ζAhrady
 
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
 
El Plagio
El PlagioEl Plagio
El Plagio
 
Defrag: Applying Twitter Analytics in Real Time
Defrag: Applying Twitter Analytics in Real TimeDefrag: Applying Twitter Analytics in Real Time
Defrag: Applying Twitter Analytics in Real Time
 
Elearning rich media_search
Elearning rich media_searchElearning rich media_search
Elearning rich media_search
 
Greece october 2012
Greece october 2012Greece october 2012
Greece october 2012
 
Madrid `in tanitimi
Madrid `in tanitimiMadrid `in tanitimi
Madrid `in tanitimi
 
Fund for a Healthy Maine PPT, July 2011
Fund for a Healthy Maine PPT, July 2011Fund for a Healthy Maine PPT, July 2011
Fund for a Healthy Maine PPT, July 2011
 
Do You Straight Talk
Do You Straight TalkDo You Straight Talk
Do You Straight Talk
 
Gen Y and Connected Consumers – A Study of their Opinion Management in Social...
Gen Y and Connected Consumers – A Study of their Opinion Management in Social...Gen Y and Connected Consumers – A Study of their Opinion Management in Social...
Gen Y and Connected Consumers – A Study of their Opinion Management in Social...
 
Christmas 2009 comenius maux 03
Christmas 2009 comenius maux 03Christmas 2009 comenius maux 03
Christmas 2009 comenius maux 03
 
Conselho de classe - Pedagogo César Tavares
Conselho de classe - Pedagogo César TavaresConselho de classe - Pedagogo César Tavares
Conselho de classe - Pedagogo César Tavares
 

Mais de jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbitjakehofman
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10jakehofman
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09jakehofman
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brainjakehofman
 

Mais de jakehofman (17)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brain
 

Último

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 

Último (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 

Computational Social Science, Lecture 10: Online Experiments

  • 1. Experimental Design Sergei Vassilvitskii Columbia University Computational Social Science April 5, 2013 Thursday, April 25, 13
  • 2. Sergei Vassilvitskii Measurement 2 “Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.” - John Wanamaker Thursday, April 25, 13
  • 3. Sergei Vassilvitskii Measurement 3 “Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.” - John Wanamaker, 1875 Thursday, April 25, 13
  • 5. Sergei Vassilvitskii Helping John: Idea 1: Measure the final effect: – Track total store sales, compare to advertising budget 5 Thursday, April 25, 13
  • 6. Sergei Vassilvitskii Idea 1: Idea 1: Measure the final effect: – Track total store sales, compare to advertising budget Findings: – Total sales typically higher after intense advertising 6 Thursday, April 25, 13
  • 7. Sergei Vassilvitskii Idea 1: Idea 1: Measure the final effect: – Track total store sales, compare to advertising budget Findings: – Total sales typically higher after intense advertising Problems: – Stores advertise when people tend to spend – Christmas shopping periods – Travel during the summer – Ski gear in winter, etc. 7 Thursday, April 25, 13
  • 8. Sergei Vassilvitskii Correlation vs. Causation 8 Thursday, April 25, 13
  • 9. Sergei Vassilvitskii Idea 1 Within Subject pre-test, post-test design. 9 Thursday, April 25, 13
  • 10. Sergei Vassilvitskii Idea 2 “Measuring the online sales impact of an online ad or a paid-search campaign -- in which a company pays to have its link appear at the top of a page of search results -- is straightforward:  We determine who has viewed the ad, then compare online purchases made by those who have and those who have not seen it."  10 Thursday, April 25, 13
  • 11. Sergei Vassilvitskii Idea 2 “Measuring the online sales impact of an online ad or a paid-search campaign -- in which a company pays to have its link appear at the top of a page of search results -- is straightforward:  We determine who has viewed the ad, then compare online purchases made by those who have and those who have not seen it."  – Magid Abraham, CEO, President & Co-Founder of ComScore, in HBR article (2008) 11 Thursday, April 25, 13
  • 12. Sergei Vassilvitskii Idea 2 Measure the difference between people who see ads and who don’t. 12 Thursday, April 25, 13
  • 13. Sergei Vassilvitskii Idea 2 Measure the difference between people who see ads and who don’t. Findings: – People who see the ads are more likely to react to them 13 Thursday, April 25, 13
  • 14. Sergei Vassilvitskii Idea 2 Measure the difference between people who see ads and who don’t. Findings: – People who see the ads are more likely to react to them Problems: – Ads are finely targeted. These are exactly the people who are likely to click! – Don’t advertise cars in fashion magazines. – Even more extreme online -- which ads are shown depends on the propensity of the user to click on the ad. 14 Thursday, April 25, 13
  • 15. Sergei Vassilvitskii Idea 3 Matching: – Compare people in a group who saw an ad with people who are similar, but didn’t see an ad, but are otherwise “the same.” 15 Thursday, April 25, 13
  • 16. Sergei Vassilvitskii Idea 3 Matching: – Compare people in a group who saw an ad with people who are similar, but didn’t see an ad, but are otherwise “the same.” Problems: – Hard to define “the same.” Beware of lurking variables. 16 Thursday, April 25, 13
  • 17. Sergei Vassilvitskii Ad Wear-out 17 What is the optimal number of times to show an ad? Thursday, April 25, 13
  • 18. Sergei Vassilvitskii Case Study: Ad Wear-out Few: – Don’t want user to be annoyed – No need to waste money if ad is ineffective Many: – Make sure the user sees it – Reinforce the message 18 What is the optimal number of times to show an ad? Thursday, April 25, 13
  • 19. Sergei Vassilvitskii Observational Study Look through the data: – Find the users who saw the ad once – Find the users who saw the ad many times 19 Thursday, April 25, 13
  • 20. Sergei Vassilvitskii Observational Study Look through the data: – Find the users who saw the ad once – Find the users who saw the ad many times Measure Revenue for the two sets of users: – Conclusion: Limit the number of impressions 20 Thursday, April 25, 13
  • 21. Sergei Vassilvitskii Correlations Why did some users only see the ad once? – They must use the web differently – : Sign on once a week to check email – : Are always online 21 Thursday, April 25, 13
  • 22. Sergei Vassilvitskii Correlations Why did some users only see the ad once? – They must use the web differently – : Sign on once a week to check email – : Are always online Correct conclusion: – People who visit the homepage often are unlikely to click on ads – Have not measured the effect of wear-out 22 Thursday, April 25, 13
  • 23. Sergei Vassilvitskii Idea 3 Matching: – Compare people in a group who saw an ad with people who are similar, but didn’t see an ad, but are otherwise “the same.” Problems: – Hard to define “the same.” Beware of lurking variables. 23 Thursday, April 25, 13
  • 24. Sergei Vassilvitskii Simpson’s Paradox Kidney Stones [Real Data]. You have Kidney stones. There are two treatments A & B. – Empirically, treatment A is effective 78% of time – Empirically, treatment B is effective 83% of time – Which one do you chose? 24 Thursday, April 25, 13
  • 25. Sergei Vassilvitskii Simpson’s Paradox Kidney Stones [Real Data]. You have Kidney stones. There are two treatments A & B. Digging into the data you see: If they are large: – Treatment A is effective 73% of the time – Treatment B is effective 69% of the time If they are small: – Treatment A is effective 93% of the time – Treatment B is effective 87% of the time 25 Thursday, April 25, 13
  • 26. Sergei Vassilvitskii Simpson’s Paradox If they are large: – Treatment A is effective 73% of the time – Treatment B is effective 69% of the time If they are small: – Treatment A is effective 93% of the time – Treatment B is effective 87% of the time Overall: – Treatment A is effective 78% of the time – Treatment B is effective 83% of the time 26 Thursday, April 25, 13
  • 27. Sergei Vassilvitskii Simpson’s Paradox Summary Stats 27 A B Small 81/87 (93%) 234/270 (87%) Large 192/263 (73%) 55/80 (69%) Combined 273/350 (78%) 289/350 (83%) Thursday, April 25, 13
  • 28. Sergei Vassilvitskii Idea 3 Matching: – Compare people in a group who saw an ad with people who are similar, but didn’t see an ad, but are otherwise “the same.” Problems: – Hard to define “the same.” Beware of lurking variables. – Simpson’s Paradox 28 Thursday, April 25, 13
  • 29. Sergei Vassilvitskii Getting at Causation Randomized, Controlled Experiments. – Select a target population – Randomly decide whom to show the ad – Subjects cannot influence whether they are in the treatment or control groups 29 Thursday, April 25, 13
  • 30. Sergei Vassilvitskii Measuring Wear Out 30 Parallel Universe Thursday, April 25, 13
  • 31. Sergei Vassilvitskii Measuring Wear Out 31 Parallel Universe Control Treatment ++ Thursday, April 25, 13
  • 32. Sergei Vassilvitskii Measuring Wear Out 32 Parallel Universe Control TreatmentControl Treatment ++ Thursday, April 25, 13
  • 33. Sergei Vassilvitskii Creating Parallel Universes When user first arrives: – Check browser cookie, assign to control or treatment group – Control group: shown PSA – Treatment group: shown ad – Treatment the same on repeated visits 33 Thursday, April 25, 13
  • 34. Sergei Vassilvitskii Creating Parallel Universes When user first arrives: – Check browser cookie, assign to control or treatment group – Control group: shown PSA – Treatment group: shown ad – Treatment the same on repeated visits Advertising Effects: – Positive ! – But smaller than reported through observational studies 34 Thursday, April 25, 13
  • 36. Sergei Vassilvitskii Online Experiments Advantages: – Can reach tens of millions of people! • Can estimate very small effects. Lewis et al., "Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising." (WWW 2011). Estimate effects of 0.01%! 36 Thursday, April 25, 13
  • 37. Sergei Vassilvitskii Online Experiments Advantages: – Can reach tens of millions of people! • Can estimate very small effects. Lewis et al., "Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising." (WWW 2011). Estimate effects of 0.01%! – Can be relatively cheap (Mechanical Turk) 37 Thursday, April 25, 13
  • 38. Sergei Vassilvitskii Online Experiments Advantages: – Can reach tens of millions of people! • Can estimate very small effects. Lewis et al., "Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising." (WWW 2011). Estimate effects of 0.01%! – Can be relatively cheap – Can be recruit diverse subjects • “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic). 38 Thursday, April 25, 13
  • 39. Sergei Vassilvitskii WEIRD People Which line is longer? – Henrich, Joseph; Heine, Steven J.; Norenzayan, Ara (2010) : The weirdest people in the world?, Working Paper Series des Rates für Sozialund Wirtschaftsdaten 39 Thursday, April 25, 13
  • 41. Sergei Vassilvitskii Online Experiments Advantages: – Can reach tens of millions of people! • Can estimate very small effects. – Can be relatively cheap – Can be recruit diverse subjects • “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic). – Access: subjects in other countries, geographically diverse – Can be quick 41 Thursday, April 25, 13
  • 42. Sergei Vassilvitskii Online Experiments Advantages: – Can reach tens of millions of people! • Can estimate very small effects. – Can be relatively cheap – Can be recruit diverse subjects • “20 students in a large Midwestern university.” Try to avoid subjects from WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic). – Access: subjects in other countries, geographically diverse – Can be quick Challenges: – Limited choice in range of treatments (no MRI studies) – Do people behave differently offline? 42 Thursday, April 25, 13
  • 43. Sergei Vassilvitskii External Validity Major Challenge in all lab experiments: – Virtual and physical labs – Do findings hold outside the lab? Enter: – Natural Experiments 43 Thursday, April 25, 13
  • 44. Sergei Vassilvitskii Natural Experiments The experimental condition: – Is not decided by the experimenter – But is exogenous (subjects have no effect on the results) 44 Thursday, April 25, 13
  • 45. Sergei Vassilvitskii Case Study: Ad-wear out Back to Ad-wear out. Natural Experiment: – When there were two competing campaigns, the Yahoo! ad server decided which campaign to show at random! – This was by engineering design -- both campaigns got an equal share of pageviews. (Less complex, easy to distribute than a round robin system) 45 Few: – Don’t want user to be annoyed – No need to waste money if ad is ineffective Many: – Make sure the user sees it – Reinforce the message Thursday, April 25, 13
  • 46. Sergei Vassilvitskii Case Study: Ad-wear out Natural Experiment: – When there were two competing campaigns, the Yahoo! ad server decided which campaign to show at random! – This was by engineering design -- both campaigns got an equal share of pageviews. (Less complex, easy to distribute than a round robin system) Experiments: – Compare behavior of people who saw the same total number of ads, but different number of each campaign. 46 Thursday, April 25, 13
  • 47. Sergei Vassilvitskii Case Study: Ad-wear out 47 Yes: – Some advertisements see a 5x drop in click-through rate after the first exposure – These typically have very high click-through rates No: – Others see no decrease in click-through rate even after ten exposures – Have lower, but steady click-through rates Thursday, April 25, 13
  • 48. Sergei Vassilvitskii Case Study 2: Yelp Does a higher Yelp Rating lead to higher revenue? How to do the experiment? 48 Thursday, April 25, 13
  • 49. Sergei Vassilvitskii Case Study 2: Yelp Does a higher Yelp Rating lead to higher revenue? How to do the experiment? – Observational -- no causality. – Control -- deception. – Natural? 49 Thursday, April 25, 13
  • 50. Sergei Vassilvitskii Case Study 2: Yelp Does a higher Yelp Rating lead to higher revenue? Natural Experiment: – Yelp rounds ratings to the nearest half star. – 4.24 becomes 4 stars, 4.26 is 4.5 stars 50 Thursday, April 25, 13
  • 51. Sergei Vassilvitskii Case Study 2: Yelp Natural Experiment: – Yelp rounds ratings to the nearest half star. – 4.24 becomes 4 stars, 4.26 is 4.5 stars Data: – Raw ratings from Yelp – Restaurant revenue (from tax records) 51 Thursday, April 25, 13
  • 52. Sergei Vassilvitskii Case Study 2: Yelp Natural Experiment: – Yelp rounds ratings to the nearest half star. – 4.24 becomes 4 stars, 4.26 is 4.5 stars Data: – Raw ratings from Yelp – Restaurant revenue (from tax records) – Finding: a one star increase leads to a 5-9% increase in revenue. 52 Thursday, April 25, 13
  • 53. Sergei Vassilvitskii Case Study 3: Badges How do Badges influence user behavior? Specifically: – The “epic” badge on stackoverflow. – Awarded after hitting the maximum number of points (through posts, responses, etc.) on 50 distinct days. 53 Thursday, April 25, 13
  • 54. Sergei Vassilvitskii Case Study 3: Badges How do Badges influence user behavior? Specifically: – The “epic” badge on stackoverflow. – Awarded after hitting the maximum number of points (through posts, responses, etc.) on 50 distinct days. Experimental Design: – Within subject pre-post test (again) – Look at user behavior before/after receiving badge – Averaged over different user, different timings, (hopefully) all other factors. 54 Thursday, April 25, 13
  • 55. Sergei Vassilvitskii Case Study 3: Badges Results: 55 Thursday, April 25, 13
  • 56. Sergei Vassilvitskii Overall Experimental Design is hard! – Be extra skeptical in your analyses. Lots of spurious correlations Experiments: – Natural and Controlled are best way to measure effects Observational Data: – Sometimes best you can do – Can lead interesting descriptive insights – But beware of correlations! 56 Thursday, April 25, 13