4. Testing
A/B Tests:
Multi Variety Tests:
vs.
It's Time
Expedia
Hero
Numbers
Offer
only
It's Time
Expedia
Hero
Numbers
Offer
only
It's Time
Expedia
Hero
Numbers
Offer
only
Go
Search
Now
Book
Now
= Test cells executed during test
= Test cells evaluated but not executed
Offer - Up to 50% offOffer - Book together and SaveOffer - Start Saving
5. A/B Test Example:
Videos continue to score higher in NSAT and PSAT, but under perform in conversions.
NSAT Results - FYQ2
2011
With video results were stat. significantly higher
Unique
Visitors
NSAT PSAT
FPP Upgrade
Conv. Rate
Avg.
Revenue
(All SKUs)
Compare w/o
Video
186,330 108 134 0.35% $1.13
Compare with
Video
187,185 124 136 0.30% $0.95
Lift 16 2 (0.05%) ($0.18)
*PSAT lift only has a stat. significance of 91%, all other are +99%
FYQ2 results are inline with Septembers
findings, which showed adding a video to the compare
page has had a positive impact on visitor‟s NSAT and
PSAT scores.
A possible down side to adding more videos is they
may serve as a distraction, causing visitors to miss the
Buy Now button and lowering conversion rates.
Numbers have been doctored to hide client sensitive data
6. Multivariate Tests
Full factorial – test every possible variation. For example if you are testing four
different elements and four variations of three element you are looking at 4 x 3 x 3 =
36 combinations.
Partial factorial – Partial factorial tests can be set up in a way that allows you to infer
results, the Taguchi method is probably the most commonly used method.
It's Time
Expedia
Hero
Numbers
Offer
only
It's Time
Expedia
Hero
Numbers
Offer
only
It's Time
Expedia
Hero
Numbers
Offer
only
Go
Search
Now
Book
Now
= Test cells executed during test
= Test cells evaluated but not executed
Offer - Up to 50% offOffer - Book together and SaveOffer - Start Saving
7. Offer
Call to Action
Message*
Shopping Rate by Call to Action
165
191
180
150
180
210
Book Now Search Now Go
ShoppingRatepermillioncookies
Shopping Rate by Offer
157
200
179
150
180
210
Tanning Beach Cruise Bay Bridge
ShoppingRatepermillioncookies
Image
Multivariate Example:
Shopping Rate by Call to Action
176
159
201
179
150
180
210
Expedia as Hero It's Time Offer Only Numbers
ShoppingRatepermillioncookies
Shopping Rate by Offer
174
183
180
150
180
210
Book Together and Save 50% off Hotels Generic
ShoppingRatepermillioncookies
_____
8. Pros and Cons
Pros Cons
A/B Test • Set up is relatively easy
• Analysis is easier
• Don‟t need much of a stats
background to interpret results
• Easy to get sucked into testing too
many things at once
• A and B need to be different
enough to get results
• Time consuming to test on
element at a time
Multivariate Test • Less political push back,
everyone gets to test their idea
• Get all of the analysis done in
one shot
• Easy to mess up
• Tools are a black box, or do it your
self + your best PhD stats buddy
• Need a lot of volume or time
9. Testing Recommendations
Start with high impact tests:
• Test home/landing pages
• Test Conversions i.e. sign-up forms, cart/purchase
pages, etc.
• Test ad design
• Price tests (a hard one politically to pull off)
Other great things to tests:
• Test landing page/deep linking
• Page Heros
10. Testing Best Practices
• Start with a hypothesis – Don‟t just start testing random stuff like
colors unless you have a good reason.
• Set goals – Looking to improve conversion rate by x%
• What is significant – We‟re not testing drugs, no one‟s life is on the
line so 99.9% statistical significance is probably over kill, but what
about say 60%???
11. More Testing Tips
• Get help – Setting up the test, searching through your old stats
notes can be a challenge. Don‟t
• Make it fun/interesting – It takes a lot to pull of a good test:
UX, creative team, site dev, analysts, and maybe more. Plus
someone‟s budget. Everyone has an opinion and/or theory, you can
use that to get momentum for a testing project.
At Getty we held a company wide contest to see who could pick the
winner of a multivariate test. There were +300 possible combinations
and everyone got to vote on which one they thought would be the
winner.
16. Combining attitudinal & behavioral data
The End Action scorecard was originally designed to value experiences based on shifts in attitudes.
For Q4 2010, we added Microsoft Store Purchase behavior as well:
17. End Action conversion rates
Using End Action cookie data we can report on a more accurate conversion
rate.
If we assume most site visits don‟t last longer than 30 minutes, we can conclude less than half of Store buyers (43%) make a
purchase during their first site visit. The remaining purchasers return later (sometimes days later) to complete their purchase. Using
End Action cookie data site visitors who read a product review, leave the Shop page, and return later to finally make a purchase will
still be counted when reporting on site visitors who read a product review and then made a purchase.
Numbers have been doctored to hide client sensitive data
20. EA measures correlations, not causation
Example: People who watch 7 Second demos have 10% higher Win.com NSAT than people who don‟t watch demos
However, EA quantifies correlation, not causation
• Cannot immediately say: Watching videos makes people more 10% satisfied
• This requires additional information such as specific testing, observation, and insight
21. Survey timing can create respondent biases
Site visitors are invited to take the EA survey as soon as they leave the windows domain. So, as site visitors
move further down the funnel, survey respondents start to look more like visitors who are abandoning their cart
vs. purchasers. This can be seen in the illustration below.
In this example, Visitor A takes the survey and will be included in the NSAT results for the Visit Shop EA, but does
not purchase. While Visitor B completes the purchase process but by doing so, never receives a survey invite.
23. Compare Page Video
Videos continue to score higher in NSAT and PSAT, but under perform in conversions.
NSAT Results - FYQ2
2011
With video results were stat. significantly higher
Unique
Visitors
NSAT PSAT
FPP Upgrade
Conv. Rate
Avg.
Revenue
(All SKUs)
Compare w/o
Video
186,330 108 134 0.35% $1.13
Compare with
Video
187,185 124 136 0.30% $0.95
Lift 16 2 (0.05%) ($0.18)
*PSAT lift only has a stat. significance of 91%, all other are +99%
FYQ2 results are inline with Septembers
findings, which showed adding a video to the compare
page has had a positive impact on visitor‟s NSAT and
PSAT scores.
A possible down side to adding more videos is they
may serve as a distraction, causing visitors to miss the
Buy Now button and lowering conversion rates.
Numbers have been doctored to hide client sensitive data
24. Attitudes Influence Buying Behavior
As site visitors move deeper into the site and further down the purchase funnel, we start to see an increase in
both site satisfaction (NSAT) and the Windows 7 Upgrade conversion rate. Based on the EA survey data, we
know that we have some levers for improving site satisfaction – using video or interactive experiences, providing
value added downloads etc. From this data, we can see that by first improving NSAT, we can push more people
into a transactional mode on the site.
(1)
(2)
(1) NSAT % ∆ from FYQ4 EAA Scorecard
(2) FPP Upgrade # ∆ from FYQ4 Sales Trans Scorecard
Note: Purchase NSAT & Video Conv. Rate were not statistically significant by +/-5%
*Conversion rate = purchasers who took the end action / count of unique cookies who took the end action.
Numbers have been doctored to hide client sensitive data
25. Target Content = Higher Scores
Windows 7 visitors – Visitors who visited the Compare pages, Anytime Upgrade, and Features pages had higher NSAT scores
while less relevant pages like the Upgrade Advisor and the Get win7 default page scored lower.
Vista Visitors – Vista users who visited the Compare pages and Upgrade Advisor related pages had higher NSAT scores. The
less relevant Anytime Upgrade pages scored lower.
XP Visitors – Similar to the Vista users, the Compare pages and Upgrade Advisor related pages had higher NSAT
scores, while the less relevant Anytime Upgrade pages scored lower.
Numbers have been doctored to hide client sensitive data
27. Ad Conversions
When a user clicks on an Ad
they re-directed through
Atlas to the destination page.
Atlas
Atlas records the click
and re-directs the user
to the destination page.
Atlas
Ad Server
(img server,
CDN)
Atlas
1x1
If the site has an action tag on the
landing page the visit can now be
directly tied back to the ad.
Atlas can then tie each ad impressions and click back to the action tag, (per cookie).
This is data is then used to optimize the ad campaign.
29. Advanced Attribution: Details
Problem: Ad-server rules are heavily biased in favor of click-based and „last-touch‟ exposures (i.e. branded
search) and undervalue a person‟s history of exposure to display media.
Objective: Correct this bias by reallocating credit for conversions in proportion to the relative contribution of past
exposures.
Approach: Model cookie-exposure history to estimate relative contribution. Use model estimates to „score‟ the
individual placements; awarding each placement some, all, or no credit for a cookies conversion.
Action: Media-planners may optimize online media budget, either during or after a campaign, towards those
publishers and engagements that drive the greatest ROI.
30. There are several approaches
Method II: Recency-weighted Attribution
Score is assigned according to its time distance to conversion
Special weight might be given to the first and last touch point
Method III: Probabilistic Attribution
Weight is given according to conversion probability change from exposure to the ads
Probability is calculated from predicting models on ads frequency and attributes
1/n C1/n1/n 1/n 1/n 1/n 1/n 1/n
CS.t1S.t2S.t3S.t4S.t5S.tn S.tn-1 S.tn-2
CΔP1ΔP2ΔP3ΔP4ΔP5ΔP8 ΔP7 ΔP6
Simple approach, but flawed in that it‟s really a “welfare
state” for media that does not address relative efficacy
More nuanced approach differentiates by recency, but does
not account for relative performance differences of different
formats
More complex performance-based approach uses the
change in historical conversion probability per exposure to
allocate credit
Method I: Even Distribution
Score = 1/ n (n is total exposure frequency)
31. Outcome Example
Using the conversion rates under the attribution model certain placements and networks look better or worse, this directly effects
how and where the media team purchases ad placements.
32. Incremental revenue from attribution
Incremental revenue increase is calculated
by comparing attribution media
optimization against last touch media
optimization.
Incremental revenue increase varies with
the degree of optimization shift from least
to most efficient media.
5% optimization: +$15 million (+2.67%).
10% optimization: +$29 million (+5.09%)
incremental revenue increase.
15% optimization: +$45 million (+7.95%)
incremental revenue increase.
$564
$576 $579 $582
$564
$591
$608
$628
$520
$540
$560
$580
$600
$620
$640
Base Lowest 5% Lowest 10% Loweest 15%
Millions
Revenue With Optimization
Standard Last Touch Razorfish Advanced Attribution
$15 MM
$29MM
$45 MM
33. Case Study
0.69
1.02
ControlTest
0.69%
1.02
%
48% lift in Paid Search Click-Through Rate due to
Banner Ad Exposure
Test group was exposed
to client media when
encountering campaign
placements
Control group was
exposed to PSA media
when encountering
campaign placements
• Across clients and advertisers, banner exposure
consistently drives incremental search clicks and
conversions
• Clearly, some portion of credit for search
conversion belongs to prior display (and other
media exposure)
• Attribution quantifies the relative contributions of
each touch point and allocates credit accordingly
Example is from an apparel retailer. We ran a “true lift test” – where we held out a random control from all display media for a
period, and evaluated performance differences between control and exposed. These results are consistent with other similar
test run for other clients.
35. ConversionsTV
Radio
Display Mobile
Cinema
Print
Media Mix Models
Problem: When multi-channel marketing efforts occur simultaneously it can be hard to identify which of these
channels responsible for conversions. Answers are difficult to come by when direct measurement of individual-
level exposure is not feasible (i.e. OOH, TV etc.).
Objective: Create a model that accurately reflects how well each channel operates within a general
business/marketing environment.
Approach: Use daily (or weekly) tracking data to specify the relationship between channel activity and conversion
volume. Incorporate into the models channel-specific accumulation and decay effects as well as
relevant, macroeconomic indicators and historical events.
Action: Using the results to estimate the channel specific point of diminishing returns, the optimal spend per channel
is appraised for future campaigns.
36. Factors and media effects
The most important aim of the attribution analysis is to get to the relationship between media spend and the KPI
that we are optimizing for. In order to get there, we need to understand each media type impacts KPIs and each
other
37. Ad Stocking effects
Adding the ad stocking effect of media to the model helps account for the diminishing effects of an ad over time.
The chart below shows the approximate half life of each media type modeled. Note some media types have a
longer half life than others, i.e. the effect of TV ads tend to last longer than a banner ad for example.
Optimizer
Ad Stocking
Effects
Effectiveness
Curves
Media Cost
Curves
Total Budget
Typical Half Life's by Media Types
38. Media effectiveness curves
The effectiveness of media diminishes as the volume of exposure is increased. Eventually the incremental
change in media will have little to no effect on the reached audience, the saturation point. Each media type
reaches its saturation point at different levels of exposure (GRPs).
Diminishing Returns by Media Type
Optimizer
Ad Stocking
Effects
Effectiveness
Curves
Media Cost
Curves
Total Budget
39. Media cost effects
Media Reach Curves:
• Inventory constraints for each media type.
• Planner judgment on maximum feasible investment levels.
Media Cost Curves:
• These reflect how media costs scale as spend scales.
• These need to capture realities such as increasing costs per reach
point, seasonality etc. in order to pragmatically reflect the media landscape.
Optimizer
Ad Stocking
Effects
Effectiveness
Curves
Media Cost
Curves
Total Budget
40. Budget effects
Because the saturation point and level of effectiveness changes at a different rate for each media type the overall
optimal mix for each channel will change with the overall media budget. In the example below shows how optimal
mix in spend shifts from one media type to another depending on the level of spend.
Diminishing Returns by Media Spend
Budget A
Budget B
Optimizer
Ad Stocking
Effects
Effectiveness
Curves
Media Cost
Curves
Total Budget
41. Optimization
The optimizer takes into account all of the factors, ad stocking, diminishing returns, cost and inventory constraints
and through and through an iterative process chooses the optimal media channel for each incremental dollar
spent.
Diminishing Returns by Media Spend Final Optimized Results
Optimizer
Ad Stocking
Effects
Effectiveness
Curves
Media Cost
Curves
Total Budget