SlideShare a Scribd company logo
1 of 49
Download to read offline
Google Analytics and AdWords optimisation with
GNU R
Hinnerk Gnutzmann & Piotr Śpiewanowski
flexponsive UG
Booster Conference, 9th March 2016
About flexponsive UG
• e-commerce consulting
• Big Data focus
• Qualitative user testing
• Academic (PhD in economics) and programming background
Contact
• mailto: spiewanowski@flexponsive.net
• web: https://www.flexponsive.net/
• t: @flexponsive
Topic of the day
• Marketing outcomes
• difficult to define
• even more difficult to measure
• Before Big Data: “Half the money I spend on advertising is wasted;
the trouble is I don’t know which half.” (John Wanamaker, 1838 -
1922)
• With Big Data: “AdWords brand keyword ads have no measurable
short-term benefis” (Blake et al., 2015) - 100% wasted?
• Open Questions:
• Incrementality Debate: Do AdWords campaings cannibalise organic
traffic?
• Quality: Are bought visitors good or bad customers?
• Heterogenity: Campaign effects differ between customers?
Agenda
1. Case study Brand Keyword: The Secret of vanishing AdWords ROI
2. What can we do?
• attribution models
• controlled experiments
• GNU R & Analytics: A Dream Team
3. How to do that?
• Google Core Reporting API & GNU R
• GA Query Explorer
• Configuring an experiment in AdWords
4. Analysis with GNU R
• Data wrangling, sampling, etc.
• GA replicate metrics
• Regression Analysis
5. Case Study II: adClicks and rain in Bergen
Example - Skandiabanken
Example - Skandiabanken
What happened?
• the AdWord is highly relevant to the search
• Navigational Query: The visitor wants to visit Skandiabanken.
• Customer knows the bank and maybe even has a service in mind
• Result: Probably the best keyword in the account
• Excellent CTR
• Very good conversion on-site
• CPC perhaps not so high
• Any questions?
• Organic result is the same!
• What would you click if there was no ad?
What happened?
• the AdWord is highly relevant to the search
• Navigational Query: The visitor wants to visit Skandiabanken.
• Customer knows the bank and maybe even has a service in mind
• Result: Probably the best keyword in the account
• Excellent CTR
• Very good conversion on-site
• CPC perhaps not so high
• Any questions?
• Organic result is the same!
• What would you click if there was no ad?
What happened?
• the AdWord is highly relevant to the search
• Navigational Query: The visitor wants to visit Skandiabanken.
• Customer knows the bank and maybe even has a service in mind
• Result: Probably the best keyword in the account
• Excellent CTR
• Very good conversion on-site
• CPC perhaps not so high
• Any questions?
• Organic result is the same!
• What would you click if there was no ad?
Example - Skandiabanken (without AdWords)
ROI
Problem: SEM expenditure a function not only of the campaign, but also
of the behavior and intent of consumer
The eBay study
• Blake et al. (2015), “Consumer Heterogeneity and Paid Search
Effectiveness: A Large Scale Field Experiment”
• Field Experiment: Does AdWords work for eBay?
• Very controversial results:
1. Conventional methods used to measure the causal (incremental)
impact of SEM vastly overstate its effect.
2. True effectiveness of SEM is small for a well-known company like eBay
3. Click substition: When the brand keyword AdWord disappeared,
almost all the users click on the organic result
4. Informative Advertising: AdWords work if a visitor gains additional
information through advertisement - AdWords had almost no effect on
revenues from existing customers - They found their own way to eBay!
What can be done? Attribution modelling
But how to know the true channel’s impact?
Attribution modelling
• a way to divide the “credit” for a sale between different marketing
channels
• if you don’t know what attribution model you are using, it’s “last
click” => you believe the sale only depends on the last ad the
customer saw before purchasing
• probably that’s not true: perhaps the customer had been following
the company blog for a long time, heard friends talk off-line about the
product, or saw many banner ads on different sides before making a
purchase
• problem: no good way to decide how to “attribute” between different
marketing channels
• results depend a lot on assumptions, which you cannot test
• similar problem: if you advertise your brick-and-mortar store on TV
and on radio, what drives the customer to your store?
What can be done? Controlled experiments
• Select by random treatment and control group, for example:
• Per user: A / B Testing
• By Geographical Region
• Assumption: Without experiment, both groups behave similarly
• Evaluation: difference in differences
• difference in the control group: Noise
• difference in treatment group: Effect + Noise
• Metrics: ∆TREATED − ∆UNTREATED
• Advantages of a geographical experiment:
• no multi-device tracking necessary
• easy integration with external data
• Caveat: Geographical groups really need to be comparable
(e.g. commuters)
Difference in Differences
GNU R and Google Analytics: Dream Team
1. Selection of the treated and control group
• Install R, generate a sample with GNU R
• Export: Copy & paste to AdWords
2. Data collection
• Google Analytics already configured
3. Aggregation and query
• In the cloud: Google Analytics Query Explorer
• Integration with RGoogleAnalytics
4. Evaluation: Estimation and Visualization
• All necessary functions available as packages in R
About R
• Programming language and software environment for statistical
computing and graphics, a dialect of S
• Quite lean; functionality is divided into modular packages
• Graphics better than in most stat packages.
• Useful for interactive work, but contains a powerful programming
language for developing new tools (user -> programmer)
• Very active and vibrant user community; R-help and R-devel mailing
lists and Stack Overflow
• Markdown packages for reproducable research and automated
reporting
• It’s free!
Install R
• Open Source for Windows / Mac / Linux etc.
• GNU R: https://www.r-project.org/
• RStudio IDE: http://www.rstudio.com
• Cheat Sheets to help!
• R Reference Card
• RStudio cheatsheets
• Package management via CRAN
install.packages('RGoogleAnalytics',
repos = "http://cran.no.r-project.org");
install.packages('plm',
repos = "http://cran.no.r-project.org");
install.packages('ggplot2',
repos = "http://cran.no.r-project.org");
Selecting Treatment Group
download.file('https://goo.gl/qVgiYp',
destfile='geoid.csv');
#Kommune level selection, but Fylke level also possible
regions <- read.csv('geoid.csv');
norway<-regions[which(regions$Country.Code == 'NO'
& regions$Target.Type == 'County'
& regions$Status == 'Active'),];
set.seed(1);
norway$isTreatment <- sample(c(0,1),
nrow(norway), replace =T)
write.csv(norway, file='norway.csv');
# paste into AdWords
writeLines(as.vector(
norway[which(norway$isTreatment == '1'),]$Canonical.Name),
file('treatment.csv'));
Configuring Google AdWords I
Configuring Google AdWords II
Configuring Google AdWords III
Configuring Google AdWords IV
Configuring Google AdWords V
Done!!
Wait
. . . for the results
Google Analytics Core Reporting API & R
1. Create an “app”
• Google Developers page
• Enable Google Analytics API
• Create Credentials: OAuth client ID, Application type: Other
• Result: Client ID and Client Secret
2. Find your GA Profile ID
Setting up GNU R
client.id <- 'xxxxxxxxxxxxxxx.apps.googleusercontent.com';
client.secret <- 'xxxxxxxxxxxxxxx';
analyticsProfileId <- '111111111';
# redirect to google, paste, code
require(RGoogleAnalytics);
token <- Auth(client.id, client.secret)
# save
save(token, file = 'gatoken.txt');
# next time
token <- load("./gatoken.txt")
ValidateToken(token);
Create a query
query.list <- Init(start.date = "2015-10-01",
end.date = "2016-02-29",
dimensions = "ga:region,ga:date,ga:medium",
metrics = "ga:sessions,ga:transactionRevenue",
filter = "ga:country==Norway",
max.results = 50000,
sort = "-ga:date,ga:region",
table.id = paste0("ga:",analyticsProfileId));
ga.query <- QueryBuilder(query.list);
ga.data <- GetReportData(ga.query, token);
Real Data Example - www.flexponsive.net
kable(head(ga.data))
region date medium country sessions transactio
Brussels 20160229 referral Belgium 1
State of Parana 20160229 referral Brazil 1
Baden-Wurttemberg 20160229 organic Germany 1
Baden-Wurttemberg 20160229 referral Germany 1
Rhineland-Palatinate 20160229 referral Germany 1
(not set) 20160229 (none) Hong Kong 5
Tip: Query Explorer
Tip2: Dimensions & Metrics Explorer
Tip3: Avoiding sampling
> ga.data <- GetReportData(ga.query, token)
Status of Query:
The API returned 1393 results
The query response contains sampled data. It is based on
XX.XX % of your visits. You can split the query day-wise
in order to reduce the effect of sampling.
Set split_daywise = T in the GetReportData function
Note that split_daywise = T will automatically ....
• “Sampling occurs automatically when more than 500,000 sessions
(25M for Premium) are collected for a report, allowing Google
Analytics to generate reports more quickly for those large data sets.”
Data Integration
• Wide Format: for each region and time a row
• Long Format: Region / time / dimension one line (EAV)
require (reshape2);
## Loading required package: reshape2
w <- reshape (ga.data, timevar = 'medium',
idvar = c( 'region', 'date'), direction = 'wide');
Data Integration: Almost finished
• Merge: Who is in which group?
ds <- merge (w, norway[, c ( 'Name', 'isTreatment')],
by.x = 'region', by.y = 'Name', all.x = T)
• Data set is ready!
• Comfortable DSL for data manipulation
• Use packages to minimize code
Case Study: Wanderlust
Case Study: Wanderlust
• an app “developed” for this presentation
• mysterious weekend getaway and short holidays booking engine
• supports inventory management of hotels and airlines
• seasonal demand fluctuations
Evaluation
• Simulated data for illustration: 3 summer months
• 1st August: experiment starts in 10 random provinces (fylke) -
AdWords stopped
• 1st August: start of school, search volume falls everywhere by 50%
• Scenario: 100% of visitors click organically when the AdWord invisible
• Randomization has decided:
• Sor-Trondelag (Trondheim): In the treatment group - from 1st August
no AdWords
• Hordaland (Bergen): In the control group - AdWords continue
Revenues in Sor-Trondelag (treatment)
60
80
100
120
Jun 01 Jun 15 Jul 01 Jul 15 Aug 01 Aug 15 Sep 01
date
transactionRevenue.total
Revenues in Hordaland (control)
60
80
100
120
Jun 01 Jun 15 Jul 01 Jul 15 Aug 01 Aug 15 Sep 01
date
transactionRevenue.total
Revenues in both Fylke
60
80
100
120
Jun 01 Jun 15 Jul 01 Jul 15 Aug 01 Aug 15 Sep 01
date
transactionRevenue.total
region Hordaland Sor−Trondelag
ROI Calculation - standard regression
require(stargazer);
out <- lm(transactionRevenue.total ~ isTreatment.cpc,
data = sd.w)
stargazer(out, header=FALSE, type='latex')
Table 2
Dependent variable:
transactionRevenue.total
isTreatment.cpc −48.358∗∗∗
(1.350)
Constant 111.350∗∗∗
(0.569)
Observations 1,748
R2
0.424
ROI Calculation - standard regression
• Standard OLS regression with binary variable == comparing means
• But not the right ones. In this case:
Revenues = β0 + β1 ∗ treatment
• The treatment takes value 1 for the treatment group after the
AdWords were stopped in Sor-Trondelag, otherwise 0
• As a result β1 represents the difference between the average revenues
in Sor-Trondelag in August and average revenues in Hordaland and
Sor-Trondelag in June and July
• That’s clearly now what we are looking for!!
Difference in Differences
ROI Calculation - Differences in Differences
require(plm)
out <- plm(transactionRevenue.total ~ isTreatment.cpc,
data=sd.w, index=c("region", "date"), model="between
stargazer(out, header=FALSE, type='latex')
Table 3
Dependent variable:
transactionRevenue.total
isTreatment.cpc 0.189
(0.254)
Constant 102.741∗∗∗
(0.062)
Observations 19
R2
0.032
ROI Calculation - Difference in Differences
• Difference in Differences estimator using fixed effects model with
binary varaibles allows to calculate the true effect of the treatment
• Econometrically we estimate this equation:
Revenues = β0 + β1 ∗ treatment + β2 ∗ before + γ ∗ fylke
• fylke is a matrix of binary variables for each district
• before is a binary variable takes value 0 in a period in which AdWords
were running in all districts and value 1 in period in which experiment
was started in some regions
• treatment takes value 1 for the treatment group in the preiod in
which the experimetn was started, i.e. after the AdWords were
stopped in Sor-Trondelag, otherwise 0
• The estimation result reveals the true impact of AdWords on
revenues in this data set
Discussion
• The Missing counterfactual - we do not know what else could be
happening - help: Experiment
• Challenge: Big Data without Big Code - Google Analytics & GNU R -
Very rich toolbox
• Result: Differences in Differences can work - note assumptions
Table of Contents
Intro
Brand Keywords
The eBay Study
Calculating the true ROI
Brand keywords with R
Configuring Experiment
Using Google Analytics API
AdWords experiment: an example
Regression Results

More Related Content

Viewers also liked

Practical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and MethodsPractical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and MethodsZhipeng Liang
 
8 landing page tips for non profits and small businesses
8 landing page tips for non profits and small businesses8 landing page tips for non profits and small businesses
8 landing page tips for non profits and small businessesBecky Livingston
 
Webinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data ScienceWebinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data ScienceQuanticMind
 
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...Traction Conf
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsArmando Vieira
 
How Data Science can increase Ecommerce profits
How Data Science can increase Ecommerce profitsHow Data Science can increase Ecommerce profits
How Data Science can increase Ecommerce profitsRomexsoft
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsJohann de Boer
 
Web data from R
Web data from RWeb data from R
Web data from Rschamber
 
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo Traction Conf
 
Media Mix Optimization - The Starting Point for Customer-Centric Communications
Media Mix Optimization - The Starting Point for Customer-Centric CommunicationsMedia Mix Optimization - The Starting Point for Customer-Centric Communications
Media Mix Optimization - The Starting Point for Customer-Centric CommunicationsAcxiom Corporation
 
Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with RJeffrey Breen
 

Viewers also liked (14)

Practical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and MethodsPractical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and Methods
 
8 landing page tips for non profits and small businesses
8 landing page tips for non profits and small businesses8 landing page tips for non profits and small businesses
8 landing page tips for non profits and small businesses
 
Webinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data ScienceWebinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data Science
 
El afiche
El aficheEl afiche
El afiche
 
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...
Do You Want to Be Rolling Stones or Vanilla Ice? by Steve Sloan, Chief Produc...
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithms
 
How Data Science can increase Ecommerce profits
How Data Science can increase Ecommerce profitsHow Data Science can increase Ecommerce profits
How Data Science can increase Ecommerce profits
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalytics
 
Marketers, make procurement your friend
Marketers, make procurement your friendMarketers, make procurement your friend
Marketers, make procurement your friend
 
Web data from R
Web data from RWeb data from R
Web data from R
 
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
 
Using R with Hadoop
Using R with HadoopUsing R with Hadoop
Using R with Hadoop
 
Media Mix Optimization - The Starting Point for Customer-Centric Communications
Media Mix Optimization - The Starting Point for Customer-Centric CommunicationsMedia Mix Optimization - The Starting Point for Customer-Centric Communications
Media Mix Optimization - The Starting Point for Customer-Centric Communications
 
Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with R
 

Recently uploaded

BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Marketing Management Presentation Final.pptx
Marketing Management Presentation Final.pptxMarketing Management Presentation Final.pptx
Marketing Management Presentation Final.pptxabhishekshetti14
 
Kraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentationKraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentationtbatkhuu1
 
What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?riteshhsociall
 
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15SearchNorwich
 
Avoid the 2025 web accessibility rush: do not fear WCAG compliance
Avoid the 2025 web accessibility rush: do not fear WCAG complianceAvoid the 2025 web accessibility rush: do not fear WCAG compliance
Avoid the 2025 web accessibility rush: do not fear WCAG complianceDamien ROBERT
 
Local SEO Domination: Put your business at the forefront of local searches!
Local SEO Domination:  Put your business at the forefront of local searches!Local SEO Domination:  Put your business at the forefront of local searches!
Local SEO Domination: Put your business at the forefront of local searches!dstvtechnician
 
How to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setupsHow to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setupsssuser4571da
 
Defining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotlerDefining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotlerAmirNasiruog
 
Brand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdfBrand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdftbatkhuu1
 
Uncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsUncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsVWO
 
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024Richard Ingilby
 
Call Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRCall Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRSapana Sha
 
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessBrighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessVarn
 
Social Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa
 
Unraveling the Mystery of Roanoke Colony: What Really Happened?
Unraveling the Mystery of Roanoke Colony: What Really Happened?Unraveling the Mystery of Roanoke Colony: What Really Happened?
Unraveling the Mystery of Roanoke Colony: What Really Happened?elizabethella096
 
April 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupApril 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupVbout.com
 

Recently uploaded (20)

BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
 
Marketing Management Presentation Final.pptx
Marketing Management Presentation Final.pptxMarketing Management Presentation Final.pptx
Marketing Management Presentation Final.pptx
 
Kraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentationKraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentation
 
What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?
 
Foundation First - Why Your Website and Content Matters - David Pisarek
Foundation First - Why Your Website and Content Matters - David PisarekFoundation First - Why Your Website and Content Matters - David Pisarek
Foundation First - Why Your Website and Content Matters - David Pisarek
 
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
 
Avoid the 2025 web accessibility rush: do not fear WCAG compliance
Avoid the 2025 web accessibility rush: do not fear WCAG complianceAvoid the 2025 web accessibility rush: do not fear WCAG compliance
Avoid the 2025 web accessibility rush: do not fear WCAG compliance
 
Local SEO Domination: Put your business at the forefront of local searches!
Local SEO Domination:  Put your business at the forefront of local searches!Local SEO Domination:  Put your business at the forefront of local searches!
Local SEO Domination: Put your business at the forefront of local searches!
 
How to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setupsHow to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setups
 
Defining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotlerDefining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotler
 
Brand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdfBrand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdf
 
Uncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsUncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 Reports
 
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
 
Call Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRCall Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCR
 
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessBrighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
 
Social Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdf
 
Unraveling the Mystery of Roanoke Colony: What Really Happened?
Unraveling the Mystery of Roanoke Colony: What Really Happened?Unraveling the Mystery of Roanoke Colony: What Really Happened?
Unraveling the Mystery of Roanoke Colony: What Really Happened?
 
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
 
April 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupApril 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting Group
 
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel LeminTurn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
 

Google Analytics and AdWords optimisation with GNU R

  • 1. Google Analytics and AdWords optimisation with GNU R Hinnerk Gnutzmann & Piotr Śpiewanowski flexponsive UG Booster Conference, 9th March 2016
  • 2. About flexponsive UG • e-commerce consulting • Big Data focus • Qualitative user testing • Academic (PhD in economics) and programming background Contact • mailto: spiewanowski@flexponsive.net • web: https://www.flexponsive.net/ • t: @flexponsive
  • 3. Topic of the day • Marketing outcomes • difficult to define • even more difficult to measure • Before Big Data: “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” (John Wanamaker, 1838 - 1922) • With Big Data: “AdWords brand keyword ads have no measurable short-term benefis” (Blake et al., 2015) - 100% wasted? • Open Questions: • Incrementality Debate: Do AdWords campaings cannibalise organic traffic? • Quality: Are bought visitors good or bad customers? • Heterogenity: Campaign effects differ between customers?
  • 4. Agenda 1. Case study Brand Keyword: The Secret of vanishing AdWords ROI 2. What can we do? • attribution models • controlled experiments • GNU R & Analytics: A Dream Team 3. How to do that? • Google Core Reporting API & GNU R • GA Query Explorer • Configuring an experiment in AdWords 4. Analysis with GNU R • Data wrangling, sampling, etc. • GA replicate metrics • Regression Analysis 5. Case Study II: adClicks and rain in Bergen
  • 7. What happened? • the AdWord is highly relevant to the search • Navigational Query: The visitor wants to visit Skandiabanken. • Customer knows the bank and maybe even has a service in mind • Result: Probably the best keyword in the account • Excellent CTR • Very good conversion on-site • CPC perhaps not so high • Any questions? • Organic result is the same! • What would you click if there was no ad?
  • 8. What happened? • the AdWord is highly relevant to the search • Navigational Query: The visitor wants to visit Skandiabanken. • Customer knows the bank and maybe even has a service in mind • Result: Probably the best keyword in the account • Excellent CTR • Very good conversion on-site • CPC perhaps not so high • Any questions? • Organic result is the same! • What would you click if there was no ad?
  • 9. What happened? • the AdWord is highly relevant to the search • Navigational Query: The visitor wants to visit Skandiabanken. • Customer knows the bank and maybe even has a service in mind • Result: Probably the best keyword in the account • Excellent CTR • Very good conversion on-site • CPC perhaps not so high • Any questions? • Organic result is the same! • What would you click if there was no ad?
  • 10. Example - Skandiabanken (without AdWords)
  • 11. ROI Problem: SEM expenditure a function not only of the campaign, but also of the behavior and intent of consumer
  • 12. The eBay study • Blake et al. (2015), “Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment” • Field Experiment: Does AdWords work for eBay? • Very controversial results: 1. Conventional methods used to measure the causal (incremental) impact of SEM vastly overstate its effect. 2. True effectiveness of SEM is small for a well-known company like eBay 3. Click substition: When the brand keyword AdWord disappeared, almost all the users click on the organic result 4. Informative Advertising: AdWords work if a visitor gains additional information through advertisement - AdWords had almost no effect on revenues from existing customers - They found their own way to eBay!
  • 13. What can be done? Attribution modelling But how to know the true channel’s impact?
  • 14. Attribution modelling • a way to divide the “credit” for a sale between different marketing channels • if you don’t know what attribution model you are using, it’s “last click” => you believe the sale only depends on the last ad the customer saw before purchasing • probably that’s not true: perhaps the customer had been following the company blog for a long time, heard friends talk off-line about the product, or saw many banner ads on different sides before making a purchase • problem: no good way to decide how to “attribute” between different marketing channels • results depend a lot on assumptions, which you cannot test • similar problem: if you advertise your brick-and-mortar store on TV and on radio, what drives the customer to your store?
  • 15. What can be done? Controlled experiments • Select by random treatment and control group, for example: • Per user: A / B Testing • By Geographical Region • Assumption: Without experiment, both groups behave similarly • Evaluation: difference in differences • difference in the control group: Noise • difference in treatment group: Effect + Noise • Metrics: ∆TREATED − ∆UNTREATED • Advantages of a geographical experiment: • no multi-device tracking necessary • easy integration with external data • Caveat: Geographical groups really need to be comparable (e.g. commuters)
  • 17. GNU R and Google Analytics: Dream Team 1. Selection of the treated and control group • Install R, generate a sample with GNU R • Export: Copy & paste to AdWords 2. Data collection • Google Analytics already configured 3. Aggregation and query • In the cloud: Google Analytics Query Explorer • Integration with RGoogleAnalytics 4. Evaluation: Estimation and Visualization • All necessary functions available as packages in R
  • 18. About R • Programming language and software environment for statistical computing and graphics, a dialect of S • Quite lean; functionality is divided into modular packages • Graphics better than in most stat packages. • Useful for interactive work, but contains a powerful programming language for developing new tools (user -> programmer) • Very active and vibrant user community; R-help and R-devel mailing lists and Stack Overflow • Markdown packages for reproducable research and automated reporting • It’s free!
  • 19. Install R • Open Source for Windows / Mac / Linux etc. • GNU R: https://www.r-project.org/ • RStudio IDE: http://www.rstudio.com • Cheat Sheets to help! • R Reference Card • RStudio cheatsheets • Package management via CRAN install.packages('RGoogleAnalytics', repos = "http://cran.no.r-project.org"); install.packages('plm', repos = "http://cran.no.r-project.org"); install.packages('ggplot2', repos = "http://cran.no.r-project.org");
  • 20. Selecting Treatment Group download.file('https://goo.gl/qVgiYp', destfile='geoid.csv'); #Kommune level selection, but Fylke level also possible regions <- read.csv('geoid.csv'); norway<-regions[which(regions$Country.Code == 'NO' & regions$Target.Type == 'County' & regions$Status == 'Active'),]; set.seed(1); norway$isTreatment <- sample(c(0,1), nrow(norway), replace =T) write.csv(norway, file='norway.csv'); # paste into AdWords writeLines(as.vector( norway[which(norway$isTreatment == '1'),]$Canonical.Name), file('treatment.csv'));
  • 27. Wait . . . for the results
  • 28. Google Analytics Core Reporting API & R 1. Create an “app” • Google Developers page • Enable Google Analytics API • Create Credentials: OAuth client ID, Application type: Other • Result: Client ID and Client Secret 2. Find your GA Profile ID
  • 29. Setting up GNU R client.id <- 'xxxxxxxxxxxxxxx.apps.googleusercontent.com'; client.secret <- 'xxxxxxxxxxxxxxx'; analyticsProfileId <- '111111111'; # redirect to google, paste, code require(RGoogleAnalytics); token <- Auth(client.id, client.secret) # save save(token, file = 'gatoken.txt'); # next time token <- load("./gatoken.txt") ValidateToken(token);
  • 30. Create a query query.list <- Init(start.date = "2015-10-01", end.date = "2016-02-29", dimensions = "ga:region,ga:date,ga:medium", metrics = "ga:sessions,ga:transactionRevenue", filter = "ga:country==Norway", max.results = 50000, sort = "-ga:date,ga:region", table.id = paste0("ga:",analyticsProfileId)); ga.query <- QueryBuilder(query.list); ga.data <- GetReportData(ga.query, token);
  • 31. Real Data Example - www.flexponsive.net kable(head(ga.data)) region date medium country sessions transactio Brussels 20160229 referral Belgium 1 State of Parana 20160229 referral Brazil 1 Baden-Wurttemberg 20160229 organic Germany 1 Baden-Wurttemberg 20160229 referral Germany 1 Rhineland-Palatinate 20160229 referral Germany 1 (not set) 20160229 (none) Hong Kong 5
  • 33. Tip2: Dimensions & Metrics Explorer
  • 34. Tip3: Avoiding sampling > ga.data <- GetReportData(ga.query, token) Status of Query: The API returned 1393 results The query response contains sampled data. It is based on XX.XX % of your visits. You can split the query day-wise in order to reduce the effect of sampling. Set split_daywise = T in the GetReportData function Note that split_daywise = T will automatically .... • “Sampling occurs automatically when more than 500,000 sessions (25M for Premium) are collected for a report, allowing Google Analytics to generate reports more quickly for those large data sets.”
  • 35. Data Integration • Wide Format: for each region and time a row • Long Format: Region / time / dimension one line (EAV) require (reshape2); ## Loading required package: reshape2 w <- reshape (ga.data, timevar = 'medium', idvar = c( 'region', 'date'), direction = 'wide');
  • 36. Data Integration: Almost finished • Merge: Who is in which group? ds <- merge (w, norway[, c ( 'Name', 'isTreatment')], by.x = 'region', by.y = 'Name', all.x = T) • Data set is ready! • Comfortable DSL for data manipulation • Use packages to minimize code
  • 38. Case Study: Wanderlust • an app “developed” for this presentation • mysterious weekend getaway and short holidays booking engine • supports inventory management of hotels and airlines • seasonal demand fluctuations
  • 39. Evaluation • Simulated data for illustration: 3 summer months • 1st August: experiment starts in 10 random provinces (fylke) - AdWords stopped • 1st August: start of school, search volume falls everywhere by 50% • Scenario: 100% of visitors click organically when the AdWord invisible • Randomization has decided: • Sor-Trondelag (Trondheim): In the treatment group - from 1st August no AdWords • Hordaland (Bergen): In the control group - AdWords continue
  • 40. Revenues in Sor-Trondelag (treatment) 60 80 100 120 Jun 01 Jun 15 Jul 01 Jul 15 Aug 01 Aug 15 Sep 01 date transactionRevenue.total
  • 41. Revenues in Hordaland (control) 60 80 100 120 Jun 01 Jun 15 Jul 01 Jul 15 Aug 01 Aug 15 Sep 01 date transactionRevenue.total
  • 42. Revenues in both Fylke 60 80 100 120 Jun 01 Jun 15 Jul 01 Jul 15 Aug 01 Aug 15 Sep 01 date transactionRevenue.total region Hordaland Sor−Trondelag
  • 43. ROI Calculation - standard regression require(stargazer); out <- lm(transactionRevenue.total ~ isTreatment.cpc, data = sd.w) stargazer(out, header=FALSE, type='latex') Table 2 Dependent variable: transactionRevenue.total isTreatment.cpc −48.358∗∗∗ (1.350) Constant 111.350∗∗∗ (0.569) Observations 1,748 R2 0.424
  • 44. ROI Calculation - standard regression • Standard OLS regression with binary variable == comparing means • But not the right ones. In this case: Revenues = β0 + β1 ∗ treatment • The treatment takes value 1 for the treatment group after the AdWords were stopped in Sor-Trondelag, otherwise 0 • As a result β1 represents the difference between the average revenues in Sor-Trondelag in August and average revenues in Hordaland and Sor-Trondelag in June and July • That’s clearly now what we are looking for!!
  • 46. ROI Calculation - Differences in Differences require(plm) out <- plm(transactionRevenue.total ~ isTreatment.cpc, data=sd.w, index=c("region", "date"), model="between stargazer(out, header=FALSE, type='latex') Table 3 Dependent variable: transactionRevenue.total isTreatment.cpc 0.189 (0.254) Constant 102.741∗∗∗ (0.062) Observations 19 R2 0.032
  • 47. ROI Calculation - Difference in Differences • Difference in Differences estimator using fixed effects model with binary varaibles allows to calculate the true effect of the treatment • Econometrically we estimate this equation: Revenues = β0 + β1 ∗ treatment + β2 ∗ before + γ ∗ fylke • fylke is a matrix of binary variables for each district • before is a binary variable takes value 0 in a period in which AdWords were running in all districts and value 1 in period in which experiment was started in some regions • treatment takes value 1 for the treatment group in the preiod in which the experimetn was started, i.e. after the AdWords were stopped in Sor-Trondelag, otherwise 0 • The estimation result reveals the true impact of AdWords on revenues in this data set
  • 48. Discussion • The Missing counterfactual - we do not know what else could be happening - help: Experiment • Challenge: Big Data without Big Code - Google Analytics & GNU R - Very rich toolbox • Result: Differences in Differences can work - note assumptions
  • 49. Table of Contents Intro Brand Keywords The eBay Study Calculating the true ROI Brand keywords with R Configuring Experiment Using Google Analytics API AdWords experiment: an example Regression Results