SlideShare uma empresa Scribd logo
1 de 49
Baixar para ler offline
Copyright © 2015 Criteo
The Criteo Experience
Olivier Koch
Engineering Program Manager, Criteo
TektosData Meetup “Data Meets Business”
May 31, 2016
Copyright © 2015 Criteo
Outline
• What does Criteo do?
• Deep dive into our technical stack
• Delivery at scale
• A few lessons learned
2
Copyright © 2015 Criteo
Banners… what else?
3
Advertiser Publisher
Copyright © 2015 Criteo
Online advertising at scale
4
3B displays / day
40 PB of data
15,000 servers
worldwide
Copyright © 2015 Criteo
• Deep dive into Criteo
Copyright © 2015 Criteo
6
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Copyright © 2015 Criteo
7
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Copyright © 2015 Criteo
 As we sell performance Criteo’s and client’s interests are aligned, so the engine aims at maximizing
the value we generate to our clients
 As the cost of a display is lower and independant from the bid (2nd price auction or floor), we should
always bid the maximum value that the client is willing to pay for a display
We bid the expected value of the display for the client
Value = 1€
CPM = 0,6€
CPM = 0,7€
CPM = 0,75€
CPM = 1,1€
CPM = 1,2€
CPM = 1,3€
This bidding strategy is optimal: we are sure to buy all profitable displays and only them
Copyright © 2015 Criteo
Bid =   CPC  pClick  pSale  AOV
2012 - Ensures constant
value allocation between
Criteo and its clients
2014 - COS
Optimizer
2013 - CRO :
“Conversion Rate
Optimizer”
This value depends on the predicted performance and the
client’s objective
Revenue that the display will generate for the clientMaximum share that
the client is willing to
pay
Copyright © 2015 Criteo
We train our prediction models on our historical displays
Historical displays
Variables
 Level of engagement of the user
 Quality of inventory
 User fatigue
 For travel: time to check-in and number
of nights
: clicked displays : converted displays (size = order value)
Our ability to predict relies
greatly on the relevance of
the variables we consider
Machine Learning
Algorithms
Copyright © 2015 Criteo
11
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Copyright © 2015 Criteo
Recommend products for a user
• What we want: reco(user) = products
• 1B users x 3B products!
• But we need to scale and keep it fresh
Copyright © 2015 Criteo
User X saw orange shoes
Users who saw these same shoes also saw
Most viewed product on the client’s site are
We use collaborative filtering to select candidate products
Candidate products for user X are
Historical
Similar
Best-of
Copyright © 2015 Criteo
Products delivering the best performance are displayed
Variables
 Products seen by the user
 Time since product event
 Level of similarity
 Product features
Historical displays
: clicked products : converted products (size = order value)
Products are selected based
on their pClick x pSale x AOV
Machine Learning
Algorithms
Copyright © 2015 Criteo
15
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Copyright © 2015 Criteo
Historical displays (color = look & feel)
We train our prediction models on our historical displays
Variables
Some of which we control:
 How user interacts with banner
 Organization of information
 Colorset
Some of which we don’t:
 Zone format
 Publisher
: clicked displays : converted displays (size = order value)
Look and feel will be selected
based on its pClick x pSale x AOV
My company
BUY! BUY! BUY!
BUY!
Machine Learning
Algorithms
Copyright © 2015 Criteo
17
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Copyright © 2015 Criteo
 Predict: 𝔼 𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡 = ℙ 𝐶𝑙𝑖𝑐𝑘 ℙ 𝑆𝑎𝑙𝑒|𝐶𝑙𝑖𝑐𝑘 𝔼[𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡|𝑆𝑎𝑙𝑒]
 Each model is trained independently & refreshed as often as possible
 Three sources of features: user, ad, page (mostly categorical).
Optimizing for sales amount
(logistic) (logistic) (log normal) (all regularized!)
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
leads to
1 sale
Copyright © 2015 Criteo
 We have our own large-scale distributed machine learning library on top of Hadoop used for all models.
 From a ML perspective we rely, in most cases, on an L-BFGS solver initialized with SGD (see, eg, A.
Agarwal et al. A Reliable Effective Terascale Linear Learning System).
In-house Machine Learning library -- IRMA
Copyright © 2015 Criteo
Learning duration: trading time and volume
Longer ⇒ Volume ↑ VS Shorter ⇒ Reactivity ↑
23
100
110
120
130
140
150
160
170
180
190
200
11/01/2014 21/01/2014 31/01/2014 10/02/2014 20/02/2014
Salesamount(€)
Valentine’s day eve
Precision
Learning duration
12/02/2014 13/02/2014 14/02/2014 15/02/2014
16/02/2014 17/02/2014 18/02/2014 All
Copyright © 2015 Criteo
 Each model is trained on several TB of data and contains millions of features
 We learn several hundreds of models, refreshed many times per day
 How about large-scale distributed machine learning?
Wait a minute: how do you handle TBs of training data?
+ =
Copyright © 2015 Criteo
 Hadoop AllReduce
 L-BFGS, being a batch algorithm, is easy to distribute (by distributing the computation of the gradient),
while it’s more difficult with SGD; we do parameter averaging for that, which needs some tweaking
(learning rate, number of epochs, …). In SGD, we use Hogwild! to multi-thread.
 Zookeeper to ensure fault-tolerance.
Distribution of L-BFGS & SGD
Copyright © 2015 Criteo
 Irma is not only about vanilla logistic regression with L2 regularization; it contains more advanced
techniques: transfer learning, factorization machines, learning to rank, …
 We for example use cost-sensitive learning for bidding.
A word on advanced techniques
Copyright © 2015 Criteo
Two steps:
 Offline testing is fast, cheap, and efficient for wide exploration
 Online testing is expensive but has the ultimate word
 The more data you have, the faster you can make decisions
Offline & online evaluation
Copyright © 2015 Criteo
28
Physical infrastructure
7 in-house data centers on 3 continents
~ 15000 servers, largest Hadoop cluster in Europe
More than 35 PB of storage Big Data
Traffic
800k HTTP requests / sec (peak activity)
29000 impressions / sec (peak activity)
<10 ms to process bidding request
<100 ms to process reco request
Copyright © 2015 Criteo
Academic research @ Criteo
• Our 1st public dataset is online: http://bit.ly/1vgw2XC
• New 1TB dataset released last year
• Recent publications:
Offline evaluation of response prediction in online advertising auctions, O. Chapelle, WWW’15.
Sources of variability in large-scale machine learning systems, D. Lefortier, A. Truchet, and M.
de Rijke, NIPS workshop on ML systems, 2015
Cost-sensitive learning for bidding in online advertising auctions, F. Vasile and D. Lefortier,
NIPS workshop on ML for e-commerce, 2015.
29
Copyright © 2015 Criteo
New areas of research
• Counterfactual evaluation (offline A/B tests)
• Product embeddings for recommendation
• Policy learning
30
Copyright © 2015 Criteo
• Delivery at scale
Copyright © 2015 Criteo
The early days of Criteo
32
Single C# repository
Build in 90 minutes
Weekly merges
Copyright © 2015 Criteo
What could go wrong?
33
Copyright © 2015 Criteo
34
Copyright © 2015 Criteo
Delivery at scale at Criteo
35
Trunk-based development (TBD)
Fast commits
Code reviews with Gerrit
The MOAB
Deploy with scp / bittorrent
Automatic metrics checks
=> 200+ happy engineers!
Copyright © 2015 Criteo
The Criteo MOAB
36
Copyright © 2015 Criteo
Delivery at scale at Criteo
37
Copyright © 2015 Criteo
• A few lessons learned
Copyright © 2015 Criteo
Start small
• If you can't build it with a few machines, it's likely you won't be able to do it with
many
39
First Google computer
Copyright © 2015 Criteo
Start small
• Keep fancy algorithms for later
40
The Page rank algorithm
Copyright © 2015 Criteo
Iterate fast
• Easy access to data (20PB vs 4GB of clean, carefully selected data)
• Convenient technologies (e.g. Python & notebooks, scikit-learn)
• Make IT a non-problem
• Keep projects small (typical project size 3-9 months)
41
Copyright © 2015 Criteo
Iterate fast
• Easy access to data (20PB vs 4GB of clean, carefully selected data)
• Convenient technologies (e.g. Python & notebooks, scikit-learn)
• Make IT a non-problem
• Keep projects small (typical project size 3-9 months)
42
Talent magnet
Copyright © 2015 Criteo
Keep teams small
43
3 members
3 channels
4 members
6 channels
5 members
10 channels
10 members
45 channels
…
Copyright © 2015 Criteo
Build the right team
• Variety of skills
• Software/ML engineers, ops/devops
• Analysts/BI
• Product
• Designers
• Managers
44
Copyright © 2015 Criteo
Make the team agile
• Use a flat, distributed hierarchy model and make people sit next to each other
45
EPM
ENG LEAD
PM
MGR
Copyright © 2015 Criteo
Make the team agile
• Use the right tools
• slack
• jira
• confluence
• git
• gerrit
• OKR
46
Copyright © 2015 Criteo
Build the culture
• Let ideas emerge bottom-up
• Hackathons (for real)
• 10% projects
• Transparency : make info available to all
• Use mature technologies
• You will fail. That’s OK!
47
Copyright © 2015 Criteo
Take-aways
• Start small
• Iterate fast
• Build the team
• Make the team agile
• Build the culture
48
Copyright © 2015 Criteo
• Thanks! Questions?

Mais conteúdo relacionado

Mais procurados

criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015Carolyn Bednarz
 
Introduction Criteo - 2.0
Introduction Criteo - 2.0Introduction Criteo - 2.0
Introduction Criteo - 2.0Scott Turecek
 
Criteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) MeetupCriteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) MeetupIbrahim Abubakari
 
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...Digiday
 
Sis mon 1315 sponsored lunch criteo
Sis mon 1315 sponsored lunch criteoSis mon 1315 sponsored lunch criteo
Sis mon 1315 sponsored lunch criteoMediaPost
 
Ad Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchangeAd Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchangeAd Server Solutions
 
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of ClickersCriteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of ClickersCriteo
 
3 Minute Introduction
3 Minute Introduction3 Minute Introduction
3 Minute IntroductionJulian Tol
 
Online Ad Serving
Online Ad ServingOnline Ad Serving
Online Ad ServingNeha Gupta
 
Criteo - NOAH13 London
Criteo - NOAH13 LondonCriteo - NOAH13 London
Criteo - NOAH13 LondonNOAH Advisors
 
Your Future With Content Manager OnDemand
Your Future With Content Manager OnDemandYour Future With Content Manager OnDemand
Your Future With Content Manager OnDemandZia Consulting
 
When business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel AvivWhen business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel AvivZorin Radovancevic
 
ActOnCloud for Cloud Service Providers and Enterprises
ActOnCloud for Cloud Service Providers and EnterprisesActOnCloud for Cloud Service Providers and Enterprises
ActOnCloud for Cloud Service Providers and EnterprisesMadan Ganesh Velayudham
 
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & Control
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & ControlObtaining the Programmatic Holy Grail: Transparency, Flexibility & Control
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & ControlMediaPost
 
Our Experience with Adobe Audience Manager DMP
Our Experience with Adobe Audience Manager DMPOur Experience with Adobe Audience Manager DMP
Our Experience with Adobe Audience Manager DMPMatěj Novák
 
Alo tech master presentation short_google partners
Alo tech master presentation short_google partnersAlo tech master presentation short_google partners
Alo tech master presentation short_google partnersCenk Soyak
 
Axonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffersAxonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffersYuval Shefler
 

Mais procurados (18)

criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015
 
Introduction Criteo - 2.0
Introduction Criteo - 2.0Introduction Criteo - 2.0
Introduction Criteo - 2.0
 
Criteo Couchbase live 2015
Criteo Couchbase live 2015Criteo Couchbase live 2015
Criteo Couchbase live 2015
 
Criteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) MeetupCriteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) Meetup
 
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...
Back to the Future: Bringing Performance Targeting to Mobile Devices from DRS...
 
Sis mon 1315 sponsored lunch criteo
Sis mon 1315 sponsored lunch criteoSis mon 1315 sponsored lunch criteo
Sis mon 1315 sponsored lunch criteo
 
Ad Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchangeAd Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchange
 
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of ClickersCriteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
 
3 Minute Introduction
3 Minute Introduction3 Minute Introduction
3 Minute Introduction
 
Online Ad Serving
Online Ad ServingOnline Ad Serving
Online Ad Serving
 
Criteo - NOAH13 London
Criteo - NOAH13 LondonCriteo - NOAH13 London
Criteo - NOAH13 London
 
Your Future With Content Manager OnDemand
Your Future With Content Manager OnDemandYour Future With Content Manager OnDemand
Your Future With Content Manager OnDemand
 
When business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel AvivWhen business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel Aviv
 
ActOnCloud for Cloud Service Providers and Enterprises
ActOnCloud for Cloud Service Providers and EnterprisesActOnCloud for Cloud Service Providers and Enterprises
ActOnCloud for Cloud Service Providers and Enterprises
 
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & Control
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & ControlObtaining the Programmatic Holy Grail: Transparency, Flexibility & Control
Obtaining the Programmatic Holy Grail: Transparency, Flexibility & Control
 
Our Experience with Adobe Audience Manager DMP
Our Experience with Adobe Audience Manager DMPOur Experience with Adobe Audience Manager DMP
Our Experience with Adobe Audience Manager DMP
 
Alo tech master presentation short_google partners
Alo tech master presentation short_google partnersAlo tech master presentation short_google partners
Alo tech master presentation short_google partners
 
Axonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffersAxonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffers
 

Semelhante a Criteo TektosData Meetup

Recommendation at scale
Recommendation at scaleRecommendation at scale
Recommendation at scalesimondolle
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Dataconomy Media
 
Tech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoTech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoGilles Legoux
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
 
VUCA - Planning for the essentially unplannable in a disruptive world
VUCA - Planning for the essentially unplannable in a disruptive worldVUCA - Planning for the essentially unplannable in a disruptive world
VUCA - Planning for the essentially unplannable in a disruptive worldJoakim Lindbom
 
Enterprise Cloud Adoption
Enterprise Cloud Adoption Enterprise Cloud Adoption
Enterprise Cloud Adoption Tom Laszewski
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionMurtaza Doctor
 
Data_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdfData_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdfprevota
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceSkillspeed
 
Platform approach to scaling machine learning across the enterprise
Platform approach to scaling machine learning across the enterprisePlatform approach to scaling machine learning across the enterprise
Platform approach to scaling machine learning across the enterpriseOlalekan Fuad Elesin
 
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du CloudObjectif Libre
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at MyplanetDaniel Zivkovic
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseThe Hive
 
Ingesting Click Data for Analytics
Ingesting Click Data for AnalyticsIngesting Click Data for Analytics
Ingesting Click Data for AnalyticsClickMeter
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamSenturus
 
Optimizing Innovation: Modular Toolchains that Enable Digital Transformations
Optimizing Innovation: Modular Toolchains that Enable Digital TransformationsOptimizing Innovation: Modular Toolchains that Enable Digital Transformations
Optimizing Innovation: Modular Toolchains that Enable Digital TransformationsDevOps.com
 
Optimizing Innovation- Modular Toolchains that Enable Digital Transformations
Optimizing Innovation-  Modular Toolchains that Enable Digital TransformationsOptimizing Innovation-  Modular Toolchains that Enable Digital Transformations
Optimizing Innovation- Modular Toolchains that Enable Digital TransformationsTasktop
 

Semelhante a Criteo TektosData Meetup (20)

Recommendation at scale
Recommendation at scaleRecommendation at scale
Recommendation at scale
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo
 
Tech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoTech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @Criteo
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
VUCA - Planning for the essentially unplannable in a disruptive world
VUCA - Planning for the essentially unplannable in a disruptive worldVUCA - Planning for the essentially unplannable in a disruptive world
VUCA - Planning for the essentially unplannable in a disruptive world
 
Enterprise Cloud Adoption
Enterprise Cloud Adoption Enterprise Cloud Adoption
Enterprise Cloud Adoption
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
 
Why choose-liferay
Why choose-liferayWhy choose-liferay
Why choose-liferay
 
Data_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdfData_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdf
 
Ingesting click events for analytics
Ingesting click events for analyticsIngesting click events for analytics
Ingesting click events for analytics
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-Commerce
 
Platform approach to scaling machine learning across the enterprise
Platform approach to scaling machine learning across the enterprisePlatform approach to scaling machine learning across the enterprise
Platform approach to scaling machine learning across the enterprise
 
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud
20151119 Sensibilisation des Utilisateurs aux coûts d'usage du Cloud
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at Myplanet
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the Enterprise
 
Ingesting Click Data for Analytics
Ingesting Click Data for AnalyticsIngesting Click Data for Analytics
Ingesting Click Data for Analytics
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science Team
 
Optimizing Innovation: Modular Toolchains that Enable Digital Transformations
Optimizing Innovation: Modular Toolchains that Enable Digital TransformationsOptimizing Innovation: Modular Toolchains that Enable Digital Transformations
Optimizing Innovation: Modular Toolchains that Enable Digital Transformations
 
Optimizing Innovation- Modular Toolchains that Enable Digital Transformations
Optimizing Innovation-  Modular Toolchains that Enable Digital TransformationsOptimizing Innovation-  Modular Toolchains that Enable Digital Transformations
Optimizing Innovation- Modular Toolchains that Enable Digital Transformations
 

Último

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 

Último (20)

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 

Criteo TektosData Meetup

  • 1. Copyright © 2015 Criteo The Criteo Experience Olivier Koch Engineering Program Manager, Criteo TektosData Meetup “Data Meets Business” May 31, 2016
  • 2. Copyright © 2015 Criteo Outline • What does Criteo do? • Deep dive into our technical stack • Delivery at scale • A few lessons learned 2
  • 3. Copyright © 2015 Criteo Banners… what else? 3 Advertiser Publisher
  • 4. Copyright © 2015 Criteo Online advertising at scale 4 3B displays / day 40 PB of data 15,000 servers worldwide
  • 5. Copyright © 2015 Criteo • Deep dive into Criteo
  • 6. Copyright © 2015 Criteo 6 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  • 7. Copyright © 2015 Criteo 7 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  • 8. Copyright © 2015 Criteo  As we sell performance Criteo’s and client’s interests are aligned, so the engine aims at maximizing the value we generate to our clients  As the cost of a display is lower and independant from the bid (2nd price auction or floor), we should always bid the maximum value that the client is willing to pay for a display We bid the expected value of the display for the client Value = 1€ CPM = 0,6€ CPM = 0,7€ CPM = 0,75€ CPM = 1,1€ CPM = 1,2€ CPM = 1,3€ This bidding strategy is optimal: we are sure to buy all profitable displays and only them
  • 9. Copyright © 2015 Criteo Bid =   CPC  pClick  pSale  AOV 2012 - Ensures constant value allocation between Criteo and its clients 2014 - COS Optimizer 2013 - CRO : “Conversion Rate Optimizer” This value depends on the predicted performance and the client’s objective Revenue that the display will generate for the clientMaximum share that the client is willing to pay
  • 10. Copyright © 2015 Criteo We train our prediction models on our historical displays Historical displays Variables  Level of engagement of the user  Quality of inventory  User fatigue  For travel: time to check-in and number of nights : clicked displays : converted displays (size = order value) Our ability to predict relies greatly on the relevance of the variables we consider Machine Learning Algorithms
  • 11. Copyright © 2015 Criteo 11 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  • 12. Copyright © 2015 Criteo Recommend products for a user • What we want: reco(user) = products • 1B users x 3B products! • But we need to scale and keep it fresh
  • 13. Copyright © 2015 Criteo User X saw orange shoes Users who saw these same shoes also saw Most viewed product on the client’s site are We use collaborative filtering to select candidate products Candidate products for user X are Historical Similar Best-of
  • 14. Copyright © 2015 Criteo Products delivering the best performance are displayed Variables  Products seen by the user  Time since product event  Level of similarity  Product features Historical displays : clicked products : converted products (size = order value) Products are selected based on their pClick x pSale x AOV Machine Learning Algorithms
  • 15. Copyright © 2015 Criteo 15 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  • 16. Copyright © 2015 Criteo Historical displays (color = look & feel) We train our prediction models on our historical displays Variables Some of which we control:  How user interacts with banner  Organization of information  Colorset Some of which we don’t:  Zone format  Publisher : clicked displays : converted displays (size = order value) Look and feel will be selected based on its pClick x pSale x AOV My company BUY! BUY! BUY! BUY! Machine Learning Algorithms
  • 17. Copyright © 2015 Criteo 17 Bidding •Should we bid? •At which price? Recommendation •Which products should we display? Look & Feel •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data
  • 18. Copyright © 2015 Criteo  Predict: 𝔼 𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡 = ℙ 𝐶𝑙𝑖𝑐𝑘 ℙ 𝑆𝑎𝑙𝑒|𝐶𝑙𝑖𝑐𝑘 𝔼[𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡|𝑆𝑎𝑙𝑒]  Each model is trained independently & refreshed as often as possible  Three sources of features: user, ad, page (mostly categorical). Optimizing for sales amount (logistic) (logistic) (log normal) (all regularized!)
  • 19. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays
  • 20. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks
  • 21. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks leads to 1 sale
  • 22. Copyright © 2015 Criteo  We have our own large-scale distributed machine learning library on top of Hadoop used for all models.  From a ML perspective we rely, in most cases, on an L-BFGS solver initialized with SGD (see, eg, A. Agarwal et al. A Reliable Effective Terascale Linear Learning System). In-house Machine Learning library -- IRMA
  • 23. Copyright © 2015 Criteo Learning duration: trading time and volume Longer ⇒ Volume ↑ VS Shorter ⇒ Reactivity ↑ 23 100 110 120 130 140 150 160 170 180 190 200 11/01/2014 21/01/2014 31/01/2014 10/02/2014 20/02/2014 Salesamount(€) Valentine’s day eve Precision Learning duration 12/02/2014 13/02/2014 14/02/2014 15/02/2014 16/02/2014 17/02/2014 18/02/2014 All
  • 24. Copyright © 2015 Criteo  Each model is trained on several TB of data and contains millions of features  We learn several hundreds of models, refreshed many times per day  How about large-scale distributed machine learning? Wait a minute: how do you handle TBs of training data? + =
  • 25. Copyright © 2015 Criteo  Hadoop AllReduce  L-BFGS, being a batch algorithm, is easy to distribute (by distributing the computation of the gradient), while it’s more difficult with SGD; we do parameter averaging for that, which needs some tweaking (learning rate, number of epochs, …). In SGD, we use Hogwild! to multi-thread.  Zookeeper to ensure fault-tolerance. Distribution of L-BFGS & SGD
  • 26. Copyright © 2015 Criteo  Irma is not only about vanilla logistic regression with L2 regularization; it contains more advanced techniques: transfer learning, factorization machines, learning to rank, …  We for example use cost-sensitive learning for bidding. A word on advanced techniques
  • 27. Copyright © 2015 Criteo Two steps:  Offline testing is fast, cheap, and efficient for wide exploration  Online testing is expensive but has the ultimate word  The more data you have, the faster you can make decisions Offline & online evaluation
  • 28. Copyright © 2015 Criteo 28 Physical infrastructure 7 in-house data centers on 3 continents ~ 15000 servers, largest Hadoop cluster in Europe More than 35 PB of storage Big Data Traffic 800k HTTP requests / sec (peak activity) 29000 impressions / sec (peak activity) <10 ms to process bidding request <100 ms to process reco request
  • 29. Copyright © 2015 Criteo Academic research @ Criteo • Our 1st public dataset is online: http://bit.ly/1vgw2XC • New 1TB dataset released last year • Recent publications: Offline evaluation of response prediction in online advertising auctions, O. Chapelle, WWW’15. Sources of variability in large-scale machine learning systems, D. Lefortier, A. Truchet, and M. de Rijke, NIPS workshop on ML systems, 2015 Cost-sensitive learning for bidding in online advertising auctions, F. Vasile and D. Lefortier, NIPS workshop on ML for e-commerce, 2015. 29
  • 30. Copyright © 2015 Criteo New areas of research • Counterfactual evaluation (offline A/B tests) • Product embeddings for recommendation • Policy learning 30
  • 31. Copyright © 2015 Criteo • Delivery at scale
  • 32. Copyright © 2015 Criteo The early days of Criteo 32 Single C# repository Build in 90 minutes Weekly merges
  • 33. Copyright © 2015 Criteo What could go wrong? 33
  • 34. Copyright © 2015 Criteo 34
  • 35. Copyright © 2015 Criteo Delivery at scale at Criteo 35 Trunk-based development (TBD) Fast commits Code reviews with Gerrit The MOAB Deploy with scp / bittorrent Automatic metrics checks => 200+ happy engineers!
  • 36. Copyright © 2015 Criteo The Criteo MOAB 36
  • 37. Copyright © 2015 Criteo Delivery at scale at Criteo 37
  • 38. Copyright © 2015 Criteo • A few lessons learned
  • 39. Copyright © 2015 Criteo Start small • If you can't build it with a few machines, it's likely you won't be able to do it with many 39 First Google computer
  • 40. Copyright © 2015 Criteo Start small • Keep fancy algorithms for later 40 The Page rank algorithm
  • 41. Copyright © 2015 Criteo Iterate fast • Easy access to data (20PB vs 4GB of clean, carefully selected data) • Convenient technologies (e.g. Python & notebooks, scikit-learn) • Make IT a non-problem • Keep projects small (typical project size 3-9 months) 41
  • 42. Copyright © 2015 Criteo Iterate fast • Easy access to data (20PB vs 4GB of clean, carefully selected data) • Convenient technologies (e.g. Python & notebooks, scikit-learn) • Make IT a non-problem • Keep projects small (typical project size 3-9 months) 42 Talent magnet
  • 43. Copyright © 2015 Criteo Keep teams small 43 3 members 3 channels 4 members 6 channels 5 members 10 channels 10 members 45 channels …
  • 44. Copyright © 2015 Criteo Build the right team • Variety of skills • Software/ML engineers, ops/devops • Analysts/BI • Product • Designers • Managers 44
  • 45. Copyright © 2015 Criteo Make the team agile • Use a flat, distributed hierarchy model and make people sit next to each other 45 EPM ENG LEAD PM MGR
  • 46. Copyright © 2015 Criteo Make the team agile • Use the right tools • slack • jira • confluence • git • gerrit • OKR 46
  • 47. Copyright © 2015 Criteo Build the culture • Let ideas emerge bottom-up • Hackathons (for real) • 10% projects • Transparency : make info available to all • Use mature technologies • You will fail. That’s OK! 47
  • 48. Copyright © 2015 Criteo Take-aways • Start small • Iterate fast • Build the team • Make the team agile • Build the culture 48
  • 49. Copyright © 2015 Criteo • Thanks! Questions?