SlideShare uma empresa Scribd logo
1 de 46
Anne-Marie Tousch
Senior Research Scientist, Criteo AI Lab
July 2d, 2019
Large-scale recommendation
a random point of view
@amy8492
2 •
Who am I?
Large-scale recommendation
a random point of view
4 •
Retargeting
User browses for
products
5 •
Retargeting
User browses
Criteo buys the ad
placement
6 •
Retargeting
User browses
Criteo buys the ad
placement
Client pays for clicks
8 •
Large scale recommendation
?
9 •
Large scale recommendation : cross-advertiser case
10 •
Large scale recommendation in a nutshell
11 •
Classical Recommendation Systems
12 •
Classical Recommendation Systems
13 •
Classical Recommendation Systems
14 •
Classical Recommendation Systems
15 •
Classical Recommendation Systems
17 •
Randomized SVD
Halko et al. "Finding structure with randomness:
Probabilistic algorithms for constructing approximate
matrix decompositions." SIAM review 2011
18 •
Randomized SVD
Halko et al. "Finding structure with randomness:
Probabilistic algorithms for constructing approximate
matrix decompositions." SIAM review 2011
19 •
Randomized SVD
Halko et al. "Finding structure with randomness:
Probabilistic algorithms for constructing approximate
matrix decompositions." SIAM review 2011
20 •
Randomized SVD
Halko et al. "Finding structure with randomness:
Probabilistic algorithms for constructing approximate
matrix decompositions." SIAM review 2011
22 •
Did you mean…
The origin
24 •
The Johnson-Lindenstrauss Lemma (1984)
25 •
log 𝑛
The Johnson-Lindenstrauss Lemma (1984)
𝜀−2
26 • Source: https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html
Johnson-Lindenstrauss
27 •
JL embeddings
Source: https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html
28 •
JL embeddings
Source: https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html
29 •
• Dimensionality reduction
• Sketching
• Approximate nearest neighbors
• Random projection trees
• Kernel approximations
• Newton sketches for optimization
• Linear programming
• …
Many applications
Googling for sketching was the best idea :-)
Fast JL transforms
31 •
Sparse random projections
…
Dasgupta, et al. "A sparse Johnson_Lindenstrauss
transform." STOC, 2010.
Achlioptas. “Database-friendly random projections: Johnson-Lindenstrauss
with binary coins”. JCSS, 2003.
Li et al, "Very Sparse Random Projections" , KDD 2006
32 •
Normalized Hadamard matrices:
𝐻2 =
1
2
1 1
1 −1
𝐻2𝑛 =
1
2
𝐻𝑛 𝐻𝑛
𝐻𝑛 −𝐻𝑛
HX = Fast Hadamard Transform of X.
Hadamard matrices
33 •
The Fast-JLT transform
= x x
Ailon and Chazelle. “Approximate nearest neighbors and the fast Johnson-
Lindenstrauss transform”. STOC 2006
34 •
Orthogonalize: 𝐺 = 𝑄𝑅
Rescale: 𝐺𝑂𝑅𝐹 =
1
𝜎
𝑆𝑄
with 𝑆 ∼ 𝑑𝑖𝑎𝑔(𝜒𝑑)
Orthogonal random features
35 •
First used for practical spherical LSH
Structured orthogonal random features
𝐺𝑆𝑂𝑅𝐹 =
𝑑
𝜎
𝐻𝐷1𝐻𝐷2𝐻𝐷3
Andoni et al. "Practical and optimal LSH for angular distance." Advances in
NeurIPS. 2015.
Choromanski et al. "The unreasonable effectiveness of structured random
orthogonal embeddings." NeurIPS. 2017.
36 •
Define a family of hash functions 𝐹 such
that:
• Define a hash function ℎ from
ℎ1, … , ℎ𝑘 ∈ 𝐻𝑘
, eg:
ℎ 𝑥 = sgn < 𝑎, 𝑥 > with ai ∼ 𝑁(0,1)
• Use 𝐿 hash tables
LSH: Locality Sensitive Hashing
Indyk and Motwani, “Approximate nearest neighbors: towards removing the
curse of dimensionality”, STOC, 1998
Charikar, “Similarity estimation techniques from rounding algorithms”,
STOC, 2002
37 •
Random Kitchen Sinks
Large scale neural networks
39 •
A dense layer
Image from http://cs231n.github.io/neural-networks-1/
40 •
Adaptive Fastfood – Deep Fried Convnets
Source: Yang et al, ICCV 2015
Randomizing
is good for you
image source
“The only principle that does not inhibit
progress is: anything goes.”
Paul Feyerabend, Against Method
Next steps
Thanks.
Questions?
@amy8492

Mais conteúdo relacionado

Semelhante a Large-scale recommendation, a random point of view

Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
Big data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataBig data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataChristos Hadjinikolis
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Collaborative Metric Learning (WWW'17)
Collaborative Metric Learning (WWW'17)Collaborative Metric Learning (WWW'17)
Collaborative Metric Learning (WWW'17)承剛 謝
 
Exploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningExploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningDongmin Lee
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportFabien Gandon
 
Intent-Aware Diversification Using a Constrained PLSA
Intent-Aware Diversification Using a Constrained PLSAIntent-Aware Diversification Using a Constrained PLSA
Intent-Aware Diversification Using a Constrained PLSAJacek Wasilewski
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsStefan Kühn
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsAlan Said
 
ObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptxObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptxssuserc074dd
 
Mechanical Librarian
Mechanical LibrarianMechanical Librarian
Mechanical LibrarianAndre Vellino
 
Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...Christian Kehl
 
"Where Far Can Be Close": Finding Distant Neighbors In Recommendation Systems
"Where Far Can Be Close": Finding Distant Neighbors In Recommendation Systems"Where Far Can Be Close": Finding Distant Neighbors In Recommendation Systems
"Where Far Can Be Close": Finding Distant Neighbors In Recommendation SystemsVikas Kumar
 
Using Networks to Measure Influence and Impact
Using Networks to Measure Influence and ImpactUsing Networks to Measure Influence and Impact
Using Networks to Measure Influence and ImpactYunhao Zhang
 
Scalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHScalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHMaruf Aytekin
 
Exploratory Visual Analysis in Large High-Resolution Displays
Exploratory Visual Analysis in Large High-Resolution DisplaysExploratory Visual Analysis in Large High-Resolution Displays
Exploratory Visual Analysis in Large High-Resolution Displayslio889
 
Evidently: New Humanities Scholarship
Evidently: New Humanities ScholarshipEvidently: New Humanities Scholarship
Evidently: New Humanities ScholarshipDeb Verhoeven
 

Semelhante a Large-scale recommendation, a random point of view (20)

Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability
 
Big data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataBig data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big Data
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Collaborative Metric Learning (WWW'17)
Collaborative Metric Learning (WWW'17)Collaborative Metric Learning (WWW'17)
Collaborative Metric Learning (WWW'17)
 
Exploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningExploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement Learning
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
 
Intent-Aware Diversification Using a Constrained PLSA
Intent-Aware Diversification Using a Constrained PLSAIntent-Aware Diversification Using a Constrained PLSA
Intent-Aware Diversification Using a Constrained PLSA
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and Applications
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
ObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptxObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptx
 
Mechanical Librarian
Mechanical LibrarianMechanical Librarian
Mechanical Librarian
 
Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...
 
"Where Far Can Be Close": Finding Distant Neighbors In Recommendation Systems
"Where Far Can Be Close": Finding Distant Neighbors In Recommendation Systems"Where Far Can Be Close": Finding Distant Neighbors In Recommendation Systems
"Where Far Can Be Close": Finding Distant Neighbors In Recommendation Systems
 
Using Networks to Measure Influence and Impact
Using Networks to Measure Influence and ImpactUsing Networks to Measure Influence and Impact
Using Networks to Measure Influence and Impact
 
Scalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHScalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSH
 
Exploratory Visual Analysis in Large High-Resolution Displays
Exploratory Visual Analysis in Large High-Resolution DisplaysExploratory Visual Analysis in Large High-Resolution Displays
Exploratory Visual Analysis in Large High-Resolution Displays
 
Evidently: New Humanities Scholarship
Evidently: New Humanities ScholarshipEvidently: New Humanities Scholarship
Evidently: New Humanities Scholarship
 

Mais de Anne-Marie Tousch

From DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transitionFrom DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transitionAnne-Marie Tousch
 
On Machine Learning Readiness
On Machine Learning ReadinessOn Machine Learning Readiness
On Machine Learning ReadinessAnne-Marie Tousch
 
Data is beautiful​, please don't ruin it
Data is beautiful​, please don't ruin itData is beautiful​, please don't ruin it
Data is beautiful​, please don't ruin itAnne-Marie Tousch
 
Large Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesLarge Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesAnne-Marie Tousch
 
PyParis -- How we used Python to introduce teenagers to the fun of programming
PyParis -- How we used Python to introduce teenagers to the fun of programmingPyParis -- How we used Python to introduce teenagers to the fun of programming
PyParis -- How we used Python to introduce teenagers to the fun of programmingAnne-Marie Tousch
 

Mais de Anne-Marie Tousch (6)

From DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transitionFrom DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transition
 
Why am I doing this???
Why am I doing this???Why am I doing this???
Why am I doing this???
 
On Machine Learning Readiness
On Machine Learning ReadinessOn Machine Learning Readiness
On Machine Learning Readiness
 
Data is beautiful​, please don't ruin it
Data is beautiful​, please don't ruin itData is beautiful​, please don't ruin it
Data is beautiful​, please don't ruin it
 
Large Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesLarge Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the Trenches
 
PyParis -- How we used Python to introduce teenagers to the fun of programming
PyParis -- How we used Python to introduce teenagers to the fun of programmingPyParis -- How we used Python to introduce teenagers to the fun of programming
PyParis -- How we used Python to introduce teenagers to the fun of programming
 

Último

Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 

Último (20)

Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 

Large-scale recommendation, a random point of view

Notas do Editor

  1. Retargeting: User browse an e-commerce website Moves on to a publisher website Criteo buys ad placements Criteo is paid if the ad is clicked
  2. Retargeting: User browse an e-commerce website Moves on to a publisher website Criteo buys ad placements Criteo is paid if the ad is clicked
  3. Retargeting: User browse an e-commerce website Moves on to a publisher website Criteo buys ad placements Criteo is paid if the ad is clicked
  4. For each user, and for each client, compute offline recommendations with different algorithms. Append all these « sources » with the last historical products & you have a short list of products to score online, where you can estimate probability of click with logistic regression.
  5. Item/User Nearest neighbors Collaborative filtering Neural networks
  6. One technique is to compute a vector space for products to be able to compute nearest neighbors between products
  7. The classical way to compute vectors is to factorize the interaction matrix, usually through a singular value decomposition (SVD). This is called collaborative filtering.
  8. Both dimensions may grow to infinity
  9. Image: https://blogs.ethz.ch/kowalski/2008/09/25/buffons-needle/
  10. 1. Intuition = there exists a transform with low distortion into O(log(n)) dimension (indep d!) 2. Get drawings and formulas from https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html => 3-4 slides
  11. The ideal setting is when n is large, and of course d > log(n)
  12. First idea: use a sparse projection matrix with +/- 1 Successive improvements But issues with sparse inputs Achlioptas. “Database-friendly random projections: Johnson-Lindenstrauss with binary coins”. JCSS, 2003. Li et al, "Very Sparse Random Projections" , KDD 2006 Dasgupta, et al. "A sparse Johnson_Lindenstrauss transform." STOC, 2010.
  13. Fourier transform idea: from uncertainty principle, if data and spectrum can’t be both sparse => work on spectrum. Randomize selection of hadamard rows. Rerandomize to ensure non-sparsity. Now P can be sparse gaussian as in original PHD, but results have been improved since and a simple coordinate sampling matrix is enough.
  14. Ailon and Chazelle. “Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform”. STOC 2006 A step further is taken by Clarkson & Woodruff, “Low rank approximation and regression in input sparsity time”, STOC 2013, where they actually used the CountMin to sample the matrix. However, it no longer has the JL properties.
  15. These are not faster, but have better properties: Originally JL proof used orthogonal features, dropped by Indyk et al. for LSH Recently shown to yield lower variance kernel estimators with RFF (see later section on RFF)
  16. While we are currently not aware how to prove rigorously that such pseudo-random rotations perform as well as the fully random ones, empirical evaluations show that three applications of HDi are exactly equivalent to applying a true random rotation (when d tends to infinity). We note that only two applications of HDi are not sufficient.
  17. A well-known application to the approximate nearest neighbors problem which you might find useful in real life.
  18. This is a very nice application to approximating kernels. https://www.youtube.com/watch?v=Qi1Yry33TQE
  19. More modern applications
  20. 3 more examples coming
  21. SGD everywhere.
  22. Bandit algorithm & how children learn
  23. The exploration-exploitation trade-off exists everywhere, eg in scientific research.