Learning to Personalize

1
Learning to Personalize
Justin Basilico
Page Algorithms Engineering September 19, 2014
@JustinBasilico
ATL 2014

2
 Interested in high-quality
recommendations
 Proxy question:
 Accuracy in predicted rating
 Measured by root mean
squared error (RMSE)
 Improve by 10% = $1 million!
 Data size:
 100M ratings (back then
“almost massive”)

4
Netflix Scale
 > 50M members
 > 40 countries
 > 1000 device types
 Hours: > 1B/month
 Plays: > 30M/day
 Log 100B events/day
 34.2% of peak US
downstream traffic

5
Goal
Help members find content to watch and enjoy
to maximize member satisfaction and retention

6
“Emmy Winning”
Approach to Recommendation

7
Everything is a Recommendation
Rows
Ranking
Over 75% of what
people watch
comes from our
recommendations
Recommendations
are driven by
Machine Learning

8
Top Picks
Personalization awareness
Diversity

9
Personalized genres
 Genres focused on user interest
 Derived from tag combinations
 Provide context and evidence
 How are they generated?
 Implicit: Based on recent plays,
ratings & other interactions
 Explicit: Taste preferences

10
Similarity
 Recommend videos similar
to one you’ve liked
 “Because you watched”
rows
 Pivots
 Video information page
 In response to user actions
(search, list add, …)

11
Support for Recommendations
Behavioral Support Social Support

13
Machine Learning Approach
Problem
Data
Metrics
Model Algorithm

14
Data
 Plays
 Duration, bookmark, time,
device, …
 Ratings
 Metadata
 Tags, synopsis, cast, …
 Impressions
 Interactions
 Search, list add, scroll, …
 Social

15
Models & Algorithms
 Regression (Linear, logistic, elastic net)
 SVD and other Matrix Factorizations
 Factorization Machines
 Restricted Boltzmann Machines
 Deep Neural Networks
 Markov Models and Graph Algorithms
 Clustering
 Latent Dirichlet Allocation
 Gradient Boosted Decision
Trees/Random Forests
 Gaussian Processes
 …

16
Rating Prediction
 Based on first year progress prize
 Top 2 algorithms
 Matrix Factorization (SVD++)
 Restricted Boltzmann Machines
(RBM)
 Ensemble: Linear blend
Videos
R
≈
Users
U
V
(99% Sparse) d
Videos
Users
× d

17
Ranking by ratings
4.7 4.6 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5
Niche titles
High average ratings… by those who would watch it

19
Learning to Rank
 Approaches:
 Point-wise: Loss over items
(Classification, ordinal regression, MF, …)
 Pair-wise: Loss over preferences
(RankSVM, RankNet, BPR, …)
 List-wise: (Smoothed) loss over ranking
(LambdaMART, DirectRank, GAPfm, …)
 Ranking quality measures:
 Recall@N, Precision@N, NDCG, MRR,
ERR, MAP, FCP, …
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 20 40 60 80 100
Importance
Rank
NDCG MRR FCP

20
Example: Two features, linear model
Popularity
Predicted Rating
1
2
3
4
5
Linear Model:
Final Ranking
frank(u,v) = w1 p(v) + w2 r(u,v)

22
“Learning to Row”
Putting a page together

23
Page-level algorithmic challenge
10,000s of
possible
rows …
10-40
rows
Variable number of
possible videos per
row (up to thousands)
1 personalized page
per device

24
Balancing a Personalized Page
Accurate vs. Diverse
Discovery vs. Continuation
Depth vs. Coverage
Freshness vs. Stability
Recommendations vs. Tasks

25
2D Navigational Modeling
More likely
to see
Less likely

26
Building a page algorithmically
 Approaches
 Template: Non-personalized layout
 Row-independent: Greedy rank rows by f(r | u, c)
 Stage-wise: Pick next rows by f(r | u, c, p1:n)
 Page-wise: Total page fitness f(p | u, c)
 Obey constraints
 Certain rows may be required (Continue Watching
and My List)
 Filter, de-duplicate
 Format for device

27
Row Features
 Quality of items
 Features of items
 Quality of evidence
 User-row interactions
 Item/row metadata
 Recency
 Item-row affinity
 Row length
 Position on page
 Title
 Diversity
 Similarity
 Freshness
 …

28
Page-level Metrics
 How do you measure the quality of
the homepage?
 Ease of discovery, Diversity,
Novelty, …
 Challenges:
 Position effects
 Row-video generalization
 2D versions of ranking quality
metrics
 Example: Recall @ row-by-column
0 10 20 30
Recall
Row

30
Three levels of Learning Distribution/Parallelization
1. For each subset of the population (e.g.
region)
 Want independently trained and tuned models
2. For each combination of hyperparameters
 Simple: Grid search
 Better: Bayesian optimization using Gaussian
Processes
3. For each subset of the training data
 Distribute over machines (e.g. ADMM)
 Multi-core parallelism (e.g. HogWild)
 Or… use GPUs

31
Example: Training Neural Networks
 Level 1: Machines in different
AWS regions
 Level 2: Machines in same AWS
region
 Spearmint or MOE for parameter
optimization
 Condor, StarCluster, Mesos, etc. for
coordination
 Level 3: Highly optimized, parallel
CUDA code on GPUs

33
Evolution of Recommendation Approach
4.7
Rating Ranking Page Generation

34
Research Directions
Context
awareness
Full-page
optimization
Presentation
effects
Scaling
Personalized
learning to
rank
Cold start

Thank You Justin Basilico
jbasilico@netflix.com
35 @JustinBasilico
We’re hiring

Learning to Personalize

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Learning to Personalize

Semelhante a Learning to Personalize (20)

Último

Último (20)

Learning to Personalize