(Presented at the Deep Learning Re-Work SF Summit on 01/25/2018)
In this talk, we go through the traditional recommendation systems set-up, and show that deep learning approaches in that set-up don't bring a lot of extra value. We then focus on different ways to leverage these techniques, most of which relying on breaking away from that traditional set-up; through providing additional data to your recommendation algorithm, modeling different facets of user/item interactions, and most importantly re-framing the recommendation problem itself. In particular we show a few results obtained by casting the problem as a contextual sequence prediction task, and using it to model time (a very important dimension in most recommendation systems).
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
Deep Learning for Recommender Systems
1. Deep Learning for
Recommender Systems
Yves Raimond & Justin Basilico
January 25, 2017
Re·Work Deep Learning Summit San Francisco
@moustaki @JustinBasilico
2. The value of recommendations
● A few seconds to find something
great to watch…
● Can only show a few titles
● Enjoyment directly impacts
customer satisfaction
● Generates over $1B per year of
Netflix revenue
● How? Personalize everything
8. A quick & dirty experiment
● MovieLens-20M
○ Binarized ratings
○ Two weeks validation, two weeks test
● Comparing two models
○ ‘Standard’ MF, with hyperparameters:
■ L2 regularization
■ Rank
○ Feed-forward net, with hyperparameters:
■ L2 regularization (for all layers + embeddings)
■ Embeddings dimensionality
■ Number of hidden layers
■ Hidden layer dimensionalities
■ Activations
● After hyperparameter search for both models, what do we get?
9.
10.
11. What’s going on?
● Very similar models: representation learning
through embeddings, MSE loss, gradient-based
optimization
● Main difference is that we can learn a different
embedding combination than a dot product
● … but embeddings are arbitrary representations
● … and capturing pairwise interactions through a
feed-forward net requires a very large model
12. Conclusion?
● Not much benefit in the ‘traditional’
recommendation setup of a deep versus a
properly tuned model
● … Is this talk over?
13. Breaking the ‘traditional’ recsys setup
● Adding extra data / inputs
● Modeling different facets of users and items
● Alternative framings of the problem
15. Content-based side information
● VBPR: helping cold-start by augmenting item
factors with visual factors from CNNs [He et. al.,
2015]
● Content2Vec [Nedelec et. al., 2017]
● Learning to approximate MF item embeddings
from content [Dieleman, 2014]
16. Metadata-based side information
● Factorization Machines [Rendle, 2010] with
side-information
○ Extending the factorization framework to an arbitrary
number of inputs
● Meta-Prod2Vec [Vasile et. al., 2016]
○ Regularize item embeddings using side-information
● DCF [Li et al., 2016]
● Using associated textual information for
recommendations [Bansal et. al., 2016]
17. YouTube Recommendations
● Two stage ranker:
candidate generation
(shrinking set of items to
rank) and ranking
(classifying actual
impressions)
● Two feed-forward, fully
connected, networks with
hundreds of features
[Covington et. al., 2016]
19. Restricted Boltzmann Machines
● RBMs for Collaborative
Filtering [Salakhutdinov,
Minh & Hinton, 2007]
● Part of the ensemble that
won the $1M Netflix Prize
● Used in our rating
prediction system for
several years
20. Auto-encoders
● RBMs are hard to train
● CF-NADE [Zheng et al., 2016]
○ Define (random) orderings over conditionals
and model with a neural network
● Denoising auto-encoders: CDL [Wang et
al., 2015], CDAE [Wu et al., 2016]
● Variational auto-encoders [Liang et al.,
2017]
21. (*)2Vec
● Prod2Vec [Grbovic et al., 2015],
Item2Vec [Barkan & Koenigstein,
2016], Pin2Vec [Ma, 2017]
● Item-item co-occurrence
factorization (instead of user-item
factorization)
● The two approaches can be
blended [Liang et al., 2016]
prod2vec
(Skip-gram)
user2vec
(Continuous Bag of Words)
22. Wide + Deep models
● Wide model:
memorize
sparse, specific
rules
● Deep model:
generalize to
similar items via
embeddings
[Cheng et. al., 2016]
Deep Wide
(many parameters due
to cross product)
24. Sequence prediction
● Treat recommendations as a
sequence classification problem
○ Input: sequence of user actions
○ Output: next action
● E.g. Gru4Rec [Hidasi et. al., 2016]
○ Input: sequence of items in a
sessions
○ Output: next item in the session
● Also co-evolution: [Wu et al., 2017],
[Dai et al., 2017]
25. Contextual sequence prediction
● Input: sequence of contextual user actions, plus
current context
● Output: probability of next action
● E.g. “Given all the actions a user has taken so far,
what’s the most likely video they’re going to play right
now?”
● e.g. [Smirnova & Vasile, 2017], [Beutel et. al., 2018]
26. Contextual sequence data
2017-12-10 15:40:22
2017-12-23 19:32:10
2017-12-24 12:05:53
2017-12-27 22:40:22
2017-12-29 19:39:36
2017-12-30 20:42:13
Context ActionSequence
per user
?
Time
27. Time-sensitive sequence prediction
● Recommendations are actions at a moment in time
○ Proper modeling of time and system dynamics is critical
● Experiment on a Netflix internal dataset
○ Context:
■ Discrete time
● Day-of-week: Sunday, Monday, …
● Hour-of-day
■ Continuous time (Timestamp)
○ Predict next play (temporal split data)
29. Other framings
● Causality in recommendations
○ Explicitly modeling the consequence of a recommender systems’ intervention
[Schnabel et al., 2016]
● Recommendation as question answering
○ E.g. “I loved Billy Madison, My Neighbor Totoro, Blades of Glory, Bio-Dome,
Clue, and Happy Gilmore. I’m looking for a Music movie.” [Dodge et al., 2016]
● Deep Reinforcement Learning for
recommendations [Zhao et al, 2017]
31. Takeaways
● Deep Learning can work well for Recommendations... when you go
beyond the classic problem definition
● Similarities between DL and MF are a good thing: Lots of MF work
can be translated to DL
● Lots of open areas to improve recommendations using deep
learning
● Think beyond solving existing problems with new tools and instead
what new problems they can solve
32. More Resources
● RecSys 2017 tutorial by Karatzoglou and Hidasi
● RecSys Summer School slides by Hidasi
● DLRS Workshop 2016, 2017
● Recommenders Shallow/Deep by Sudeep Das
● Survey paper by Zhang, Yao & Sun
● GitHub repo of papers by Nandi