Recommender Systems

Francesco Casalegno
Recommender Systems
→ Content-Based Filtering
→ Collaborative Filtering
→ Explicit and Implicit Feedback
→ Machine & Deep Learning Models

Francesco Casalegno – Recommender Systems 2
● Users cannot evaluate overwhelming numbers of alternatives
○ YouTube: 5 B videos (watched every day)
2 B users
○ Amazon: 3 B products (across 11 marketplaces)
200 M users (per month)
Problem Statement
Francesco Casalegno
● Recommender Systems to the rescue!
○ Predict rating or preference a user would give to an item
○ Provide users with suggestions for items to be of use to them
● Many challenges are involved
○ Accuracy of recommendations
○ Scalability (> 2 B users on YouTube)
○ Serendipity (surprising + fortunate discoveries)
○ Explainability
○ Cold start
○ Interests evolves over time

Francesco Casalegno – Recommender Systems
Explicit Feedback: Ratings
3
● In many cases, users give explicit feedback to some items they viewed/purchased
● We can then define the rating matrix by rui
= rating given by user u to item i
○ The matrix rui
is usually large and sparse, as users view only few items and rate even fewer
○ We denote by K the set of pairs (u, i) s.t. rui
is known, i.e. user u gave a rating to item i
➝ Problem Statement: Predict rui
for unobserved items

Implicit Feedback: Confidence
● Explicit feedback (rate 1 to 5, like/dislike, …) is not always available, at least not in large quantities
○ But we can use implicit feedback, indirectly reflecting opinions through behavior
○ Examples: purchase history, browsing history, search patterns, mouse movements, …
● Implicit feedback is much more abundant, but also more difficult to use.
○ No negative feedback. User did not watch a movie: she dislikes it / does not even know it exists?
○ Noise. User searches a product: he may be buying a gift; he may be disappointed; …
○ Appreciation vs Confidence. Unlike the explicit case, rui
here measures confidence
→if you watch/search something many times/for long durations, probably you liked it
■ For how much time did the user watch the show?
■ How many times did the user search an item?
○ Evaluation metric: choice is not straightforward
● Example. TV shows, rui
= how many times u fully watched show i
○ rui
= 0.5 → user (got bored?) stopped watching at half show
○ rui
= 2 → user (loved it? fell asleep and played in loop?) watched the show twice
4

Motivation
6
Francesco Casalegno

Motivation
7
Francesco Casalegno

Motivation
8
Francesco Casalegno

Motivation
9
Francesco Casalegno

Motivation
10
Francesco CasalegnoJean Dupont

Classes of Recommender Systems
11
Recommender Systems
Content-Based Filtering Collaborative Filtering
User-Based
Model-BasedMemory-Based
Item-Based

12
Recommender Systems
User-Based
Item-Based
Item-Based k-NN,
Slope One
User-Based k-NN
Co-Clustering,
SVD,
Neural Networks
Information Retrieval Methods

13
Recommender Systems
User-Based
Item-Based
Item-Based k-NN,
Slope One
User-Based k-NN
Co-Clustering,
SVD,
Neural Networks

● Recommend items by assuming that users who
agreed in the past will agree in future
● Tracks and compare user activity
○ explicit: like/dislike, star ratings
○ implicit: viewing times, purchased items
● Examples:
➕ Works without needing any knowledge of items
➕ More variety in recommendations
➖ Cold start: need much data to get accurate
➖ Shilling attacks
● Recommend items having features similar to
those of the items liked by the user in the past
→ extract features + use information retrieval
● Similarity of items is based on discrete features
○ text: word counts / tf-idf (see
○ movies: “comedy”, “horror”, … tags
○ songs: Music Genome Project attributes
(e.g. “aggressive drummings”)
● Examples:
➕ Needs little info on user to start
➕ Leverages items info (e.g. genre) if available
➖ Proposes items too similar to those liked by user
➖ Requires to describe features for each item
Content-Based VS Collaborative Filtering
1414
… obviously the two approaches can be combined (hybrid methods)

Content-Based VS Collaborative Filtering
1515
similar features
(taste, ingredients, …)
liked
by user 1
recommended
to user 1
user 1
user 1 user 2
liked
by user 1
liked
by user 2
liked
by user 2
recommended
to user 1

16
Recommender Systems
User-Based
Item-Based
Item-Based k-NN,
Slope One
User-Based k-NN
Co-Clustering,
SVD

● Use given ratings as training set to fit a model
that predicts users' rating of unrated items
● Typically uses
○ Embedding / dim. reduction / matrix factoriz.
○ Machine Learning models to train & predict
● Examples
○ Co-Clustering
○ SVD
➕ Scales well with sparse data
➕ ML models can capture more complex relations
➕ Fast prediction
➕ Usually better predictions than memory-based
➖ Learning/Training phase required
● Uses users’ ratings to compute the similarity
between users or items
● Typically uses
○ Similarity (cosine dist., Pearson correlation)
○ Predict a weighted average of ratings
● Examples
○ k-Nearest Neighbors
○ Slope One
➕ Easy to implement
➕ Explainable results
➖ Scalability issues for sparse data
➖ Slow predictions (has to find similar items/users)
Memory-Based VS Model-Based
1717
Memory-Based Model-Based

Memory-Based
18
Recommender Systems
User-Based
Item-Based

19
Recommender Systems
User-Based
Item-Based
Item-Based k-NN,
Slope One
User-Based k-NN
Co-Clustering,
SVD

User-Based VS Item-Based
● Memory-Based models predict the rating rui
given user u to item i in different ways.
○ User-Based models looks at users v∊V that are similar to u.
■ rui
prediction is based on ratings rvi
given by similar users to same item
○ Item-Based models look at items j∊J that are similar to i.
■ rui
prediction is a based on ratings ruj
given by same user to similar item
20
ruj
= 6.0
user u
ruj
= 8.0 ruj
= 3.0
predict
item i rui
= 6.5
similar items
similar users
rvi
= 5.0
user u
rvi
= 9.5 rvi
= 8.5
predict
item i rui
= 8.0

● Simplest class of methods, based on looking at ratings of most similar (neighbors) users/items.
● First, we represent users and items by simply considering rows and cols of rating matrix:
○ user u is represented by the vector [ru1
, ru2
, ru3
, ...]
○ item i is represented by the vector [r1i
, r2i
, r3i
, ...]
● Then, we compute similarity between vectors
○ sim(u, v) / sim(i, j) can be: cosine similarity, Pearson’s correlation, …
○ But our vectors have unknown entries! → considering only indexes where ratings are known:
● Finally, we pick a number of nearest neighbors k and we predict the rating rui
as
→ The first formula corresponds to the user-based k-NN, the second to the item-based k-NN.
k-Nearest Neighbors Method
21
sim([?, ?, 4, 5, ?, 2], [?, ?, ?, 3, 4, 1]) ⟶ sim([5, 2], [3, 1])
or(1) (2)
Nk
i
(u) = k items most similar to i that are rated by user u

● Simple, yet powerful item-based method with good scalability and less prone to overfit.
● Idea: we could fit a linear model y = ax + b for any x = ruj
and y = rui
■ For 1,000 items, that means 2 M coefficients to learn!
■ Prone to overfit
● So, instead we use simplified (slope-one) linear regression of the form y = x + b
■ More robust to overfit
■ Coefficients can be computed very easily, and we get:
mu
= avg rating of user u
Ri
(u)= items j rated by u also having at least one common user with i
dev(i,j) = average items deviation =
Uij
= users having rated both i and j
Slope One Method
22

Model-Based
23
Recommender Systems
User-Based
Item-Based

● Cluster = subset of rows (columns) with similar behavior across the set of all columns (rows)
● Co-cluster = subset of rows + subset of cols, where rows have similar behavior across cols, and vice-versa
● We can then base our model on these clusters and predict ratings as
Co-clustering Method
24
rating matrix rui
item clustering user clustering
rating matrix rui
co-clustering
rating matrix rui
Cu
= avg rating of cluster user u belongs to
Ci
= avg rating of cluster item i belongs to
Cui
= avg rating of cluster user u and item i belong to
mu
= avg rating of user u
mi
= avg rating of item i

● A popular set of methods is based on matrix factorization of the rating matrix X = {rui
} ∊ Rn x m
○ Remark: A ∊ Rn x m
has rank k ↔ Y = QT
P for some Q ∊ Rk x n
and P ∊ Rk x m
○ Remark: A ∊ Rn x m
has SVD decomposition A = UΣVT
and truncated SVD Ak
= ∑i=1..k
σi
ui
vi
T
● In particular, recommender systems focus on these two different low-rank approximations:
○ SVD coincides with the solution to the problem (ǁxǁF
= Frobenius norm)
This result is known as Eckart-Young-Mirsky Theorem.
○ NMF (non-negative matrix factorization) is defined as the solution to the problem
s.t. Q, P have all coefficients ≥ 0
● Idea: factorize matrix {rui
} (with SVD or NMF) , then predict rui
as qi
T
pu
… but {rui
} has unknown entries!
○ We could fill {rui
} with 0 when unknown entries → old approach, not really meaningful…
○ … or instead solve minimization problem only on known entries → much better!
Matrix Factorization Methods
25

SVD Method
● One of the most popular methods, equivalent to Probabilistic Matrix Factorization..
● Idea: if we had a representation xu
∊ Rf
for user u, then we solve a linear regression problem:
○ ...so we try to learn representations qi
for items and pu
for users:
○ notice that if λ = 0 and if all ratings rui
are in K (i.e. are known) this is exactly SVD decomposition!
● How to minimize this loss function?
○ GD (gradient descent) → scalability issues + loss is non convex!
○ ALS (alternating least squares) → 2-step iterative method, solves 2 convex problems:
1. fix pu
, solve optimization problem for qi
2. fix qi
, solve optimization problem for pu
● Much of the variation in ratings is due to effects, called biases, associated with users or items.
So, most SVD-based methods modify the model to include item biases bi
and user biases bu
:
26
where m is the overall average rating.

SVD Method for Implicit Feedback
● For implicit feedback rui
(e.g. how many times u watched show i) is a measure of a confidence value
○ Define binary preferences bui
= 1 if rui
>0, and 0 otherwise
○ Define confidence variables as cui
= 1 + α rui
(typically α = 40)
● The problem is then formulated in terms of trying to predict preferences as
○ The minimization of the loss function can be efficiently done using ALS
● SVD methods presented here (both explicit and implicit) are very scalable
○ Spark ML uses these two methods to implement recommender systems
27

Neural Net for Explicit Feedback
● Idea: start writing SVD Method as a Neural Net
● How can we improve this network?
Learn a generic learnable function (with fully-connected layers) instead of simple dot-product
○ Include users metadata (age, sex, ...) and items metadata (cost, class, ...) as inputs to the network
29

Triplet Loss and Siamese Networks
30
● Idea: have a Neural Net learning how close a sample is to an anchor
○ Output of NN is a learned distance between anchor and sample: dNN
(a, x)
○ Train using triplet loss and siamese network:
■ we want dNN
(a, x+
) > dNN
(a, x–
) for a positive and a negative sample
■ equivalent to dNN
(a, x+
) - dNN
(a, x–
) + α ≥ 0
● Applications
○ Face Recognition:
○ Learn to Rate: say x is preferred by a over y if dNN
(a, x) > dNN
(a, y)

Neural Net for Implicit Feedback
31
● Idea: use siamese network to learn user’s preferences
○ Train with triplet loss: users prefer shows they watched over shows they have not watched (yet)
■ i+ (positive samples) = shows watched by u
■ i- (negative samples) = shows watched by u
○ Then, sort all unseen shows using the predicted distance user-item dNN
(u, i)

Neural Net with Hybrid Approach
● Finally, here is a more complex hybrid approach for YouTube recommendations.
32

1. Understand which kind of data you have
○ explicit feedback (users ratings) = easy to use, available in limited amount
○ implicit feedback (users activities) = difficult to use, available in greater quantity
2. Decide which approach works best in your case
○ content-based = good if you can extract features (e.g. tf-idf), ignores the other users
○ collaborative filtering = leverages from all users ratings, but cold start and shilling attacks
3. Choose the method considering several factors
○ scalability = working with 1,000 items or with 1 B items is not the same
○ easy to update = add users and items to the system does not require rebuild from scratch
○ cold start = new user or item, no interaction relative to them are available yet
○ accuracy = make relevant recommendations
○ interpretability = “why I am seeing this ad?”
Take-Home Messages
34

● Covington, Paul, Jay Adams, and Emre Sargin. "Deep neural networks for youtube recommendations." Proceedings of the
10th ACM Conference on Recommender Systems. ACM, 2016. [link]
● George, Thomas, and Srujana Merugu. "A scalable collaborative filtering framework based on co-clustering." Data Mining,
Fifth IEEE international conference on. IEEE, 2005. [link]
● Grisel, Olivier, presentation at dotAI Conference, Paris, 2017 [link]
● Hu, Yifan, Yehuda Koren, and Chris Volinsky. "Collaborative filtering for implicit feedback datasets." Data Mining, 2008.
ICDM'08. Eighth IEEE International Conference on. Ieee, 2008. [link]
● Hug, Nicolas. "Surprise, a Python library for recommender systems." (2017). [link]
● Ricci, F. Rokah, L. Sharpira, and B. Kantor. "Recommender Systems Handbook." (2010).
● Zhou, Yunhong, et al. "Large-scale parallel collaborative filtering for the netflix prize." International Conference on
Algorithmic Applications in Management. Springer, Berlin, Heidelberg, 2008. [link]
References
35

Recommender Systems

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Recommender Systems

Semelhante a Recommender Systems (20)

Mais de Francesco Casalegno

Mais de Francesco Casalegno (8)

Último

Último (20)

Recommender Systems