3. User-based Collaborative Filtering
3
Predicts a target user’s rating for an item
based on rating tendency of similar users
𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢), 𝑖 = 𝑟,-
+
∑,∈12
𝑠𝑖𝑚(𝑢), 𝑢) 7 (𝑟,,8 − 𝑟,-
)
∑,∈12
𝑠𝑖𝑚(𝑢), 𝑢)
Item5 sim Average Rating
Alice ? 1 4
User1 3 0.85 2.4
User2 5 0.71 3.8
Similar
users
4. Idea about Item-based Collaborative Filtering
4
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
similar
Predicts unknown scores
based on rating tendency for similar items
similar
𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢), 𝑖: =
∑8∈;2
𝑠𝑖𝑚(𝑖:, 𝑖) 7 𝑟,-,8
∑8∈;2
𝑠𝑖𝑚(𝑖:, 𝑖)
5. Problems on CF approaches (1/3)
5Image reference: https://rafalab.github.io/dsbook/recommendation-systems.html
Real data is quite sparse!!
Even on large e-commerce sites, there are few intersections
between user vectors (& item vectors).
6. Problems on CF approaches (2/3)
6
The curse of dimensionality
• In high-dimensional space, it’s difficult to handle similarity
• Usually, item/user vectors have quite high dimensionality
(b/c rating matrix is quite large)
7. Problems on CF approaches (3/3)
7
High computational cost
• The rating matrix is directly used every time systems find
similar user/items and make predictions
• CF approaches do not scale for most real world scenarios
User 1
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Similar users
for user 1
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Suggested
items
for user 1
Compute Compute
User 2
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Similar users
for user 2
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Suggested
items
for user 2
Compute Compute
8. Recent approaches for recommender systems
8
Model-based approach
- Based on offline pre-processing
- At run-time, only the learned model is used for rating prediction
- Learned models can be updated
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Learned
Model
Computation
offline
User
Suggested
Items for user
Computation
online
9. Memory-based approach vs. model-based approach
9
Memory-based
approach
- User-based CF
- Item-based CF
Model-based
approach
- Matrix factorization
- Association rule mining
- Probabilistic model
- Other ML techniques
11. User
God
father
Termin-
ator
Money
game
Titanic
Back to
the future
… X-men
Alice 5 1 4 4 3 … 2
Basic idea 1 (1/2)
11
I don’t like horror…
Sci-Fis often move me.
I love humane and
dramatic movies!
User Horror Sci-Fi Humanity Drama …
Alice 0.3 2.1 6.1 4.7 …
Latent factors (which cannot be observed)
Assumes that latent factors exist in users/items
12. Basic idea 1 (2/2)
12
Assumes that latent factors exist in users/items
Image from Amazon.com
Latent factors (which cannot be observed)
User
God
father
Termin-
ator
Money
game
Titanic
Back to
the future
… X-men
Alice 5 1 4 4 3 … 2
User Horror Sci-Fi Humanity Drama …
God
father 2.3 0.4 4.9 5.7 …
13. Basic idea 2
13
User
God
father
Termin-
ator
Money
game
Titanic
Back to
the future
… X-men
Alice 5 1 4 4 3 … 2
User Horror Sci-Fi Humanity Drama …
Alice 0.3 2.1 6.1 4.7 …
Assumes that rating scores derive from
latent factors of users and items
User Horror Sci-Fi Humanity Drama …
God
father 2.3 0.4 4.9 5.7 …
×
14. Summary of matrix factorization
14
3 … 3
5
2 … 1
…
Rating matrix = R
(m users × n items)
≈
P
1.3 … 0.7
0.2 … 4.9
…
×
0.2 1.1 … 0.9
3.1 … 0.7 1
…
Latent user matrix
(m users × k latent factors)
Latent item matrix
(k latent factors x n items)
QR ≈
T
×
• Rating matrix can be decomposed to latent factors
of users and items
• The dimension of latent factors (vectors) is much less
than the number of users and items (k ≪ m, n)
□
□
□
(□ = unknown scores)
User
Item
Rating
15. Prediction using matrix factorization
15
3 … 3
5
2 … 1
…
Predicted
rating matrix = R*
=
P
1.3 … 0.7
0.2 … 4.9
…
×
0.2 1.1 … 0.9
3.1 … 0.7 1
…
Latent user matrix
Latent item matrix
QR =
T
×
If we obtain latent user/item matrix, we can predict
unknown scores by multiplying the two latent matrix
How to obtain latent user/item matrix?
*
2
5
4
□
□
□
16. SVD for recommender systems
16
SVD: singular value decomposition
- A famous linear algebra technique for matrix decomposition
- It is often used for dimensionality reduction
- SVD delivers essentially the same result as PCA does
U ΣX =
T
× V×
m x n matrix
m x m
unitary matrix
n x n
unitary matrix
Rectangular diagonal matrix
(diagonal values are called as singular values)
18. Important features of SVD (1/4)
18
We can approximate a given matrix by focusing
on the largest singular values and the vectors of U
and V which correspond to the singular values
-0.369 -0.325 0.282 0.343 0.749
-0.459 0.448 -0.339 -0.584 0.364
-0.468 -0.359 0.544 -0.459 -0.381
-0.563 0.491 0.07 0.562 -0.348
-0.341 -0.569 -0.71 0.118 -0.202
13.368 0 0 0 0
0 4.708 0 0 0
0 0 2.792 0 0
0 0 0 1.586 0
0 0 0 0 0.904
-0.325 0.341 -0.101 0.682 -0.55
-0.562 0.345 0.273 0.153 0.684
-0.468 -0.642 -0.577 0.125 0.141
-0.504 -0.327 0.601 -0.324 -0.416
-0.324 0.496 -0.47 -0.626 -0.189
U=
Σ=
V=
Largest singular values
19. Important features of SVD (2/4)
19
-0.369 -0.325 0.282 0.343 0.749
-0.459 0.448 -0.339 -0.584 0.364
-0.468 -0.359 0.544 -0.459 -0.381
-0.563 0.491 0.07 0.562 -0.348
-0.341 -0.569 -0.71 0.118 -0.202
13.368 0 0 0 0
0 4.708 0 0 0
0 0 2.792 0 0
0 0 0 1.586 0
0 0 0 0 0.904
-0.325 0.341 -0.101 0.682 -0.55
-0.562 0.345 0.273 0.153 0.684
-0.468 -0.642 -0.577 0.125 0.141
-0.504 -0.327 0.601 -0.324 -0.416
-0.324 0.496 -0.47 -0.626 -0.189
U=
Σ=
V=
Ignore unimportant values!
We can approximate a given matrix by focusing
on the largest singular values and the vectors of U
and V which correspond to the singular values
20. 13.368 0 0 0 0
0 4.708 0 0 0
0 0 2.792 0 0
0 0 0 1.586 0
0 0 0 0 0.90
Important features of SVD (3/4)
20
Σ2=
-0.369 -0.325 0.282 0.343 0.749
-0.459 0.448 -0.339 -0.584 0.364
-0.468 -0.359 0.544 -0.459 -0.381
-0.563 0.491 0.07 0.562 -0.348
-0.341 -0.569 -0.71 0.118 -0.202
U2=
-0.325 0.341 -0.101 0.682 -0.55
-0.562 0.345 0.273 0.153 0.684
-0.468 -0.642 -0.577 0.125 0.141
-0.504 -0.327 0.601 -0.324 -0.416
-0.324 0.496 -0.47 -0.626 -0.189
V2=
We can approximate a given matrix by focusing
on the largest singular values and the vectors of U
and V which correspond to the singular values
21. Important features of SVD (4/4)
21
U2 Σ2
T
× V2
×
=
1.08 2.24 3.29 2.98 0.84
2.72 4.17 1.52 2.41 3.04
1.46 2.93 4.02 3.71 1.19
3.24 5.03 2.04 3.04 3.59
0.57 1.64 3.86 3.18 0.15
1 3 3 3 0
2 4 2 2 4
1 3 3 5 1
4 5 2 3 3
1 1 5 2 1
≈ = X
We can approximate a given matrix by focusing
on the largest singular values and the vectors of U
and V which correspond to the singular values
23. Apply SVD for recommender systems (2/2)
23
U2 Σ2
T
V2
U ΣX =
T
× V×
Focus on
important features
× ×
Step 1
Step 2
Run SVD
Multiply three matrixStep 3
=
1.08 2.24 3.29 2.98 0.84
2.72 4.17 1.52 2.41 3.04
1.46 2.93 4.02 3.71 1.19
3.24 5.03 2.04 3.04 3.59
0.57 1.64 3.86 3.18 0.15
Check the values
which were zero
before running SVD
Step 4
24. Problems on SVD
24
Predicted values are often negative
- SVD does not take rating score range into account
Zero replacement decreases prediction quality
- SVD analyzes the relation between all data in matrix
- The meaning of “zero” is different from that of “unknown”
26. Netflix Prize (2006-2010)
26Image ref: http://blogs.itmedia.co.jp/saito/2009/09/httpjournalmyco.html
Netflix held the open competition to advance collaborative
filtering algorithms and to seek the best algorithm.
27. Simon Funk’s Matrix Factorization (2006)
27
Without using SVD (with zero replacement),
the Simon’s method learns matrix P and Q
to compute observed values only in R
𝑚𝑖𝑛=,> ?
,,8 ∈@
𝑟,,8 − 𝒑, 𝒒8
C D
+ 𝜆( 𝒑,
D
+ 𝒒8
D
)
Target optimization function
musers
n items
Rating
matrix
R
≈ ×User
Item
P Q
u: user u; i: item i;
pu: u’s latent vector; qi: i’s latent vector
28. Various approaches have been developed …
28Ref: https://www.slideshare.net/databricks/deep-learning-for-recommender-systems-with-nick-pentreath
2003 2006-2009 2010 2013
Scalable models
Amazon’s item-based CF
Netflix Prize
The rise of matrix factorization
like Simon Funk’s method
Factorization machine
Generalized matrix factorization
for dealing with various factors
Deep Learning
・Deep Factorization machine
・Content2Vec to get content
embeddings
30. Remaining challenges
30
Cold start problem
How to recommend new items? What to recommend to new users?
Serendipity
- Only recommendation of favorable items cannot expand user interests
- How to recommend unexpected and surprising items?
Providing explanations
- Recommendation algorithms just decide what to recommend
- How can systems persuade users to purchase recommended items?
Reviewer’s trust
How to remove spam reviewers? How to find high-quality reviewers?
31. Remaining challenges
31
Cold start problem
How to recommend new items? What to recommend to new users?
Serendipity
- Only recommendation of favorable items cannot expand user interests
- How to recommend unexpected and surprising items?
Providing explanations
- Recommendation algorithms just decide what to recommend
- How can systems persuade users to purchase recommended items?
Reviewer’s trust
How to remove spam reviewers? How to find high-quality reviewers?
32. Cold start problem
32
Item1 Item2 Item3 Item4 item5
Kate
User1 3 1 2 3
User2 4 3 4 3
User3 3 3 1 5
User4 1 5 5 2
New user
New item
CF approaches don’t work for new items/users
New items/users have no clues to predict unknown scores b/c
the CF cannot find neighbor users/items
34. Reviewer Trust
34T. Wang, D. Wang. (2014) “Why Amazon’s Ratings Might Mislead You: The Story of Herding Effects”, Journal of Big Data, Vol.2, No.4, pp.196-204.
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
P−Value
CumulativeDis
0 50 100 150 200 250
2
3
4
5
Sequence Number of Rating
MeanRating
Books
Electronics
Movies & TV
Music −0
−0
−0
0
0
Pearson’sCorrelationCoefficient
0.05
C
FIG. 1. (A) Cumulative distribution of p-values of Augment Dickey–Fuller test
Fig: Average rating scores on Amazon.com
How to find trustworthy reviewers (rating experts)?
- People often give good scores to items
- Some reviewers intentionally give too high/low scores to items
(spammers)