VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence
1. Factorization Meets the Item Embedding:
Regularizing Matrix Factorization with
Item Co-occurrence
Dawen Liang
Columbia University/Netflix
Jaan Altosaar Laurent Charlin David Blei
2. A simple trick to boost the performance
of your recommender system without
using any additional data
Dawen Liang
Columbia University/Netflix
Jaan Altosaar Laurent Charlin David Blei
3. • User-item interaction is commonly
encoded in a user-by-item matrix
• In the form of (user, item, preference) triplets
• Matrix factorization is the standard method to
infer latent user preferences
Motivation
Items
Users
?
?
4. • Alternatively we can model item co-
occurrence across users
• Analogy: modeling a set of documents (users) as
a bag of co-occurring words (items): e.g., “Pluto”
and “planet”
Motivation
:
…
,{ } ,{ }
,{ }
,
, ,{ }
6. ItemsUsers
?
? ≈
User latent factors θ
Item latent factors β
K
# ItemsK
#users
*
Click matrix Y
“Collaborative filtering for implicit feedback datasets”, Y. Hu, Y. Koren, C. Volinsky, ICDM 08.
Lmf =
X
u,i
cui(yui ✓>
u i)2
7. • Skip-gram word2vec
• Learn a low-dimensional
word embedding in a
continuous space
• Predict context words given
the current word
Word embedding
8. Item embedding
• Skip-gram word2vec
• Learn a low-dimensional
word embedding in a
continuous space
• Predict context words given
the current word
We can embed item sequences in the same fashion
9. Levy & Goldberg show that skip-gram
word2vec is implicitly factorizing (some
variation of) the pointwise mutual
information (PMI) matrix
“Neural Word Embedding as Implicit Matrix Factorization”, Levy & Goldberg, NIPS 14.
ct of
held-
dings
ories
ords.
earn
erred
the
used
item
tions
that
eable
ather
s for
intly
locks
item
how
Mikolov et al. [13] for more details).
Levy and Goldberg [10] show that word2vec with a neg-
ative sampling value of k can be interpreted as implicitly
factorizing the pointwise mutual information (PMI) matrix
shifted by log k. PMI between a word i and its context word
j is defined as:
PMI(i, j) = log
P(i, j)
P(i)P(j)
Empirically, it is estimated as:
PMI(i, j) = log
#(i, j) · D
#(i) · #(j)
.
Here #(i, j) is the number of times word j appears in the
context of word i. D is the total number of word-context
pairs. #(i) =
P
j #(i, j) and #(j) =
P
i #(i, j).
After making the connection between word2vec and matrix
factorization, Levy and Goldberg [10] further proposed to
perform word embedding by spectral dimensionality reduc-
tion (e.g., singular value decomposition) on shifted positive
PMI (SPPMI) matrix:
SPPMI(i, j) = max max{PMI(i, j), 0} log k, 0
This is attractive since it does not require learning rate and
current
word/item
context
word/item
Co-occurrence matrix
• PMI(“Pluto”, “planet”) > PMI(“Pluto”, “RecSys”)
10. Jointly factorize both the click matrix and
co-occurrence PMI matrix with a shared
item representation/embedding
CoFactor
11. • Item representation must account for both user-
item interactions and item-item co-occurrence
• Alternative interpretation: regularizing the
traditional MF objective with item embeddings
learned by factorizing the item co-occurrence
matrix
Lco =
X
u,i
cui(yui ✓>
u i)2
+
X
mij 6=0
(mij
>
i j wi cj)2
Matrix factorization Item embedding
Shared item representation/embedding
12. Problem/application-specific
• Define context as the entire user click history
• #(i, j) is the number of users who clicked on
both item i and item j
• Do not require any additional information
beyond standard MF model
How to define “co-occur”
13. • Data preparation: 70/20/10 train/test/validation
• Make sure train/validation do not overlap in time
with test
• Metrics: Recall@20, 50, NDCG@100, MAP@100
Empirical study
ArXiv ML-20M TasteProfile
# of users 25,057 111,148 221,830
# of items 63,003 11,711 22,781
# interactions 1.8M 8.2M 14.0M
% interactions 0.12% 0.63% 0.29%
with timestamps yes yes no
Table 1: Attributes of datasets after preprocessing. Inter-
actions are non-zero entries (listening counts, watches, and
clicks). % interactions refers to the density of the user-item
interaction matrix (Y ). For datasets with timestamps, we
ensure there is no overlap in time between the training and
test sets.
why jointly factoring both the user click matrix and
item co-occurrence matrix boosts the performance by
exploring the model fits.
• We also demonstrate the importance of joint learning
dation challenges,
Recall@M, truncat
(NDCG@M), and
each user, all the
(unobserved) items
considers all items r
NDCG@M and M
discount to emphas
lower ones. Formal
items, 1{·} is the in
user u has consume
to predict ranking
preference ✓>
u i fo
defined as
Recall@M(u, ⇡)
The expression in th
between M and th
14. Quantitative results
ArXiv ML-20M TasteProfile
WMF CoFactor WMF CoFactor WMF CoFactor
Recall@20 0.063 0.067 0.133 0.145 0.198 0.208
Recall@50 0.108 0.110 0.165 0.177 0.286 0.300
NDCG@100 0.076 0.079 0.160 0.172 0.257 0.268
MAP@100 0.019 0.021 0.047 0.055 0.103 0.111
ble 2: Comparison between the widely-used weighted matrix factorization (WMF) model [8] and our CoFactor mode
Factor significantly outperforms WMF on all the datasets across all metrics. The improvement is most pronounced on th
ovie watching (ML-20M) and music listening (TasteProfile) datasets.
rameter indicates that the model benefits from account-
g for co-occurrence patterns in the observed user behavior
ta. We also grid search for the negative sampling values
2 {1, 2, 5, 10, 50} which e↵ectively modulate how much to
ft the empirically estimated PMI matrix.
4 Analyzing the CoFactor model fits
Table 2 summarizes the quantitative results. Each metric
averaged across all users in the test set. As we can see,
• We get better results by simply re-using the data
• Item co-occurrence is in principle available to
MF model, but MF model (bi-linear) has limited
modeling capacity to make use of it
15. < 50 ≥ 50, < 100 ≥ 100, < 150 ≥ 150, < 500 ≥ 500
1umber of songs Whe user hDs lisWeneG Wo
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40AverDge1DCG@100
CoFDFWor
W0F
User activity: Low High
We observe similar trend for other datasets as well
16. Toy Story (24659)
Fight Club (18728)
Kill Bill: Vol. 1 (8728)
Mouchette (32)
Army of Shadows (L'armée
des ombres) (96)
User’s watch
history
The Silence of the Lambs
(37217)
Pulp Fiction (37445)
Finding Nemo (9290)
Atalante L’ (90)
Diary of a Country Priest
(Journal d'un curé de
campagne) (68)
Top recommendation
by CoFactor
Rain Man (11862)
Pulp Fiction (37445)
Finding Nemo (9290)
The Godfather: Part II (15325)
That Obscure Object of Desire
(Cet obscur objet du désir)
(300)
Top recommendation
by WMF
number of users who watched
the movie in the training set
17. How important is joint learning?
steProfile, WMF CoFactor word2vec + reg
Recall@20 0.063 0.067 0.052
Recall@50 0.108 0.110 0.095
NDCG@100 0.076 0.079 0.065
MAP@100 0.019 0.021 0.016
Table 3: Comparison between joint learning (CoFactor)
and learning from a separate two-stage (word2vec + reg)
process on ArXiv. Even though they make similar modeling
assumptions, CoFactor provides superior performance.
word2vec as the latent factors ˇi in the MF model, and learn
user latent factors ✓u. Learning ✓u in this way is the same
18. Extension
• User-user co-occurrence
• Higher-order co-occurrence patterns
• Add the same type of item-item co-
occurrence regularization in other
collaborative filtering methods, e.g.,
BPR, factorization machine, or SLIM
19. Conclusion
• We present CoFactor model:
• Jointly factorize both user-item click matrix and
item-item co-occurrence matrix
• Motivated by the recent success of word
embedding models (e.g., word2vec)
• Explore the results both quantitatively and
qualitatively to investigate the pros/cons
Source code available: https://github.com/dawenl/cofactor
20. Thank you
• We present CoFactor model:
• Jointly factorize both user-item click matrix and
item-item co-occurrence matrix
• Motivated by the recent success of word
embedding models (e.g., word2vec)
• Explore the results both quantitatively and
qualitatively to investigate the pros/cons
Source code available: https://github.com/dawenl/cofactor