Tutorial bpocf

Collaborative Filtering with Binary,
Positive-only Data
Tutorial @ ECML PKDD, September 2015, Porto
Koen Verstrepen+,
Kanishka Bhaduri*,
Bart Goethals+ *
+

Agenda
•  Introduction
•  Algorithms
•  Netflix

Binary, Positive-Only Data
— also empirically evaluate the ex
9. SYMBOLS FOR PRESENTATION
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recomm
273–280.
Fabio Aiolli. 2014. Convex AUC optimiza
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Effic
273–280.
Fabio Aiolli. 2014. C
293–296.
evaluation measures,
— also empirically evalu
9. SYMBOLS FOR PRES
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top
273–280.
Fabio Aiolli. 2014. Convex A
293–296.
S.S. Anand and B. Mobasher

Collaborative Filtering
U
I
R
REFERENCES
273–280.
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
293–296.
9. SYMBOLS FOR PRES
U
I
R
REFERENCES
F. Aiolli. 2013. Efﬁcient Top
273–280.
Fabio Aiolli. 2014. Convex A
293–296.
S.S. Anand and B. Mobasher

Movies
U
I
R
REFERENCES
273–280.
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
293–296.

Music
U
I
R
REFERENCES
273–280.
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
293–296.

Social Networks
U
I
R
REFERENCES
273–280.
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
293–296.

Tagging / Annotation
U
I
R
REFERENCES
273–280.
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
293–296.
Paris
New York
Porto
Statue of Liberty
Eiﬀel Tower

Also Explicit Feedback
U
I
R
REFERENCES
273–280.
293–296.
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
293–296.

Matrix Representation
8. EXPERIMENTAL EVALUATION
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, many a
evaluation measures, multiple data split methods, sufficiently rand
— also empirically evaluate the explanations extracted.
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–16
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, N
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse lin
recommender systems. In Advances in Knowledge Discovery and Data Mining. Spr
— Who: ?
— THE offline comparison of OCCF algorithms. Man
evaluation measures, multiple data split methods
— also empirically evaluate the explanations extract
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Larg
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recomme
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation
C.M. Bishop. 2006. Pattern Recognition and Machine Learning.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: H
recommender systems. In Advances in Knowledge Discovery
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. P
— Convince the reader this is much better than offline, how to
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets
evaluation measures, multiple data split methods, sufficien
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Bin
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation wi
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMi
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, Ne
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-orde
recommender systems. In Advances in Knowledge Discovery and Data M
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance
top-n recommendation tasks. In Proceedings of the fourth ACM confer
1

1

1

1

1

1

1

1

— Convince the re
8. EXPERIMENTAL
— Who: ?
— THE offline com
evaluation meas
— also empirically
9. SYMBOLS FOR
U
I
R
REFERENCES
F. Aiolli. 2013. Efficie
273–280.
— THE offli
evaluatio
— also empi
9. SYMBOL
U
I
R
REFERENC
F. Aiolli. 201
273–280.
Fabio Aiolli. 2
293–296.
S.S. Anand an
C.M. Bishop.
Evangelia Ch
R

Unknown = 0 no negative information
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, many a
evaluation measures, multiple data split methods, sufficiently rand
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–16
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, N
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse lin
recommender systems. In Advances in Knowledge Discovery and Data Mining. Spr
— Who: ?
— THE offline comparison of OCCF algorithms. Man
evaluation measures, multiple data split methods
— also empirically evaluate the explanations extract
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Larg
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recomme
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation
C.M. Bishop. 2006. Pattern Recognition and Machine Learning.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: H
recommender systems. In Advances in Knowledge Discovery
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. P
— Convince the reader this is much better than offline, how to
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets
evaluation measures, multiple data split methods, sufficien
U
I
R
REFERENCES
273–280.
293–296.
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, Ne
1

0
1

0
1

0
1
0
0

0

1
0

0

1
0

0
1

0

0
1

— Convince the re
8. EXPERIMENTAL
— Who: ?
— THE offline com
evaluation meas
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
— THE offli
evaluatio
— also empi
9. SYMBOL
U
I
R
REFERENC
F. Aiolli. 201
273–280.
Fabio Aiolli. 2
293–296.
S.S. Anand an
C.M. Bishop.
Evangelia Ch
R

Diﬀerent Data
Ratings
Graded relevance,
Positive-Only
Binary,
Positive-Only
1

5

4

3
3

4

2
2

5
5

1

5

4

4

5
5

X

X

X

X
X

• 
•  Movies
•  Music
•  …
•  Minutes watched
•  Times clicked
•  Times listened
•  Money spent
•  Visits/week
•  …
•  Seen
•  Bought
•  Watched
•  Clicked
•  …

Agenda
•  Introduction
•  Algorithms
– Elegant example
– Models
– Deviation functions
– Diﬀerence with rating-based algorithms
– Parameter inference
•  Netflix

pLSA An elegant example
[Hofmann 2004]

pLSA probabilistic Latent Semantic Analysis
— also empirically evaluate
9. SYMBOLS FOR PRESENTA
U
I
R
REFERENCES
F. Aiolli. 2013. Efficient Top-N R
273–280.
Fabio Aiolli. 2014. Convex AUC o
293–296.
9. SYMBOLS FO
U
I
R
REFERENCES
F. Aiolli. 2013. Effi
273–280.
293–296.
— Who: ?
— THE offline comparison of OCCF algorithms
evaluation measures, multiple data split me
— also empirically evaluate the explanations e
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Ve
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N r
293–296.
— THE offline compariso
9. SYMBOLS FOR PRESE
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-
273–280.
Fabio Aiolli. 2014. Convex AU
293–296.

pLSA latent interests
U
I
R
REFERENCES
273–280.
293–296.
9. SYMBOLS FO
U
I
R
REFERENCES
273–280.
293–296.
U
I
R
D
REFERENCES
F. Aiolli. 2013. Efficient
273–280.
Fabio Aiolli. 2014. Conve
293–296.
S.S. Anand and B. Mobas
— We should emphasise how choosing hyperparameters is oft
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better than offline, how to d
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, m
evaluation measures, multiple data split methods, sufficiently
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order s
recommender systems. In Advances in Knowledge Discovery and Data Min
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of
top-n recommendation tasks. In Proceedings of the fourth ACM conferen
39–46.
M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Al
143–177.
C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborh
Methods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Sha
Springer, Boston, MA.
Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization
1:30 K. Verstrepen e
— Convince the reader ranking is more important than RMSE or MSE.
— data splits (leave-one-out, 5 fold, ...)
— Pradel et al. :ranking with non-random missing ratings: influence of popularity a
positivity on evaluation metrics
— Marlin et al. :Collaaborative prediction and ranking with non-random missing da
— Marlin et al. :collaborative filtering and the missing at random assumption
— Steck: Training and testing of recommender systems on data missing not at rand
— We should emphasise how choosing hyperparameters is often done in a way t
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better than offline, how to do it etc.
— Who: ?
— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, ma
evaluation measures, multiple data split methods, sufficiently randomized.
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Rec
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Rec
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for t
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithm
top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender syst
— Who: ?
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
— THE offline comparison of OCCF
evaluation measures, multiple d
— also empirically evaluate the exp
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommen
273–280.
Fabio Aiolli. 2014. Convex AUC optimizati
293–296.
S.S. Anand and B. Mobasher. 2006. Contex
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.

pLSA generative model
U
I
R
REFERENCES
273–280.
293–296.
9. SYMBOLS FO
U
I
R
REFERENCES
273–280.
293–296.
U
I
R
D
REFERENCES
273–280.
293–296.
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
39–46.
143–177.
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
— Who: ?
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.

pLSA probabilistic weights
U
I
R
REFERENCES
273–280.
293–296.
9. SYMBOLS FO
U
I
R
REFERENCES
273–280.
293–296.
U
I
R
D
REFERENCES
273–280.
293–296.
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
39–46.
143–177.
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
— Who: ?
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. E
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
273–280.
293–296.
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, N
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recom
C.M. Bishop. 2006. Pattern Recognition and Machine L
Evangelia Christakopoulou and George Karypis. 2014
recommender systems. In Advances in Knowledge
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin
top-n recommendation tasks. In Proceedings of t
39–46.
M. Deshpande and G. Karypis. 2004. Item-Based Top
143–177.
C. Desrosiers and G. Karypis. 2011. A Comprehens
Methods. In Recommender Systems Handbook, F.
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
PD
d=1 p(d | u) = 1P
i2I p(i | d) = 1
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Data
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedb
293–296.
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
PD
d=1 p(d | u) =P
i2I p(i | d) = 1
REFERENCES
273–280.
Fabio Aiolli. 2014. Con
293–296.
S.S. Anand and B. Mob
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I p(i | d) = 1
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In
273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In
293–296.
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
REFERENCES
273–280.
Fabio Aiolli. 2014. Conv
293–296.
S.S. Anand and B. Moba
C.M. Bishop. 2006. Patte

pLSA compute like-probability
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
Fabio Aiolli. 2014. Co
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
— We should emphasise how choo
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
Fabio Aiolli. 2014. Convex AUC optimizat
293–296.
S.S. Anand and B. Mobasher. 2006. Conte
C.M. Bishop. 2006. Pattern Recognition an
Evangelia Christakopoulou and George Ka
recommender systems. In Advances in
Paolo Cremonesi, Yehuda Koren, and Ro
top-n recommendation tasks. In Proc
39–46.
M. Deshpande and G. Karypis. 2004. Item
143–177.
C. Desrosiers and G. Karypis. 2011. A C
Methods. In Recommender Systems H
Jerome Friedman, Trevor Hastie, and Ro
1:30
— Convince the reader ranking is more impo
— Pradel et al. :ranking with non-random m
— Marlin et al. :Collaaborative prediction an
— Marlin et al. :collaborative filtering and th
— Steck: Training and testing of recommend
— We should emphasise how choosing hype
causes leakage.
7.2. online
— Who: Kanishka?
— Convince the reader this is much better th
— Who: ?
— THE offline comparison of OCCF algorithm
evaluation measures, multiple data split m
— also empirically evaluate the explanations
x
U
I
R
D
d = 1
d = D
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendation for V
273–280.
293–296.
S.S. Anand and B. Mobasher. 2006. Contextual Recomm
Evangelia Christakopoulou and George Karypis. 2014.
top-n recommendation tasks. In Proceedings of th
— Who: ?
— THE offline comparison of O
evaluation measures, multip
— also empirically evaluate th
9. SYMBOLS FOR PRESENTATI
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficient Top-N Reco
273–280.
Fabio Aiolli. 2014. Convex AUC opti
293–296.
— THE offline comp
evaluation measu
— also empirically e
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
F. Aiolli. 2013. Efficien
273–280.
293–296.
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient Top-N Recommendat
273–280.
Fabio Aiolli. 2014. Convex AUC optimization f
293–296.
S.S. Anand and B. Mobasher. 2006. Contextua
C.M. Bishop. 2006. Pattern Recognition and M
Evangelia Christakopoulou and George Karyp
recommender systems. In Advances in Kn
Paolo Cremonesi, Yehuda Koren, and Roberto
top-n recommendation tasks. In Proceedin
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
F. Aiolli. 2013. Efficient T
273–280.
Fabio Aiolli. 2014. Convex
293–296.
S.S. Anand and B. Mobash
C.M. Bishop. 2006. Pattern
Evangelia Christakopoulou
recommender systems
Paolo Cremonesi, Yehuda
top-n recommendation
39–46.
M. Deshpande and G. Kar
143–177.
C. Desrosiers and G. Kar
Methods. In Recomme
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
ecommendation for Very Large Scale Binary Rated Datasets. In RecSys.
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
d =
d =
...
u
i
p(u
p(d
p(d
p(i
DP

pLSA computing the weights
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
D
d = 1
d = 1
d = D
— Who: ?
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
evaluation measu
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
(tempered)
Expecta5on-‐Maximiza5on
(EM)

(1,1)
· · · S(1,F1)
⌘
+ · · · +
⇣
S(T,1)
· · · S(T,FT )
⌘
max
X
Rui=1
log p(i|u)
Recommendation for Very Large Scale Binary Rated Datasets. In RecSy
optimization for top-N recommendation with implicit feedback. In RecSy
06. Contextual Recommendation. In WebMine. 142–160.
gnition and Machine Learning. Springer, New York, NY.
George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-

pLSA recap
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
39–46.
143–177.
1:30
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
— Who: ?
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
evaluation measu
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
273–280.
293–296.
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
273–280.
293–296.
recommender systems
39–46.
143–177.
Methods. In Recomme
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
ptimization for top-N recommendation with implicit feedback. In RecSys.
ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
d =
d =
...
u
i
p(u
p(d
p(d
p(i
DP

pLSA recap
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
39–46.
143–177.
1:30
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
— Who: ?
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
evaluation measu
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
273–280.
293–296.
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
273–280.
293–296.
recommender systems
39–46.
143–177.
Methods. In Recomme
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
d =
d =
...
u
i
p(u
p(d
p(d
p(i
DP
2|u|
models/user
2|u|
S = S(1)
⇤ S(2)
+ S(3)
+ S(4)
S(5)
S(6)
p(d1|u)
p(d2|u)
p(dD|u)
ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
2|u|
models/user
2|u|
S = S(1)
⇤ S(2)
+ S(3)
+ S(4)
S(5)
S(6)
p(d1|u)
p(d2|u)
p(dD|u)
uting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
2 models/user
2|u|
S = S(1)
⇤ S(2)
+ S(3)
+ S(4)
S(5)
S(6)
p(d1|u)
p(d2|u)
p(dD|u)
ting Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
K. Verstrepen et al.
p(i|d1)
p(i|d2)
p(i|dD)
nt top-n recommendation for very large scale binary rated datasets. In Proceedings
rence on Recommender systems. ACM, 273–280.
p(i|d1)
p(i|d2)
p(i|dD)
nt top-n recommendation for very large scale binary rated datasets. In Proceedings
rence on Recommender systems. ACM, 273–280.
x AUC optimization for top-N recommendation with implicit feedback. In Proceed-
Conference on Recommender systems. ACM, 293–296.
p(i|d1)
p(i|d2)
p(i|dD)
ent top-n recommendation for very large scale binary rated datasets. In Proceedings
erence on Recommender systems. ACM, 273–280.
ex AUC optimization for top-N recommendation with implicit feedback. In Proceed-
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
M Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: Jan
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
ve-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
=
Z
( ) · p( | ) · d
D(S, R) = DKL(Q(S)||p(S|R))
. . .
max for every (u, i)
ES
013. Efﬁcient top-n recommendation for very large scale binary rated datasets. In Proceedings
ACM conference on Recommender systems. ACM, 273–280.
2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-
e 8th ACM Conference on Recommender systems. ACM, 293–296.
h Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
2006. Pattern Recognition and Machine Learning. Springer, New York, NY.

pLSA matrix factorization notation
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
39–46.
143–177.
1:30
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
— Who: ?
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
evaluation measu
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
273–280.
293–296.
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
273–280.
293–296.
recommender systems
39–46.
143–177.
Methods. In Recomme
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
— Who: ?
— THE ofﬂine comparison of OCCF algorithms. Many d
evaluation measures, multiple data split methods, s
— also empirically evaluate the explanations extracted
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
d =
d =
...
u
i
p(u
p(d
p(d
p(i
DP
evaluation measures, multiple data split methods, su
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
p(i|u) =
DX
p(i|d) · p(d|u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
max
P
Rui=1
log p(i | u)
|U| ⇥ |I|
|U| ⇥ D
D ⇥ |I|
|U|
|I|
D
ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
max
P
Rui=1
log p(i | u)
|U| ⇥ |I|
|U| ⇥ D
D ⇥ |I|
|U|
|I|
D
ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publicati
p(i|u) =
X
d=1
p(i|d) · p(d|u
max
P
Rui=1
log p(i | u)
|U| ⇥ |I|
|U| ⇥ D
D ⇥ |I|
|U|
|I|
D
ACM Computing Surveys, Vol. 1, No

pLSA matrix factorization notation
9. SYMBOLS FOR
U
I
R
REFERENCES
273–280.
293–296.
9. SYM
U
I
R
REFER
F. Aioll
273
Fabio A
293
U
I
R
D
REFERENCE
F. Aiolli. 2013
273–280.
Fabio Aiolli. 20
293–296.
S.S. Anand an
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
39–46.
143–177.
1:30
causes leakage.
7.2. online
— Who: Kanishka?
— Who: ?
x
U
I
R
D
d = 1
d = D
REFERENCES
273–280.
293–296.
— Who: ?
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
evaluation measu
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
REFERENCES
273–280.
293–296.
— THE o
evalua
— also em
9. SYMB
x
U
I
R
D
d = 1
d = D
...
REFERE
F. Aiolli. 2
273–2
Fabio Aiol
293–2
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
273–280.
293–296.
39–46.
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(i | d)
REFERENCES
273–280.
293–296.
recommender systems
39–46.
143–177.
Methods. In Recomme
p(i|u) =
DX
d=1
p(i|d) · p(d|u)
— Who: ?
— THE ofﬂine comparison of OCCF algorithms. Many d
evaluation measures, multiple data split methods, s
— also empirically evaluate the explanations extracted
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
d =
d =
...
u
i
p(u
p(d
p(d
p(i
DP
evaluation measures, multiple data split methods, su
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
p(i|u) =
DX
p(i|d) · p(d|u)
D ⇥
|U|
|I|
|U|
|I|
|I|
D
Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31
Sui = S
(1)
u⇤ · S
(2)
⇤i
S = S(1)
S(2)
ltering: A Theoretical and Experimental Comparison of the State Of The Art1:31
Sui = S
(1)
u⇤ · S
(2)
⇤i
S = S(1)
S(2)

Scores = Matrix Factorization
— Who: ?
— THE ofﬂine comparison of OCCF alg
evaluation measures, multiple data
— also empirically evaluate the explan
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
— also empirically evaluate the explan
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
p(i|u)
max
P
log p(i | u)
D ⇥
|U|
|I|
|U|
|I|
|I|
D
Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of
Sui = S
(1)
u⇤ · S
(2)
⇤i
S = S(1)
S(2)
Collaborative Filtering: A Theoretical and Experimental Comparison of the Stat
Sui = S
(1)
u⇤ · S
(2)
⇤i
S = S(1)
S(2)

Deviation Function
S =
⇣
S(1,1)
· · · S(1,F1)
⌘
+ · · · +
⇣
S(T,1)
· · · S(T,FT )
⌘
max
X
Rui=1
log p(i|u)
Efﬁcient Top-N Recommendation for Very Large Scale Binary Rated Dat
4. Convex AUC optimization for top-N recommendation with implicit feed
B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
akopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear
r systems. In Advances in Knowledge Discovery and Data Mining. Spring
, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommend
max
X
Rui=1
log p(i|u)
max
X
Rui=1
log Sui
min
X
Rui=1
log Sui
min D (S, R) =
X
Rui=1
log Sui
min D (S, R)
S =
⇣
S(1,1)
· · · S(1,F1)
⌘
+ · · · +
⇣
S(T,1)
· · · S(T,FT )
⌘
max
X
Rui=1
log p(i|u)
max
X
Rui=1
log Sui
min
X
Rui=1
log Sui
min D (S, R) =
X
Rui=1
log Sui
S =
⇣
S(1,1)
· · · S(1,F1)
⌘
+ · · · +
⇣
S(T,1)
· · · S(T,FT )
⌘
max
X
Rui=1
log p(i|u)
max
X
Rui=1
log Sui
min
X
Rui=1
log Sui
min D (S, R) =
X
Rui=1
log Sui

Summary: 2 Basic Building Blocks
Factorization Model
Deviation Function

Agenda
•  Introduction
•  Algorithms
– Elegant example
– Models
– Deviation functions
– Parameter inference
•  Netflix

pLSA soft clustering interpretation
user-item scores
user-cluster aﬃnity
item-cluster aﬃnity
mixed clusters
[Hofmann 2004]
[Hu et al. 2008]
[Pan et al. 2008]
[Sindhwani et al. 2010]
[Yao et al. 2014]
[Pan and Scholz 2009]
[Rendle et al. 2009]
[Shi et al. 2012]
[Takàcs and Tikk 2012]

pLSA soft clustering interpretation
— Who: Kanishka?
— Convince the rea
8. EXPERIMENTAL
— Who: ?
evaluation meas
9. SYMBOLS FOR P
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
max
P
Rui=1
log p(i
REFERENCES
273–280.
293–296.
— Who: ?
— THE offline co
evaluation me
— also empirical
9. SYMBOLS FO
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) =
P
i2I
p(i | d) = 1
max
P
Rui=1
log
REFERENCES
273–280.
293–296.
S.S. Anand and B. M
D ⇥
|U|
|I|
|U|
|I|
|I|
D
0.05

0.1

0.5

0.3

0.4

0.1

0.4

0.1

max
P
Rui=1
|U| ⇥ |I|
|U| ⇥ D
D ⇥ |I|
|U|
|I|
D = 4
d = 1
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
p(i|u)
max
P
Rui=1
log p(i | u)
|U| ⇥ |I|
|U| ⇥ D
D ⇥ |I|
|U|
|I|
D = 4
d = 1
ACM Comp
Binary, Positive-Only Collaborative Filtering
d = 2
d = 3
d = 4
S
Binary, Positive-Only Collaborative Filtering
d = 2
d = 3
d = 4
S
Binary, Positive-Only Collaborative Filtering: A
d = 2
d = 3
d = 4
Sui
S
0.04

0.01

0.20

0.03

0.28

user-item scores
user-cluster affinity
item-cluster affinity

Hard Clustering
user-item scores
user-uCluster membership
item-iCluster membership
item probabilities
uCluster-iCluster similarity
[Hofmann 2004]
[Hofmann 1999]
[Ungar and Foster 1998]

Item Similarity dense
user-item scores original rating matrix item-item similarity
[Aiolli 2013]

Item Similarity sparse
user-item scores item-item similarityoriginal rating matrix
[Deshpande and Karypis 2004]
[Sigurbjörnsson and Van Zwol 2008]
[Ning and Karypis 2011]

User Similarity sparse
user-item scores
column normalized
original rating matrix
(row normalized)
user-user similarity
[Sarwar et al. 2000]

User Similarity dense
user-item scores
column normalized
(row normalized)
user-user similarity
[Aiolli 2014]
[Aiolli 2013]

User+Item Similarity
[Verstrepen and Goethals 2014]

Factored Item Similarity symmetrical
user-item scores original rating matrix Identical item profiles
evaluation measures, mult
9. SYMBOLS FOR PRESENTAT
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) = 1
P
i2I
p(i | d) = 1
9. SYMBOLS FO
x
U
I
R
D
d = 1
d = D
...
u
i
p(u | i)
p(d | u)
p(d | u) 0
p(i | d) 0
DP
d=1
p(d | u) =
P
i2I
p(i | d) = 1
max
P
Rui=1
log
item clusters
Item-cluster aﬃnity
Similarity by dotproduct
[Weston et al. 2013b]

Factored Item Similarity asymmetrical + bias
user-item scores
row normalized
Item profile if
known preference
Item profile
if candidateitem biasesuser biases
[Kabbur et al. 2013]

Higher Order Item Similarity inner product
user-item scores extended rating matrix Itemset-item similarity
selected higher order
itemsets[Christakopoulou and Karypis 2014]
[Deshpande and Karypis 2004]
[Menezes et al. 2010]
[van Leeuwen and Puspitaningrum 2012]
[Lin et al. 2002]

Higher Order Item Similarity max product
0.05

0.1

0.5

0.3

0.4

0.1

0.4

0.1

0.04

0.01

0.20

0.03

0.20

max
MP
[Sarwar et al. 2001]
[Mobasher et al. 2001]

Higher Order User Similarity inner product
user-item scores user-userset similarity extended rating matrix
selected higher order
usersets
[Lin et al. 2002]

Best of few user models non linearity by max
[Weston et al. 2013a]
Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison o
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
=
Z
( ) · p( | ) · d
. . .
REFERENCES
Fabio Aiolli. 2013. Efﬁcient top-n recommendation for very large scale binary rated datasets. In P
of the 7th ACM conference on Recommender systems. ACM, 273–280.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback.
ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear meth
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
=
Z
( ) · p( | ) · d
. . .
ENCES
olli. 2013. Efﬁcient top-n recommendation for very large scale binary rated datasets. In Proceedings
he 7th ACM conference on Recommender systems. ACM, 273–280.
olli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-
of the 8th ACM Conference on Recommender systems. ACM, 293–296.
Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
hop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
ia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n
mmender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
emonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-
commendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,
46.
~ 3 models/user

Best of all user models efficient max out of
[Verstrepen and Goethals 2015]
Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Compariso
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R
=
Z
( ) · p( | ) · d
. . .
REFERENCES
Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. I
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedbac
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear me
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer,
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
=
Z
( ) · p( | ) · d
. . .
ENCES
olli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings
he 7th ACM conference on Recommender systems. ACM, 273–280.
olli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-
of the 8th ACM Conference on Recommender systems. ACM, 293–296.
Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
hop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
ia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n
mmender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.
emonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-
commendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,
46.
. . .
max log p(S|R)
max log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
X
u2U
X
i2I
↵Rui log Sui + log(1 Sui) +
⇣
||S(
2|u|
models/user
REFERENCES
Fabio Aiolli. 2013. Efficient top-n recommendation for very large scale b
of the 7th ACM conference on Recommender systems. ACM, 273–280
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation
ings of the 8th ACM Conference on Recommender systems. ACM, 29
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recom
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springe
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-o
recommender systems. In Advances in Knowledge Discovery and Da
n recommendation tasks. In Proceedings of the fourth ACM conferen
39–46.
Mukund Deshpande and George Karypis. 2004. Item-based top-n recom
2 models/
2|u|
REFERENCES
Fabio Aiolli. 2013. Efficient top-n recommendation for very
of the 7th ACM conference on Recommender systems. A
Fabio Aiolli. 2014. Convex AUC optimization for top-N rec
ings of the 8th ACM Conference on Recommender syste
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Con
C.M. Bishop. 2006. Pattern Recognition and Machine Lear
Evangelia Christakopoulou and George Karypis. 2014. Ho
recommender systems. In Advances in Knowledge Dis
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010
n recommendation tasks. In Proceedings of the fourth

Combination item vectors can be shared
[Kabbur and Karypis 2014]
Binary, Positive-Only Collaborative Filtering: A Theoretical and E
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
=
Z
( ) · p(
D(S, R) = DKL(Q(S)||p(S|
. . .
REFERENCES
Fabio Aiolli. 2013. Efﬁcient top-n recommendation for very large sca
of the 7th ACM conference on Recommender systems. ACM, 273
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommenda
ings of the 8th ACM Conference on Recommender systems. ACM
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual r
C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Spr
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Hig
recommender systems. In Advances in Knowledge Discovery an
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R
=
Z
( ) · p( | ) · d
. . .
REFERENCES
Fabio Aiolli. 2013. Efﬁcient top-n recommendation for very large scale binary rated datasets.
Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedba
Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer
Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear m
recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer,
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender alg
n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender
39–46.

Sigmoid link function for probabilistic frameworks
[Johnson 2014]
= r
u2U i2I
Rui=1
j2I
Duij(S, R) =
=
Z
( ) · p(

ive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of t
rD(S, R) = r
X
u2U
X
i2I
Dui(S, R) =
X
u2U
X
i2I
rDui(S, R)
rD(S, R) = r
X
u2U
X
i2I
Rui=1
X
j2I
Duij(S, R) =
X
u2U
X
i2I
Rui=1
X
j2I
rDuij(S, R)
=
Z
( ) · p( | ) · d
CES
2013. Efﬁcient top-n recommendation for very large scale binary rated datasets. In Pro
h ACM conference on Recommender systems. ACM, 273–280.
2014. Convex AUC optimization for top-N recommendation with implicit feedback. In
he 8th ACM Conference on Recommender systems. ACM, 293–296.
gh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.
Pdf over parameters i.s.o. point estimation
[Koeningstein et al. 2012]
[Paquet and Koeningstein 2013]

Summary: 2 Basic Building Blocks
Factorization Model
Deviation Function
a.k.a. What do we minimize in order to find the
parameters in the factor matrices?

Local Minima depending on initialisation
Rui=1
X
Rui=1
log Sui
=
X
Rui=1
log Sui
n D (S, R)
for Very Large Scale Binary Rated Datasets. In RecSys.
top-N recommendation with implicit feedback. In RecSys.
ecommendation. In WebMine. 142–160.
hine Learning. Springer, New York, NY.
2014. Hoslim: Higher-order sparse linear method for top-n
ledge Discovery and Data Mining. Springer, 38–49.
urrin. 2010. Performance of recommender algorithms on
s of the fourth ACM conference on Recommender systems.
d Top-N Recommendation Algorithms. TOIS 22, 1 (2004),
hensive Survey of Neighborhood-based Recommendation
ok, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).
hirani. 2010. Regularization paths for generalized linear
tistical software 33, 1 (2010), 1.
en PLSA and NMF and implications. In SIGIR. 601–602.
Indexing. In SIGIR. 50–57.
ls for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1
Mining and Knowledge Discovery Handbook, O. Mainmon
Y.
for all i, j 2 I
for all u, v 2 U
X
i2I
X
j2I
⇣
sim(j, i) · |KN
every row S
(1)
u. and
(S(1,1)
, . . . , S(T,F )
)
REFERENCES

Max Likelihood high scores for known preferences
max
Rui=1
log Sui
min
X
Rui=1
log Sui
in D (S, R) =
X
Rui=1
log Sui
Binary, Positive-Only Collaborative Filtering: A Theoretical a
d = 2
d = 3
d = 4
D = |I|
S
S(1)
S(2)
S
(1)
ud 0
S
(1)
di 0
Binary, Positive-Only Collaborative Filtering: A Th
d = 2
d = 3
d = 4
D = |I|
S
S(1)
S(2)
S
(1)
ud 0
S
(1)
di 0
Binary, Positive-Only Collaborative
d = 2
d = 3
d = 4
D = |I|
S
S(1)
S(2)
S
(1)
ud 0
S
(1)
di 0
1
1
[Hofmann 2004]
[Hofmann 1999]

Reconstruction
w(j) =
X
u2U
Ruj
X
u2U
X
Rui=1
X
Ruj =0
((Rui Ruj) (Sui Suj))
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
NCES
2013. Efﬁcient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In Re
280.
lli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re
296.
d and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
op. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.
Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for

Reconstruction
w(j) =
X
u2U
Ruj
X
u2U
X
Rui=1
X
Ruj =0
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
NCES
280.
296.
`Ridge’regularization[Kabbur et al. 2013]

Reconstruction
Elastic net regularization
w(j) =
X
u2U
Ruj
X
u2U
X
Rui=1
X
Ruj =0
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
NCES
280.
296.
(1 0)2
= 1 = (1 2)2
w(j) =
X
u2U
Ruj
X
u2U
X
Rui=1
X
Ruj =0
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
X
u2U
X
i2I
(Rui Sui)
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F + ||S(t,f)
||1
FERENCES
Aiolli. 2013. Efﬁcient Top-N Recommendation for Very Large Scale Binary Rated Dataset
273–280.
bio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedbac
293–296.
`Ridge’regularization
[Ning and Karypis 2011]
[Christakopoulou and Karypis 2014]

Reconstruction between AMAU and AMAN
u2U i2I
rly makes the AMAU assumption. Making the A
all missing values are interpreted as an absenc
on becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
he AMAU assumption is too careful because the
atives. On the other hand, the AMAN assumpti
ually searching for the preferences among the un
t al. 2008] and Pan et al. [Pan et al. 2008] simul
een AMAU and AMAN:
D (S, R) =
X X
Wui (Rui Sui)
2
,
AMAN

999]. Ungar and Foster [Ungar and Foster 199
ethod, but remain vague about the details of t
tion Based Deviation Functions. Next, there is a
-based matrix factorization algorithms for ra
Bell 2011]. They start from the 2-factor factor
(Eq. 3) but strip the parameters of all their st
ated to be an approximate, factorized reconstru
h is to ﬁnd S(1)
and S(2)
such that they mini
ror between S and R. A deviation function th
D (S, R) =
X
u2U
X
i2I
Rui (Rui Sui)
2
.
u2U i2I
on becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
een AMAU and AMAN:
D (S, R) =
X X
Wui (Rui Sui)
2
,
AMAU
AMAN

999]. Ungar and Foster [Ungar and Foster 199
ethod, but remain vague about the details of t
tion Based Deviation Functions. Next, there is a
-based matrix factorization algorithms for ra
Bell 2011]. They start from the 2-factor factor
(Eq. 3) but strip the parameters of all their st
ated to be an approximate, factorized reconstru
h is to find S(1)
and S(2)
such that they mini
ror between S and R. A deviation function th
D (S, R) =
X
u2U
X
i2I
Rui (Rui Sui)
2
.
u2U i2I
on becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
een AMAU and AMAN:
D (S, R) =
X X
Wui (Rui Sui)
2
,
nction becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
, the AMAU assumption is too careful because the
egatives. On the other hand, the AMAN assumpti
actually searching for the preferences among the u
Hu et al. 2008] and Pan et al. [Pan et al. 2008] simu
tween AMAU and AMAN:
D (S, R) =
X
u2U
X
i2I
Wui (Rui Sui)
2
,
n⇥m
assigns a weight to every value in R. The hig
bout Rui. There is a high confidence about the one
fidence about the zeros being dislikes. To formaliz
2008] give two potential definitions of Wui:
AMAU
AMAN
Middle Way

Reconstruction choosing W
nction becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
, the AMAU assumption is too careful because the
egatives. On the other hand, the AMAN assumpti
actually searching for the preferences among the u
Hu et al. 2008] and Pan et al. [Pan et al. 2008] simu
tween AMAU and AMAN:
D (S, R) =
X
u2U
X
i2I
Wui (Rui Sui)
2
,
n⇥m
assigns a weight to every value in R. The hig
bout Rui. There is a high confidence about the one
fidence about the zeros being dislikes. To formaliz
2008] give two potential definitions of Wui:
Middle Way
d=1
X
i2I
S
(1)
di = 1
⇢
Wui = 1 if Rui = 0
Wui = ↵ if Rui = 1
ient Top-N Recommendation for Very Large Scale Binary Rate
eys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Reconstruction regularization
ollaborative Filtering: A Theoretical and Experimental Comparison
matrix factorization of its statistical meaning, also the c
9 disappear. Simply minimizing Equation 11 however r
t are overﬁtted on the training data. Therefore both Hu
o minimize a regularized version
R) =
X
u2U
X
i2I
Wui (Rui Sui)
2
+
⇣
||S(1)
||F + ||S(2)
||F
⌘
,
egularization hyperparameter and ||.||F the Frobenius
an make it hard to ﬁnd a good value. Additionally, Pan
te regularization:
=
X
u2U
X
i2I
Wui
⇣
(Rui Sui)
2
+
⇣
||S
(1)
u⇤ ||F + ||S
(2)
⇤j ||F
⌘⌘
.
Squared reconstruction error term Regularization term
Regularization hyperparameter
[Hu et al. 2008]
[Pan et al. 2008]

Reconstruction more complex
matrix factorization of its statistical meaning, also the c
disappear. Simply minimizing Equation 11 however re
are overfitted on the training data. Therefore both Hu
o minimize a regularized version
) =
X
u2U
X
i2I
Wui (Rui Sui)
2
+
⇣
||S(1)
||F + ||S(2)
||F
⌘
,
gularization hyperparameter and ||.||F the Frobenius n
an make it hard to find a good value. Additionally, Pan
te regularization:
=
X
u2U
X
i2I
Wui
⇣
(Rui Sui)
2
+
⇣
||S
(1)
u⇤ ||F + ||S
(2)
⇤j ||F
⌘⌘
.
function is defined over all user-item pairs, a direct
s stochastic gradient descent (SGD), which is frequentl
orizations in rating prediction problems, seems unfeasib

Reconstruction rewritten
S ||F + ||S ||F
X
2U
X
i2I
(1 Rui)H (Pui) ,
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui (0 Sui)
2
+ ||S(1)
||F + ||S(2)
||F
N Recommendation for Very Large Scale Binary Rated Datasets
UC optimization for top-N recommendation with implicit feedback

S ||F + ||S ||F
X
2U
X
i2I
(1 Rui)H (Pui) ,
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui (0 Sui)
2
+ ||S(1)
||F + ||S(2)
||F
N Recommendation for Very Large Scale Binary Rated Datasets
UC optimization for top-N recommendation with implicit feedback
Reconstruction guess unknown = 0

+
X
u2U
X
i2I
(1 Rui)Wui (0 Sui)
2
+ ||S(1)
||F + ||S(2)
||F
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
p (1 Sui)
2
+ (1 p) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
Reconstruction unknown can also be 1
[Yao et al. 2014]

Reconstruction less assumptions, more parameters
u2U i2I
+
X
u2U
X
i2I
(1 Rui)Wui (0 Sui)
2
+ ||S(1)
||F + ||S(2)
||F
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
NCES
013. Efﬁcient Top-N Recommendation for Very Large Scale Binary Rated Datas
80.
i. 2014. Convex AUC optimization for top-N recommendation with implicit feedba
96.
and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
p. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.

Reconstruction more regularization
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
↵
X
u2U
X
i2I
(1 Rui)H (Pui)
NCES
80.

sitive-Only Collaborative Filtering: A Theoretical and Experimental Compariso
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
↵
X
u2U
X
i2I
(1 Rui)H (Pui)
NCES
80.
Reconstruction more (flexible) parameters
[Sindhwani et al. 2010]

Reconstruction conceptual flaw
imate, factorized reconstruction of R
d S(2)
such that they minimize the
R. A deviation function that reﬂect
X
2U
X
i2I
Rui (Rui Sui)
2
.
AU assumption. Making the AMAN
s are interpreted as an absence of pr
=
X X
(Rui Sui)
2
.
1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 S
F + ||S(2)
||F
X
I
(1 Rui)H (Pui)
(1 0)2
= 1 = (1 2)2
-N Recommendation for Very Large Scale Binary Rate
UC optimization for top-N recommendation with impli

Log likelihood similar idea
[C. Johnson 2014]
. . .
max log p(S|R)
max log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
X
u2U
X
i2I
↵Rui log Sui + log
⇣
1 Sui) + (||S(1)
||2
F + ||S(2)
||2
F
⌘
NCES
. 2013. Efﬁcient top-n recommendation for very large scale binary rated datasets.
7th ACM conference on Recommender systems. ACM, 273–280.

Log likelihood similar idea
[C. Johnson 2014]
. . .
max log p(S|R)
max log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
X
u2U
X
i2I
↵Rui log Sui + log
⇣
1 Sui) + (||S(1)
||2
F + ||S(2)
||2
F
⌘
NCES
. 2013. Efﬁcient top-n recommendation for very large scale binary rated datasets.
. . .
max log p(S|R)
max log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
log
Y
u2U
Y
i2I
S↵Rui
ui (1 Sui)
X
u2U
X
i2I
↵Rui log Sui + log(1 Sui) +
⇣
||S(1)
||2
F + ||S(2)
||2
F
⌘
NCES
li. 2013. Efﬁcient top-n recommendation for very large scale binary rated datasets. In P
Zero-‐mean,
spherical

Gaussian
priors

Maximum Margin not all preferences equally preferred
⇢
˜Rui = 1 if Rui = 1
˜Rui = 1 if Rui = 0,
on funtion as
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Wuih
⇣
˜Rui · Sui
⌘
+ ||S||⌃,
orm, a regularization hyperparameter, h
⇣
˜Rui · Su
igure 3 [Rennie and Srebro 2005] and W given b
on incorporates the conﬁdence about the training da
knowledge about the degree of preference by means
e the degree of preference is considered unknown, a

Maximum Margin not all preferences equally preferred
⇢
on funtion as
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Wuih
⇣
˜Rui · Sui
⌘
+ ||S||⌃,
orm, a regularization hyperparameter, h
⇣
˜Rui · Su
igure 3 [Rennie and Srebro 2005] and W given b
on incorporates the confidence about the training da
knowledge about the degree of preference by means
e the degree of preference is considered unknown, a
1999]. Ungar and Foster [Ungar and Foster 1998] proposed a simila
method, but remain vague about the details of their method.
uction Based Deviation Functions. Next, there is a group of algorithm
D-based matrix factorization algorithms for rating prediction prob
d Bell 2011]. They start from the 2-factor factorization that describe
l (Eq. 3) but strip the parameters of all their statistical meaning. In
lated to be an approximate, factorized reconstruction of R. A straigh
h is to find S(1)
and S(2)
such that they minimize the the square
rror between S and R. A deviation function that reflects this line o
D (S, R) =
X
u2U
X
i2I
Rui (Rui Sui)
2
.
early makes the AMAU assumption. Making the AMAN assumption
nd, all missing values are interpreted as an absence of preference an
nction becomes
D (S, R) =
X
u2U
X
i2I
(Rui Sui)
2
.
, the AMAU assumption is too careful because the vast majority of th
has important practical consequences: If Rui = 1, the square loss
and Sui = 2. However, Sui = 2 is a much better prediction than Sui
the reconstruction based deviation functions (implicitly) assume
are equally strong, which is an important simplification of reality
A deviation function that does not suffer from this flaw was p
Scholz [Pan and Scholz 2009], who applied the idea of Maximum
torization (MMMF) by Srebro et al. [Srebro et al. 2004] to binary,
orative filtering. They construct the matrix ˜R as
⇢
and define the deviation funtion as
D
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Wuih
⇣
˜Rui · Sui
⌘
+ ||S||⌃
with ||.||⌃ the trace norm, a regularization hyperparameter, h
⇣
hinge loss given by Figure 3 [Rennie and Srebro 2005] and W
Equations 14-16.
The deviation function incorporates the confidence about the tra
of W and the missing knowledge about the degree of preference b
loss h
⇣
˜Rui · Sui
⌘
. Since the degree of preference is considered unk
1 is not penalized.
ons. Notice that R is a binary matrix and
ued matrix. Therefore, the interpretation
amentally flawed. This fundamental flaw
i = 1, the square loss is 1 for both Sui = 0
er prediction than Sui = 0. Put differently,
s (implicitly) assume that all preferences
mplification of reality.
from this flaw was proposed by Pan and
he idea of Maximum Margin Matrix Fac-
et al. 2004] to binary, positive-only collab-
˜R as
if Rui = 1
if Rui = 0,
p(i|d2)
p(i|dD)
Sui = 0
Sui = 2
p(i|d1)
p(i|d2)
p(i|dD)
Sui = 0
Sui = 2
1
1
~ 0.5
0

e J(U, V, ) is fairly simple. Ignoring fo
e non-differentiability of h(z) = (1 z
ient of J(U, V, ) is easy to compute. T
ve with respect to each element of U is:
Uia C
R 1X
r=1
X
j|ij2S
Tij(k)h
⇣
T r
ij( ir
to the best of our knowledge they have not yet been used for one-
tering.
anking Based Deviation Functions. The scores computed by recommen
used to personally rank all items for every user. Therefore, Rendle
2009] argued that it is natural to directly optimize the ranking. M
aim to maximize the area under the ROC curve (AUC), which is
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui>0
X
Ruj=0
(Sui > Suj),
ue) = 1 and (false) = 0. If the AUC is higher, the pairwise rankin
odel S are more in line with the observed data R. However, beca
on-differentiable, their deviation function is a differentiable app
gative AUC from which constant factors have been removed and
ation term has been added:
D
⇣
S, ˜R
⌘
=
X
u2U
X
Rui>0
X
Ruj=0
log (Suj Sui) 1||S(1)
||2
F 2||S(2)
|
the sigmoid function and 1, 2 regularization constants, which
AUC directly optimize the ranking

AUC directly optimize the ranking
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
r>(Suj | {Suk | Ruk = 0})
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
ES
3. Efﬁcient Top-N Recommendation for Very Large Scale Binary Rated Datasets. I

AUC non-diﬀerentiable
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
ES

AUC smooth approximation
n-differentiability of h(z) = (1 z)+
of J(U, V, ) is easy to compute. The
th respect to each element of U is:
C
R 1X
r=1
X
j|ij2S
Tij(k)h
⇣
Tr
ij( ir Ui
ased Deviation Functions. The scores computed by recommender s
personally rank all items for every user. Therefore, Rendle et al
rgued that it is natural to directly optimize the ranking. More s
maximize the area under the ROC curve (AUC), which is give
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui>0
X
Ruj=0
(Sui > Suj),
and (false) = 0. If the AUC is higher, the pairwise rankings in
are more in line with the observed data R. However, because
rentiable, their deviation function is a differentiable approxim
AUC from which constant factors have been removed and to w
rm has been added:
⌘
=
X
u2U
X
Rui>0
X
Ruj=0
log (Suj Sui) 1||S(1)
||2
F 2||S(2)
||2
F ,
moid function and 1, 2 regularization constants, which are
he method. Notice that this deviation function coniders all m
negative, i.e. it corresponds to the AMAN assumption.[Rendle et al. 2009]
max
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
Sui Suj
2
max min
↵u⇤
X
Rui=1
X
Ruj =0
↵ui↵uj(Sui Suj)
X
u2U
X
Rui=1
X
Ruj=0
(Sui > Suj)
X
u2U
X
Rui=1
X
Ruj =0
(Suj + 1 Sui)
AUC =
1
|U|
X
u2U
1
|u| · (|I| |u|)
X
Rui=1
X
Ruj =0
(Sui > Suj),
ES

Pairwise Ranking 2 similar to AUC
+ ||S ||F + ||S ||F
↵
X
u2U
X
i2I
(1 Rui)H (Pui) (57
(1 0)2
= 1 = (1 2)2
(58
w(j) =
X
u2U
Ruj
˜R
⌘
=
X
u2U
X
Rui=1
X
Ruj =0
2
+
TX
t=1
FX
f=1
tf ||S(t,f)
||2
F
ES
3. Efﬁcient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSy
2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSy
nd B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
ristakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-
nder systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.

Pairwise Ranking 3 no regularization, also 1 to 1
+
⇣
||S(1)
||2
F + ||S(2)
||2
F
⌘
,
larization constant and () the sigmoid function. Notice that this
n de facto ignores all missing feedback, i.e. it corresponds to the AM
r ranking based deviation function was proposed by Takács
and Tikk 2012]
D
⇣
S, ˜R
⌘
=
X
u2U
X
i2I
Rui
X
j2I
w(j) ((Sui Suj) (Rui Ruj))
2
,
er-defined item weighting function. The simplest choice is w(j) =
native proposed by Takács and Tikk is w(j) =
P
u2U Ruj. This devia
some resemblance with the one in Equation 4.1.4. However, a squ
stead of the log-loss of the sigmoid. Furthermore, this deviation fun
s the score-difference between all known preferences, which is not
.1.4. Finally, it is remarkable that Takács and Tikk explicitly do not
on term, whereas most other authors find that the regularization ter
their models performance.
or Probability Deviation Functions. At this point, we almost finished
X
u2U
X
i2I
RuiWui (1 Sui)
2
+
X
u2U
X
i2I
(1 Rui)Wui
⇣
Pui (1 Sui)
2
+ (1 Pui) (0 Sui)
2
⌘
+ ||S(1)
||F + ||S(2)
||F
↵
X
u2U
X
i2I
(1 Rui)H (Pui)
(1 0)2
= 1 = (1 2)2
w(j) =
X
u2U
Ruj
NCES
80.
i. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Re
96.
and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.
[Takàcs and Tikk 2012]

al derivative with respect to Vja is analo
rivative with respect to ik is
@J
@ ir
= C
X
j|ij2S
Tr
ijh
⇣
Tr
ij( ir UiVj )
gradient in-hand, we can turn to gradie
he sigmoid function and 1, 2 regularization constants, whic
s of the method. Notice that this deviation function conider
qually negative, i.e. it corresponds to the AMAN assumption.
, very often, only the N highest ranked items are shown to use
Shi et al. 2012] propose to minimize the mean reciprocal rank (M
. The MRR is deﬁned as
MRR =
1
|U|
X
u2U
r>
✓
max
Rui=1
Sui | Su⇤
◆ 1
,
g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
MRR focus on top of the ranking
[Shi et al. 2012]

al derivative with respect to Vja is analo
rivative with respect to ik is
@J
@ ir
= C
X
j|ij2S
Tr
ijh
⇣
Tr
ij( ir UiVj )
gradient in-hand, we can turn to gradie
he sigmoid function and 1, 2 regularization constants, whic
s of the method. Notice that this deviation function conider
qually negative, i.e. it corresponds to the AMAN assumption.
, very often, only the N highest ranked items are shown to use
Shi et al. 2012] propose to minimize the mean reciprocal rank (M
. The MRR is deﬁned as
MRR =
1
|U|
X
u2U
r>
✓
max
Rui=1
Sui | Su⇤
◆ 1
,
g Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.
MRR non-diﬀerentiable
[Shi et al. 2012]

Tutorial bpocf

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Tutorial bpocf

Semelhante a Tutorial bpocf (10)

Último

Último (20)

Tutorial bpocf