Comparing State-of-the-Art Collaborative Filtering Systems

Comparing State-of-the-Art
Collaborative Filtering Systems

Laurent Candillier, Frank Meyer, Marc Boull´
e
Introduction

France Telecom R&D Lannion
Collaborative
approaches
MLDM 2007
Experiments

Conclusions

1 Introduction

2 Collaborative approaches

3 Experiments

4 Conclusions

Recommender systems

Help users ﬁnd items they should appreciate from huge
catalogues [Adomavicius and Tuzhilin, 2005]
Introduction

Collaborative
approaches
⇒ Collaborative ﬁltering : based on user to item rating matrix
Experiments

Conclusions
i1 i2 i3 i4 i5
4 4 1
u1
4 3
u2
5 2 1
u3
4 5
u4
5 4
u5
5 3
u6
4 ? 1
u7

User-based approaches

Recommend items appreciated by users whose tastes are similar
to the ones of the given user [Resnick et al., 1994]
Introduction

⇒ need a similarity measure between users
Collaborative
approaches
ex : pearson similarity : cosine of deviation from the mean
Experiments

Conclusions

i ∈Sa ∩Su (vai − va )(vui − vu )
w (a, u) =
− va )2 − vu )2
i ∈Sa ∩Su (vai i ∈Sa ∩Su (vui

vui : rating of user u on item i
Su : set of items rated by user u
vu : mean rating of user u

vui
i ∈Su
vu =
|Su |

User-based approaches

Which rating for user a (active) on item i ?
Introduction

Collaborative
approaches
Prediction using weighted sum
Experiments

{u|i ∈Su } w (a, u) × vui
Conclusions
pai =
{u|i ∈Su } |w (a, u)|

Prediction using weighted sum of deviations from the mean

{u|i ∈Su } w (a, u) × (vui − vu )
pai = va +
{u|i ∈Su } |w (a, u)|

How many neighbors considered ?

Cluster-based approaches

Recommend items appreciated by users that belong to the
Introduction

same group as the given user [Breese et al., 1998]
Collaborative
approaches

Experiments
⇒ need
Conclusions
a clustering method : ex : K-means
a distance measure : ex : euclidian distance

Then the rating of a user on an item is the mean rating given
by the users that belong to the same cluster

How many clusters considered ?

Item-based approaches

Recommend items similar to those appreciated by the given
user [Karypis, 2001]
Introduction

Collaborative
approaches
⇒ dual of user-based approach
Experiments

Conclusions
× (vaj − vj )
{j∈Sa |j=i } sim(i , j)
pai = vi +
{j∈Sa |j=i } |sim(i , j)|

sim(i , j) : similarity measure between items i and j
Sa : set of items rated by user a
vi : mean rating on item i

How many neighbors considered ?

Experiments

For user- and item-based approaches, choose
similarity measure
prediction scheme
Introduction

Collaborative
neighborhood size K
approaches

For cluster-based approaches, choose
Experiments

distance measure
Conclusions

prediction scheme
number of clusters
Evaluation protocol [Herlocker et al., 2004]
movie rating dataset : MovieLens (6040 × 3706)
10-fold cross validation (10 × 9/10th for learning)
Mean Absolute Error Rate on test set T = {(u, i , r )}
1
MAE = |pui − r |
|T |
(u,i ,r )∈T

User-based approaches, similarity measures

MAE
Introduction
Pearson
Collaborative
Constraint
approaches
0.8 Cosine
Experiments
Adjusted
Conclusions
Proba
0.76

0.72

0.68

0 500 1000 1500 2000 2500 K

User-based approaches, prediction schemes

MAE
Introduction
PearsonWeighted
Collaborative
PearsonDeviation
approaches
0.8 ProbaWeighted
Experiments
ProbaDeviation
Conclusions

0.76

0.72

0.68

0 500 1000 1500 2000 2500 K

Item-based approaches, similarity measures

MAE
Introduction
Pearson
Collaborative
Constraint
approaches
0.76 Cosine
Experiments
Adjusted
Conclusions
Proba
0.72

0.68

0.64

0 200 400 600 800 1000 1200 1400 K

Summary of experiments

BestDefault BestUser BestItem BestCluster
Introduction model construction
1 730 170 254
time (in sec.)
Collaborative
prediction time
approaches
1 31 3 1
(in sec.)
Experiments

MAE 0.6829 0.6688 0.6382 0.6736
Conclusions

BestDefault : Bayes minimizing MAE
BestUser : pearson similarity, 1500 neighbors, prediction
using deviation from the mean
BestItem : probabilistic similarity, 400 neighbors,
prediction using deviation from the mean
BestCluster : K-means, euclidian distance, 4 clusters,
prediction using Bayes minimizing MAE

Conclusions

Introduction

Collaborative
All approaches, and all their possible options, are tested
approaches
under exactly the same conditions
Experiments

Bayes is a good compromise : low error rate, low
Conclusions

execution time, incremental
Deviation from the mean : better results, new for
item-based approaches
Similarity measures : pearson for user-based, probabilistic
for item-based

Conclusions

The item-based approach
Introduction

Collaborative
get the best performances in the experiments
approaches

seems to need fewer neighbors than user-based approach
Experiments

Conclusions
is also appropriate to navigate in item catalogues even
with no user information
may naturally use content data about items to improve its
results (idem for user-based approach with demographic
data)
results depend on the number of items compared to the
number of users ?

Next

Need to scale well even when faced with huge datasets
Introduction

ex : netﬂix prize : 100,480,507 ratings from 480,189 users on
Collaborative
approaches
17,770 movies
Experiments

select most relevant users [Yu et al., 2002]
Conclusions

reduce dimensionality with PCA or SVD
[Goldberg et al., 2001, Vozalis and Margaritis, 2005]
create a set of super-users [Rashid et al., 2006]
sampling ? stochastic ? bagging ?

Combine approaches ⇒ ensemble methods [Polikar, 2006]

P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J.
Riedl (1994)
Grouplens: an open architecture for collaborative filtering
Introduction

of netnews
Collaborative
approaches
In Conference on Computer Supported Cooperative Work,
Experiments
pages 175–186. ACM
Conclusions

J. Breese, D. Heckerman and C. Kadie (1998)
Empirical analysis of predictive algorithms for collaborative
filtering
In 14th Conference on Uncertainty in Artificial Intelligence,
pages 43–52. Morgan Kaufman
G. Karypis (2001)
Evaluation of item-based top-N recommendation
algorithms

In 10th International Conference on Information and
Knowledge Management, pages 247–254
K. Goldberg, T. Roeder, D. Gupta and C. Perkins (2001)
Introduction

Eigentaste: a constant time collaborative filtering
Collaborative
approaches
algorithm
Experiments
Information Retrieval, 4(2):133–151
Conclusions

K. Yu, X. Xu, J. Tao, M. Ester and H. Kriegel (2002)
Instance selection techniques for memory-based
collaborative filtering
In SIAM Data Mining
J. Herlocker, J. Konstan, L. Terveen and J. Riedl (2004)
Evaluating collaborative filtering recommender systems
ACM Transactions on Information Systems, 22(1):5–53
G. Adomavicius and A. Tuzhilin (2005)

Toward the next generation of recommender systems: a
survey of the state-of-the-art and possible extensions
IEEE Transactions on Knowledge and Data Engineering,
Introduction
17(6):734–749
Collaborative
approaches
M. Vozalis and K. Margaritis (2005)
Experiments
Applying SVD on item-based ﬁltering
Conclusions

In 5th International Conference on Intelligent Systems
Design and Applications, pages 464–469
A.M. Rashid, S.K. Lam, G. Karypis and J. Riedl (2006)
ClustKNN: a highly scalable hybrid model- &
memory-based CF algorithm
In KDD Workshop on Web Mining and Web Usage Analysis
R. Polikar (2006)
Ensemble systems in decision making
IEEE Circuits & Systems Magazine, 6(3):21–45

Comparing State-of-the-Art Collaborative Filtering Systems

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Comparing State-of-the-Art Collaborative Filtering Systems

Semelhante a Comparing State-of-the-Art Collaborative Filtering Systems (20)

Mais de nextlib

Mais de nextlib (20)

Último

Último (20)

Comparing State-of-the-Art Collaborative Filtering Systems