Handwritten Text Recognition for manuscripts and early printed texts
Comparing State-of-the-Art Collaborative Filtering Systems
1. Comparing State-of-the-Art
Collaborative Filtering Systems
Laurent Candillier, Frank Meyer, Marc Boull´
e
Introduction
France Telecom R&D Lannion
Collaborative
approaches
MLDM 2007
Experiments
Conclusions
1 Introduction
2 Collaborative approaches
3 Experiments
4 Conclusions
2. Recommender systems
Help users find items they should appreciate from huge
catalogues [Adomavicius and Tuzhilin, 2005]
Introduction
Collaborative
approaches
⇒ Collaborative filtering : based on user to item rating matrix
Experiments
Conclusions
i1 i2 i3 i4 i5
4 4 1
u1
4 3
u2
5 2 1
u3
4 5
u4
5 4
u5
5 3
u6
4 ? 1
u7
3. User-based approaches
Recommend items appreciated by users whose tastes are similar
to the ones of the given user [Resnick et al., 1994]
Introduction
⇒ need a similarity measure between users
Collaborative
approaches
ex : pearson similarity : cosine of deviation from the mean
Experiments
Conclusions
i ∈Sa ∩Su (vai − va )(vui − vu )
w (a, u) =
− va )2 − vu )2
i ∈Sa ∩Su (vai i ∈Sa ∩Su (vui
vui : rating of user u on item i
Su : set of items rated by user u
vu : mean rating of user u
vui
i ∈Su
vu =
|Su |
4. User-based approaches
Which rating for user a (active) on item i ?
Introduction
Collaborative
approaches
Prediction using weighted sum
Experiments
{u|i ∈Su } w (a, u) × vui
Conclusions
pai =
{u|i ∈Su } |w (a, u)|
Prediction using weighted sum of deviations from the mean
{u|i ∈Su } w (a, u) × (vui − vu )
pai = va +
{u|i ∈Su } |w (a, u)|
How many neighbors considered ?
5. Cluster-based approaches
Recommend items appreciated by users that belong to the
Introduction
same group as the given user [Breese et al., 1998]
Collaborative
approaches
Experiments
⇒ need
Conclusions
a clustering method : ex : K-means
a distance measure : ex : euclidian distance
Then the rating of a user on an item is the mean rating given
by the users that belong to the same cluster
How many clusters considered ?
6. Item-based approaches
Recommend items similar to those appreciated by the given
user [Karypis, 2001]
Introduction
Collaborative
approaches
⇒ dual of user-based approach
Experiments
Conclusions
× (vaj − vj )
{j∈Sa |j=i } sim(i , j)
pai = vi +
{j∈Sa |j=i } |sim(i , j)|
sim(i , j) : similarity measure between items i and j
Sa : set of items rated by user a
vi : mean rating on item i
How many neighbors considered ?
7. Experiments
For user- and item-based approaches, choose
similarity measure
prediction scheme
Introduction
Collaborative
neighborhood size K
approaches
For cluster-based approaches, choose
Experiments
distance measure
Conclusions
prediction scheme
number of clusters
Evaluation protocol [Herlocker et al., 2004]
movie rating dataset : MovieLens (6040 × 3706)
10-fold cross validation (10 × 9/10th for learning)
Mean Absolute Error Rate on test set T = {(u, i , r )}
1
MAE = |pui − r |
|T |
(u,i ,r )∈T
11. Summary of experiments
BestDefault BestUser BestItem BestCluster
Introduction model construction
1 730 170 254
time (in sec.)
Collaborative
prediction time
approaches
1 31 3 1
(in sec.)
Experiments
MAE 0.6829 0.6688 0.6382 0.6736
Conclusions
BestDefault : Bayes minimizing MAE
BestUser : pearson similarity, 1500 neighbors, prediction
using deviation from the mean
BestItem : probabilistic similarity, 400 neighbors,
prediction using deviation from the mean
BestCluster : K-means, euclidian distance, 4 clusters,
prediction using Bayes minimizing MAE
12. Conclusions
Introduction
Collaborative
All approaches, and all their possible options, are tested
approaches
under exactly the same conditions
Experiments
Bayes is a good compromise : low error rate, low
Conclusions
execution time, incremental
Deviation from the mean : better results, new for
item-based approaches
Similarity measures : pearson for user-based, probabilistic
for item-based
13. Conclusions
The item-based approach
Introduction
Collaborative
get the best performances in the experiments
approaches
seems to need fewer neighbors than user-based approach
Experiments
Conclusions
is also appropriate to navigate in item catalogues even
with no user information
may naturally use content data about items to improve its
results (idem for user-based approach with demographic
data)
results depend on the number of items compared to the
number of users ?
14. Next
Need to scale well even when faced with huge datasets
Introduction
ex : netflix prize : 100,480,507 ratings from 480,189 users on
Collaborative
approaches
17,770 movies
Experiments
select most relevant users [Yu et al., 2002]
Conclusions
reduce dimensionality with PCA or SVD
[Goldberg et al., 2001, Vozalis and Margaritis, 2005]
create a set of super-users [Rashid et al., 2006]
sampling ? stochastic ? bagging ?
Combine approaches ⇒ ensemble methods [Polikar, 2006]
15. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J.
Riedl (1994)
Grouplens: an open architecture for collaborative filtering
Introduction
of netnews
Collaborative
approaches
In Conference on Computer Supported Cooperative Work,
Experiments
pages 175–186. ACM
Conclusions
J. Breese, D. Heckerman and C. Kadie (1998)
Empirical analysis of predictive algorithms for collaborative
filtering
In 14th Conference on Uncertainty in Artificial Intelligence,
pages 43–52. Morgan Kaufman
G. Karypis (2001)
Evaluation of item-based top-N recommendation
algorithms
16. In 10th International Conference on Information and
Knowledge Management, pages 247–254
K. Goldberg, T. Roeder, D. Gupta and C. Perkins (2001)
Introduction
Eigentaste: a constant time collaborative filtering
Collaborative
approaches
algorithm
Experiments
Information Retrieval, 4(2):133–151
Conclusions
K. Yu, X. Xu, J. Tao, M. Ester and H. Kriegel (2002)
Instance selection techniques for memory-based
collaborative filtering
In SIAM Data Mining
J. Herlocker, J. Konstan, L. Terveen and J. Riedl (2004)
Evaluating collaborative filtering recommender systems
ACM Transactions on Information Systems, 22(1):5–53
G. Adomavicius and A. Tuzhilin (2005)
17. Toward the next generation of recommender systems: a
survey of the state-of-the-art and possible extensions
IEEE Transactions on Knowledge and Data Engineering,
Introduction
17(6):734–749
Collaborative
approaches
M. Vozalis and K. Margaritis (2005)
Experiments
Applying SVD on item-based filtering
Conclusions
In 5th International Conference on Intelligent Systems
Design and Applications, pages 464–469
A.M. Rashid, S.K. Lam, G. Karypis and J. Riedl (2006)
ClustKNN: a highly scalable hybrid model- &
memory-based CF algorithm
In KDD Workshop on Web Mining and Web Usage Analysis
R. Polikar (2006)
Ensemble systems in decision making
IEEE Circuits & Systems Magazine, 6(3):21–45