Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap
1. IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Improving Memory-Based
Collaborative Filtering
by Neighbour Selection based on
User Preference Overlap
Alejandro Bellogín*, Pablo Castells, Iván Cantador
Information Retrieval Group – Universidad Autónoma de Madrid
*Information Access Group – Centrum Wiskunde & Informatica
2. 2
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Recommender Systems
A recommender system aims to find and suggest items of
likely interest based on the users’ preferences
Examples:
• Amazon – products
• Netflix – tv shows and movies
• LinkedIn – jobs and colleagues
• Last.fm – music artists and tracks
3. 3
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Collaborative Filtering
“You may like classical music if you like heavy metal”
4. 4
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Collaborative Filtering (in real world)
5. 5
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Neighbours are selected according to how similar they are
Neighbour selection
6. 6
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Neighbour-based collaborative filtering
Depend on user similarity metrics
• Pearson’s correlation
• Spearman’s correlation
• Cosine
Advantages
• Simplicity
• Intuitive
• Efficiency
Disadvantages
• Lower accuracy than other methods
• Limited coverage
7. 7
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Research question
Can we find a good surrogate of
user similarity for neighbour selection?
We need an alternative of similarity which provides equivalent
(or better!) results
We focus on user preference overlap
8. 8
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Relation between similarity and overlap
I u I v
P earso n 's ,co r u v
2 2
, ,
sim ,
, ,
i
i i
r u i r u r v i r v
u v
r u i r u r v i r v
9. 9
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Relation between similarity and overlap
The larger the overlap, the more likely the users would agree on
their ratings
10. 10
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Relation between similarity and overlap
The larger the overlap, the more likely the users would agree on
their ratings
Low (negative) similarity values obtained with small overlap sizes
11. 11
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Relation between similarity and overlap
The larger the overlap, the more likely the users would agree on
their ratings
Low (negative) similarity values obtained with small overlap sizes
Highest similarity values obtained by users with tiny overlap
• This is coincidental, such similarities have low reliability
12. 12
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
User-based collaborative filtering:
Select top-k neighbours V according to
• Similarity
• Intersection (overlap)
• Herlocker’s weighting
• McLaughlin’s weighting
r , sim , r ,
v V
u i u v v i
m in ,
H ,
m ax ,
M L ,
I u I v n
u v
n
I u I v n
u v
n
13. 13
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
Performance comparison
• RMSE: error-based evaluation (the lower, the better)
• P@10: precision-based evaluation (the higher the better)
2
,
1
R M S E , ,
u i T
r u i r u i
T
R el T opN
P @ N
N
14. 14
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
Performance comparison
• RMSE: error-based evaluation (the lower, the better)
• P@10: precision-based evaluation (the higher the better)
↑Performance as good (or better) than the baseline
↓Lower coverage for some methods
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.0000
1.0000
10 20 50 100 200 1000 2000
P@
10
Neighbourhood size
AR
Similarity filtering Intersection filtering
Herlocker filtering McLaughlin filtering
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
15. 15
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
Performance comparison: different values for n
0.0000
1.0000
10 20 50 100 200 1000 2000
P@
10
Neighbourhood size
ARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 10
m in ,
H ,
m ax ,
M L ,
I u I v n
u v
n
I u I v n
u v
n
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.40
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 100
16. 16
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
Performance comparison: different values for n
0.0000
1.0000
10 20 50 100 200 1000 2000
P@
10
Neighbourhood size
ARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 10
m in ,
H ,
m ax ,
M L ,
I u I v n
u v
n
I u I v n
u v
n
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.40
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 100
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 20
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 80
17. 17
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
Performance comparison: different values for n
0.0000
1.0000
10 20 50 100 200 1000 2000
P@
10
Neighbourhood size
ARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 10
m in ,
H ,
m ax ,
M L ,
I u I v n
u v
n
I u I v n
u v
n
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.40
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 100
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 20
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 80
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 40
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 60
18. 18
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Conclusions
We compare three approaches based on user preference overlap
• Intersection
• Herlocker’s weighting
• McLaughlin’s weighting
Overlap-based approaches improve accuracy (RMSE) and precision
(P@10)
• Especially, for small neighbourhoods
• Stable with respect to the threshold parameter n
19. 19
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Future work
Extend our analysis to other similarity functions
• Cosine
• Spearman
Exploit other recommendation input dimensions
• User logs
• Social networks
Understand why the overlap is special to improve other
recommendation algorithms
20. 20
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Thank you!
Improving Memory-Based Collaborative Filtering
by Neighbour Selection Based on
User Preference Overlap
Alejandro Bellogín, Pablo Castells, Iván Cantador
@abellogin
alejandro.bellogin@cwi.nl
22. 22
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Experiments
Dataset: MovieLens 1M
• Users: 6,040
• Items: 3,600
• Number of ratings: 1 million
• Available in http://www.grouplens.org/node/73
23. 23
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Neighbour selection in collaborative filtering
Two main alternatives
• Top-k most similar users
• Thresholding: those users whose similarity is above a threshold
In [Herlocker et al, 2002]: thresholding obtains worse error accuracy
Compatible with
• Clustering of users: neighbourhood ~ users in the same cluster
[Xue et al, 2005; Bellogin & Parapar, 2013]
• Probabilistic relevance models: p(v | Ru)
[Submitted to RecSys ‘13]
• Rating normalisation
[Desrosiers & Karypis, 2012]
24. 24
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
References
[Bellogin & Parapar, 2013] Using Graph Partitioning Techniques for
Neighbour Selection in User-Based Collaborative Filtering. RecSys
[Desrosiers & Karypis, 2012] A Comprehensive Survey of
Neighborhood-based Recommendation Methods. Recommender
Systems Handbook
[Herlocker et al, 2002] An Empirical Analysis of Design Choices in
Neighborhood-Based Collaborative Filtering Algorithms.
Information Retrieval Journal
[Xue et al, 2005] Scalable Collaborative Filtering Using Cluster-Based
Smoothing. SIGIR