SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Improving Memory-Based
Collaborative Filtering
by Neighbour Selection based on
User Preference Overlap
Alejandro Bellogín*, Pablo Castells, Iván Cantador
Information Retrieval Group – Universidad Autónoma de Madrid
*Information Access Group – Centrum Wiskunde & Informatica
2
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Recommender Systems
 A recommender system aims to find and suggest items of
likely interest based on the users’ preferences
 Examples:
• Amazon – products
• Netflix – tv shows and movies
• LinkedIn – jobs and colleagues
• Last.fm – music artists and tracks
3
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Collaborative Filtering
“You may like classical music if you like heavy metal”
4
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Collaborative Filtering (in real world)
5
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Neighbours are selected according to how similar they are
Neighbour selection
6
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Neighbour-based collaborative filtering
 Depend on user similarity metrics
• Pearson’s correlation
• Spearman’s correlation
• Cosine
 Advantages
• Simplicity
• Intuitive
• Efficiency
 Disadvantages
• Lower accuracy than other methods
• Limited coverage
7
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Research question
Can we find a good surrogate of
user similarity for neighbour selection?
 We need an alternative of similarity which provides equivalent
(or better!) results
 We focus on user preference overlap
8
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Relation between similarity and overlap
   I u I v
 P earso n 's ,co r u v 
         
         
2 2
, ,
sim ,
, ,
i
i i
r u i r u r v i r v
u v
r u i r u r v i r v
 

 

 
9
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Relation between similarity and overlap
 The larger the overlap, the more likely the users would agree on
their ratings
10
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Relation between similarity and overlap
 The larger the overlap, the more likely the users would agree on
their ratings
 Low (negative) similarity values obtained with small overlap sizes
11
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Relation between similarity and overlap
 The larger the overlap, the more likely the users would agree on
their ratings
 Low (negative) similarity values obtained with small overlap sizes
 Highest similarity values obtained by users with tiny overlap
• This is coincidental, such similarities have low reliability
12
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
 User-based collaborative filtering:
 Select top-k neighbours V according to
• Similarity
• Intersection (overlap)
• Herlocker’s weighting
• McLaughlin’s weighting
     r , sim , r ,
v V
u i u v v i

 
 
    
 
    
m in ,
H ,
m ax ,
M L ,
I u I v n
u v
n
I u I v n
u v
n


13
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
 Performance comparison
• RMSE: error-based evaluation (the lower, the better)
• P@10: precision-based evaluation (the higher the better)
    
 
2
,
1
R M S E , ,
u i T
r u i r u i
T 
 
R el T opN
P @ N
N

14
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
 Performance comparison
• RMSE: error-based evaluation (the lower, the better)
• P@10: precision-based evaluation (the higher the better)
↑Performance as good (or better) than the baseline
↓Lower coverage for some methods
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.0000
1.0000
10 20 50 100 200 1000 2000
P@
10
Neighbourhood size
AR
Similarity filtering Intersection filtering
Herlocker filtering McLaughlin filtering
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
15
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
 Performance comparison: different values for n
0.0000
1.0000
10 20 50 100 200 1000 2000
P@
10
Neighbourhood size
ARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 10
 
    
 
    
m in ,
H ,
m ax ,
M L ,
I u I v n
u v
n
I u I v n
u v
n


0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.40
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 100
16
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
 Performance comparison: different values for n
0.0000
1.0000
10 20 50 100 200 1000 2000
P@
10
Neighbourhood size
ARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 10
 
    
 
    
m in ,
H ,
m ax ,
M L ,
I u I v n
u v
n
I u I v n
u v
n


0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.40
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 100
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 20
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 80
17
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
 Performance comparison: different values for n
0.0000
1.0000
10 20 50 100 200 1000 2000
P@
10
Neighbourhood size
ARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 10
 
    
 
    
m in ,
H ,
m ax ,
M L ,
I u I v n
u v
n
I u I v n
u v
n


0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.40
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 100
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 20
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 80
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 40
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 60
18
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Conclusions
 We compare three approaches based on user preference overlap
• Intersection
• Herlocker’s weighting
• McLaughlin’s weighting
 Overlap-based approaches improve accuracy (RMSE) and precision
(P@10)
• Especially, for small neighbourhoods
• Stable with respect to the threshold parameter n
19
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Future work
 Extend our analysis to other similarity functions
• Cosine
• Spearman
 Exploit other recommendation input dimensions
• User logs
• Social networks
 Understand why the overlap is special to improve other
recommendation algorithms
20
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Thank you!
Improving Memory-Based Collaborative Filtering
by Neighbour Selection Based on
User Preference Overlap
Alejandro Bellogín, Pablo Castells, Iván Cantador
@abellogin
alejandro.bellogin@cwi.nl
21
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
User overlap for neighbour selection
 Performance comparison: different values for n
0.0000
1.0000
10 20 50 100 200 1000 2000
P@
10
Neighbourhood size
ARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.40
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 100
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 80
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 60
0.00
0.10
0.20
0.30
0.40
0.50
0.60
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 40
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 20
0.00
0.05
0.10
0.15
0.20
0.25
10 20 50 100 200 1000 2000
P@10
Neighbourhood size
0.55
0.70
0.85
1.00
1.15
1.30
1.45
10 20 50 100 200 1000 2000
RMSE
Neighbourhood size
0.00001.0000
10 20 50 100 200 1000 2000
P
@
1
0
Neighbourhood sizeAR
Similarity filtering Trust filtering Intersection filtering
Herlocker filtering McLaughlin filtering
n = 10
22
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Experiments
 Dataset: MovieLens 1M
• Users: 6,040
• Items: 3,600
• Number of ratings: 1 million
• Available in http://www.grouplens.org/node/73
23
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
Neighbour selection in collaborative filtering
 Two main alternatives
• Top-k most similar users
• Thresholding: those users whose similarity is above a threshold
In [Herlocker et al, 2002]: thresholding obtains worse error accuracy
 Compatible with
• Clustering of users: neighbourhood ~ users in the same cluster
[Xue et al, 2005; Bellogin & Parapar, 2013]
• Probabilistic relevance models: p(v | Ru)
[Submitted to RecSys ‘13]
• Rating normalisation
[Desrosiers & Karypis, 2012]
24
IRGIR Group @ UAM
Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín
Open Research Areas in Information Retrieval (OAIR 2013)
References
 [Bellogin & Parapar, 2013] Using Graph Partitioning Techniques for
Neighbour Selection in User-Based Collaborative Filtering. RecSys
 [Desrosiers & Karypis, 2012] A Comprehensive Survey of
Neighborhood-based Recommendation Methods. Recommender
Systems Handbook
 [Herlocker et al, 2002] An Empirical Analysis of Design Choices in
Neighborhood-Based Collaborative Filtering Algorithms.
Information Retrieval Journal
 [Xue et al, 2005] Scalable Collaborative Filtering Using Cluster-Based
Smoothing. SIGIR

Mais conteúdo relacionado

Semelhante a Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
Dave King
 

Semelhante a Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap (20)

Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
 
Understanding Similarity Metrics in Neighbour-based Recommender Systems
Understanding Similarity Metrics in Neighbour-based Recommender SystemsUnderstanding Similarity Metrics in Neighbour-based Recommender Systems
Understanding Similarity Metrics in Neighbour-based Recommender Systems
 
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
 
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
 
typicality-based collaborative filtering recommendation
typicality-based collaborative filtering recommendationtypicality-based collaborative filtering recommendation
typicality-based collaborative filtering recommendation
 
Yelper Helper Concept
Yelper Helper ConceptYelper Helper Concept
Yelper Helper Concept
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
 
Recommendation Systems Roadtrip
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems Roadtrip
 
Telefonica Lunch Seminar
Telefonica Lunch SeminarTelefonica Lunch Seminar
Telefonica Lunch Seminar
 
Icdec2020_presentation_slides_13
Icdec2020_presentation_slides_13Icdec2020_presentation_slides_13
Icdec2020_presentation_slides_13
 
Social Recommender Systems
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systems
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
 
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
 
Collaborative filtering hyoungtae cho
Collaborative filtering hyoungtae choCollaborative filtering hyoungtae cho
Collaborative filtering hyoungtae cho
 
The Wisdom of the Few @SIGIR09
The Wisdom of the Few @SIGIR09The Wisdom of the Few @SIGIR09
The Wisdom of the Few @SIGIR09
 
Recommendation Systems with R
Recommendation Systems with RRecommendation Systems with R
Recommendation Systems with R
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross Entropy
 

Mais de Alejandro Bellogin

Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...
Alejandro Bellogin
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Alejandro Bellogin
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Alejandro Bellogin
 

Mais de Alejandro Bellogin (18)

Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?
 
Revisiting neighborhood-based recommenders for temporal scenarios
Revisiting neighborhood-based recommenders for temporal scenariosRevisiting neighborhood-based recommenders for temporal scenarios
Revisiting neighborhood-based recommenders for temporal scenarios
 
Evaluating decision-aware recommender systems
Evaluating decision-aware recommender systemsEvaluating decision-aware recommender systems
Evaluating decision-aware recommender systems
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Implicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix FactorizationImplicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix Factorization
 
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluationRiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
 
CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013
 
CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013
 
Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?
 
Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
 
Predicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - SlidesPredicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - Slides
 
Predicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slamPredicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slam
 
Predicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - PosterPredicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - Poster
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

  • 1. IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap Alejandro Bellogín*, Pablo Castells, Iván Cantador Information Retrieval Group – Universidad Autónoma de Madrid *Information Access Group – Centrum Wiskunde & Informatica
  • 2. 2 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Recommender Systems  A recommender system aims to find and suggest items of likely interest based on the users’ preferences  Examples: • Amazon – products • Netflix – tv shows and movies • LinkedIn – jobs and colleagues • Last.fm – music artists and tracks
  • 3. 3 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Collaborative Filtering “You may like classical music if you like heavy metal”
  • 4. 4 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Collaborative Filtering (in real world)
  • 5. 5 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Neighbours are selected according to how similar they are Neighbour selection
  • 6. 6 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Neighbour-based collaborative filtering  Depend on user similarity metrics • Pearson’s correlation • Spearman’s correlation • Cosine  Advantages • Simplicity • Intuitive • Efficiency  Disadvantages • Lower accuracy than other methods • Limited coverage
  • 7. 7 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Research question Can we find a good surrogate of user similarity for neighbour selection?  We need an alternative of similarity which provides equivalent (or better!) results  We focus on user preference overlap
  • 8. 8 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Relation between similarity and overlap    I u I v  P earso n 's ,co r u v                      2 2 , , sim , , , i i i r u i r u r v i r v u v r u i r u r v i r v        
  • 9. 9 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Relation between similarity and overlap  The larger the overlap, the more likely the users would agree on their ratings
  • 10. 10 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Relation between similarity and overlap  The larger the overlap, the more likely the users would agree on their ratings  Low (negative) similarity values obtained with small overlap sizes
  • 11. 11 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Relation between similarity and overlap  The larger the overlap, the more likely the users would agree on their ratings  Low (negative) similarity values obtained with small overlap sizes  Highest similarity values obtained by users with tiny overlap • This is coincidental, such similarities have low reliability
  • 12. 12 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) User overlap for neighbour selection  User-based collaborative filtering:  Select top-k neighbours V according to • Similarity • Intersection (overlap) • Herlocker’s weighting • McLaughlin’s weighting      r , sim , r , v V u i u v v i                  m in , H , m ax , M L , I u I v n u v n I u I v n u v n  
  • 13. 13 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) User overlap for neighbour selection  Performance comparison • RMSE: error-based evaluation (the lower, the better) • P@10: precision-based evaluation (the higher the better)        2 , 1 R M S E , , u i T r u i r u i T    R el T opN P @ N N 
  • 14. 14 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) User overlap for neighbour selection  Performance comparison • RMSE: error-based evaluation (the lower, the better) • P@10: precision-based evaluation (the higher the better) ↑Performance as good (or better) than the baseline ↓Lower coverage for some methods 0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.0000 1.0000 10 20 50 100 200 1000 2000 P@ 10 Neighbourhood size AR Similarity filtering Intersection filtering Herlocker filtering McLaughlin filtering 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size
  • 15. 15 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) User overlap for neighbour selection  Performance comparison: different values for n 0.0000 1.0000 10 20 50 100 200 1000 2000 P@ 10 Neighbourhood size ARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering 0.00 0.05 0.10 0.15 0.20 0.25 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 10               m in , H , m ax , M L , I u I v n u v n I u I v n u v n   0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.40 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 100
  • 16. 16 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) User overlap for neighbour selection  Performance comparison: different values for n 0.0000 1.0000 10 20 50 100 200 1000 2000 P@ 10 Neighbourhood size ARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering 0.00 0.05 0.10 0.15 0.20 0.25 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 10               m in , H , m ax , M L , I u I v n u v n I u I v n u v n   0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.40 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 100 0.00 0.05 0.10 0.15 0.20 0.25 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 20 0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 80
  • 17. 17 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) User overlap for neighbour selection  Performance comparison: different values for n 0.0000 1.0000 10 20 50 100 200 1000 2000 P@ 10 Neighbourhood size ARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering 0.00 0.05 0.10 0.15 0.20 0.25 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 10               m in , H , m ax , M L , I u I v n u v n I u I v n u v n   0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.40 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 100 0.00 0.05 0.10 0.15 0.20 0.25 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 20 0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 80 0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 40 0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 60
  • 18. 18 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Conclusions  We compare three approaches based on user preference overlap • Intersection • Herlocker’s weighting • McLaughlin’s weighting  Overlap-based approaches improve accuracy (RMSE) and precision (P@10) • Especially, for small neighbourhoods • Stable with respect to the threshold parameter n
  • 19. 19 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Future work  Extend our analysis to other similarity functions • Cosine • Spearman  Exploit other recommendation input dimensions • User logs • Social networks  Understand why the overlap is special to improve other recommendation algorithms
  • 20. 20 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Thank you! Improving Memory-Based Collaborative Filtering by Neighbour Selection Based on User Preference Overlap Alejandro Bellogín, Pablo Castells, Iván Cantador @abellogin alejandro.bellogin@cwi.nl
  • 21. 21 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) User overlap for neighbour selection  Performance comparison: different values for n 0.0000 1.0000 10 20 50 100 200 1000 2000 P@ 10 Neighbourhood size ARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.40 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 100 0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 80 0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 60 0.00 0.10 0.20 0.30 0.40 0.50 0.60 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 40 0.00 0.05 0.10 0.15 0.20 0.25 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 20 0.00 0.05 0.10 0.15 0.20 0.25 10 20 50 100 200 1000 2000 P@10 Neighbourhood size 0.55 0.70 0.85 1.00 1.15 1.30 1.45 10 20 50 100 200 1000 2000 RMSE Neighbourhood size 0.00001.0000 10 20 50 100 200 1000 2000 P @ 1 0 Neighbourhood sizeAR Similarity filtering Trust filtering Intersection filtering Herlocker filtering McLaughlin filtering n = 10
  • 22. 22 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Experiments  Dataset: MovieLens 1M • Users: 6,040 • Items: 3,600 • Number of ratings: 1 million • Available in http://www.grouplens.org/node/73
  • 23. 23 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) Neighbour selection in collaborative filtering  Two main alternatives • Top-k most similar users • Thresholding: those users whose similarity is above a threshold In [Herlocker et al, 2002]: thresholding obtains worse error accuracy  Compatible with • Clustering of users: neighbourhood ~ users in the same cluster [Xue et al, 2005; Bellogin & Parapar, 2013] • Probabilistic relevance models: p(v | Ru) [Submitted to RecSys ‘13] • Rating normalisation [Desrosiers & Karypis, 2012]
  • 24. 24 IRGIR Group @ UAM Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro Bellogín Open Research Areas in Information Retrieval (OAIR 2013) References  [Bellogin & Parapar, 2013] Using Graph Partitioning Techniques for Neighbour Selection in User-Based Collaborative Filtering. RecSys  [Desrosiers & Karypis, 2012] A Comprehensive Survey of Neighborhood-based Recommendation Methods. Recommender Systems Handbook  [Herlocker et al, 2002] An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms. Information Retrieval Journal  [Xue et al, 2005] Scalable Collaborative Filtering Using Cluster-Based Smoothing. SIGIR