SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Collaborative Filtering
Kira Radinsky
Slides based on material from: William W. Cohen (CMU)
Rate it?
The Dark Star's crew is on a 20-year mission ..but unlike Star Trek... the
nerves of this crew are ... frayed to the point of psychosis. Their captain
has been killed by a radiation leak that also destroyed their toilet paper.
"Don't give me any of that 'Intelligent Life' stuff," says Commander
Doolittle when presented with the possibility of alien life. "Find me
something I can blow up.―...
Examples of Collaborative Filtering
• Bestseller lists
• Top 40 music lists
• The “recent returns” shelf at the library
• Unmarked but well-used paths thru the woods
• The printer room at work
• “Read any good books lately?”
• Common insight: personal tastes are correlated:
– If Alice and Bob both like X and Alice likes Y then Bob is
more likely to like Y
– Does it matter if Bob knows Alice?
• Two schools of thought:
– item/item methods
– user/user (neighborhood) methods.
Algorithms for Collaborative Filtering 1:
Memory-Based Algorithms (Breese et al, UAI98)
• vi,j= vote of user i on item j
• Ii = items for which user i has voted
• Mean vote for i is
• Predicted vote for “active user” a is weighted sum
weights of n similar usersnormalizer
Algorithms for Collaborative Filtering 1:
Memory-Based Algorithms (Breese et al, UAI98)
• K-nearest neighbor
• Pearson correlation coefficient (Resnick ’94,
Grouplens):
• Cosine distance (from IR)


 

else0
)neighbors(if1
),(
ai
iaw
• Cosine with “inverse user frequency” fi = log(n/nj), where n is
number of users, nj is number of users voting for item j
Algorithms for Collaborative Filtering 1:
Memory-Based Algorithms (Breese et al, UAI98)
Algorithms for Collaborative Filtering 1:
Memory-Based Algorithms (Breese et al, UAI98)
• Evaluation:
– split users into train/test sets
– for each user a in the test set:
• split a’s votes into observed (I) and to-predict (P)
• measure average absolute deviation between
predicted and actual votes in P
• predict votes in P, and form a ranked list
• assume (a) utility of k-th item in list is max(va,j-d,0),
where d is a “default vote” (b) probability of reaching
rank k drops exponentially in k. Score a list by its
expected utility Ra
– average Ra over all test users
Algorithms for Collaborative Filtering 1:
Memory-Based Algorithms (Breese et al, UAI98)
soccerscoregolfscore
Why are
these
numbers
worse?
Algorithms for Collaborative Filtering 2:
Collaborative + Content Filtering As Classification
(Basu, Hirsh, Cohen, AAAI98)
Airplane Matrix Room with
a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua
48,M,81k 0 1 ? ? ?
Classification task: map (user,movie) pair into {likes,dislikes}
Training data: known likes/dislikes
Test data: active users
Features: any properties of
user/movie pair
Airplane Matrix Room with
a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua
48,M,81k 0 1 ? ? ?
Features: any properties of
user/movie pair (U,M)
Examples: genre(U,M), age(U,M), income(U,M),...
• genre(Carol,Matrix) = action
• income(Kumar,Hidalgo) = 22k/year
Algorithms for Collaborative Filtering 2:
Collaborative + Content Filtering As Classification
(Basu, Hirsh, Cohen, AAAI98)
Airplane Matrix Room with
a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua
48,M,81k 0 1 ? ? ?
Features: any properties of
user/movie pair (U,M)
Examples: usersWhoLikedMovie(U,M):
• usersWhoLikedMovie(Carol,Hidalgo) = {Joe,...,Kumar}
• usersWhoLikedMovie(Ua, Matrix) = {Joe,...}
Algorithms for Collaborative Filtering 2:
Collaborative + Content Filtering As Classification
(Basu, Hirsh, Cohen, AAAI98)
Airplane Matrix Room with
a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua
48,M,81k 0 1 ? ? ?
Features: any properties of
user/movie pair (U,M)
Examples: moviesLikedByUser(M,U):
• moviesLikedByUser(*,Joe) = {Airplane,Matrix,...,Hidalgo}
• actionMoviesLikedByUser(*,Joe)={Matrix,Hidalgo}
Algorithms for Collaborative Filtering 2:
Collaborative + Content Filtering As Classification
(Basu, Hirsh, Cohen, AAAI98)
Airplane Matrix Room with
a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua
48,M,81k 1 1 ? ? ?
Features: any properties of
user/movie pair (U,M)
genre={romance}, age=48, sex=male, income=81k,
usersWhoLikedMovie={Carol}, moviesLikedByUser={Matrix,Airplane}, ...
Algorithms for Collaborative Filtering 2:
Collaborative + Content Filtering As Classification
(Basu, Hirsh, Cohen, AAAI98)
Airplane Matrix Room with
a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua
48,M,81k 1 1 ? ? ?
genre={romance}, age=48, sex=male, income=81k,
usersWhoLikedMovie={Carol}, moviesLikedByUser={Matrix,Airplane}, ...
genre={action}, age=48, sex=male, income=81k, usersWhoLikedMovie =
{Joe,Kumar}, moviesLikedByUser={Matrix,Airplane},...
Algorithms for Collaborative Filtering 2:
Collaborative + Content Filtering As Classification
(Basu, Hirsh, Cohen, AAAI98)
genre={romance}, age=48, sex=male, income=81k,
usersWhoLikedMovie={Carol}, moviesLikedByUser={Matrix,Airplane}, ...
genre={action}, age=48, sex=male, income=81k, usersWhoLikedMovie =
{Joe,Kumar}, moviesLikedByUser={Matrix,Airplane},...
• Classification learning algorithm: rule learning (RIPPER)
• If NakedGun33/13 moviesLikedByUser and Joe usersWhoLikedMovie
and genre=comedy then predict likes(U,M)
• If age>12 and age<17 and HolyGrail moviesLikedByUser and
director=MelBrooks then predict likes(U,M)
• If Ishtar moviesLikedByUser then predict likes(U,M)




Algorithms for Collaborative Filtering 2:
Collaborative + Content Filtering As Classification
(Basu, Hirsh, Cohen, AAAI98)
• Classification learning algorithm: rule learning (RIPPER)
• If NakedGun33/13 moviesLikedByUser and Joe usersWhoLikedMovie
and genre=comedy then predict likes(U,M)
• If age>12 and age<17 and HolyGrail moviesLikedByUser and
director=MelBrooks then predict likes(U,M)
• If Ishtar moviesLikedByUser then predict likes(U,M)
• Important difference from memory-based approaches:
• again, Ripper builds an explicit model—of how user’s tastes relate items,
and to the tastes of other users



Algorithms for Collaborative Filtering 2:
Collaborative + Content Filtering As Classification
(Basu, Hirsh, Cohen, AAAI98)
• Evaluation:
– Predict liked(U,M)=“M in top quartile of U’s ranking”
from features, evaluate recall and precision
– Features:
• Collaborative: UsersWhoLikedMovie, UsersWhoDislikedMovie,
MoviesLikedByUser
• Content: Actors, Directors, Genre, MPAA rating, ...
• Hybrid: ComediesLikedByUser, DramasLikedByUser,
UsersWhoLikedFewDramas, ...
• Results: at same level of recall (about 33%)
– Ripper with collaborative features only is worse than
the original MovieRecommender (by about 5 pts
precision – 73 vs 78)
– Ripper with hybrid features is better than
MovieRecommender (by about 5 pts precision)
Algorithms for Collaborative Filtering 2:
Collaborative + Content Filtering As Classification
(Basu, Hirsh, Cohen, AAAI98)
Rij
Airplane Matrix Room with
a View
... Hidalgo
Joe 9 7 2 ... 7
Carol 8 ? 9 ... ?
... ... ... ... ... ...
Kumar 9 3 ? ... 6
• Estimate Pr(Rij=k) for each user i, movie j, and rating k
• Use all available data to build model for this estimator
Algorithms for Collaborative Filtering 3: CF
as density estimation (Breese et al, UAI98)
• Estimate Pr(Rij=k) for each user i, movie j, and rating k
• Use all available data to build model for this estimator
• A simple example:
jkRkR
R
ji
kRi
kRj
ij
k
ij
ij
ij
ij
movieofratingaverage)Pr(]E[
:unknownforvalueexpectedthistoLeads
)ratingusers(#
):users(#
)Pr(,movies




Algorithms for Collaborative Filtering 3: CF
as density estimation (Breese et al, UAI98)
• Estimate Pr(Rij=k) for each user i, movie j, and rating k
• Use all available data to build model for this estimator
• More complex example:
• Group users into M ―clusters‖: c(1), ..., c(M)
• For movie j,




m
ij
ij
m
ij
mcjmciR
mcimcikRikR
))(inofratingaverage())(Pr(][E
))(Pr())(|Pr()|Pr(
estimate by counts
Algorithms for Collaborative Filtering 3: CF
as density estimation (Breese et al, UAI98)
CF as density estimation: BC
(Breese et al, UAI98)
• Group users into clusters using Expectation-Maximization:
• Randomly initialize Pr(Rm,j=k) for each m
(i.e., initialize the clusters differently somehow)
• E-Step: Estimate Pr(user i in cluster m) for each i,m
• M-Step: Find maximum likelihood (ML) estimator for Rij within each
cluster m
• Use ratio of #(users i in cluster m with rating Rij=k) to #(user i in
cluster m ), weighted by Pr(i in m) from E-step
• Repeat E-step, M-step until convergence
CF as density estimation: BC
(Breese et al, UAI98)
CF as density estimation: BN
(Breese et al, UAI98)
• BC assumes movie ratings within a cluster are independent.
• Bayes Network approach allows dependencies between ratings, but does not
cluster. (Networks are constructed using greedy search.)
ID4
MIB
JunoJumper
10kBC
Algorithms for Collaborative Filtering 3: Results
(Breese et al, UAI98)
soccerscoregolfscore
Datasets are different...
fewer items to
recommend
fewer votes/user
Results on MS Web & Nielson’ssoccerscoresoccerscore
• Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and
Model-Based Approach, Pennock, Horvitz, Lawrence & Giles, UAI 2000
• Basic ideas:
– assume Gaussian noise applied to all ratings
– treat each user as a separate cluster m
– Pr(user a in cluster i) = w(a,i)



j
RR
j
ijaj
mjaj
e
Z
RR
2
2/)(1
)|Pr(

Algorithms for Collaborative Filtering 4:
Personality Diagnosis (Pennock et al, UAI 2000)
• Evaluation (EachMovie, following Breese et al):
Algorithms for Collaborative Filtering 4:
Personality Diagnosis (Pennock et al, UAI 2000)
Summary
collaborative/social content-based
LIBRA
LIBRA-NR
memory-basedmodel-based
MovieRecommender
VSIM
CR
BN
BC
PD
RIPPER +
hybrid
features
k-NN paper rec.
as matching
RankBoost
(many rounds)
RIPPER
RankBoost
(k rounds)
music rec.
with web pages
(k-NN)
music rec.
with web pages
(XDB)
Other issues, not addressed here
(see lecture)
• We present basic algorithms (10+ years ago)
• Over the last decade there have been many advances,
including algebraic methods (e.g. Matrix Factorization).
• Efficiency issues—how to handle a large community?
• What do we measure when we evaluate CF?
– Predicting actual rating may be useless!
– Example: music recommendations:
• Beatles, Eric Clapton, Stones, Elton John, Led Zep, the Who, ...
– What’s useful and new? for this need model of user’s prior
knowledge, not just his tastes.
• Subjectively better recs result from “poor” distance metrics

Mais conteúdo relacionado

Destaque

Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
 
Tutorial 12 (click models)
Tutorial 12 (click models)Tutorial 12 (click models)
Tutorial 12 (click models)Kira
 
Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)Kira
 
Collaborative filtering at scale
Collaborative filtering at scaleCollaborative filtering at scale
Collaborative filtering at scalehuguk
 
Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Kira
 
Tutorial 7 (link analysis)
Tutorial 7 (link analysis)Tutorial 7 (link analysis)
Tutorial 7 (link analysis)Kira
 
Duplicate detection
Duplicate detectionDuplicate detection
Duplicate detectionjonecx
 
Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)Kira
 
Progressive duplicate detection
Progressive duplicate detectionProgressive duplicate detection
Progressive duplicate detectionieeepondy
 
Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrievalSadaf Rafiq
 
Hypothesis-Based Collaborative Filtering
Hypothesis-Based Collaborative FilteringHypothesis-Based Collaborative Filtering
Hypothesis-Based Collaborative FilteringAmancio Bouza
 
Browsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedInBrowsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedInLili Wu
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative FilteringTayfun Sen
 
Collaborative Filtering in Map/Reduce
Collaborative Filtering in Map/ReduceCollaborative Filtering in Map/Reduce
Collaborative Filtering in Map/ReduceOle-Martin Mørk
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with SparkChris Johnson
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 

Destaque (17)

Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Tutorial 12 (click models)
Tutorial 12 (click models)Tutorial 12 (click models)
Tutorial 12 (click models)
 
Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)Tutorial 2 (mle + language models)
Tutorial 2 (mle + language models)
 
Collaborative filtering at scale
Collaborative filtering at scaleCollaborative filtering at scale
Collaborative filtering at scale
 
Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)
 
Tutorial 7 (link analysis)
Tutorial 7 (link analysis)Tutorial 7 (link analysis)
Tutorial 7 (link analysis)
 
Duplicate detection
Duplicate detectionDuplicate detection
Duplicate detection
 
Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)
 
Progressive duplicate detection
Progressive duplicate detectionProgressive duplicate detection
Progressive duplicate detection
 
Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrieval
 
Hypothesis-Based Collaborative Filtering
Hypothesis-Based Collaborative FilteringHypothesis-Based Collaborative Filtering
Hypothesis-Based Collaborative Filtering
 
Browsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedInBrowsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedIn
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative Filtering
 
Collaborative Filtering in Map/Reduce
Collaborative Filtering in Map/ReduceCollaborative Filtering in Map/Reduce
Collaborative Filtering in Map/Reduce
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 

Semelhante a Tutorial 14 (collaborative filtering)

movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReportSohini Sarkar
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
A hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsA hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsAravindharamanan S
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlightsSandra Garcia
 
How to build recommender system
How to build recommender systemHow to build recommender system
How to build recommender systemMitko Gurbanski
 
Movie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens DatasetMovie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens DatasetJagruti Joshi
 
Week06 bme429-cbir
Week06 bme429-cbirWeek06 bme429-cbir
Week06 bme429-cbirIkram Moalla
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfAlphaIssaghaDiallo
 
A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...Prabhu Kumar
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...Iván Palomares Carrascosa
 
Recommenders Systems
Recommenders SystemsRecommenders Systems
Recommenders SystemsTariq Hassan
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用台灣資料科學年會
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 

Semelhante a Tutorial 14 (collaborative filtering) (20)

Collab filtering-tutorial
Collab filtering-tutorialCollab filtering-tutorial
Collab filtering-tutorial
 
lecture26-mf.pptx
lecture26-mf.pptxlecture26-mf.pptx
lecture26-mf.pptx
 
lecture244-mf.pptx
lecture244-mf.pptxlecture244-mf.pptx
lecture244-mf.pptx
 
movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReport
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
CBIR_white.ppt
CBIR_white.pptCBIR_white.ppt
CBIR_white.ppt
 
A hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsA hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratings
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlights
 
How to build recommender system
How to build recommender systemHow to build recommender system
How to build recommender system
 
Movie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens DatasetMovie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens Dataset
 
Week06 bme429-cbir
Week06 bme429-cbirWeek06 bme429-cbir
Week06 bme429-cbir
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdf
 
A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
 
Recommenders Systems
Recommenders SystemsRecommenders Systems
Recommenders Systems
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 

Mais de Kira

Tutorial 11 (computational advertising)
Tutorial 11 (computational advertising)Tutorial 11 (computational advertising)
Tutorial 11 (computational advertising)Kira
 
Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)Kira
 
Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Kira
 
Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)Kira
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)Kira
 
Tutorial 3 (b tree min heap)
Tutorial 3 (b tree min heap)Tutorial 3 (b tree min heap)
Tutorial 3 (b tree min heap)Kira
 

Mais de Kira (6)

Tutorial 11 (computational advertising)
Tutorial 11 (computational advertising)Tutorial 11 (computational advertising)
Tutorial 11 (computational advertising)
 
Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)Tutorial 10 (computational advertising)
Tutorial 10 (computational advertising)
 
Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Tutorial 8 (web graph models)
Tutorial 8 (web graph models)
 
Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Tutorial 3 (b tree min heap)
Tutorial 3 (b tree min heap)Tutorial 3 (b tree min heap)
Tutorial 3 (b tree min heap)
 

Último

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Tutorial 14 (collaborative filtering)

  • 1. Collaborative Filtering Kira Radinsky Slides based on material from: William W. Cohen (CMU)
  • 2. Rate it? The Dark Star's crew is on a 20-year mission ..but unlike Star Trek... the nerves of this crew are ... frayed to the point of psychosis. Their captain has been killed by a radiation leak that also destroyed their toilet paper. "Don't give me any of that 'Intelligent Life' stuff," says Commander Doolittle when presented with the possibility of alien life. "Find me something I can blow up.―...
  • 3. Examples of Collaborative Filtering • Bestseller lists • Top 40 music lists • The “recent returns” shelf at the library • Unmarked but well-used paths thru the woods • The printer room at work • “Read any good books lately?” • Common insight: personal tastes are correlated: – If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y – Does it matter if Bob knows Alice? • Two schools of thought: – item/item methods – user/user (neighborhood) methods.
  • 4. Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98) • vi,j= vote of user i on item j • Ii = items for which user i has voted • Mean vote for i is • Predicted vote for “active user” a is weighted sum weights of n similar usersnormalizer
  • 5. Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98) • K-nearest neighbor • Pearson correlation coefficient (Resnick ’94, Grouplens): • Cosine distance (from IR)      else0 )neighbors(if1 ),( ai iaw
  • 6. • Cosine with “inverse user frequency” fi = log(n/nj), where n is number of users, nj is number of users voting for item j Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98)
  • 7. Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98) • Evaluation: – split users into train/test sets – for each user a in the test set: • split a’s votes into observed (I) and to-predict (P) • measure average absolute deviation between predicted and actual votes in P • predict votes in P, and form a ranked list • assume (a) utility of k-th item in list is max(va,j-d,0), where d is a “default vote” (b) probability of reaching rank k drops exponentially in k. Score a list by its expected utility Ra – average Ra over all test users
  • 8. Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98) soccerscoregolfscore Why are these numbers worse?
  • 9. Algorithms for Collaborative Filtering 2: Collaborative + Content Filtering As Classification (Basu, Hirsh, Cohen, AAAI98) Airplane Matrix Room with a View ... Hidalgo comedy action romance ... action Joe 27,M,70k 1 1 0 1 Carol 53,F,20k 1 1 0 ... Kumar 25,M,22k 1 0 0 1 Ua 48,M,81k 0 1 ? ? ? Classification task: map (user,movie) pair into {likes,dislikes} Training data: known likes/dislikes Test data: active users Features: any properties of user/movie pair
  • 10. Airplane Matrix Room with a View ... Hidalgo comedy action romance ... action Joe 27,M,70k 1 1 0 1 Carol 53,F,20k 1 1 0 ... Kumar 25,M,22k 1 0 0 1 Ua 48,M,81k 0 1 ? ? ? Features: any properties of user/movie pair (U,M) Examples: genre(U,M), age(U,M), income(U,M),... • genre(Carol,Matrix) = action • income(Kumar,Hidalgo) = 22k/year Algorithms for Collaborative Filtering 2: Collaborative + Content Filtering As Classification (Basu, Hirsh, Cohen, AAAI98)
  • 11. Airplane Matrix Room with a View ... Hidalgo comedy action romance ... action Joe 27,M,70k 1 1 0 1 Carol 53,F,20k 1 1 0 ... Kumar 25,M,22k 1 0 0 1 Ua 48,M,81k 0 1 ? ? ? Features: any properties of user/movie pair (U,M) Examples: usersWhoLikedMovie(U,M): • usersWhoLikedMovie(Carol,Hidalgo) = {Joe,...,Kumar} • usersWhoLikedMovie(Ua, Matrix) = {Joe,...} Algorithms for Collaborative Filtering 2: Collaborative + Content Filtering As Classification (Basu, Hirsh, Cohen, AAAI98)
  • 12. Airplane Matrix Room with a View ... Hidalgo comedy action romance ... action Joe 27,M,70k 1 1 0 1 Carol 53,F,20k 1 1 0 ... Kumar 25,M,22k 1 0 0 1 Ua 48,M,81k 0 1 ? ? ? Features: any properties of user/movie pair (U,M) Examples: moviesLikedByUser(M,U): • moviesLikedByUser(*,Joe) = {Airplane,Matrix,...,Hidalgo} • actionMoviesLikedByUser(*,Joe)={Matrix,Hidalgo} Algorithms for Collaborative Filtering 2: Collaborative + Content Filtering As Classification (Basu, Hirsh, Cohen, AAAI98)
  • 13. Airplane Matrix Room with a View ... Hidalgo comedy action romance ... action Joe 27,M,70k 1 1 0 1 Carol 53,F,20k 1 1 0 ... Kumar 25,M,22k 1 0 0 1 Ua 48,M,81k 1 1 ? ? ? Features: any properties of user/movie pair (U,M) genre={romance}, age=48, sex=male, income=81k, usersWhoLikedMovie={Carol}, moviesLikedByUser={Matrix,Airplane}, ... Algorithms for Collaborative Filtering 2: Collaborative + Content Filtering As Classification (Basu, Hirsh, Cohen, AAAI98)
  • 14. Airplane Matrix Room with a View ... Hidalgo comedy action romance ... action Joe 27,M,70k 1 1 0 1 Carol 53,F,20k 1 1 0 ... Kumar 25,M,22k 1 0 0 1 Ua 48,M,81k 1 1 ? ? ? genre={romance}, age=48, sex=male, income=81k, usersWhoLikedMovie={Carol}, moviesLikedByUser={Matrix,Airplane}, ... genre={action}, age=48, sex=male, income=81k, usersWhoLikedMovie = {Joe,Kumar}, moviesLikedByUser={Matrix,Airplane},... Algorithms for Collaborative Filtering 2: Collaborative + Content Filtering As Classification (Basu, Hirsh, Cohen, AAAI98)
  • 15. genre={romance}, age=48, sex=male, income=81k, usersWhoLikedMovie={Carol}, moviesLikedByUser={Matrix,Airplane}, ... genre={action}, age=48, sex=male, income=81k, usersWhoLikedMovie = {Joe,Kumar}, moviesLikedByUser={Matrix,Airplane},... • Classification learning algorithm: rule learning (RIPPER) • If NakedGun33/13 moviesLikedByUser and Joe usersWhoLikedMovie and genre=comedy then predict likes(U,M) • If age>12 and age<17 and HolyGrail moviesLikedByUser and director=MelBrooks then predict likes(U,M) • If Ishtar moviesLikedByUser then predict likes(U,M)     Algorithms for Collaborative Filtering 2: Collaborative + Content Filtering As Classification (Basu, Hirsh, Cohen, AAAI98)
  • 16. • Classification learning algorithm: rule learning (RIPPER) • If NakedGun33/13 moviesLikedByUser and Joe usersWhoLikedMovie and genre=comedy then predict likes(U,M) • If age>12 and age<17 and HolyGrail moviesLikedByUser and director=MelBrooks then predict likes(U,M) • If Ishtar moviesLikedByUser then predict likes(U,M) • Important difference from memory-based approaches: • again, Ripper builds an explicit model—of how user’s tastes relate items, and to the tastes of other users    Algorithms for Collaborative Filtering 2: Collaborative + Content Filtering As Classification (Basu, Hirsh, Cohen, AAAI98)
  • 17. • Evaluation: – Predict liked(U,M)=“M in top quartile of U’s ranking” from features, evaluate recall and precision – Features: • Collaborative: UsersWhoLikedMovie, UsersWhoDislikedMovie, MoviesLikedByUser • Content: Actors, Directors, Genre, MPAA rating, ... • Hybrid: ComediesLikedByUser, DramasLikedByUser, UsersWhoLikedFewDramas, ... • Results: at same level of recall (about 33%) – Ripper with collaborative features only is worse than the original MovieRecommender (by about 5 pts precision – 73 vs 78) – Ripper with hybrid features is better than MovieRecommender (by about 5 pts precision) Algorithms for Collaborative Filtering 2: Collaborative + Content Filtering As Classification (Basu, Hirsh, Cohen, AAAI98)
  • 18. Rij Airplane Matrix Room with a View ... Hidalgo Joe 9 7 2 ... 7 Carol 8 ? 9 ... ? ... ... ... ... ... ... Kumar 9 3 ? ... 6 • Estimate Pr(Rij=k) for each user i, movie j, and rating k • Use all available data to build model for this estimator Algorithms for Collaborative Filtering 3: CF as density estimation (Breese et al, UAI98)
  • 19. • Estimate Pr(Rij=k) for each user i, movie j, and rating k • Use all available data to build model for this estimator • A simple example: jkRkR R ji kRi kRj ij k ij ij ij ij movieofratingaverage)Pr(]E[ :unknownforvalueexpectedthistoLeads )ratingusers(# ):users(# )Pr(,movies     Algorithms for Collaborative Filtering 3: CF as density estimation (Breese et al, UAI98)
  • 20. • Estimate Pr(Rij=k) for each user i, movie j, and rating k • Use all available data to build model for this estimator • More complex example: • Group users into M ―clusters‖: c(1), ..., c(M) • For movie j,     m ij ij m ij mcjmciR mcimcikRikR ))(inofratingaverage())(Pr(][E ))(Pr())(|Pr()|Pr( estimate by counts Algorithms for Collaborative Filtering 3: CF as density estimation (Breese et al, UAI98)
  • 21. CF as density estimation: BC (Breese et al, UAI98) • Group users into clusters using Expectation-Maximization: • Randomly initialize Pr(Rm,j=k) for each m (i.e., initialize the clusters differently somehow) • E-Step: Estimate Pr(user i in cluster m) for each i,m • M-Step: Find maximum likelihood (ML) estimator for Rij within each cluster m • Use ratio of #(users i in cluster m with rating Rij=k) to #(user i in cluster m ), weighted by Pr(i in m) from E-step • Repeat E-step, M-step until convergence
  • 22. CF as density estimation: BC (Breese et al, UAI98)
  • 23. CF as density estimation: BN (Breese et al, UAI98) • BC assumes movie ratings within a cluster are independent. • Bayes Network approach allows dependencies between ratings, but does not cluster. (Networks are constructed using greedy search.) ID4 MIB JunoJumper 10kBC
  • 24. Algorithms for Collaborative Filtering 3: Results (Breese et al, UAI98) soccerscoregolfscore
  • 25. Datasets are different... fewer items to recommend fewer votes/user
  • 26. Results on MS Web & Nielson’ssoccerscoresoccerscore
  • 27. • Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and Model-Based Approach, Pennock, Horvitz, Lawrence & Giles, UAI 2000 • Basic ideas: – assume Gaussian noise applied to all ratings – treat each user as a separate cluster m – Pr(user a in cluster i) = w(a,i)    j RR j ijaj mjaj e Z RR 2 2/)(1 )|Pr(  Algorithms for Collaborative Filtering 4: Personality Diagnosis (Pennock et al, UAI 2000)
  • 28. • Evaluation (EachMovie, following Breese et al): Algorithms for Collaborative Filtering 4: Personality Diagnosis (Pennock et al, UAI 2000)
  • 29. Summary collaborative/social content-based LIBRA LIBRA-NR memory-basedmodel-based MovieRecommender VSIM CR BN BC PD RIPPER + hybrid features k-NN paper rec. as matching RankBoost (many rounds) RIPPER RankBoost (k rounds) music rec. with web pages (k-NN) music rec. with web pages (XDB)
  • 30. Other issues, not addressed here (see lecture) • We present basic algorithms (10+ years ago) • Over the last decade there have been many advances, including algebraic methods (e.g. Matrix Factorization). • Efficiency issues—how to handle a large community? • What do we measure when we evaluate CF? – Predicting actual rating may be useless! – Example: music recommendations: • Beatles, Eric Clapton, Stones, Elton John, Led Zep, the Who, ... – What’s useful and new? for this need model of user’s prior knowledge, not just his tastes. • Subjectively better recs result from “poor” distance metrics