SlideShare uma empresa Scribd logo
1 de 46
1 
Recommender Systems 
Collaborative Filtering & 
Content-Based Recommending
2 
Recommender Systems 
• Systems for recommending items (e.g. books, 
movies, CD’s, web pages, newsgroup messages) 
to users based on examples of their preferences. 
• Many on-line stores provide recommendations 
(e.g. Amazon, CDNow). 
• Recommenders have been shown to substantially 
increase sales at on-line stores. 
• There are two basic approaches to recommending: 
– Collaborative Filtering (a.k.a. social filtering) 
– Content-based
3 
Book Recommender 
Red 
Mars 
Found 
ation 
Juras-sic 
Park 
Lost 
World 
2001 
Differ-ence 
Engine 
Machine 
Learning 
User 
Profile 
Neuro-mancer 
2010
4 
Personalization 
• Recommenders are instances of personalization 
software. 
• Personalization concerns adapting to the individual 
needs, interests, and preferences of each user. 
• Includes: 
– Recommending 
– Filtering 
– Predicting (e.g. form or calendar appt. completion) 
• From a business perspective, it is viewed as part of 
Customer Relationship Management (CRM).
5 
Machine Learning and Personalization 
• Machine Learning can allow learning a user 
model or profile of a particular user based 
on: 
– Sample interaction 
– Rated examples 
• This model or profile can then be used to: 
– Recommend items 
– Filter information 
– Predict behavior
6 
Collaborative Filtering 
• Maintain a database of many users’ ratings of a 
variety of items. 
• For a given user, find other similar users whose 
ratings strongly correlate with the current user. 
• Recommend items rated highly by these similar 
users, but not rated by the current user. 
• Almost all existing commercial recommenders use 
this approach (e.g. Amazon).
7 
Collaborative Filtering 
A 9 
B 3 
C: 
: 
Z 5 
A 
B 
C 9 
: : 
Z 10 
A 5 
B 3 
C: 
: 
Z 7 
A 
B 
C 8 
: : 
Z 
A 6 
B 4 
C: 
: 
Z 
A 10 
B 4 
C 8 
. . 
Z 1 
User 
Database 
Active 
User 
Correlation 
Match 
A 9 
B 3 
C 
. . 
Z 5 
A 9 
B 3 
C: 
: 
Z 5 
A 10 
B 4 
C 8 
. . 
Z 1 
Extract 
Recommendations 
C
8 
Collaborative Filtering Method 
• Weight all users with respect to similarity 
with the active user. 
• Select a subset of the users (neighbors) to 
use as predictors. 
• Normalize ratings and compute a prediction 
from a weighted combination of the 
selected neighbors’ ratings. 
• Present items with highest predicted ratings 
as recommendations.
9 
Similarity Weighting 
• Typically use Pearson correlation coefficient between 
ratings for active user, a, and another user, u. 
c r r 
covar( , ) 
a u 
ra ru 
a u 
s s 
, = 
ra and ru are the ratings vectors for the m items rated by 
both a and u 
ri,j is user i’s rating for item j
10 
Covariance and Standard Deviation 
• Covariance: 
r r r r 
r 
• Standard Deviation: 
m 
r r 
m 
i 
a i a u i u 
a u 
å= 
- - 
= 1 
, , ( )( ) 
covar( , ) 
m 
r 
m 
i 
x i 
x 
å= 
= 1 
, 
r r 
m 
m 
i 
x i x 
rx 
å= 
- 
= 1 
2 
, ( ) 
s
11 
Significance Weighting 
• Important not to trust correlations based on 
very few co-rated items. 
• Include significance weights, sa,u, based on 
number of co-rated items, m. 
a u a u a u w s c , , , = 
ïþ 
ïý ü 
ïî 
ïí ì 
> 
1 if 50 
, m m 
= if £ 
50 
50 
m 
s a u
12 
Neighbor Selection 
• For a given active user, a, select correlated 
users to serve as source of predictions. 
• Standard approach is to use the most similar 
n users, u, based on similarity weights, wa,u 
• Alternate approach is to include all users 
whose similarity weight is above a given 
threshold.
13 
Rating Prediction 
• Predict a rating, pa,i, for each item i, for active user, a, by 
using the n selected neighbor users, u Î {1,2,…n}. 
• To account for users different ratings levels, base 
predictions on differences from a user’s average rating. 
• Weight users’ ratings contribution by their similarity to 
the active user. 
w r r 
å 
å 
= 
= 
- 
= + n 
u 
a u 
n 
u 
a u u i u 
a i a 
w 
p r 
1 
, 
1 
, , 
, 
( )
14 
Problems with Collaborative Filtering 
• Cold Start: There needs to be enough other users 
already in the system to find a match. 
• Sparsity: If there are many items to be 
recommended, even if there are many users, the 
user/ratings matrix is sparse, and it is hard to find 
users that have rated the same items. 
• First Rater: Cannot recommend an item that has 
not been previously rated. 
– New items 
– Esoteric items 
• Popularity Bias: Cannot recommend items to 
someone with unique tastes. 
– Tends to recommend popular items.
15 
Content-Based Recommending 
• Recommendations are based on information on the 
content of items rather than on other users’ 
opinions. 
• Uses a machine learning algorithm to induce a 
profile of the users preferences from examples 
based on a featural description of content. 
• Some previous applications: 
– Newsweeder (Lang, 1995) 
– Syskill and Webert (Pazzani et al., 1996)
Advantages of Content-Based Approach 
• No need for data on other users. 
– No cold-start or sparsity problems. 
• Able to recommend to users with unique tastes. 
• Able to recommend new and unpopular items 
– No first-rater problem. 
• Can provide explanations of recommended 
items by listing content-features that caused an 
item to be recommended. 
16
17 
Disadvantages of Content-Based Method 
• Requires content that can be encoded as 
meaningful features. 
• Users’ tastes must be represented as a 
learnable function of these content features. 
• Unable to exploit quality judgments of 
other users. 
– Unless these are somehow included in the 
content features.
18 
LIBRA 
Learning Intelligent Book Recommending Agent 
• Content-based recommender for books using 
information about titles extracted from Amazon. 
• Uses information extraction from the web to 
organize text into fields: 
– Author 
– Title 
– Editorial Reviews 
– Customer Comments 
– Subject terms 
– Related authors 
– Related titles
19 
LIBRA System 
Amazon Pages 
Rated 
Examples 
Information 
Extraction 
Machine Learning 
Learner 
User Profile 
LIBRA 
Database 
Recommendations 
1.~~~~~~ 
2.~~~~~~~ 
3.~~~~~ 
::: 
Predictor
20 
Sample Amazon Page 
Age of Spiritual Machines
21 
Sample Extracted Information 
Title: <The Age of Spiritual Machines: When Computers Exceed Human Intelligence> 
Author: <Ray Kurzweil> 
Price: <11.96> 
Publication Date: <January 2000> 
ISBN: <0140282025> 
Related Titles: <Title: <Robot: Mere Machine or Transcendent Mind> 
Author: <Hans Moravec> > 
… 
Reviews: <Author: <Amazon.com Reviews> Text: <How much do we humans…> > 
… 
Comments: <Stars: <4> Author: <Stephen A. Haines> Text:<Kurzweil has …> > 
… 
Related Authors: <Hans P. Moravec> <K. Eric Drexler>… 
Subjects: <Science/Mathematics> <Computers> <Artificial Intelligence> …
22 
Libra Content Information 
• Libra uses this extracted information to 
form “bags of words” for the following 
slots: 
– Author 
– Title 
– Description (reviews and comments) 
– Subjects 
– Related Titles 
– Related Authors
23 
Libra Overview 
• User rates selected titles on a 1 to 10 scale. 
• Libra uses a naïve Bayesian text-categorization 
algorithm to learn a profile from these rated 
examples. 
– Rating 6–10: Positive 
– Rating 1–5: Negative 
• The learned profile is used to rank all other books as 
recommendations based on the computed posterior 
probability that they are positive. 
• User can also provide explicit positive/negative 
keywords, which are used as priors to bias the role 
of these features in categorization.
24 
Bayesian Categorization in LIBRA 
• Model is generalized to generate a vector of bags 
of words (one bag for each slot). 
– Instances of the same word in different slots are treated 
as separate features: 
• “Chrichton” in author vs. “Chrichton” in description 
• Training examples are treated as weighted positive 
or negative examples when estimating conditional 
probability parameters: 
– An example with rating 1 £ r £ 10 is given: 
positive probability: (r – 1)/9 
negative probability: (10 – r)/9
25 
Implementation 
• Stopwords removed from all bags. 
• A book’s title and author are added to its own 
related title and related author slots. 
• All probabilities are smoothed using Laplace 
estimation to account for small sample size. 
• Lisp implementation is quite efficient: 
– Training: 20 exs in 0.4 secs, 840 exs in 11.5 secs 
– Test: 200 books per second
26 
Explanations of Profiles and 
Recommendations 
• Feature strength of word wk appearing in a 
slot sj : 
P w s 
( | positive, ) 
k j 
k j P w 
( | negative,s ) 
strength( , ) log 
k j 
w s =
27 
Experimental Data 
• Amazon searches were used to find books 
in various genres. 
• Titles that have at least one review or 
comment were kept. 
• Data sets: 
– Literature fiction: 3,061 titles 
– Mystery: 7,285 titles 
– Science: 3,813 titles 
– Science Fiction: 3.813 titles
28 
Rated Data 
• 4 users rated random examples within a 
genre by reviewing the Amazon pages 
about the title: 
– LIT1 936 titles 
– LIT2 935 titles 
– MYST 500 titles 
– SCI 500 titles 
– SF 500 titles
29 
Experimental Method 
• 10-fold cross-validation to generate learning curves. 
• Measured several metrics on independent test data: 
– Precision at top 3: % of the top 3 that are positive 
– Rating of top 3: Average rating assigned to top 3 
– Rank Correlation: Spearman’s, rs, between system’s and 
user’s complete rankings. 
• Test ablation of related author and related title slots 
(LIBRA-NR). 
– Test influence of information generated by Amazon’s 
collaborative approach.
30 
Experimental Result Summary 
• Precision at top 3 is fairly consistently in the 90’s 
% after only 20 examples. 
• Rating of top 3 is fairly consistently above 8 after 
only 20 examples. 
• All results are always significantly better than 
random chance after only 5 examples. 
• Rank correlation is generally above 0.3 (moderate) 
after only 10 examples. 
• Rank correlation is generally above 0.6 (high) 
after 40 examples.
31 
Precision at Top 3 for Science
32 
Rating of Top 3 for Science
33 
Rank Correlation for Science
34 
User Studies 
• Subjects asked to use Libra and get 
recommendations. 
• Encouraged several rounds of feedback. 
• Rated all books in final list of 
recommendations. 
• Selected two books for purchase. 
• Returned reviews after reading selections. 
• Completed questionnaire about the system.
35 
Combining Content and Collaboration 
• Content-based and collaborative methods have 
complementary strengths and weaknesses. 
• Combine methods to obtain the best of both. 
• Various hybrid approaches: 
– Apply both methods and combine recommendations. 
– Use collaborative data as content. 
– Use content-based predictor as another collaborator. 
– Use content-based predictor to complete 
collaborative data.
36 
Movie Domain 
• EachMovie Dataset [Compaq Research Labs] 
– Contains user ratings for movies on a 0–5 scale. 
– 72,916 users (avg. 39 ratings each). 
– 1,628 movies. 
– Sparse user-ratings matrix – (2.6% full). 
• Crawled Internet Movie Database (IMDb) 
– Extracted content for titles in EachMovie. 
• Basic movie information: 
– Title, Director, Cast, Genre, etc. 
• Popular opinions: 
– User comments, Newspaper and Newsgroup reviews, etc.
37 
Content-Boosted Collaborative Filtering 
EachMovie Web Crawler IMDb 
Movie 
Content 
Database 
Full User 
Ratings Matrix 
Collaborative 
Filtering 
User Ratings 
Matrix (Sparse) Content-based 
Predictor 
Active 
User Ratings 
Recommendations
38 
Content-Boosted CF - I 
Content-Based 
Predictor 
Training Examples 
Pseudo User-ratings Vector 
Items with Predicted Ratings 
User-ratings Vector 
User-rated 
UnratIetde mItsems
39 
Content-Boosted CF - II 
User Ratings 
Matrix 
Content-Based 
Predictor 
• Compute pseudo user ratings matrix 
Pseudo User 
Ratings Matrix 
– Full matrix – approximates actual full user ratings matrix 
• Perform CF 
– Using Pearson corr. between pseudo user-rating vectors
40 
Experimental Method 
• Used subset of EachMovie (7,893 users; 299,997 
ratings) 
• Test set: 10% of the users selected at random. 
– Test users that rated at least 40 movies. 
– Train on the remainder sets. 
• Hold-out set: 25% items for each test user. 
– Predict rating of each item in the hold-out set. 
• Compared CBCF to other prediction approaches: 
– Pure CF 
– Pure Content-based 
– Naïve hybrid (averages CF and content-based 
predictions)
41 
Metrics 
• Mean Absolute Error (MAE) 
– Compares numerical predictions with user ratings 
• ROC sensitivity [Herlocker 99] 
– How well predictions help users select high-quality 
items 
– Ratings ³ 4 considered “good”; < 4 considered “bad” 
• Paired t-test for statistical significance
42 
Results - I 
1.06 
1.04 
1.02 
1 
0.98 
0.96 
0.94 
0.92 
0.9 
MAE 
MAE 
Algorithm 
CF 
Content 
Naïve 
CBCF 
CBCF is significantly better (4% over CF) at (p < 0.001)
43 
Results - II 
0.68 
0.66 
0.64 
0.62 
0.6 
0.58 
ROC-4 
ROC Sensitivity 
Algorithm 
CF 
Content 
Naïve 
CBCF 
CBCF outperforms rest (5% improvement over CF)
44 
Active Learning 
(Sample Section, Learning with Queries) 
• Used to reduce the number of training 
examples required. 
• System requests ratings for specific items 
from which it would learn the most. 
• Several existing methods: 
– Uncertainty sampling 
– Committee-based sampling
45 
Semi-Supervised Learning 
(Weakly Supervised, Bootstrapping) 
• Use wealth of unlabeled examples to aid 
learning from a small amount of labeled data. 
• Several recent methods developed: 
– Semi-supervised EM (Expectation Maximization) 
– Co-training 
– Transductive SVM’s
46 
Conclusions 
• Recommending and personalization are 
important approaches to combating 
information over-load. 
• Machine Learning is an important part of 
systems for these tasks. 
• Collaborative filtering has problems. 
• Content-based methods address these 
problems (but have problems of their own). 
• Integrating both is best.

Mais conteúdo relacionado

Mais procurados

Recommendation system
Recommendation systemRecommendation system
Recommendation systemAkshat Thakar
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingCrossing Minds
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender SystemsRoelof van Zwol
 
Recommender Systems in E-Commerce
Recommender Systems in E-CommerceRecommender Systems in E-Commerce
Recommender Systems in E-CommerceRoger Chen
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Ernesto Mislej
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engineJayesh Lahori
 
Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Paolo Cremonesi
 

Mais procurados (20)

Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
Recommender Systems in E-Commerce
Recommender Systems in E-CommerceRecommender Systems in E-Commerce
Recommender Systems in E-Commerce
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engine
 
Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018
 

Semelhante a Recommender systems

Recommenders Systems
Recommenders SystemsRecommenders Systems
Recommenders SystemsTariq Hassan
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbaiTejaspathiLV
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in puneprathyusha1234
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabadprathyusha1234
 
best online data science courses
best online data science coursesbest online data science courses
best online data science coursesprathyusha1234
 
Book Recommendation Engine
Book Recommendation EngineBook Recommendation Engine
Book Recommendation EngineShravaniBheema
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender systemKaren Li
 
Recommendation Systems Roadtrip
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems RoadtripThe Real Dyl
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionPerumalPitchandi
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems - Yousef Fadila
 
Recommender Systems.pptx
Recommender Systems.pptxRecommender Systems.pptx
Recommender Systems.pptxNamrata Vij
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label ClassificationYONG ZHENG
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Dakiry
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto
 

Semelhante a Recommender systems (20)

Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
Recommenders Systems
Recommenders SystemsRecommenders Systems
Recommenders Systems
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
 
Book Recommendation Engine
Book Recommendation EngineBook Recommendation Engine
Book Recommendation Engine
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
 
Recommendation Systems Roadtrip
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems Roadtrip
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System Introduction
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
 
Recommender Systems.pptx
Recommender Systems.pptxRecommender Systems.pptx
Recommender Systems.pptx
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification
 
Lec7 collaborative filtering
Lec7 collaborative filteringLec7 collaborative filtering
Lec7 collaborative filtering
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 

Último

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 

Recommender systems

  • 1. 1 Recommender Systems Collaborative Filtering & Content-Based Recommending
  • 2. 2 Recommender Systems • Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences. • Many on-line stores provide recommendations (e.g. Amazon, CDNow). • Recommenders have been shown to substantially increase sales at on-line stores. • There are two basic approaches to recommending: – Collaborative Filtering (a.k.a. social filtering) – Content-based
  • 3. 3 Book Recommender Red Mars Found ation Juras-sic Park Lost World 2001 Differ-ence Engine Machine Learning User Profile Neuro-mancer 2010
  • 4. 4 Personalization • Recommenders are instances of personalization software. • Personalization concerns adapting to the individual needs, interests, and preferences of each user. • Includes: – Recommending – Filtering – Predicting (e.g. form or calendar appt. completion) • From a business perspective, it is viewed as part of Customer Relationship Management (CRM).
  • 5. 5 Machine Learning and Personalization • Machine Learning can allow learning a user model or profile of a particular user based on: – Sample interaction – Rated examples • This model or profile can then be used to: – Recommend items – Filter information – Predict behavior
  • 6. 6 Collaborative Filtering • Maintain a database of many users’ ratings of a variety of items. • For a given user, find other similar users whose ratings strongly correlate with the current user. • Recommend items rated highly by these similar users, but not rated by the current user. • Almost all existing commercial recommenders use this approach (e.g. Amazon).
  • 7. 7 Collaborative Filtering A 9 B 3 C: : Z 5 A B C 9 : : Z 10 A 5 B 3 C: : Z 7 A B C 8 : : Z A 6 B 4 C: : Z A 10 B 4 C 8 . . Z 1 User Database Active User Correlation Match A 9 B 3 C . . Z 5 A 9 B 3 C: : Z 5 A 10 B 4 C 8 . . Z 1 Extract Recommendations C
  • 8. 8 Collaborative Filtering Method • Weight all users with respect to similarity with the active user. • Select a subset of the users (neighbors) to use as predictors. • Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings. • Present items with highest predicted ratings as recommendations.
  • 9. 9 Similarity Weighting • Typically use Pearson correlation coefficient between ratings for active user, a, and another user, u. c r r covar( , ) a u ra ru a u s s , = ra and ru are the ratings vectors for the m items rated by both a and u ri,j is user i’s rating for item j
  • 10. 10 Covariance and Standard Deviation • Covariance: r r r r r • Standard Deviation: m r r m i a i a u i u a u å= - - = 1 , , ( )( ) covar( , ) m r m i x i x å= = 1 , r r m m i x i x rx å= - = 1 2 , ( ) s
  • 11. 11 Significance Weighting • Important not to trust correlations based on very few co-rated items. • Include significance weights, sa,u, based on number of co-rated items, m. a u a u a u w s c , , , = ïþ ïý ü ïî ïí ì > 1 if 50 , m m = if £ 50 50 m s a u
  • 12. 12 Neighbor Selection • For a given active user, a, select correlated users to serve as source of predictions. • Standard approach is to use the most similar n users, u, based on similarity weights, wa,u • Alternate approach is to include all users whose similarity weight is above a given threshold.
  • 13. 13 Rating Prediction • Predict a rating, pa,i, for each item i, for active user, a, by using the n selected neighbor users, u Î {1,2,…n}. • To account for users different ratings levels, base predictions on differences from a user’s average rating. • Weight users’ ratings contribution by their similarity to the active user. w r r å å = = - = + n u a u n u a u u i u a i a w p r 1 , 1 , , , ( )
  • 14. 14 Problems with Collaborative Filtering • Cold Start: There needs to be enough other users already in the system to find a match. • Sparsity: If there are many items to be recommended, even if there are many users, the user/ratings matrix is sparse, and it is hard to find users that have rated the same items. • First Rater: Cannot recommend an item that has not been previously rated. – New items – Esoteric items • Popularity Bias: Cannot recommend items to someone with unique tastes. – Tends to recommend popular items.
  • 15. 15 Content-Based Recommending • Recommendations are based on information on the content of items rather than on other users’ opinions. • Uses a machine learning algorithm to induce a profile of the users preferences from examples based on a featural description of content. • Some previous applications: – Newsweeder (Lang, 1995) – Syskill and Webert (Pazzani et al., 1996)
  • 16. Advantages of Content-Based Approach • No need for data on other users. – No cold-start or sparsity problems. • Able to recommend to users with unique tastes. • Able to recommend new and unpopular items – No first-rater problem. • Can provide explanations of recommended items by listing content-features that caused an item to be recommended. 16
  • 17. 17 Disadvantages of Content-Based Method • Requires content that can be encoded as meaningful features. • Users’ tastes must be represented as a learnable function of these content features. • Unable to exploit quality judgments of other users. – Unless these are somehow included in the content features.
  • 18. 18 LIBRA Learning Intelligent Book Recommending Agent • Content-based recommender for books using information about titles extracted from Amazon. • Uses information extraction from the web to organize text into fields: – Author – Title – Editorial Reviews – Customer Comments – Subject terms – Related authors – Related titles
  • 19. 19 LIBRA System Amazon Pages Rated Examples Information Extraction Machine Learning Learner User Profile LIBRA Database Recommendations 1.~~~~~~ 2.~~~~~~~ 3.~~~~~ ::: Predictor
  • 20. 20 Sample Amazon Page Age of Spiritual Machines
  • 21. 21 Sample Extracted Information Title: <The Age of Spiritual Machines: When Computers Exceed Human Intelligence> Author: <Ray Kurzweil> Price: <11.96> Publication Date: <January 2000> ISBN: <0140282025> Related Titles: <Title: <Robot: Mere Machine or Transcendent Mind> Author: <Hans Moravec> > … Reviews: <Author: <Amazon.com Reviews> Text: <How much do we humans…> > … Comments: <Stars: <4> Author: <Stephen A. Haines> Text:<Kurzweil has …> > … Related Authors: <Hans P. Moravec> <K. Eric Drexler>… Subjects: <Science/Mathematics> <Computers> <Artificial Intelligence> …
  • 22. 22 Libra Content Information • Libra uses this extracted information to form “bags of words” for the following slots: – Author – Title – Description (reviews and comments) – Subjects – Related Titles – Related Authors
  • 23. 23 Libra Overview • User rates selected titles on a 1 to 10 scale. • Libra uses a naïve Bayesian text-categorization algorithm to learn a profile from these rated examples. – Rating 6–10: Positive – Rating 1–5: Negative • The learned profile is used to rank all other books as recommendations based on the computed posterior probability that they are positive. • User can also provide explicit positive/negative keywords, which are used as priors to bias the role of these features in categorization.
  • 24. 24 Bayesian Categorization in LIBRA • Model is generalized to generate a vector of bags of words (one bag for each slot). – Instances of the same word in different slots are treated as separate features: • “Chrichton” in author vs. “Chrichton” in description • Training examples are treated as weighted positive or negative examples when estimating conditional probability parameters: – An example with rating 1 £ r £ 10 is given: positive probability: (r – 1)/9 negative probability: (10 – r)/9
  • 25. 25 Implementation • Stopwords removed from all bags. • A book’s title and author are added to its own related title and related author slots. • All probabilities are smoothed using Laplace estimation to account for small sample size. • Lisp implementation is quite efficient: – Training: 20 exs in 0.4 secs, 840 exs in 11.5 secs – Test: 200 books per second
  • 26. 26 Explanations of Profiles and Recommendations • Feature strength of word wk appearing in a slot sj : P w s ( | positive, ) k j k j P w ( | negative,s ) strength( , ) log k j w s =
  • 27. 27 Experimental Data • Amazon searches were used to find books in various genres. • Titles that have at least one review or comment were kept. • Data sets: – Literature fiction: 3,061 titles – Mystery: 7,285 titles – Science: 3,813 titles – Science Fiction: 3.813 titles
  • 28. 28 Rated Data • 4 users rated random examples within a genre by reviewing the Amazon pages about the title: – LIT1 936 titles – LIT2 935 titles – MYST 500 titles – SCI 500 titles – SF 500 titles
  • 29. 29 Experimental Method • 10-fold cross-validation to generate learning curves. • Measured several metrics on independent test data: – Precision at top 3: % of the top 3 that are positive – Rating of top 3: Average rating assigned to top 3 – Rank Correlation: Spearman’s, rs, between system’s and user’s complete rankings. • Test ablation of related author and related title slots (LIBRA-NR). – Test influence of information generated by Amazon’s collaborative approach.
  • 30. 30 Experimental Result Summary • Precision at top 3 is fairly consistently in the 90’s % after only 20 examples. • Rating of top 3 is fairly consistently above 8 after only 20 examples. • All results are always significantly better than random chance after only 5 examples. • Rank correlation is generally above 0.3 (moderate) after only 10 examples. • Rank correlation is generally above 0.6 (high) after 40 examples.
  • 31. 31 Precision at Top 3 for Science
  • 32. 32 Rating of Top 3 for Science
  • 33. 33 Rank Correlation for Science
  • 34. 34 User Studies • Subjects asked to use Libra and get recommendations. • Encouraged several rounds of feedback. • Rated all books in final list of recommendations. • Selected two books for purchase. • Returned reviews after reading selections. • Completed questionnaire about the system.
  • 35. 35 Combining Content and Collaboration • Content-based and collaborative methods have complementary strengths and weaknesses. • Combine methods to obtain the best of both. • Various hybrid approaches: – Apply both methods and combine recommendations. – Use collaborative data as content. – Use content-based predictor as another collaborator. – Use content-based predictor to complete collaborative data.
  • 36. 36 Movie Domain • EachMovie Dataset [Compaq Research Labs] – Contains user ratings for movies on a 0–5 scale. – 72,916 users (avg. 39 ratings each). – 1,628 movies. – Sparse user-ratings matrix – (2.6% full). • Crawled Internet Movie Database (IMDb) – Extracted content for titles in EachMovie. • Basic movie information: – Title, Director, Cast, Genre, etc. • Popular opinions: – User comments, Newspaper and Newsgroup reviews, etc.
  • 37. 37 Content-Boosted Collaborative Filtering EachMovie Web Crawler IMDb Movie Content Database Full User Ratings Matrix Collaborative Filtering User Ratings Matrix (Sparse) Content-based Predictor Active User Ratings Recommendations
  • 38. 38 Content-Boosted CF - I Content-Based Predictor Training Examples Pseudo User-ratings Vector Items with Predicted Ratings User-ratings Vector User-rated UnratIetde mItsems
  • 39. 39 Content-Boosted CF - II User Ratings Matrix Content-Based Predictor • Compute pseudo user ratings matrix Pseudo User Ratings Matrix – Full matrix – approximates actual full user ratings matrix • Perform CF – Using Pearson corr. between pseudo user-rating vectors
  • 40. 40 Experimental Method • Used subset of EachMovie (7,893 users; 299,997 ratings) • Test set: 10% of the users selected at random. – Test users that rated at least 40 movies. – Train on the remainder sets. • Hold-out set: 25% items for each test user. – Predict rating of each item in the hold-out set. • Compared CBCF to other prediction approaches: – Pure CF – Pure Content-based – Naïve hybrid (averages CF and content-based predictions)
  • 41. 41 Metrics • Mean Absolute Error (MAE) – Compares numerical predictions with user ratings • ROC sensitivity [Herlocker 99] – How well predictions help users select high-quality items – Ratings ³ 4 considered “good”; < 4 considered “bad” • Paired t-test for statistical significance
  • 42. 42 Results - I 1.06 1.04 1.02 1 0.98 0.96 0.94 0.92 0.9 MAE MAE Algorithm CF Content Naïve CBCF CBCF is significantly better (4% over CF) at (p < 0.001)
  • 43. 43 Results - II 0.68 0.66 0.64 0.62 0.6 0.58 ROC-4 ROC Sensitivity Algorithm CF Content Naïve CBCF CBCF outperforms rest (5% improvement over CF)
  • 44. 44 Active Learning (Sample Section, Learning with Queries) • Used to reduce the number of training examples required. • System requests ratings for specific items from which it would learn the most. • Several existing methods: – Uncertainty sampling – Committee-based sampling
  • 45. 45 Semi-Supervised Learning (Weakly Supervised, Bootstrapping) • Use wealth of unlabeled examples to aid learning from a small amount of labeled data. • Several recent methods developed: – Semi-supervised EM (Expectation Maximization) – Co-training – Transductive SVM’s
  • 46. 46 Conclusions • Recommending and personalization are important approaches to combating information over-load. • Machine Learning is an important part of systems for these tasks. • Collaborative filtering has problems. • Content-based methods address these problems (but have problems of their own). • Integrating both is best.