Collaborative filtering intro - Full

WE KNOW YOU WILL LIKE THIS
Introduction to Recommendation Engines

Monday, January 14, 13

ML
X X +Y

Supervised Unsupervised
Clustering
T + YT

X X +Y

Hierarchical Clustering
Regression Classiﬁcation
Turnout Class
30 Spam
Y= (numeric) Y = Not Spam (Categorical)
12
25 Spam

MarabooKarnaf Ima Adama
Liv
Idan 5 ? 3 ?
Shahar 4 3 ? 2
Gadi ? 1 ? 5

Content/Model-Based
(Agnostic, Behavioural)
(predicting the rating)

Recommendation

Preference Problem (Ads)

Rating Problem (Movies)


Related problem: Ranking


Maraboo Karnaf Ima Adama Liv
Idan 1 ? 1 ?
Shahar 1 1 ? 1
Gadi ? 1 ? 1

Idan 5 ? 3 ?
Shahar 4 3 ? 2
Gadi ? 1 ? 5


Idan 1 ? 1 ?
Shahar 1 1 ? 1
Gadi ? 1 ? 1


Idan 5 ? 3 ?
Shahar 4 3 ? 2
Gadi ? 1 ? 5


User-based Collaborative Filtering


Jaccard Distance “We share 5 preferences out of 7!”

Euclidean Distance

Cosine Similiarity

Pearson’s
Correlation 1- “Our preferences go
Distance in the same direction!”
(but only 2 such preferences do...)
Log-Likelihood
Ratio

Measure of “Surprise” at correlation


Item-Based Collaborative Filtering

Usually bounded


Case study: Amazon
100,000,000 users

2,000,000 items

Each user expresses preference for 10 items

Each item has 500 reviews
User-Based CF: Item-Based CF:

100,000,000 x 100,000,000 2,000,000 x 2,000,000 similarity
similarity matrix matrix

2,000,000 x 500 sum terms 2,000,000 x 10 sum terms


Interpretability

“People who go to
La Colombe “Coffee Shop
Torrefaction & connoisseurs tend
FourSquare HQ tend to come here”
to go here”


Evaluation
Rating Problem: Predictive accuracy (regression) metrics

RMSE, MAE, etc.

Preference (Binary) Problem: Classification accuracy (IR) metrics

Accuracy, Precision, Recall, F-1, ROC, etc.

Benchmark vs. ‘random’ and ‘popular’

Ranking accuracy metrics: Similarity of permutations

Pearson’s correlation, Spearman’s rho, Kendall’s tau


Challenges

Cold-start problems (new item, new user)

“Black” and “Grey” sheep

Exploration-exploitation and reinforcement learning

Scale


Advanced Topics

Dimensionality Reduction

Map-Reducible calculations

Content-based (feature-based)

Multiple models


MapReduce Similarity Calculation
“User-based”
A ui
Maraboo Karnaf Ima Adama Liv Gadi Gadi
Idan
Shahar
1
1
?
1
1
?
?
1 * Maraboo
Karnaf
?
1
= Idan
Shahar
0
2
Gadi ? 1 ? 1 Ima Adama ? Gadi 2
Liv 1
User similarity vector
AT Aui T(Au )
Maraboo
Idan
1
Shahar Gadi
1 ?
* Idan
Gadi
0
= Maraboo
Gadi
2
A i
Karnaf ? 1 1 Shahar 2 Karnaf 4
Ima Adama 1 ? ? Gadi 2 Ima Adama 0
Liv ? 1 1 Liv 4


“Item-Based”
A T A
Idan Shahar Gadi Maraboo Karnaf Ima Adama Liv Maraboo Karnaf Ima Adama Liv
Maraboo 1 1 ? Idan 1 ? 1 ?
Karnaf ? 1 1 * Shahar 1 1 ? 1 = Maraboo
Karnaf
2
1
1
2
1
0
1
2
Ima Adama 1 ? ? Gadi ? 1 ? 1 Ima Adama 1 0 1 0
Liv ? 1 1 Liv 1 2 0 2

Item similarity matrix
ATA ui
Maraboo Karnaf Ima Adama Liv Gadi Gadi
Maraboo 2 1 1 1 Maraboo ? Maraboo 2
=
* T
(A A)ui
Karnaf 1 2 0 2 Karnaf 1 Karnaf 4
Ima Adama 1 0 1 0 Ima Adama ? Ima Adama 0
Liv 1 2 0 2 Liv 1 Liv 4

Similarity of item x to item y is <ix,iy>


Recall row outer-product matrix multiplication:
Maraboo 2 1 1 1
Karnaf 1 2 0 2
Ima Adama 1 0 1 0
Liv 1 2 0 2

=
Maraboo Karnaf Ima Adama Liv Maraboo Karnaf Ima Adama Liv Maraboo Karnaf Ima Adama Liv
Maraboo 1 0 1 0 Maraboo 1 1 0 1 Maraboo 0 0 0 0
Karnaf
Ima Adama
0
1
0
0
0
1
0
0 + Karnaf
Ima Adama
1
0
1
0
0
0
1
0
+ Karnaf
Ima Adama
0
0
1
0
0
0
1
0
Liv 0 0 0 0 Liv 1 1 0 1 Liv 0 1 0 1

uIdanuIdan T uShaharuShahar
T uGadiuGadi T

Only one user’s list of items is used every time!



All of the classic similarity functions are
made up of 3 stages:

Preprocess (uses only one ELEMENT)

Norm (Can be done in reduce on one
VECTOR)
T
Similarity utilizes the A A matrix joined
with norm entries


Bibliography
Google News Personalization: Scalable Online Collaborative Filtering - Das, Datar, Garg, Rajaram, WWW2007

Logistic Regression and Collaborative Filtering for Sponsored Search Term Recommendation - Bartz, Murthi, Sebastian, EC2006

Evaluating Collaborative Filtering Recommender Systems - Herlocker, Konstan, Tenveen, Riedl, ACM TIS2004

A Survey of Collaborative Filtering Techniques - Su, Khoshgoftaar, AAI2009

An Introduction to Information Retrieval - Manning, Raghavan, Schutze, Cambridge Press

Mahout in Action - Friedman, Dunning, Anil, Owen, Manning Publications

Lessons from the Netflix Prize Challenge - Bell, Koren, KDD2009

Factorization meets the Neighbourhood: a Multifaceted Collaborative Filtering Model - Koren, KDD2008

Accurate Methods for the Statistics of Surprise and Coincidence - Dunning, ACL1993

Item-Based Collaborative Filtering Recommendation Algorithms - Sarwar, Konstan, Karypis, Riedl, WWW2001

Matrix Factorization Techniques for Recommender Systems - Koren, Bell, Volinsky, IEEE2009

recommenderlab: A Framework for Developing and Testing Recommendation Algorithms - Hahsler, 2001

Scalable Similarity-Based Neighbourhood Methods with MapReduce - Schelter, Boden, Markl, RecSys2012


Thanks!

Nimrod Priell
nimrod.priell@gmail.com
@nimrodpriell
http://www.educated-guess.com


Collaborative filtering intro - Full

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

Collaborative filtering intro - Full