SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Adam Warski

Recommendation
systems

5/11/2013 Warszawa-JUG

Introduction using Mahout

@adamwarski
Agenda

❖

What is a recommender?!

❖

Types of recommenders, input data!

❖

Recommendations with Mahout: single node!

❖

Recommendations with Mahout: multiple nodes

5/11/2013 Warszawa-JUG

@adamwarski
Who Am I?
❖

Day: coding @ SoftwareMill!

❖

Afternoon: playgrounds, Duplos, etc.!

❖

Evenings: blogging, open-source!
❖

Original author of Hibernate Envers!

❖

ElasticMQ, Veripacks, MacWire!

❖

http://www.warski.org

5/11/2013 Warszawa-JUG

@adamwarski
Me + recommenders
❖

Yap.TV!

❖

Recommends shows to users
basing on their favorites!
❖
❖

❖

from Yap!
from Facebook!

Some business rules

5/11/2013 Warszawa-JUG

@adamwarski
Some background

5/11/2013 Warszawa-JUG

@adamwarski
Information …

❖

Retrieval!
❖

❖

given a query, select items!

Filtering!
❖

given a user, filter out irrelevant items

5/11/2013 Warszawa-JUG

@adamwarski
What can be recommended?

❖

Items to users!

❖

Items to items!

❖

Users to users!

❖

Users to items

5/11/2013 Warszawa-JUG

@adamwarski
Input data

❖

(user, item, rating) tuples!
❖

❖

rating: e.g. 1-5 stars!

(user, item) tuples!
❖

unary

5/11/2013 Warszawa-JUG

@adamwarski
Input data
❖

Implicit!
❖
❖

reads!

❖
❖

clicks!

watching a video!

Explicits!
❖

explicit rating!

❖

favorites

5/11/2013 Warszawa-JUG

@adamwarski
Prediction vs recommendation
❖

Recommendations!
❖
❖

❖

suggestions!
top-N!

Predictions!
❖

ratings

5/11/2013 Warszawa-JUG

@adamwarski
Collaborative Filtering

❖

Something more than search keywords!
❖
❖

❖

tastes!
quality!

Collaborative: data from many users

5/11/2013 Warszawa-JUG

@adamwarski
Content-based recommenders

❖

Define features and feature values!

❖

Describe each item as a vector of features

5/11/2013 Warszawa-JUG

@adamwarski
Content-based recommenders
❖

User vector!
❖
❖

❖

e.g. counting user tags!
sum of item vectors!

Prediction: cosine between two vectors!
❖

profile!

❖

item

5/11/2013 Warszawa-JUG

@adamwarski
User-User CF

❖

Measure similarity between users!

❖

Prediction: weighted combination of existing ratings

5/11/2013 Warszawa-JUG

@adamwarski
User-User CF

❖

Domain must be scoped so that there’s agreement!

❖

Individual preferences should be stable

5/11/2013 Warszawa-JUG

@adamwarski
User similarity

❖

Many different metrics!
!

❖

Pearson correlation:

5/11/2013 Warszawa-JUG

@adamwarski
User neighbourhood
❖

Threshold on similarity!

❖

Top-N neighbours!
❖

❖

25-100: good starting point!

Variations:!
❖

trust networks, e.g. friends in a social network

5/11/2013 Warszawa-JUG

@adamwarski
Predictions in UU-CF

5/11/2013 Warszawa-JUG

@adamwarski
Item-Item CF

❖

Similar to UU-CF, only with items!

❖

Item similarity!
❖

Pearson correlation on item ratings!

❖

co-occurences

5/11/2013 Warszawa-JUG

@adamwarski
Evaluating recommenders
❖

How to tell if a recommender is good?!
❖
❖

❖

compare implementations!
are the recommendations ok?!

Business metrics!
❖

what leads to increased sales

5/11/2013 Warszawa-JUG

@adamwarski
Leave one out

❖

Remove one preference!

❖

Rebuild the model!

❖

See if it is recommended!

❖

Or, see what is the predicted rating

5/11/2013 Warszawa-JUG

@adamwarski
Some metrics

❖

Mean absolute error:!

❖

Mean squared error

5/11/2013 Warszawa-JUG

@adamwarski
Precision/recall
❖

Precision: % of recommended items that are relevant!

❖

Recall: % of relevant items that are recommended!

❖

Precision@N, recall@N
Recall

Ideal recs

Precision
Recs
5/11/2013 Warszawa-JUG

All items
@adamwarski
Problems
❖

Novelty: hard to measure!
❖

❖

leave one out: punishes recommenders which
recommend new items!

Test on humans!
❖

A/B testing

5/11/2013 Warszawa-JUG

@adamwarski
Diversity/serendipity
❖

Increase diversity:!
❖
❖

❖

top-n!
as items come in, remove the ones that are too similar
to prior items!

Increase serendipity:!
❖

downgrade popular items

5/11/2013 Warszawa-JUG

@adamwarski
Netflix challenge

❖

$1m for the best recommender!

❖

Used mean error!

❖

Winning algorithm won on predicting low ratings

5/11/2013 Warszawa-JUG

@adamwarski
Mahout single-node

5/11/2013 Warszawa-JUG

@adamwarski
5/11/2013 Warszawa-JUG

@adamwarski
5/11/2013 Warszawa-JUG

@adamwarski
Mahout single-node
❖

User-User CF!

❖

Item-Item CF!

❖

Various metrics!
❖
❖

Pearson!

❖
❖

Euclidean!

Log-likelihood!

Support for evaluation

5/11/2013 Warszawa-JUG

@adamwarski
Let’s code

5/11/2013 Warszawa-JUG

@adamwarski
Mahout multi-node
❖

Based on Hadoop!

❖

Pre-computed recommendations!

❖

Item-based recommenders!
❖

❖

Co-occurrence matrix!

Matrix decomposition!
❖

Alternating Least Squares

5/11/2013 Warszawa-JUG

@adamwarski
Running Mahout w/ Hadoop
hadoop jar mahout-core-0.8-job.jar !
! org.apache.mahout.cf.taste.hadoop.item.RecommenderJob !
! --booleanData !
! --similarityClassname SIMILARITY_LOGLIKELIHOOD !
! --output output !
! --input input/data.dat

5/11/2013 Warszawa-JUG

@adamwarski
Links
❖

http://www.warski.org!

❖

http://mahout.apache.org/!

❖

http://grouplens.org/!

❖

http://graphlab.org/graphchi/!

❖

https://class.coursera.org/recsys-001/class!

❖

https://github.com/adamw/mahout-pres

5/11/2013 Warszawa-JUG

@adamwarski
Thanks!

❖

Questions?!

❖

Stickers ->!

❖

adam@warski.org

5/11/2013 Warszawa-JUG

@adamwarski

Mais conteúdo relacionado

Mais de Adam Warski

The ideal module system and the harsh reality
The ideal module system and the harsh realityThe ideal module system and the harsh reality
The ideal module system and the harsh reality
Adam Warski
 

Mais de Adam Warski (9)

What have the annotations done to us?
What have the annotations done to us?What have the annotations done to us?
What have the annotations done to us?
 
Hejdyk maleńka część mazur
Hejdyk maleńka część mazurHejdyk maleńka część mazur
Hejdyk maleńka część mazur
 
Slick eventsourcing
Slick eventsourcingSlick eventsourcing
Slick eventsourcing
 
Evaluating persistent, replicated message queues
Evaluating persistent, replicated message queuesEvaluating persistent, replicated message queues
Evaluating persistent, replicated message queues
 
The no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection FrameworkThe no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection Framework
 
ElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS serverElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS server
 
The ideal module system and the harsh reality
The ideal module system and the harsh realityThe ideal module system and the harsh reality
The ideal module system and the harsh reality
 
Scala Macros
Scala MacrosScala Macros
Scala Macros
 
CDI Portable Extensions
CDI Portable ExtensionsCDI Portable Extensions
CDI Portable Extensions
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Recommendation systems with Mahout: introduction