SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
MovieLens Recommendation
Engine

Outline:

Task & Dataset

Techniques

Results

Scalability

Conclusion
­ Ambarish Hazarnis
­ Vibhor Mathur
Task
Predict the rating, a user will give to a movie
which he hasn’t seen yet.
Recommend the movies with the highest scores.
Dataset
MovieLens 100k
•
100000 ratings by 943 users on 1682 items. Each user has rated at
least 20 movies.
•
Movies can be in several genres at once.
•
Demographic information about the users (age, gender, occupation).
Evaluation

Root Mean Squared Error
Techniques

Collaborative

User Based

Item Based

Slope one

Content Based

User Based – Age, Occupation, Gender

Item Based – Genre

Ensemble

Committee

Weighted

Distributed
Results

RMSE
Recommender Error
User Based 1.227
Item Based 0.664
Slope One 0.587
User Content Based 0.649
Item Content Based 0.639
Ensemble

Commitee
Recommender RMSE
Collaborative Based 0.595
Content Based 0.612
Collaborative + Content 0.594

Weighted
Recommender RMSE
Collaborative Based 0.747
Content Based 0.612
Collaborative + Content 0.663
Slope One

Principle:
Preferences for new items is based on average difference in the
preference value between a new item and the other items the user
prefers.

For two items I1 and I2, rating of user1 for I2 who has rated I1,

Count Weighting- Weight heavily those differences that are based on
more data.

Standard Deviation- A low std dev means will translate to a higher
weight.
User Content Based
User: Gender, Occupation, Age
Principle - Two users having similar gender, occupation or age group share similar taste.
Similarity -
Taking advantage of user-specific knowledge.
Custom Similarity metric for user similarity.
Assigning different weightage to gender, occupation and age similarities to deduce this
custom similarity.
This custom similarity metric can be paired with a standard
GenericUserBasedRecommender.
Discard all rating related information from metric computation.
Item Content based
Item: Multiple genre
Principle - Two movies of similar multiple genres will be similar.
Similarity -
Taking advantage of item-specific knowledge.
Custom Similarity metric for movie similarity.
Similarity is deduced based on the degree of similarity of genres.
This custom movie similarity metric can be paired with a standard
GenericItemBasedRecommender.
Ensemble

Ensemble

Uses phenomenon of 'Wisdom of crowds'

Commitee
Unweighted average of predicted ratings of all recommenders

Weighted

Higher weights for better recommenders

If Ei is the error of recommender, let Ai and Wi denote its accuracy and
weight respectively.
Scalability-1

Case Study: Item Based Recommender using Coocurrence as similarity.
4(2.0) + 3(0.0) + 4(0.0) + 3(4.0) + 1(4.5) + 2(0.0) + 0(5.0) = 24.5
Distributed computation helps
by breaking up a problem that’s too big for
one server into pieces that several smaller
servers can handle
Scalability-2

Sums the products of co-occurrences and preference values.

How is it suitable for distributed?
Computing the resulting recommendation vector only requires
loading one row or column of the matrix at a time
User's
Ratings
Cooccurence
Matrix
Item Based Rec
Top N
Recommendations
Apache Mahout: Provides scalable Machine learning
libraries
Package:
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
(5 MapReduce jobs)
Recommendations for User 122:
[ 9 : 5.0, 546 : 5.0, 568 : 5.0, 527 : 5.0, 515 : 5.0, 514 : 5.0, 511 : 5.0, 498 : 5.0]
Conclusion

Slope one recommender worked best but it is also computationally
very expensive.

Content based approach gave better results than plain collaborative
approach. However, the former is domain-specific.

A ensemble of simple learners gave comparable result.

More learners in a ensemble results in better predictions.
Thank YouThank You

Mais conteúdo relacionado

Destaque

Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and ClouderaJoey Echeverria
 
Slope one recommender on hadoop
Slope one recommender on hadoopSlope one recommender on hadoop
Slope one recommender on hadoopYONG ZHENG
 
CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013Cloudera Japan
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIClouderaUserGroups
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformIMC Institute
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformCloudera, Inc.
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Cloudera Manager 5 (hadoop運用) #cwt2013
Cloudera Manager 5 (hadoop運用)  #cwt2013Cloudera Manager 5 (hadoop運用)  #cwt2013
Cloudera Manager 5 (hadoop運用) #cwt2013Cloudera Japan
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Alan Said
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
The Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessThe Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessPipedrive
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 

Destaque (20)

Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and Cloudera
 
Slope one recommender on hadoop
Slope one recommender on hadoopSlope one recommender on hadoop
Slope one recommender on hadoop
 
CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
YARN High Availability
YARN High AvailabilityYARN High Availability
YARN High Availability
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via API
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Cloudera Manager 5 (hadoop運用) #cwt2013
Cloudera Manager 5 (hadoop運用)  #cwt2013Cloudera Manager 5 (hadoop運用)  #cwt2013
Cloudera Manager 5 (hadoop運用) #cwt2013
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
The Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessThe Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of Serverless
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 

Semelhante a Recommendation Engine using Apache Mahout

movie recommender system using vectorization and SVD tech
movie recommender system using vectorization and SVD techmovie recommender system using vectorization and SVD tech
movie recommender system using vectorization and SVD techUddeshBhagat
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
 
Movie lens movie recommendation system
Movie lens movie recommendation systemMovie lens movie recommendation system
Movie lens movie recommendation systemGaurav Sawant
 
2013-1 Machine Learning Lecture 03 - Sergio Jimenez - Text Classification …
2013-1 Machine Learning Lecture 03 - Sergio Jimenez - Text Classification …2013-1 Machine Learning Lecture 03 - Sergio Jimenez - Text Classification …
2013-1 Machine Learning Lecture 03 - Sergio Jimenez - Text Classification …Dongseo University
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Sujit Pal
 
[AAAI2021] Proxy Synthesis: Learning with Synthetic Classes for Deep Metric L...
[AAAI2021] Proxy Synthesis: Learning with Synthetic Classes for Deep Metric L...[AAAI2021] Proxy Synthesis: Learning with Synthetic Classes for Deep Metric L...
[AAAI2021] Proxy Synthesis: Learning with Synthetic Classes for Deep Metric L...Byung Soo Ko
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
Proceedings Template - WORD
Proceedings Template - WORDProceedings Template - WORD
Proceedings Template - WORDbutest
 
Feature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon ReviewsFeature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon ReviewsRavi Kiran Holur Vijay
 
MOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxMOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxAyushkumar417871
 
Typicality based collaborative filtering recommendation
Typicality based collaborative filtering recommendationTypicality based collaborative filtering recommendation
Typicality based collaborative filtering recommendationPapitha Velumani
 
The Wisdom of the Few @SIGIR09
The Wisdom of the Few @SIGIR09The Wisdom of the Few @SIGIR09
The Wisdom of the Few @SIGIR09Xavier Amatriain
 
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBig Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBigDataExpo
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...butest
 
Movie Recommendation System Using Hybrid Approch.pptx
Movie Recommendation System Using Hybrid Approch.pptxMovie Recommendation System Using Hybrid Approch.pptx
Movie Recommendation System Using Hybrid Approch.pptxChanduChandran6
 
16 recommender systems
16 recommender systems16 recommender systems
16 recommender systemsTanmayVijay1
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptxDr.Shweta
 

Semelhante a Recommendation Engine using Apache Mahout (20)

movie recommender system using vectorization and SVD tech
movie recommender system using vectorization and SVD techmovie recommender system using vectorization and SVD tech
movie recommender system using vectorization and SVD tech
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Movie lens movie recommendation system
Movie lens movie recommendation systemMovie lens movie recommendation system
Movie lens movie recommendation system
 
2013-1 Machine Learning Lecture 03 - Sergio Jimenez - Text Classification …
2013-1 Machine Learning Lecture 03 - Sergio Jimenez - Text Classification …2013-1 Machine Learning Lecture 03 - Sergio Jimenez - Text Classification …
2013-1 Machine Learning Lecture 03 - Sergio Jimenez - Text Classification …
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
 
[AAAI2021] Proxy Synthesis: Learning with Synthetic Classes for Deep Metric L...
[AAAI2021] Proxy Synthesis: Learning with Synthetic Classes for Deep Metric L...[AAAI2021] Proxy Synthesis: Learning with Synthetic Classes for Deep Metric L...
[AAAI2021] Proxy Synthesis: Learning with Synthetic Classes for Deep Metric L...
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
Proceedings Template - WORD
Proceedings Template - WORDProceedings Template - WORD
Proceedings Template - WORD
 
Feature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon ReviewsFeature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon Reviews
 
MOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxMOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptx
 
Typicality based collaborative filtering recommendation
Typicality based collaborative filtering recommendationTypicality based collaborative filtering recommendation
Typicality based collaborative filtering recommendation
 
The Wisdom of the Few @SIGIR09
The Wisdom of the Few @SIGIR09The Wisdom of the Few @SIGIR09
The Wisdom of the Few @SIGIR09
 
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBig Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
 
Hosanagar Supernova 2008
Hosanagar Supernova 2008Hosanagar Supernova 2008
Hosanagar Supernova 2008
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...
 
Movie Recommendation System Using Hybrid Approch.pptx
Movie Recommendation System Using Hybrid Approch.pptxMovie Recommendation System Using Hybrid Approch.pptx
Movie Recommendation System Using Hybrid Approch.pptx
 
16 recommender systems
16 recommender systems16 recommender systems
16 recommender systems
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptx
 

Último

Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - AvrilIvanti
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 

Último (20)

Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - Avril
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 

Recommendation Engine using Apache Mahout

  • 1. MovieLens Recommendation Engine  Outline:  Task & Dataset  Techniques  Results  Scalability  Conclusion ­ Ambarish Hazarnis ­ Vibhor Mathur
  • 2. Task Predict the rating, a user will give to a movie which he hasn’t seen yet. Recommend the movies with the highest scores.
  • 3. Dataset MovieLens 100k • 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. • Movies can be in several genres at once. • Demographic information about the users (age, gender, occupation). Evaluation  Root Mean Squared Error
  • 4. Techniques  Collaborative  User Based  Item Based  Slope one  Content Based  User Based – Age, Occupation, Gender  Item Based – Genre  Ensemble  Committee  Weighted  Distributed
  • 5. Results  RMSE Recommender Error User Based 1.227 Item Based 0.664 Slope One 0.587 User Content Based 0.649 Item Content Based 0.639
  • 6. Ensemble  Commitee Recommender RMSE Collaborative Based 0.595 Content Based 0.612 Collaborative + Content 0.594  Weighted Recommender RMSE Collaborative Based 0.747 Content Based 0.612 Collaborative + Content 0.663
  • 7. Slope One  Principle: Preferences for new items is based on average difference in the preference value between a new item and the other items the user prefers.  For two items I1 and I2, rating of user1 for I2 who has rated I1,  Count Weighting- Weight heavily those differences that are based on more data.  Standard Deviation- A low std dev means will translate to a higher weight.
  • 8. User Content Based User: Gender, Occupation, Age Principle - Two users having similar gender, occupation or age group share similar taste. Similarity - Taking advantage of user-specific knowledge. Custom Similarity metric for user similarity. Assigning different weightage to gender, occupation and age similarities to deduce this custom similarity. This custom similarity metric can be paired with a standard GenericUserBasedRecommender. Discard all rating related information from metric computation.
  • 9. Item Content based Item: Multiple genre Principle - Two movies of similar multiple genres will be similar. Similarity - Taking advantage of item-specific knowledge. Custom Similarity metric for movie similarity. Similarity is deduced based on the degree of similarity of genres. This custom movie similarity metric can be paired with a standard GenericItemBasedRecommender.
  • 10. Ensemble  Ensemble  Uses phenomenon of 'Wisdom of crowds'  Commitee Unweighted average of predicted ratings of all recommenders  Weighted  Higher weights for better recommenders  If Ei is the error of recommender, let Ai and Wi denote its accuracy and weight respectively.
  • 11. Scalability-1  Case Study: Item Based Recommender using Coocurrence as similarity. 4(2.0) + 3(0.0) + 4(0.0) + 3(4.0) + 1(4.5) + 2(0.0) + 0(5.0) = 24.5 Distributed computation helps by breaking up a problem that’s too big for one server into pieces that several smaller servers can handle
  • 12. Scalability-2  Sums the products of co-occurrences and preference values.  How is it suitable for distributed? Computing the resulting recommendation vector only requires loading one row or column of the matrix at a time User's Ratings Cooccurence Matrix Item Based Rec Top N Recommendations Apache Mahout: Provides scalable Machine learning libraries Package: org.apache.mahout.cf.taste.hadoop.item.RecommenderJob (5 MapReduce jobs) Recommendations for User 122: [ 9 : 5.0, 546 : 5.0, 568 : 5.0, 527 : 5.0, 515 : 5.0, 514 : 5.0, 511 : 5.0, 498 : 5.0]
  • 13. Conclusion  Slope one recommender worked best but it is also computationally very expensive.  Content based approach gave better results than plain collaborative approach. However, the former is domain-specific.  A ensemble of simple learners gave comparable result.  More learners in a ensemble results in better predictions.