SlideShare uma empresa Scribd logo
1 de 16
MovieLens Movie Recommendation System
BIA 678
GAURAV SAWANT
DHAVAL SAWLANI
GUIDED BY:
PROF. DAVID BELANGER
Overview
Goal:
● To study different Similarity techniques and collaborative filtering
algorithms and see how they scale.
Data Set:
● We are using the Movie Lens data sets found on the Group Lens website
(grouplens https://grouplens.org/)
● Ratings Data: UserId, MovieId, Ratings(1-5), TimeStamp
● Movie Data: Movie_ID, Title, Genres
Movie ratings distribution
Movie reviews distribution
Cosine Similarities
● Cosine similarity is the measure of the angle between the vectors of an non-zero space in
mathematical terms.
● In movie recommendations, it is the measure of the similarity between the movies taking
the ratings of the users into account.
● Used RDDs and key, value pairs to obtain the cosine similarity of the movies.
● Cosine similarity between any two vectors is calculated as follows
● Range of 0-1
● Closer to 1 means stronger similarity index and vice-versa
Computing Cosine Similarities
Steps followed:
1. Created a cartesian join of the data set. (Took maximum time on the cluster and on local)
2. Filtered duplicates
3. Group by movieID to obtain all the reviews of that movie in a single (key,value) pair
4. Computed Cosine similarity.
5. Filtered Movies with more than 50 reviews and greater than 0.97 score
The results on the cluster where as shown below:
Cosine Similarities Results
Pearson correlation coefficient (r)
● Pearson correlation coefficient also called as bivariate correlation is referred to as
Pearson’s r.
● It is the measure of linear correlation between two vectors A & B and has a range of -1 to
1, where 1 means total +ve correlation, 0 means no correlation and -1 means total -ve
correlation.
● Pearson correlation coefficient usually normalizes your similarity index and thus punishes
the instances where cosine similarity would fail.
● Limitations
1. Does not explain the relation between dependent and independent variables. This does
not matter in our case, because if movie A has an r of 0.8 with movie B and vice-versa
exists, we can anyways say movie A and B are similar.
Computing Pearson correlation
Steps followed:
1. Created a cartesian join of the data set. (Took maximum time on the cluster and on local)
2. Filtered duplicates
3. Group by movieID to obtain all the reviews of that movie in a single (key,value) pair
4. Computed Pearson correlation.
5. Filtered Movies with more than 50 reviews and greater than 0.5 score
The results on the cluster where as shown below:
Pearson Correlation Results
Collaborative Filtering
• Collaborative Filtering is based on the idea that people get best
recommendations from people with similar tastes
• Memory Based CF: Uses user rating data to compute similarity between
users/items to make recommendations
• Model Based CF: Models are developed using different machine learning
algorithms to predict users’ rating of unrated items
Alternating Least Squares(ALS):
Matrix Factorization:
ALS is a popular matrix factorization recommendation system. Matrix Factorization can be shortly explained
as follows-
• Each user is described by k features or attributes. A feature might be a number how much each user like
movies of a particular genre
• Each movie can be described as a set of k features. Feature 1 for a movie might be a number how close
a movie is to that genre
• Finally, multiply the feature of user with the feature of item and add to get and an approximate rating
• User u takes form of k-dimensional vector x_u and the item/movie i takes form of k-dimensional vector
y_i. The k attributes are called latent factors.
• We produce a loss function by minimizing the square of the difference between predictions and ratings
Alternating Least Squares(ALS)
● For ALS we hold one set of latent factors constant say item vectors
● We then solve for the non-constant vectors
● We alternate back and forth until before converging
Steps Followed:
● Construct RDDs as per the requirement
● Use spark’s mllib library with pyspark to run the ALS algorithm
● Compare the RMSE and time results to check scalability
Alternating Least Squares Results
Data Size vs RMSE (Cluster) Data Size vs Time (Cluster)
Alternating Least Squares Results (Scalability)
Conclusion & Future Scope
● Cosine Similarities and Pearson Coefficient Correlation overall produce similar results.
Pearson Coefficient is only slightly better
● In ALS, as expected the time taken increases with the size of the data and the RMSE
decreases with increase in the size of the data
● In all the methods studied, time taken on cluster is considerably less than the time
taken on the local machine
Future Scope:
● Studying more collaborative filtering techniques and their scalability.
● Decrease the RMSE
● Impment Hybrid CF techniques and run on larger data sets.

Mais conteúdo relacionado

Mais procurados

Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Shrutika Oswal
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Movie Recommender system
Movie Recommender systemMovie Recommender system
Movie Recommender systemPalakNath
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 
Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Mauryasuraj98
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems BasicsJarin Tasnim Khan
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation projectAbhishek Jaisingh
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 

Mais procurados (20)

Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Movie Recommender system
Movie Recommender systemMovie Recommender system
Movie Recommender system
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Developing Movie Recommendation System
Developing Movie Recommendation SystemDeveloping Movie Recommendation System
Developing Movie Recommendation System
 
Project presentation
Project presentationProject presentation
Project presentation
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation project
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 

Semelhante a Movie lens movie recommendation system

(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project reportGaurav Sawant
 
movie recommender system using vectorization and SVD tech
movie recommender system using vectorization and SVD techmovie recommender system using vectorization and SVD tech
movie recommender system using vectorization and SVD techUddeshBhagat
 
movierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptxmovierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptxAryanVyawahare
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
[UMAP2013] Recommendation with Differential Context Weighting
[UMAP2013] Recommendation with Differential Context Weighting[UMAP2013] Recommendation with Differential Context Weighting
[UMAP2013] Recommendation with Differential Context WeightingYONG ZHENG
 
Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Sarasi Sarangi
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutAmbarish Hazarnis
 
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-19-Recommender Systems: Neighborhood-based FilteringAaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-19-Recommender Systems: Neighborhood-based FilteringAminaRepo
 
16 recommender systems
16 recommender systems16 recommender systems
16 recommender systemsTanmayVijay1
 
The Wisdom of the Few @SIGIR09
The Wisdom of the Few @SIGIR09The Wisdom of the Few @SIGIR09
The Wisdom of the Few @SIGIR09Xavier Amatriain
 
MOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxMOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxAyushkumar417871
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Aaa ped-21-Recommender Systems: Content-based Filtering
Aaa ped-21-Recommender Systems: Content-based FilteringAaa ped-21-Recommender Systems: Content-based Filtering
Aaa ped-21-Recommender Systems: Content-based FilteringAminaRepo
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative FilteringYONG ZHENG
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetCrossing Minds
 
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNINGENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNINGIRJET Journal
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Ernesto Mislej
 
Download
DownloadDownload
Downloadbutest
 
Download
DownloadDownload
Downloadbutest
 

Semelhante a Movie lens movie recommendation system (20)

(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report
 
movie recommender system using vectorization and SVD tech
movie recommender system using vectorization and SVD techmovie recommender system using vectorization and SVD tech
movie recommender system using vectorization and SVD tech
 
movierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptxmovierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptx
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
[UMAP2013] Recommendation with Differential Context Weighting
[UMAP2013] Recommendation with Differential Context Weighting[UMAP2013] Recommendation with Differential Context Weighting
[UMAP2013] Recommendation with Differential Context Weighting
 
Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016
 
Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache Mahout
 
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-19-Recommender Systems: Neighborhood-based FilteringAaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
 
16 recommender systems
16 recommender systems16 recommender systems
16 recommender systems
 
The Wisdom of the Few @SIGIR09
The Wisdom of the Few @SIGIR09The Wisdom of the Few @SIGIR09
The Wisdom of the Few @SIGIR09
 
MOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxMOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptx
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Aaa ped-21-Recommender Systems: Content-based Filtering
Aaa ped-21-Recommender Systems: Content-based FilteringAaa ped-21-Recommender Systems: Content-based Filtering
Aaa ped-21-Recommender Systems: Content-based Filtering
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
 
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNINGENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 

Último

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 

Último (20)

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 

Movie lens movie recommendation system

  • 1. MovieLens Movie Recommendation System BIA 678 GAURAV SAWANT DHAVAL SAWLANI GUIDED BY: PROF. DAVID BELANGER
  • 2. Overview Goal: ● To study different Similarity techniques and collaborative filtering algorithms and see how they scale. Data Set: ● We are using the Movie Lens data sets found on the Group Lens website (grouplens https://grouplens.org/) ● Ratings Data: UserId, MovieId, Ratings(1-5), TimeStamp ● Movie Data: Movie_ID, Title, Genres
  • 5. Cosine Similarities ● Cosine similarity is the measure of the angle between the vectors of an non-zero space in mathematical terms. ● In movie recommendations, it is the measure of the similarity between the movies taking the ratings of the users into account. ● Used RDDs and key, value pairs to obtain the cosine similarity of the movies. ● Cosine similarity between any two vectors is calculated as follows ● Range of 0-1 ● Closer to 1 means stronger similarity index and vice-versa
  • 6. Computing Cosine Similarities Steps followed: 1. Created a cartesian join of the data set. (Took maximum time on the cluster and on local) 2. Filtered duplicates 3. Group by movieID to obtain all the reviews of that movie in a single (key,value) pair 4. Computed Cosine similarity. 5. Filtered Movies with more than 50 reviews and greater than 0.97 score The results on the cluster where as shown below:
  • 8. Pearson correlation coefficient (r) ● Pearson correlation coefficient also called as bivariate correlation is referred to as Pearson’s r. ● It is the measure of linear correlation between two vectors A & B and has a range of -1 to 1, where 1 means total +ve correlation, 0 means no correlation and -1 means total -ve correlation. ● Pearson correlation coefficient usually normalizes your similarity index and thus punishes the instances where cosine similarity would fail. ● Limitations 1. Does not explain the relation between dependent and independent variables. This does not matter in our case, because if movie A has an r of 0.8 with movie B and vice-versa exists, we can anyways say movie A and B are similar.
  • 9. Computing Pearson correlation Steps followed: 1. Created a cartesian join of the data set. (Took maximum time on the cluster and on local) 2. Filtered duplicates 3. Group by movieID to obtain all the reviews of that movie in a single (key,value) pair 4. Computed Pearson correlation. 5. Filtered Movies with more than 50 reviews and greater than 0.5 score The results on the cluster where as shown below:
  • 11. Collaborative Filtering • Collaborative Filtering is based on the idea that people get best recommendations from people with similar tastes • Memory Based CF: Uses user rating data to compute similarity between users/items to make recommendations • Model Based CF: Models are developed using different machine learning algorithms to predict users’ rating of unrated items
  • 12. Alternating Least Squares(ALS): Matrix Factorization: ALS is a popular matrix factorization recommendation system. Matrix Factorization can be shortly explained as follows- • Each user is described by k features or attributes. A feature might be a number how much each user like movies of a particular genre • Each movie can be described as a set of k features. Feature 1 for a movie might be a number how close a movie is to that genre • Finally, multiply the feature of user with the feature of item and add to get and an approximate rating • User u takes form of k-dimensional vector x_u and the item/movie i takes form of k-dimensional vector y_i. The k attributes are called latent factors. • We produce a loss function by minimizing the square of the difference between predictions and ratings
  • 13. Alternating Least Squares(ALS) ● For ALS we hold one set of latent factors constant say item vectors ● We then solve for the non-constant vectors ● We alternate back and forth until before converging Steps Followed: ● Construct RDDs as per the requirement ● Use spark’s mllib library with pyspark to run the ALS algorithm ● Compare the RMSE and time results to check scalability
  • 14. Alternating Least Squares Results Data Size vs RMSE (Cluster) Data Size vs Time (Cluster)
  • 15. Alternating Least Squares Results (Scalability)
  • 16. Conclusion & Future Scope ● Cosine Similarities and Pearson Coefficient Correlation overall produce similar results. Pearson Coefficient is only slightly better ● In ALS, as expected the time taken increases with the size of the data and the RMSE decreases with increase in the size of the data ● In all the methods studied, time taken on cluster is considerably less than the time taken on the local machine Future Scope: ● Studying more collaborative filtering techniques and their scalability. ● Decrease the RMSE ● Impment Hybrid CF techniques and run on larger data sets.