SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
1
2
Collaborative filtering algorithms recommend items (this is 
the filtering part) based on preference information from many 
users (this is the collaborative part). The collaborative filtering 
approach is based on similarity; the basic idea is people who 
liked similar items in the past will like similar items in the future. 
In the example shown, Ted likes movies A, B, and C. Carol likes 
movies B and C. Bob likes movie B. To recommend a movie to 
Bob, we calculate that users who liked B also liked C, so C is a 
possible recommendation for Bob. Of course, this is a tiny 
example. In real situations, we would have much more data to 
work with.
3
The goal of a collaborative filtering algorithm is to take  
preferences data from users and to create a model which can be 
used for recommendations or predictions. 
Ted likes movies A, B, and C. Carol likes movies B and C.  So we 
take this data , run it through an algorithm to build a model.
Then when we have new Data such as  Bob likes movie B, we 
use the model to predict  that C is a possible recommendation 
for Bob. 
4
ALS approximates the sparse user item rating matrix of 
dimension K as the product of two dense matrices, User and 
Item factor matrices of size U×K and I×K (see picture below). The 
factor matrices are also called latent feature models. The factor 
matrices represent hidden features which the algorithm tries to 
discover. One matrix tries to describe the latent or hidden 
features of each user, and one tries to describe latent 
properties of each movie.
ALS is an iterative algorithm. In each iteration, the algorithm 
alternatively fixes one factor matrix and solves for the other, and 
this process continues until it converges. This alternation 
between which matrix to optimize is where the "alternating" in 
the name comes from.
5
A typical machine learning workflow is shown , we will perform 
the following steps:
Load the sample data.
Parse the data into the input format for the ALS algorithm.
Split the data into two parts, one for building the model and one 
for testing the model.
Run the ALS algorithm to build/train a user product matrix 
model.
Make predictions with the training data and observe the results.
Test the model with the test data.
6
Spark is especially useful for parallel processing of distributed 
data with iterative algorithms. Spark tries to keep things in 
memory, whereas MapReduce involves more reading and 
writing from disk. As shown in the image below, for each 
MapReduce Job, data is read from an HDFS file for a mapper, 
written to and from a SequenceFile in between, and then 
written to an output file from a reducer. When a chain of 
multiple jobs is needed, Spark can execute much faster by 
keeping data in memory.
7
Spark’s primary abstraction is a distributed collection of items 
called a Resilient Distributed Dataset (RDD). RDDs can be 
created from Hadoop InputFormats (such as HDFS files) or by 
transforming other RDDs.
8
An RDD is simply a distributed collection of elements. You can think 
of the distributed collections like of like an array or list in your single 
machine program, except that it’s spread out across multiple nodes 
in the cluster.
In Spark all work is expressed as either creating new RDDs, 
transforming existing RDDs, or calling operations on RDDs to 
compute a result. Under the hood, Spark automatically distributes 
the data contained in RDDs across your cluster and parallelizes the 
operations you perform on them.
So, Spark gives you APIs and functions that lets you do something on 
the whole collection in parallel  using all the nodes.
9
10
We use 
the org.apache.spark.mllib.recommendation.Rating class for 
parsing the ratings.dat file. Later we will use the Rating class as 
input for the ALS run method.
Then we use the map transformation on ratingText, which will 
apply the parseRating function to each element in ratingText 
and return a new RDD of Rating objects. We cache the ratings 
data, since we will use this data to build the matrix model.
11
Next we  we Split the data into two parts, one for building the 
model and one for testing the model.
Then we Run the ALS algorithm to build/train a user product 
matrix model.
12
Next we  we Split the data into two parts, one for building the 
model and one for testing the model.
Then we Run the ALS algorithm to build/train a user product 
matrix model.
13
Next we get predicted movie ratings for the test  data:  by calling 
model.predict  with test User id  , Movie Id  input data
14
Next we will compare test User id  , Movie Id  Ratings   to the   
test Userid, Movie Id predicted Rating 
15
Here we create  User id  , Movie Id ,  Ratings   key value pairs for 
joining in order to compare the test ratings to the predicted 
ratings
16
Next we will compare test User id  , Movie Id  Ratings   to the   
test Userid, Movie Id predicted Rating 
17
Here we compare test ratings and predicted ratings by filtering 
on ratings where the test rating<=1 and the predicted rating is 
>=4
18
we register the DataFrame as a table. Registering it as a table 
allows us to use it in subsequent SQL statements.
Now we can inspect the data.
19
20
https://www.mapr.com/blog/parallel‐and‐iterative‐processing‐
machine‐learning‐recommendations‐spark
21
22
23
24

Mais conteúdo relacionado

Destaque

el trabajo es armonia y equilibrio en la vida
el trabajo es armonia y equilibrio en la vidael trabajo es armonia y equilibrio en la vida
el trabajo es armonia y equilibrio en la vida
Alison Ordoñez
 
15596322 final-project-on-value-added-tax
15596322 final-project-on-value-added-tax15596322 final-project-on-value-added-tax
15596322 final-project-on-value-added-tax
Bhavya Savla
 

Destaque (14)

Rétro 2016
Rétro 2016Rétro 2016
Rétro 2016
 
How does Taleo help Businesses
How does Taleo help BusinessesHow does Taleo help Businesses
How does Taleo help Businesses
 
Bir meynə vardi
Bir meynə vardiBir meynə vardi
Bir meynə vardi
 
el trabajo es armonia y equilibrio en la vida
el trabajo es armonia y equilibrio en la vidael trabajo es armonia y equilibrio en la vida
el trabajo es armonia y equilibrio en la vida
 
15596322 final-project-on-value-added-tax
15596322 final-project-on-value-added-tax15596322 final-project-on-value-added-tax
15596322 final-project-on-value-added-tax
 
The Art Of Stealing & Absorptive Capacity - Aditya Yadav
The Art Of Stealing & Absorptive Capacity - Aditya YadavThe Art Of Stealing & Absorptive Capacity - Aditya Yadav
The Art Of Stealing & Absorptive Capacity - Aditya Yadav
 
Metro rail in Dhaka city
Metro rail in Dhaka cityMetro rail in Dhaka city
Metro rail in Dhaka city
 
Argument, Principle, and Value Judgement
Argument, Principle, and Value JudgementArgument, Principle, and Value Judgement
Argument, Principle, and Value Judgement
 
Dhaka metro rail project
Dhaka metro rail projectDhaka metro rail project
Dhaka metro rail project
 
Raffles Institute_creativity&concept development_01
Raffles Institute_creativity&concept development_01Raffles Institute_creativity&concept development_01
Raffles Institute_creativity&concept development_01
 
Westminster Abbey
Westminster AbbeyWestminster Abbey
Westminster Abbey
 
Emaar Dubai Creekside 18 - Apartments - Creek Harbour +971 4553 8725
Emaar Dubai Creekside 18 - Apartments - Creek Harbour +971 4553 8725Emaar Dubai Creekside 18 - Apartments - Creek Harbour +971 4553 8725
Emaar Dubai Creekside 18 - Apartments - Creek Harbour +971 4553 8725
 
Emaar Creek Residences at Dubai Creek Harbour +971 4553 8725
Emaar Creek Residences at Dubai Creek Harbour +971 4553 8725Emaar Creek Residences at Dubai Creek Harbour +971 4553 8725
Emaar Creek Residences at Dubai Creek Harbour +971 4553 8725
 
Digital transformation in other countries' governments
Digital transformation in other countries' governmentsDigital transformation in other countries' governments
Digital transformation in other countries' governments
 

Semelhante a Machine Learning Recommendations with Spark

Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
Dave King
 
movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReport
Sohini Sarkar
 
Ccr a content collaborative reciprocal recommender for online dating
Ccr a content collaborative reciprocal recommender for online datingCcr a content collaborative reciprocal recommender for online dating
Ccr a content collaborative reciprocal recommender for online dating
Sean Chiu
 
2.social recommedation
2.social recommedation2.social recommedation
2.social recommedation
jilung hsieh
 
Online social network based object recommendation system
Online social network based object recommendation systemOnline social network based object recommendation system
Online social network based object recommendation system
Sriram Patil
 
CS583-recommender-systems.ppt
CS583-recommender-systems.pptCS583-recommender-systems.ppt
CS583-recommender-systems.ppt
ArfatAhmadKhan1
 

Semelhante a Machine Learning Recommendations with Spark (20)

Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
 
movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReport
 
OMRES-ProgressPresentation1.pptx
OMRES-ProgressPresentation1.pptxOMRES-ProgressPresentation1.pptx
OMRES-ProgressPresentation1.pptx
 
Movie Recommendation System Using Hybrid Approch.pptx
Movie Recommendation System Using Hybrid Approch.pptxMovie Recommendation System Using Hybrid Approch.pptx
Movie Recommendation System Using Hybrid Approch.pptx
 
(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report
(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report
(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report
 
Introduction to recommendation system
Introduction to recommendation systemIntroduction to recommendation system
Introduction to recommendation system
 
Advances In Collaborative Filtering
Advances In Collaborative FilteringAdvances In Collaborative Filtering
Advances In Collaborative Filtering
 
Ccr a content collaborative reciprocal recommender for online dating
Ccr a content collaborative reciprocal recommender for online datingCcr a content collaborative reciprocal recommender for online dating
Ccr a content collaborative reciprocal recommender for online dating
 
C018211723
C018211723C018211723
C018211723
 
2.social recommedation
2.social recommedation2.social recommedation
2.social recommedation
 
Online social network based object recommendation system
Online social network based object recommendation systemOnline social network based object recommendation system
Online social network based object recommendation system
 
Bn35364376
Bn35364376Bn35364376
Bn35364376
 
Project presentation
Project presentationProject presentation
Project presentation
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptx
 
Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system
 
Predicting movie success from search
Predicting movie success from searchPredicting movie success from search
Predicting movie success from search
 
An Experiment In Cross-Representation Mediation Of User Models
An Experiment In Cross-Representation Mediation Of User ModelsAn Experiment In Cross-Representation Mediation Of User Models
An Experiment In Cross-Representation Mediation Of User Models
 
movierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptxmovierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptx
 
CS583-recommender-systems.ppt
CS583-recommender-systems.pptCS583-recommender-systems.ppt
CS583-recommender-systems.ppt
 
Recommendation Systems Roadtrip
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems Roadtrip
 

Mais de Carol McDonald

Mais de Carol McDonald (20)

Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churn
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Machine Learning Recommendations with Spark