SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
November 14, 2015
Building
a
Music Recommender
from
Scratch
Vidhya Murali
@vid052
Vidhya Murali
Who Am I?
2
•Areas of Interest: Data & Machine Learning
•Data Science Engineer @Spotify
•Masters Student from the University of Wisconsin Madison
aka Happy Badger for life!
“Torture the data, and it will
confess!”
3
– Ronald Coase, Nobel Prize Laureate
Music Recommendations at Spotify
Features:
Discover
Discover Weekly
Moments
Radio
Related Artists
4
5
30 million tracks…
What to recommend?
6
•Manual Curation by Experts
•Editorial Tagging
•Metadata (e.g. Label provided data, NLP over News,
Blogs)
•Audio Signals
•Collaborative Filtering Model
Approaches
6
•Manual Curation by Experts
•Editorial Tagging
•Metadata (e.g. Label provided data, NLP over News,
Blogs)
•Audio Signals
•Collaborative Filtering Model
Approaches
Definition of CF
7
Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!
Then you should check out
track P!
Nice! Btw try track T!
Legacy Slide of Erik Bernhardsson
Collaborative Filtering Model 8
•Find patterns from user’s past behavior to generate
recommendations
•Domain independent
•Scalable
•Accuracy (Collaborative Model) >= Accuracy (Content
Based Model)
Construct Big Matrix!
9
Artists(n)
Users(m)
Vidhya
Ellie Goulding
Construct Big Matrix!
9
Artists(n)
Users(m)
Vidhya
Ellie Goulding
Order of Millions!
Latent Factor Models 10
Vidhya
Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(artists): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n
Latent Factor Models 10
Vidhya
Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(artists): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n
User Artist Matrix:
(m x n)
Latent Factor Models 10
Vidhya
Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(artists): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n
User Vector Matrix:
X: (m x f)
User Artist Matrix:
(m x n)
Latent Factor Models 10
Vidhya
Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(artists): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n
User Vector Matrix:
X: (m x f)
Artist Vector Matrix:
Y: (n x f)
User Artist Matrix:
(m x n)
Latent Factor Models 10
Vidhya
Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(artists): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
(here, f = 2)
m m
n
m n
User Vector Matrix:
X: (m x f)
Artist Vector Matrix:
Y: (n x f)
User Artist Matrix:
(m x n)
Why Vectors? 11
•Vectors encode higher order dependencies
•Users and Items in the same vector space!
•Use vector similarity to compute:
•Item-Item similarities
•User-Item recommendations
•Linear complexity: order of number of latent factors
•Easy to scale up
Explicit Matrix Factorization 12
•User explicitly rates a subset of the music catalog
•Goal: Predict how users will rate new music
•How: Approximate ratings matrix by the inner product of 2 smaller matrices
by minimizing the RMSE (root mean squared error)
X YUsers
Artists
• = bias for user
• = bias for item
• = regularization parameter
• = user rating for item
• = user latent factor vector
• = item latent factor vector
Matrix Factorization using Implicit Feedback 13
Matrix Factorization using Implicit Feedback
User Artist Play
Count Matrix
13
Matrix Factorization using Implicit Feedback
User Artist Play
Count Matrix
User Artist
Preference
Matrix
Binary Label:
1 => played
0 => not played
13
Matrix Factorization using Implicit Feedback
User Artist Play
Count Matrix
User Artist
Preference
Matrix
Binary Label:
1 => played
0 => not played
Weights
Matrix
Weights based on play count
and smoothing
13
Equation(s) Alert!
14
Implicit Matrix Factorization 15
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
•Aggregate all (user, artist) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
X YUsers
Artists
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed artist else 0
•
• = user latent factor vector
• = item latent factor vector
Alternating Least Squares 16
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
Artists
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed artist else 0
•
• = user latent factor vector
• = item latent factor vector
Fix artists
•Aggregate all (user, artist) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
17
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed artist else 0
•
• = user latent factor vector
• = item latent factor vector
Fix artists
Solve for users
•Aggregate all (user, artist) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
Alternating Least Squares
Artists
18
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed artist else 0
•
• = user latent factor vector
• = item latent factor vector
Fix users
•Aggregate all (user, artist) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
Alternating Least Squares
Artists
19
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed artist else 0
•
• = user latent factor vector
• = item latent factor vector
Fix users
Solve for artists
•Aggregate all (user, artist) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
Alternating Least Squares
Artists
20
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed artist else 0
•
• = user latent factor vector
• = item latent factor vector
Fix users
Solve for artists
Repeat until convergence…
•Aggregate all (user, artist) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
Alternating Least Squares
Artists
21
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed track else 0
•
• = user latent factor vector
• = item latent factor vector
Fix users
Solve for artists
Repeat until convergence…
•Aggregate all (user, artist) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
Alternating Least Squares
Artists
Vectors
•“Compact” representation for users and items(artists) in the same space
23
Recommendations via Cosine Similarity
23
Recommendations via Cosine Similarity
24
Annoy
•70 million users, at least 4 million tracks for candidates per user
•Brute Force Approach:
•O(70M x 4M x 10) ~= 0(3 peta-operations)!
• Approximate Nearest Neighbor Oh Yeah!
• Uses Local Sensitive Hashing
• Clone: https://github.com/spotify/annoy
25
Thank You!
You can reach me @
Email: vidhya@spotify.com
Twitter: @vid052

Mais conteúdo relacionado

Mais procurados

Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
Adam Kawa
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
Roelof van Zwol
 

Mais procurados (20)

From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at Scale
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
 
The Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and PainThe Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and Pain
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 

Semelhante a DataEngConf: Building a Music Recommender System from Scratch with Spotify Data Team

Recommendations with hadoop streaming and python
Recommendations with hadoop streaming and pythonRecommendations with hadoop streaming and python
Recommendations with hadoop streaming and python
Andrew Look
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptx
GopiNathVelivela
 
Sums.of.equivalent.sequences.of.positive.operators.Taft.Beamer.Presentation
Sums.of.equivalent.sequences.of.positive.operators.Taft.Beamer.PresentationSums.of.equivalent.sequences.of.positive.operators.Taft.Beamer.Presentation
Sums.of.equivalent.sequences.of.positive.operators.Taft.Beamer.Presentation
Dragan (dragancn)
 

Semelhante a DataEngConf: Building a Music Recommender System from Scratch with Spotify Data Team (20)

Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
React Native Performance
React Native Performance React Native Performance
React Native Performance
 
Recommendations with hadoop streaming and python
Recommendations with hadoop streaming and pythonRecommendations with hadoop streaming and python
Recommendations with hadoop streaming and python
 
LinearAlgebra_2016updatedFromwiki.ppt
LinearAlgebra_2016updatedFromwiki.pptLinearAlgebra_2016updatedFromwiki.ppt
LinearAlgebra_2016updatedFromwiki.ppt
 
LinearAlgebra_2016updatedFromwiki.ppt
LinearAlgebra_2016updatedFromwiki.pptLinearAlgebra_2016updatedFromwiki.ppt
LinearAlgebra_2016updatedFromwiki.ppt
 
Scalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHScalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSH
 
Data Mining Lecture_9.pptx
Data Mining Lecture_9.pptxData Mining Lecture_9.pptx
Data Mining Lecture_9.pptx
 
SVD.ppt
SVD.pptSVD.ppt
SVD.ppt
 
Aggregation operator for image reduction
Aggregation operator for image reductionAggregation operator for image reduction
Aggregation operator for image reduction
 
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural Networks
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
LP.pptx
LP.pptxLP.pptx
LP.pptx
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptx
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlights
 
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Practical Deep Learning Using Tensor Flow - Sandeep KathPractical Deep Learning Using Tensor Flow - Sandeep Kath
Practical Deep Learning Using Tensor Flow - Sandeep Kath
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdf
 
Smart Room Gesture Control
Smart Room Gesture ControlSmart Room Gesture Control
Smart Room Gesture Control
 
Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)
 
Sums.of.equivalent.sequences.of.positive.operators.Taft.Beamer.Presentation
Sums.of.equivalent.sequences.of.positive.operators.Taft.Beamer.PresentationSums.of.equivalent.sequences.of.positive.operators.Taft.Beamer.Presentation
Sums.of.equivalent.sequences.of.positive.operators.Taft.Beamer.Presentation
 

Mais de Hakka Labs

Mais de Hakka Labs (20)

Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
DataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data ScienceDataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data Science
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at Instacart
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scale
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineering
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data Structures
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
 
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with Ourselves
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 

Último

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Último (20)

WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 

DataEngConf: Building a Music Recommender System from Scratch with Spotify Data Team

  • 1. November 14, 2015 Building a Music Recommender from Scratch Vidhya Murali @vid052
  • 2. Vidhya Murali Who Am I? 2 •Areas of Interest: Data & Machine Learning •Data Science Engineer @Spotify •Masters Student from the University of Wisconsin Madison aka Happy Badger for life!
  • 3. “Torture the data, and it will confess!” 3 – Ronald Coase, Nobel Prize Laureate
  • 4. Music Recommendations at Spotify Features: Discover Discover Weekly Moments Radio Related Artists 4
  • 6. 6 •Manual Curation by Experts •Editorial Tagging •Metadata (e.g. Label provided data, NLP over News, Blogs) •Audio Signals •Collaborative Filtering Model Approaches
  • 7. 6 •Manual Curation by Experts •Editorial Tagging •Metadata (e.g. Label provided data, NLP over News, Blogs) •Audio Signals •Collaborative Filtering Model Approaches
  • 8. Definition of CF 7 Hey, I like tracks P, Q, R, S! Well, I like tracks Q, R, S, T! Then you should check out track P! Nice! Btw try track T! Legacy Slide of Erik Bernhardsson
  • 9. Collaborative Filtering Model 8 •Find patterns from user’s past behavior to generate recommendations •Domain independent •Scalable •Accuracy (Collaborative Model) >= Accuracy (Content Based Model)
  • 12. Latent Factor Models 10 Vidhya Ellie .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . •Use a “small” representation for each user and items(artists): f-dimensional vectors .. . .. . .. . .. . . . ... ... ... ... .. m m n m n
  • 13. Latent Factor Models 10 Vidhya Ellie .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . •Use a “small” representation for each user and items(artists): f-dimensional vectors .. . .. . .. . .. . . . ... ... ... ... .. m m n m n User Artist Matrix: (m x n)
  • 14. Latent Factor Models 10 Vidhya Ellie .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . •Use a “small” representation for each user and items(artists): f-dimensional vectors .. . .. . .. . .. . . . ... ... ... ... .. m m n m n User Vector Matrix: X: (m x f) User Artist Matrix: (m x n)
  • 15. Latent Factor Models 10 Vidhya Ellie .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . •Use a “small” representation for each user and items(artists): f-dimensional vectors .. . .. . .. . .. . . . ... ... ... ... .. m m n m n User Vector Matrix: X: (m x f) Artist Vector Matrix: Y: (n x f) User Artist Matrix: (m x n)
  • 16. Latent Factor Models 10 Vidhya Ellie .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . •Use a “small” representation for each user and items(artists): f-dimensional vectors .. . .. . .. . .. . . . ... ... ... ... .. (here, f = 2) m m n m n User Vector Matrix: X: (m x f) Artist Vector Matrix: Y: (n x f) User Artist Matrix: (m x n)
  • 17. Why Vectors? 11 •Vectors encode higher order dependencies •Users and Items in the same vector space! •Use vector similarity to compute: •Item-Item similarities •User-Item recommendations •Linear complexity: order of number of latent factors •Easy to scale up
  • 18. Explicit Matrix Factorization 12 •User explicitly rates a subset of the music catalog •Goal: Predict how users will rate new music •How: Approximate ratings matrix by the inner product of 2 smaller matrices by minimizing the RMSE (root mean squared error) X YUsers Artists • = bias for user • = bias for item • = regularization parameter • = user rating for item • = user latent factor vector • = item latent factor vector
  • 19. Matrix Factorization using Implicit Feedback 13
  • 20. Matrix Factorization using Implicit Feedback User Artist Play Count Matrix 13
  • 21. Matrix Factorization using Implicit Feedback User Artist Play Count Matrix User Artist Preference Matrix Binary Label: 1 => played 0 => not played 13
  • 22. Matrix Factorization using Implicit Feedback User Artist Play Count Matrix User Artist Preference Matrix Binary Label: 1 => played 0 => not played Weights Matrix Weights based on play count and smoothing 13
  • 24. Implicit Matrix Factorization 15 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 •Aggregate all (user, artist) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the artist latent factor vectors in Y. X YUsers Artists • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed artist else 0 • • = user latent factor vector • = item latent factor vector
  • 25. Alternating Least Squares 16 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers Artists • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed artist else 0 • • = user latent factor vector • = item latent factor vector Fix artists •Aggregate all (user, artist) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the artist latent factor vectors in Y.
  • 26. 17 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed artist else 0 • • = user latent factor vector • = item latent factor vector Fix artists Solve for users •Aggregate all (user, artist) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the artist latent factor vectors in Y. Alternating Least Squares Artists
  • 27. 18 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed artist else 0 • • = user latent factor vector • = item latent factor vector Fix users •Aggregate all (user, artist) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the artist latent factor vectors in Y. Alternating Least Squares Artists
  • 28. 19 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed artist else 0 • • = user latent factor vector • = item latent factor vector Fix users Solve for artists •Aggregate all (user, artist) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the artist latent factor vectors in Y. Alternating Least Squares Artists
  • 29. 20 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed artist else 0 • • = user latent factor vector • = item latent factor vector Fix users Solve for artists Repeat until convergence… •Aggregate all (user, artist) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the artist latent factor vectors in Y. Alternating Least Squares Artists
  • 30. 21 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed track else 0 • • = user latent factor vector • = item latent factor vector Fix users Solve for artists Repeat until convergence… •Aggregate all (user, artist) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the artist latent factor vectors in Y. Alternating Least Squares Artists
  • 31. Vectors •“Compact” representation for users and items(artists) in the same space
  • 34. 24 Annoy •70 million users, at least 4 million tracks for candidates per user •Brute Force Approach: •O(70M x 4M x 10) ~= 0(3 peta-operations)! • Approximate Nearest Neighbor Oh Yeah! • Uses Local Sensitive Hashing • Clone: https://github.com/spotify/annoy
  • 35. 25
  • 36. Thank You! You can reach me @ Email: vidhya@spotify.com Twitter: @vid052