SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
Algorithmic Music
Discovery at Spotify
Chris Johnson
@MrChrisJohnson
January 13, 2014

Monday, January 13, 14
Who am I??
•Chris Johnson

– Machine Learning guy from NYC
– Focused on music recommendations
– Formerly a graduate student at UT Austin

Monday, January 13, 14
What is Spotify?

•
•

On demand music streaming service
“iTunes in the cloud”

Monday, January 13, 14

3
Section name

Monday, January 13, 14

4
Data at Spotify....
• 20 Million songs
• 24 Million active users
• 6 Million paying users
• 8 Million daily active users
• 1 TB of compressed data generated from users per day
• 700 node Hadoop Cluster
• 1 Million years worth of music streamed
• 1 Billion user generated playlists

Monday, January 13, 14

5
Challenge: 20 Million songs... how do we
recommend music to users?

Monday, January 13, 14

6
Recommendation Features
• Discover (personalized recommendations)
• Radio
• Related Artists
• Now Playing

Monday, January 13, 14

7
8

How can we find good
recommendations?
• Manual Curation

• Manually Tag Attributes

• Audio Content,
Metadata, Text Analysis

• Collaborative Filtering

Monday, January 13, 14
Collaborative Filtering - “The Netflix Prize”

Monday, January 13, 14

9
Collaborative Filtering

10

Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!

Then you should check out
track P!

Nice! Btw try track T!

Image via Erik Bernhardsson
Monday, January 13, 14
Section name

Monday, January 13, 14

11
Difference between movie and music recs

•

Scale of catalog

60,000 movies

Monday, January 13, 14

20,000,000 songs

12
Difference between movie and music recs

•

Repeated consumption

Monday, January 13, 14

13
Difference between movie and music recs

•

Music is more niche

Monday, January 13, 14

14
“The Netflix Problem” Vs “The Spotify Problem

•Netflix:

Users explicitly “rate” movies

•Spotify:

Feedback is implicit through streaming behavior

Monday, January 13, 14

15
Section name

Monday, January 13, 14

16
Explicit Matrix Factorization

•Users explicitly rate a subset of the movie catalog
•Goal: predict how users will rate new movies
Movies

Users
Chris
Inception

Monday, January 13, 14

17
Explicit Matrix Factorization

18

•Approximate ratings matrix by the product of lowdimensional user and movie matrices
Minimize RMSE (root mean squared error)

•

?
1
2
?
5

•
•
•

3
?
?
?
2

5
?
3
?
?

?
1
2
5
4

= user
= user

rating for movie
latent factor vector

= item

latent factor vector

Monday, January 13, 14

X

Y
Inception
Chris

•
•
•

= bias for user
= bias for item
= regularization parameter
Implicit Matrix Factorization

19

•Replace Stream counts with binary labels
– 1 = streamed, 0 = never streamed

•Minimize weighted RMSE (root mean squared error) using a
function of stream counts as weights

10001001
00100100
10100011
01000100
00100100
10001001

•
•
•
•

= 1 if user
= user
=i tem

Monday, January 13, 14

streamed track
latent factor vector
latent factor vector

X

else 0

Y

•
•
•

= bias for user
= bias for item
= regularization parameter
Alternating Least Squares

• Initialize user and item vectors to random noise

• Fix item vectors and solve for optimal user vectors

– Take the derivative of loss function with respect to user’s vector, set
–

equal to 0, and solve
Results in a system of linear equations with closed form solution!

• Fix user vectors and solve for optimal item vectors
• Repeat until convergence
code: https://github.com/MrChrisJohnson/implicitMF
Monday, January 13, 14

20
Alternating Least Squares

• Note that:
• Then, we can pre-compute
–
–

once per iteration

and
only contain non-zero elements for tracks that
the user streamed
Using sparse matrix operations we can then compute each user’s
vector efficiently in
time where
is the number of
tracks the user streamed

code: https://github.com/MrChrisJohnson/implicitMF
Monday, January 13, 14

21
Alternating Least Squares

code: https://github.com/MrChrisJohnson/implicitMF
Monday, January 13, 14

22
How do we use the learned vectors?

•User-Item score is the dot product

•Item-Item similarity is the cosine similarity

•Both operations have trivial complexity based on the number of
latent factors

Monday, January 13, 14

23
Latent Factor Vectors in 2 dimensions

Monday, January 13, 14

24
Section name

Monday, January 13, 14

25
Scaling up Implicit Matrix Factorization
with Hadoop

Monday, January 13, 14

26
Hadoop at Spotify 2009

Monday, January 13, 14

27
Hadoop at Spotify 2014
700 Nodes in our London data center

Monday, January 13, 14

28
Implicit Matrix Factorization with Hadoop
Map step

29

Reduce step

item vectors
item%L=0

item vectors
item%L=1

user vectors
u%K=0

u%K=0
i%L=0

u%K=0
i%L=1

...

u%K=0
i % L = L-1

u%K=0

user vectors
u%K=1

u%K=1
i%L=0

u%K=1
i%L=1

...

...

u%K=1

...

...

...

...

u % K = K-1
i%L=0

...

...

u % K = K-1
i % L = L-1

user vectors
u % K = K-1

item vectors
i % L = L-1

u % K = K-1

all log entries
u%K=1
i%L=1

Figure via Erik Bernhardsson
Monday, January 13, 14
Implicit Matrix Factorization with Hadoop

30

One map task
Distributed
cache:
All user vectors
where u % K = x
Distributed
cache:
All item vectors
where i % L = y

Mapper

Emit contributions

Reducer

New vector!

Map input:
tuples (u, i, count)
where
u%K=x
and
i%L=y

Figure via Erik Bernhardsson
Monday, January 13, 14
Implicit Matrix Factorization with Spark

31

Spark

Vs
Hadoop

http://www.slideshare.net/Hadoop_Summit/spark-and-shark
Monday, January 13, 14
Section name

Monday, January 13, 14

32
Approximate Nearest Neighbors

code: https://github.com/Spotify/annoy
Monday, January 13, 14

33
Ensemble of Latent Factor Models

34

Figure via Erik Bernhardsson
Monday, January 13, 14
AB-Testing Recommendations

Monday, January 13, 14

35
Open Problems

•How to go from predictive model to related artists? (learning

to rank?)
How do you learn from user feedback?
How do you deal with observation bias in the user feedback?
(active learning?)
How to factor in temporal information?
How much value in content based recommendations?
How to best evaluate model performance?
How to best train an ensemble?

•
•
•
•
•
•

Monday, January 13, 14

36
Section name

37

Thank You!

Monday, January 13, 14
Section name

Monday, January 13, 14

38
Section name

Monday, January 13, 14

39
Section name

Monday, January 13, 14

40
Section name

Monday, January 13, 14

41
Section name

Monday, January 13, 14

42

Mais conteúdo relacionado

Mais procurados

Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ SpotifyOscar Carlsson
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...Hakka Labs
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyChing-Wei Chen
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Esh Vckay
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyChris Johnson
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsErik Bernhardsson
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experienceMounia Lalmas-Roelleke
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsChris Johnson
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at SpotifyNeville Li
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at SpotifyOguz Semerci
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with SparkChris Johnson
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Mounia Lalmas-Roelleke
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSophia Ciocca
 

Mais procurados (20)

Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Search @ Spotify
Search @ Spotify Search @ Spotify
Search @ Spotify
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at Spotify
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
 

Destaque

Becoming Rhizomatic?
Becoming Rhizomatic?Becoming Rhizomatic?
Becoming Rhizomatic?Mark Ingham
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresData Science London
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Till Rohrmann
 
Fast ALS-based matrix factorization for explicit and implicit feedback datasets
Fast ALS-based matrix factorization for explicit and implicit feedback datasetsFast ALS-based matrix factorization for explicit and implicit feedback datasets
Fast ALS-based matrix factorization for explicit and implicit feedback datasetsGravity - Rock Solid Recommendations
 
Microservices at Spotify
Microservices at SpotifyMicroservices at Spotify
Microservices at SpotifyKevin Goldsmith
 
Amazon.com: the Hidden Empire - Update 2013
Amazon.com: the Hidden Empire - Update 2013Amazon.com: the Hidden Empire - Update 2013
Amazon.com: the Hidden Empire - Update 2013Fabernovel
 

Destaque (6)

Becoming Rhizomatic?
Becoming Rhizomatic?Becoming Rhizomatic?
Becoming Rhizomatic?
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least Squares
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
 
Fast ALS-based matrix factorization for explicit and implicit feedback datasets
Fast ALS-based matrix factorization for explicit and implicit feedback datasetsFast ALS-based matrix factorization for explicit and implicit feedback datasets
Fast ALS-based matrix factorization for explicit and implicit feedback datasets
 
Microservices at Spotify
Microservices at SpotifyMicroservices at Spotify
Microservices at Spotify
 
Amazon.com: the Hidden Empire - Update 2013
Amazon.com: the Hidden Empire - Update 2013Amazon.com: the Hidden Empire - Update 2013
Amazon.com: the Hidden Empire - Update 2013
 

Semelhante a Algorithmic Music Recommendations at Spotify

(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...icwe2015
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming serviceJulie Knibbe
 
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...Timo van Niedek
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Recommendation Systems Roadtrip
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems RoadtripThe Real Dyl
 
Real-world News Recommender Systems
Real-world News Recommender SystemsReal-world News Recommender Systems
Real-world News Recommender Systemskib_83
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedXavier Amatriain
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Karthik Murugesan
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018Fabien Gouyon
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlightsSandra Garcia
 
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonApproximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonHakka Labs
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Alexandros Karatzoglou
 
Intro to R and Data Mining 2012 09 27
Intro to R and Data Mining 2012 09 27Intro to R and Data Mining 2012 09 27
Intro to R and Data Mining 2012 09 27Raj Kasarabada
 
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...LINE Corp.
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Arjen de Vries
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Sc Huang
 
Quettra Design Problem Solution - Deepti Chafekar
Quettra Design Problem Solution - Deepti ChafekarQuettra Design Problem Solution - Deepti Chafekar
Quettra Design Problem Solution - Deepti Chafekarquettra
 

Semelhante a Algorithmic Music Recommendations at Spotify (20)

(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming service
 
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Recommendation Systems Roadtrip
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems Roadtrip
 
Real-world News Recommender Systems
Real-world News Recommender SystemsReal-world News Recommender Systems
Real-world News Recommender Systems
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlights
 
Time frequency analysis_journey
Time frequency analysis_journeyTime frequency analysis_journey
Time frequency analysis_journey
 
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonApproximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Intro to R and Data Mining 2012 09 27
Intro to R and Data Mining 2012 09 27Intro to R and Data Mining 2012 09 27
Intro to R and Data Mining 2012 09 27
 
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
 
Quettra Design Problem Solution - Deepti Chafekar
Quettra Design Problem Solution - Deepti ChafekarQuettra Design Problem Solution - Deepti Chafekar
Quettra Design Problem Solution - Deepti Chafekar
 

Último

COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 

Último (20)

COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 

Algorithmic Music Recommendations at Spotify

  • 1. Algorithmic Music Discovery at Spotify Chris Johnson @MrChrisJohnson January 13, 2014 Monday, January 13, 14
  • 2. Who am I?? •Chris Johnson – Machine Learning guy from NYC – Focused on music recommendations – Formerly a graduate student at UT Austin Monday, January 13, 14
  • 3. What is Spotify? • • On demand music streaming service “iTunes in the cloud” Monday, January 13, 14 3
  • 5. Data at Spotify.... • 20 Million songs • 24 Million active users • 6 Million paying users • 8 Million daily active users • 1 TB of compressed data generated from users per day • 700 node Hadoop Cluster • 1 Million years worth of music streamed • 1 Billion user generated playlists Monday, January 13, 14 5
  • 6. Challenge: 20 Million songs... how do we recommend music to users? Monday, January 13, 14 6
  • 7. Recommendation Features • Discover (personalized recommendations) • Radio • Related Artists • Now Playing Monday, January 13, 14 7
  • 8. 8 How can we find good recommendations? • Manual Curation • Manually Tag Attributes • Audio Content, Metadata, Text Analysis • Collaborative Filtering Monday, January 13, 14
  • 9. Collaborative Filtering - “The Netflix Prize” Monday, January 13, 14 9
  • 10. Collaborative Filtering 10 Hey, I like tracks P, Q, R, S! Well, I like tracks Q, R, S, T! Then you should check out track P! Nice! Btw try track T! Image via Erik Bernhardsson Monday, January 13, 14
  • 12. Difference between movie and music recs • Scale of catalog 60,000 movies Monday, January 13, 14 20,000,000 songs 12
  • 13. Difference between movie and music recs • Repeated consumption Monday, January 13, 14 13
  • 14. Difference between movie and music recs • Music is more niche Monday, January 13, 14 14
  • 15. “The Netflix Problem” Vs “The Spotify Problem •Netflix: Users explicitly “rate” movies •Spotify: Feedback is implicit through streaming behavior Monday, January 13, 14 15
  • 17. Explicit Matrix Factorization •Users explicitly rate a subset of the movie catalog •Goal: predict how users will rate new movies Movies Users Chris Inception Monday, January 13, 14 17
  • 18. Explicit Matrix Factorization 18 •Approximate ratings matrix by the product of lowdimensional user and movie matrices Minimize RMSE (root mean squared error) • ? 1 2 ? 5 • • • 3 ? ? ? 2 5 ? 3 ? ? ? 1 2 5 4 = user = user rating for movie latent factor vector = item latent factor vector Monday, January 13, 14 X Y Inception Chris • • • = bias for user = bias for item = regularization parameter
  • 19. Implicit Matrix Factorization 19 •Replace Stream counts with binary labels – 1 = streamed, 0 = never streamed •Minimize weighted RMSE (root mean squared error) using a function of stream counts as weights 10001001 00100100 10100011 01000100 00100100 10001001 • • • • = 1 if user = user =i tem Monday, January 13, 14 streamed track latent factor vector latent factor vector X else 0 Y • • • = bias for user = bias for item = regularization parameter
  • 20. Alternating Least Squares • Initialize user and item vectors to random noise • Fix item vectors and solve for optimal user vectors – Take the derivative of loss function with respect to user’s vector, set – equal to 0, and solve Results in a system of linear equations with closed form solution! • Fix user vectors and solve for optimal item vectors • Repeat until convergence code: https://github.com/MrChrisJohnson/implicitMF Monday, January 13, 14 20
  • 21. Alternating Least Squares • Note that: • Then, we can pre-compute – – once per iteration and only contain non-zero elements for tracks that the user streamed Using sparse matrix operations we can then compute each user’s vector efficiently in time where is the number of tracks the user streamed code: https://github.com/MrChrisJohnson/implicitMF Monday, January 13, 14 21
  • 22. Alternating Least Squares code: https://github.com/MrChrisJohnson/implicitMF Monday, January 13, 14 22
  • 23. How do we use the learned vectors? •User-Item score is the dot product •Item-Item similarity is the cosine similarity •Both operations have trivial complexity based on the number of latent factors Monday, January 13, 14 23
  • 24. Latent Factor Vectors in 2 dimensions Monday, January 13, 14 24
  • 26. Scaling up Implicit Matrix Factorization with Hadoop Monday, January 13, 14 26
  • 27. Hadoop at Spotify 2009 Monday, January 13, 14 27
  • 28. Hadoop at Spotify 2014 700 Nodes in our London data center Monday, January 13, 14 28
  • 29. Implicit Matrix Factorization with Hadoop Map step 29 Reduce step item vectors item%L=0 item vectors item%L=1 user vectors u%K=0 u%K=0 i%L=0 u%K=0 i%L=1 ... u%K=0 i % L = L-1 u%K=0 user vectors u%K=1 u%K=1 i%L=0 u%K=1 i%L=1 ... ... u%K=1 ... ... ... ... u % K = K-1 i%L=0 ... ... u % K = K-1 i % L = L-1 user vectors u % K = K-1 item vectors i % L = L-1 u % K = K-1 all log entries u%K=1 i%L=1 Figure via Erik Bernhardsson Monday, January 13, 14
  • 30. Implicit Matrix Factorization with Hadoop 30 One map task Distributed cache: All user vectors where u % K = x Distributed cache: All item vectors where i % L = y Mapper Emit contributions Reducer New vector! Map input: tuples (u, i, count) where u%K=x and i%L=y Figure via Erik Bernhardsson Monday, January 13, 14
  • 31. Implicit Matrix Factorization with Spark 31 Spark Vs Hadoop http://www.slideshare.net/Hadoop_Summit/spark-and-shark Monday, January 13, 14
  • 33. Approximate Nearest Neighbors code: https://github.com/Spotify/annoy Monday, January 13, 14 33
  • 34. Ensemble of Latent Factor Models 34 Figure via Erik Bernhardsson Monday, January 13, 14
  • 36. Open Problems •How to go from predictive model to related artists? (learning to rank?) How do you learn from user feedback? How do you deal with observation bias in the user feedback? (active learning?) How to factor in temporal information? How much value in content based recommendations? How to best evaluate model performance? How to best train an ensemble? • • • • • • Monday, January 13, 14 36