40 million songs, albums and artists available - how nice? Streaming allows you to get a grasp at the biggest music collections in the world. The only thing is that you would need centuries to listen to all of it.
Getting access doesn’t mean knowing what to do with it. How are we making music discovery more & more efficient at Deezer?
Vector Databases 101 - An introduction to the world of Vector Databases
Deezer - Big data as a streaming service
1. Big Data as a Streaming Service
Big Data as a Streaming Service
Julie Knibbe
Product Manager – Deezer
@julieknibbe
Manuel Moussalam
R&D – Deezer
2. Big Data as a Streaming Service
Product Manager
Defines features that meet users needs
Based on:
• Market research
• Product Data Analytics
• Users feedback
• Competitive Analysis
• Creativity
3. Big Data as a Streaming Service
The Leanback Experience Team at Deezer
• Product Manager
• Project Manager
• R&D Developers
• Big Data developers
• Web developers (front/back)
• Mobile developers
• QA
4. Big Data as a Streaming Service
Deezer
Active users 30M
Countries 180+
Tracks in catalog 35M
Artists in catalog 1M
Music providers 1K+
5. Big Data as a Streaming Service
The recommendation problem
No one wants to hear music
they don’t like
6. Big Data as a Streaming Service
The recommendation problem
No one wants to hear the
same 200 tracks over and
over again
7. Big Data as a Streaming Service
The recommendation problem
You need to hear a song from
1 to 7 times to like it
8. Big Data as a Streaming Service
The recommendation problem
Parameters and variables:
• Mood
• Tastes
• Habits
• Openness
• Sociological profile
• …
Dimensions:
• 35M tracks
• 1M artists
• 30M users
9. Big Data as a Streaming Service
Building a user profile
Onboarding users
Monitoring user actions
10. Big Data as a Streaming Service
Deezer – User qualification
12. Big Data as a Streaming Service
User Profile – Implicit / Explicit feedback
Adaptation
Add new information
Forget old interests
13. Big Data as a Streaming Service
Music Recommendation
Given a listening profile for user X, what music should we
recommend?
14. Recommendation system – adapting to user types
Big Data as a Streaming Service
Savants
Enthusiasts
Casuals
Indifferents
Riskier
recommendations
Popular
recommendations
Finding the right mix between novelty, familiarity and relevance
15. Recommendation system – adapting to user types
Big Data as a Streaming Service
Sources:
http://alchemi.co.uk/archives/mus/groups_and_beha.html
http://musicmachinery.com/2014/01/14/the-zero-button-music-player-2/
16. Big Data as a Streaming Service
Use cases
Playlist / Channel generation
Discovery
Personal Search
…
17. Big Data as a Streaming Service
Deezer features – Flow
18. Big Data as a Streaming Service
Deezer features – Hear This
19. Big Data as a Streaming Service
At Deezer
Mixing collaborative filtering with semi-supervised
approaches
• Curation: Deezer Editors
• Multi-layered graph structure of tracks & artists
• Usage monitoring
Based on Hadoop + ElasticSearch + Spark
20. Big Data as a Streaming Service
Collaborative Filtering: Matching
Collaborative Filtering :
« User X listened to the Rolling Stones. Users listening
to the Rolling Stones usually also listen to the Who,
let's suggest the Who to user X. »
Popularized by the Netflix Prize
21. Big Data as a Streaming Service
Collaborative Filtering
Either compute similarity upon users or items.. or both
23. Big Data as a Streaming Service
Collaborative filtering: Exemplar based
Association rules
• Market basket analysis
• A priori Algorithm
• ..
But:
• Scalability issues
• Hubs and Island issues (Stromae example)
24. Big Data as a Streaming Service
Collaborative filtering: Model based
Matrix Factorization
A
n
m
= U
I
X
k
• U is low-dimensional model on users
• I on items
Recommended items are missing entries of A
25. Big Data as a Streaming Service
Collaborative Filtering: Limitations
• Cold Start problem
• Sparse user-item matrix (1% coverage)
• Only based on social behaviors
• Popularity bias (« The rich gets richer »)
27. Big Data as a Streaming Service
Content-based filtering: Limitations
• Cold Start problem
• Users with atypical tastes
• Lack of novelty
• Subjectivity not taken into account
28. Big Data as a Streaming Service
Content Similarity
Clustering tracks, artists, albums…
Methods:
• Matrix Factorization techniques
• Spectral clustering
• Musical features extraction
• Louvain algorithm
• …
29. Big Data as a Streaming Service
Example: Multiple Spectral Clustering
30. Big Data as a Streaming Service
Cleaning
• Mislabeled data: Different sources tell different things
about songs, artists, albums
• No universally adopted music ontology
• Subjectivity
• Outlier detection: confronting several sources and
models
31. Big Data as a Streaming Service
Cleaning: Example
32. Big Data as a Streaming Service
In real life…
A/B Testing
33. Big Data as a Streaming Service
Algorithms A/B Testing
Algo A
Algo B
Observe results:
• Daily Active Users
• Streams / users
• Satisfaction
• …
Deezer users
34. Big Data as a Streaming Service
Algorithms A/B Testing: Example
Test: Are new users (with no profile data) more likely to be
more satisfied with charts items or with new ones?
User based neighbourhood: find similar users and recommend their taste
Item based neighbourhood: find similar items (association rules item in same playlists, etc.)
User based neighbourhood: find similar users and recommend their taste
Item based neighbourhood: find similar items (association rules item in same playlists, etc.)
User based neighbourhood: find similar users and recommend their taste
Item based neighbourhood: find similar items (association rules item in same playlists, etc.)
User based neighbourhood: find similar users and recommend their taste
Item based neighbourhood: find similar items (association rules item in same playlists, etc.)
Rich gets richer
Collect information to describe items – and work on similarity
Collect information to describe items – and work on similarity
Collect information to describe items – and work on similarity