SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Delivering Music
Recommendations to Millions!
!
Sriram Malladi
Rohan Singh (@rohansingh)
A little bit about us
Over 24 million active users
!
55 countries, 4 data centers
!
20+ million tracks
Discovering music — then
3
4
5
What do you want to listen to?
6
The right music for every moment
7
The Plan
8
1. Collect lots of data

2. Generate personalized recommendations

3. Serve those to millions of listeners each day
9
Data, data, data
Track plays
!
Radio feedback
!
Playlists
!
Follows
10
Collecting it
All data into and out of Spotify goes through access points
!
Access points and services log data
!
Logs are aggregated and shipped to Hadoop
11
12
access points
HadoopCassandra
Discover
backend
Hadoop
Framework to store and process big, distributed data
!
Hadoop Distributed File System (HDFS)
!
Hadoop MapReduce
!
hadoop.apache.org
13
14
15
Crunch the data
16
What you listen to
!
Artists you follow
!
What similar users listen to
17
Your buddy listened to Daft Punk all day
!
An artist you follow released a new album
!
A friend just created a new playlist
18
Region
!
Age group
!
Gender
19
What do you want to listen to?
20
21
Roll it out
22
Storing recommendations
About 80 kilobytes of data per user
!
Regenerated daily for all active users
!
> 1 TB of recommendations overall
23
Storing recommendations
Terabyte a day is a fair amount of data to write and index
(+ replicas)
!
Tradeoff between storing data or
looking things up on the fly
24
Apache Cassandra
Highly available, distributed, scalable
!
Fast writes, decent reads
!
We have one cluster per data center
25
Transferring worldwide
26
Recommendations all generated in Hadoop, in London
!
Need to be shipped to our data centers worldwide
!
So much Internet weather
27
hdfs2cass
Internal tool to copy data from Hadoop to Cassandra
!
Creates table from data in HDFS
!
Loads tables into Cassandra using a bulk loader
!
github.com/spotify/hdfs2cass
28
Serve it up
29
1. Aggregate data from all our data sources

2. Decorate it with metadata

3. Shuffle!
30
Serving at scale…
Discover is Spotify’s home page
500+ requests per second
!
Thousands of requests to other services
!
Database calls for each unique user
31
…or not?
Naive, first-stage prototype:
!
10 to 20 seconds per request
!
(doesn’t really scale)
32
Fail a lot
33
Make it webscale!
!
Find & rewrite slow sections
!
Switch to C++ from Python for critical code
34
More improvements
!
Pregenerate more data
!
Cache aggressively
35
Throw hardware at it
!
Switch to SSD’s
!
Scale horizontally
36
Takeaways
You will need hardware —
big data can still be expensive
!
Less data can be better
!
Prototype, iterate, optimize—
fail early but improve
37
Questions?
38

Mais conteúdo relacionado

Destaque

Regulamin konkursu
Regulamin konkursuRegulamin konkursu
Regulamin konkursupzgomaz
 
X Światowe Igrzyska Sportowe Głuchych
X Światowe Igrzyska Sportowe GłuchychX Światowe Igrzyska Sportowe Głuchych
X Światowe Igrzyska Sportowe Głuchychpzgomaz
 
Świat Głuchoniemych Gazeta z 1927r.
Świat Głuchoniemych Gazeta z 1927r.Świat Głuchoniemych Gazeta z 1927r.
Świat Głuchoniemych Gazeta z 1927r.pzgomaz
 
Wiggin dielectric
Wiggin dielectricWiggin dielectric
Wiggin dielectricSaul Wiggin
 
speech class ssl document
speech class ssl documentspeech class ssl document
speech class ssl documentquincigstudent
 
SXSW Interactive 2014
SXSW Interactive 2014SXSW Interactive 2014
SXSW Interactive 2014CowanDeBaets
 
Honorowy Prezes PZG
Honorowy Prezes PZGHonorowy Prezes PZG
Honorowy Prezes PZGpzgomaz
 
Stefania ulassowa
Stefania ulassowaStefania ulassowa
Stefania ulassowapzgomaz
 
Powstanie Warszawskie oczami 12 letniego cywila
Powstanie Warszawskie oczami 12 letniego cywilaPowstanie Warszawskie oczami 12 letniego cywila
Powstanie Warszawskie oczami 12 letniego cywilapzgomaz
 
Smoszewo dalsza historia
Smoszewo dalsza historiaSmoszewo dalsza historia
Smoszewo dalsza historiapzgomaz
 
Speechslides for class
Speechslides for classSpeechslides for class
Speechslides for classquincigstudent
 
Rocznica
RocznicaRocznica
Rocznicapzgomaz
 
Pluton głuchoniemych żołnierzy
Pluton głuchoniemych żołnierzyPluton głuchoniemych żołnierzy
Pluton głuchoniemych żołnierzypzgomaz
 

Destaque (18)

Regulamin konkursu
Regulamin konkursuRegulamin konkursu
Regulamin konkursu
 
Speechfinalslides3
Speechfinalslides3Speechfinalslides3
Speechfinalslides3
 
X Światowe Igrzyska Sportowe Głuchych
X Światowe Igrzyska Sportowe GłuchychX Światowe Igrzyska Sportowe Głuchych
X Światowe Igrzyska Sportowe Głuchych
 
Świat Głuchoniemych Gazeta z 1927r.
Świat Głuchoniemych Gazeta z 1927r.Świat Głuchoniemych Gazeta z 1927r.
Świat Głuchoniemych Gazeta z 1927r.
 
LN Case Study
LN Case StudyLN Case Study
LN Case Study
 
Kongres
KongresKongres
Kongres
 
Wiggin dielectric
Wiggin dielectricWiggin dielectric
Wiggin dielectric
 
speech class ssl document
speech class ssl documentspeech class ssl document
speech class ssl document
 
SXSW Interactive 2014
SXSW Interactive 2014SXSW Interactive 2014
SXSW Interactive 2014
 
Speechfinalslides2
Speechfinalslides2Speechfinalslides2
Speechfinalslides2
 
Creative IT Minds
Creative IT MindsCreative IT Minds
Creative IT Minds
 
Honorowy Prezes PZG
Honorowy Prezes PZGHonorowy Prezes PZG
Honorowy Prezes PZG
 
Stefania ulassowa
Stefania ulassowaStefania ulassowa
Stefania ulassowa
 
Powstanie Warszawskie oczami 12 letniego cywila
Powstanie Warszawskie oczami 12 letniego cywilaPowstanie Warszawskie oczami 12 letniego cywila
Powstanie Warszawskie oczami 12 letniego cywila
 
Smoszewo dalsza historia
Smoszewo dalsza historiaSmoszewo dalsza historia
Smoszewo dalsza historia
 
Speechslides for class
Speechslides for classSpeechslides for class
Speechslides for class
 
Rocznica
RocznicaRocznica
Rocznica
 
Pluton głuchoniemych żołnierzy
Pluton głuchoniemych żołnierzyPluton głuchoniemych żołnierzy
Pluton głuchoniemych żołnierzy
 

Semelhante a Delivering Personalized Music Discovery

Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?rhatr
 
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyScaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyEvention
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopSpagoWorld
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Miguel Pastor
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big DataMiguel Pastor
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyJosh Baer
 
Drupal case study: ABC Dig Music
Drupal case study: ABC Dig MusicDrupal case study: ABC Dig Music
Drupal case study: ABC Dig MusicDavid Peterson
 
Final version sql over hadoop ver1
Final version sql over hadoop ver1Final version sql over hadoop ver1
Final version sql over hadoop ver1Sudheesh Narayanan
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisTrieu Nguyen
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopStu Hood
 
Processing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesProcessing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesSyed Muhammad Ali Hasnain
 
2016-07-21-Godil-presentation.pptx
2016-07-21-Godil-presentation.pptx2016-07-21-Godil-presentation.pptx
2016-07-21-Godil-presentation.pptxD21CE161GOSWAMIPARTH
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsDataWorks Summit
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopTimothy Spann
 

Semelhante a Delivering Personalized Music Discovery (20)

Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?
 
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyScaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell Spotify
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from Hadoop
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big Data
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
Drupal case study: ABC Dig Music
Drupal case study: ABC Dig MusicDrupal case study: ABC Dig Music
Drupal case study: ABC Dig Music
 
Final version sql over hadoop ver1
Final version sql over hadoop ver1Final version sql over hadoop ver1
Final version sql over hadoop ver1
 
Sql over hadoop ver 3
Sql over hadoop ver 3Sql over hadoop ver 3
Sql over hadoop ver 3
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
 
Hfile
HfileHfile
Hfile
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with Hadoop
 
Processing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesProcessing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web Technologies
 
2016-07-21-Godil-presentation.pptx
2016-07-21-Godil-presentation.pptx2016-07-21-Godil-presentation.pptx
2016-07-21-Godil-presentation.pptx
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 

Último

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Delivering Personalized Music Discovery