SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
June 12, 2014
Danielle Jabin
Data Engineer, A/B Testing
Data at Spotify
I’m Danielle Jabin
•  Data Engineer in the Stockholm office
•  A/B testing infrastructure
•  California born & raised
•  If I can survive a Swedish winter, so can you!
•  Studied Computer Science, Statistics, and Real Estate
through the M&T program at the University of
Pennsylvania
3
Over 40 million active
users
As of June 9, 2014	
  
4
Access to more than 20
million songs
As of June 9, 2014	
  
Big Data
•  40 million Monthly Active Users
•  20+ million tracks
•  1.5 TB of compressed data from users per day
•  64 TB of data generated in Hadoop each day (including
replication factor of 3)
As of June 9, 2014	
  
6
So how much data is that?
Let’s compare: 64 TB
•  293, 203, 072 books (200 pages or 240,000
characters)
•  16,777,216 MP3 files (with 4MB average file size)
•  22,369,600 images (with 3MB average file size)
8
That’s a lot of selfies
9
How do we use this data?
Use Cases
•  Reporting
•  Business Analytics
•  Operational Analytics
•  Product Features
Reporting
•  Reporting to labels, licensors, partners, and advertisers
•  We support our partners
Business Analytics
•  Analyzing growth, user behavior, sign-up funnels, etc
•  Company KPIs
•  NPS analysis
Operational Metrics
•  Root cause analysis
•  Latency analysis
•  Better capacity planning (servers, people, bandwidth)
Product Features
•  Discover and Radio
•  Top lists
•  Personalized recommendations
•  A/B Testing
15
How do we collect this
data?
The three pillars of our Data Infrastructure:
Kafka
Collection
Hadoop
Processing
Databases
Analytics/Visualization
This is Dave. Data Engineer at
Spotify by day…
…chiptune DJ Demoscene Time
Machine by night.
Let’s listen to Dave’s song
Kafka
•  High volume pub-sub
system
•  “Producers publish messages to
Kafka topics, and consumers
subscribe to these topics and
consume the messages.”
Kafka
•  Robust and scalable solution for collection of logs
•  Fast data transfer
•  Low CPU overhead
•  Built-in partitioning, replication, and fault-tolerance
•  Consumers can pull data at different rates
•  Able to handle extremely high volumes
Other people listened too!
Hadoop
•  Process and store massive amounts of unstructured data
across a distributed cluster
•  One cluster with 37 nodes to 690 nodes today
•  28 PB of storage
•  The largest Hadoop cluster in Europe
Hadoop
•  Entering the land of optimizations
•  Data retention policy
•  Move to JVM-based languages
•  MapReduce languages
•  Moving to Crunch, JVM-based, for speed and scalability
•  Python with Hadoop Streaming, Java, Hive, PIG, Scala
•  Sprunch: Crunch wrapper for Scala, open sourced by Spotify
•  Spotify open-sourced scheduler, Luigi, written in Python
•  Simple and easy way to chain jobs
What if we want to know more?
vs
Databases
•  Aggregates from Hadoop put into PostgreSQL or
Cassandra
•  Sqoop
•  Core data can be used and manipulated for various needs
•  Ad hoc queries
•  Dashboards
Databases
•  Aggregates from Hadoop put into PostgreSQL or
Cassandra
•  Sqoop
•  Ad hoc queries
•  Dashboards
Databases
•  Aggregates from Hadoop put into PostgreSQL or
Cassandra
•  Sqoop
•  Ad hoc queries
•  Dashboards
Questions?
A/B testing questions? Find me!
Contr
ol
vs
Thank you!

Mais conteúdo relacionado

Mais procurados

Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At SpotifyVidhya Murali
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyChris Johnson
 
CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyVidhya Murali
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCJosh Baer
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ SpotifyOscar Carlsson
 
Spotify for Brands
Spotify for BrandsSpotify for Brands
Spotify for BrandsDT
 
Spotify Company Presentation
Spotify Company PresentationSpotify Company Presentation
Spotify Company PresentationErik Forkin
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessNick Barkas
 
A Spotify Presentation - Case studies
A Spotify Presentation - Case studiesA Spotify Presentation - Case studies
A Spotify Presentation - Case studiesEmily Wilkinson
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at SpotifyErik Bernhardsson
 
Analysis of Spotify & New Feature Ideas
Analysis of Spotify & New Feature IdeasAnalysis of Spotify & New Feature Ideas
Analysis of Spotify & New Feature IdeasSarah L. Miller
 
Social Media Monitoring: The Case of Spotify
Social Media Monitoring: The Case of SpotifySocial Media Monitoring: The Case of Spotify
Social Media Monitoring: The Case of SpotifyValeria Aguerri
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Spotify Marketing Analysis Project
Spotify Marketing Analysis ProjectSpotify Marketing Analysis Project
Spotify Marketing Analysis ProjectMariaHenaoVivas
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with SparkChris Johnson
 
Product School - Spotify presentation
Product School - Spotify presentationProduct School - Spotify presentation
Product School - Spotify presentationSuleiman Younossi
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotifyAli Sarrafi
 

Mais procurados (20)

Spotify
SpotifySpotify
Spotify
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At Spotify
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
 
Spotify for Brands
Spotify for BrandsSpotify for Brands
Spotify for Brands
 
Spotify Company Presentation
Spotify Company PresentationSpotify Company Presentation
Spotify Company Presentation
 
Spotify: Data center & Backend buildout
Spotify: Data center & Backend buildoutSpotify: Data center & Backend buildout
Spotify: Data center & Backend buildout
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
A Spotify Presentation - Case studies
A Spotify Presentation - Case studiesA Spotify Presentation - Case studies
A Spotify Presentation - Case studies
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 
Analysis of Spotify & New Feature Ideas
Analysis of Spotify & New Feature IdeasAnalysis of Spotify & New Feature Ideas
Analysis of Spotify & New Feature Ideas
 
Social Media Monitoring: The Case of Spotify
Social Media Monitoring: The Case of SpotifySocial Media Monitoring: The Case of Spotify
Social Media Monitoring: The Case of Spotify
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Spotify Marketing Analysis Project
Spotify Marketing Analysis ProjectSpotify Marketing Analysis Project
Spotify Marketing Analysis Project
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with Spark
 
Product School - Spotify presentation
Product School - Spotify presentationProduct School - Spotify presentation
Product School - Spotify presentation
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 
Spotify presentation
Spotify presentationSpotify presentation
Spotify presentation
 

Destaque

Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes TomorrowDanielle Jabin
 
NAMP Conference - A/B Testing Your Way to Success
NAMP Conference - A/B Testing Your Way to SuccessNAMP Conference - A/B Testing Your Way to Success
NAMP Conference - A/B Testing Your Way to SuccessDevon Smith
 
A/B Testing - In data we trust
A/B Testing - In data we trustA/B Testing - In data we trust
A/B Testing - In data we trustPedro Marques
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
 
4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B TestingJanessa Lantz
 
Scaling Agile at Spotify (representation)
Scaling Agile at Spotify (representation)Scaling Agile at Spotify (representation)
Scaling Agile at Spotify (representation)Vlad Mysla
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Peter Antman
 

Destaque (7)

Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes Tomorrow
 
NAMP Conference - A/B Testing Your Way to Success
NAMP Conference - A/B Testing Your Way to SuccessNAMP Conference - A/B Testing Your Way to Success
NAMP Conference - A/B Testing Your Way to Success
 
A/B Testing - In data we trust
A/B Testing - In data we trustA/B Testing - In data we trust
A/B Testing - In data we trust
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing
 
Scaling Agile at Spotify (representation)
Scaling Agile at Spotify (representation)Scaling Agile at Spotify (representation)
Scaling Agile at Spotify (representation)
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved
 

Semelhante a Data at Spotify

Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big DataMiguel Pastor
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Miguel Pastor
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeAdel Rahimi
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Streaming data mining
Streaming data miningStreaming data mining
Streaming data miningAnkit Solanki
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...Lucidworks
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache HadoopKMS Technology
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudRightScale
 
Spotify's Ad Targeting Infrastructure: Achieving Real-time Personalization fo...
Spotify's Ad Targeting Infrastructure: Achieving Real-time Personalization fo...Spotify's Ad Targeting Infrastructure: Achieving Real-time Personalization fo...
Spotify's Ad Targeting Infrastructure: Achieving Real-time Personalization fo...Hakka Labs
 

Semelhante a Data at Spotify (20)

Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big Data
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Big data
Big dataBig data
Big data
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
From Big Data to Fast Data
From Big Data to Fast DataFrom Big Data to Fast Data
From Big Data to Fast Data
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Streaming data mining
Streaming data miningStreaming data mining
Streaming data mining
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
Spotify's Ad Targeting Infrastructure: Achieving Real-time Personalization fo...
Spotify's Ad Targeting Infrastructure: Achieving Real-time Personalization fo...Spotify's Ad Targeting Infrastructure: Achieving Real-time Personalization fo...
Spotify's Ad Targeting Infrastructure: Achieving Real-time Personalization fo...
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Data at Spotify

  • 1. June 12, 2014 Danielle Jabin Data Engineer, A/B Testing Data at Spotify
  • 2. I’m Danielle Jabin •  Data Engineer in the Stockholm office •  A/B testing infrastructure •  California born & raised •  If I can survive a Swedish winter, so can you! •  Studied Computer Science, Statistics, and Real Estate through the M&T program at the University of Pennsylvania
  • 3. 3 Over 40 million active users As of June 9, 2014  
  • 4. 4 Access to more than 20 million songs As of June 9, 2014  
  • 5. Big Data •  40 million Monthly Active Users •  20+ million tracks •  1.5 TB of compressed data from users per day •  64 TB of data generated in Hadoop each day (including replication factor of 3) As of June 9, 2014  
  • 6. 6 So how much data is that?
  • 7. Let’s compare: 64 TB •  293, 203, 072 books (200 pages or 240,000 characters) •  16,777,216 MP3 files (with 4MB average file size) •  22,369,600 images (with 3MB average file size)
  • 8. 8 That’s a lot of selfies
  • 9. 9 How do we use this data?
  • 10. Use Cases •  Reporting •  Business Analytics •  Operational Analytics •  Product Features
  • 11. Reporting •  Reporting to labels, licensors, partners, and advertisers •  We support our partners
  • 12. Business Analytics •  Analyzing growth, user behavior, sign-up funnels, etc •  Company KPIs •  NPS analysis
  • 13. Operational Metrics •  Root cause analysis •  Latency analysis •  Better capacity planning (servers, people, bandwidth)
  • 14. Product Features •  Discover and Radio •  Top lists •  Personalized recommendations •  A/B Testing
  • 15. 15 How do we collect this data?
  • 16. The three pillars of our Data Infrastructure: Kafka Collection Hadoop Processing Databases Analytics/Visualization
  • 17. This is Dave. Data Engineer at Spotify by day…
  • 18. …chiptune DJ Demoscene Time Machine by night.
  • 19. Let’s listen to Dave’s song
  • 20. Kafka •  High volume pub-sub system •  “Producers publish messages to Kafka topics, and consumers subscribe to these topics and consume the messages.”
  • 21. Kafka •  Robust and scalable solution for collection of logs •  Fast data transfer •  Low CPU overhead •  Built-in partitioning, replication, and fault-tolerance •  Consumers can pull data at different rates •  Able to handle extremely high volumes
  • 23. Hadoop •  Process and store massive amounts of unstructured data across a distributed cluster •  One cluster with 37 nodes to 690 nodes today •  28 PB of storage •  The largest Hadoop cluster in Europe
  • 24. Hadoop •  Entering the land of optimizations •  Data retention policy •  Move to JVM-based languages •  MapReduce languages •  Moving to Crunch, JVM-based, for speed and scalability •  Python with Hadoop Streaming, Java, Hive, PIG, Scala •  Sprunch: Crunch wrapper for Scala, open sourced by Spotify •  Spotify open-sourced scheduler, Luigi, written in Python •  Simple and easy way to chain jobs
  • 25. What if we want to know more? vs
  • 26. Databases •  Aggregates from Hadoop put into PostgreSQL or Cassandra •  Sqoop •  Core data can be used and manipulated for various needs •  Ad hoc queries •  Dashboards
  • 27. Databases •  Aggregates from Hadoop put into PostgreSQL or Cassandra •  Sqoop •  Ad hoc queries •  Dashboards
  • 28. Databases •  Aggregates from Hadoop put into PostgreSQL or Cassandra •  Sqoop •  Ad hoc queries •  Dashboards
  • 30. A/B testing questions? Find me! Contr ol vs