SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
Last.fm vs. Xbox


David Singleton
last.fm/user/underpangs
twitter.com/dsingleton
Music discovery powered by Scrobbling

Personalised radio, social network, events, a
“wikipedia of music”

High traffic

  Monthly visitors: 40 million

  Monthly page views: 500,000 million
Last.fm Architecture
       Load balancer



        HTTP Cache



        Web Server



        Object Cache



         Database
Xbox Live Platform,
millions of users

Last.fm Radio App

Built by Microsoft

Powered by our API

Launched along side
Facebook & Twitter
Last.fm vs. Xbox
So, what’s there to
    talk about?
Good Things ™

  New users. It’s really cool.

Bad Things™

  Lots of new users, traffic spikes

  A very important, high profile, launch

How did Last.fm approach this?
Xbox Live: 15 million users
       assuming a 10% take-up rate = 1,500,000 users
      startup: 5 requests   + starting radio: 5 requests + 15 minutes of radio: 60 requests


               1 hour of radio = 250 requests per user
                       an hour of radio per user is a rough averaged guess


  1,500,000 users = 375,000,000 requests over 24 hours
                   assuming an even distribution = 4,500 requests / second


Likely peaking at more than triple = 15,000 requests / second




  Last.fm: 2,000 requests/sec
                      based on number of servers and apache configuration


    estimated max capacity of 3,500 requests per second
Oh fuck
What next?

Picked a metric: requests per second

Estimated traffic increase vs capacity

Selected our goals;

  Serve requests faster

  Reduce number requests
Profiling traffic

Used traffic generated beta testing

Web server request logs

  Common format, widely supported format

  Hundreds of existing tools

We generated some stats using AWK...
Which API requests
     were made?
        Method

71638   track.getInfo
53941   artist.getImages
15150   radio.getPlaylist
 7308   library.getArtists
 5020   user.getRecentStations
 4979   ads.getVideos
 4205   radio.tune
 3155   track.love
 1507   artist.getInfo
 1258   user.getRecommendedArtists
 1135   user.getInfo
 1130   geo.getTopArtists
 1128   radio.gamerStations
 1102   tag.getTopArtists
 1021   track.ban
 1006   user.getLovedTracks
  340   library.addArtist
  206   auth.getMobileSession
Which API requests
   were made?
Raw data from beta
Calls Method                         Total    Average

53941   artist.getImages              19647    0.36
71638   track.getInfo                 15789    0.22
15150   radio.getPlaylist              6962    0.46
 7308   library.getArtists             2402    0.33
 4979   ads.getVideos                  1810    0.36
 5020   user.getRecentStations         1674    0.33
 1102   tag.getTopArtists              1488    1.35
 1258   user.getRecommendedArtists     1457    1.16
 4205   radio.tune                      923    0.22
 1130   geo.getTopArtists               575    0.51
 1507   artist.getInfo                  440    0.29
 1128   radio.gamerStations             298    0.26
 1006   user.getLovedTracks             271    0.27
 1135   user.getInfo                    171    0.15
  206   auth.getMobileSession            38    0.19
  136   user.signUp                      32    0.24
  123   user.terms                       16    0.13
 3155   track.love                        0    0.00
How long did each
 method take?
Why so many
track.getInfo calls?
A tiny UI tweak...

...responsible for 25% of calls.

Arrggghhhhhh

Added that information to a sensible API call

Microsoft kindly updated the app
What next?
What about the
  getImages calls?

Powers an artist slideshow visualisation

Results of this call won’t change often

  Set a HTTP cache timeout

Set caching on a few other calls too
Cached Requests



4
Request generation
Calls Method                         Total    Average

53941   artist.getImages              19647   0.36
71638   track.getInfo                 15789   0.22
15150   radio.getPlaylist              6962   0.46
 7308   library.getArtists             2402   0.33
 4979   ads.getVideos                  1810   0.36
 5020   user.getRecentStations         1674   0.33
 1102   tag.getTopArtists              1488   1.35
 1258   user.getRecommendedArtists     1457   1.16
 4205   radio.tune                      923   0.22
 1130   geo.getTopArtists               575   0.51
 1507   artist.getInfo                  440   0.29
 1128   radio.gamerStations             298   0.26
 1006   user.getLovedTracks             271   0.27
 1135   user.getInfo                    171   0.15
  206   auth.getMobileSession            38   0.19
  136   user.signUp                      32   0.24
  123   user.terms                       16   0.13
 3155   track.love                        0   0.00
kcachegrind
 http://kcachegrind.sourceforge.net




  webgrind
 http://code.google.com/p/webgrind/
What happens if
   things break?

Simulated failing calls

Highlighted essential calls

Acted as a dry-run for launch day failures

  Informed our backup plans
Only essential
  requests
Prepare for the worst

 Unexpected problems we’ve had:

  Servers overheating (twice)

  Hardware (almost) stolen from data-centers

  Power outage in the office
Backup plans, AKA
     The “Kill List”
       Plan                  Effect         Severity

 Disable radio DB-
                          Faster calls       Minor
     backing

Disable Flash Player    Save 200 req/sec     Major

Drop non essential     Reduce Xbox traffic
                                            Extreme
 Xbox API calls            by 0 - 50%
 Drop X% of radio      Reduce Xbox traffic
                                            Nuclear
    tune calls              by X%
Communication
Last.fm: Launch Day
    (When traffic attacks)
How did it go?

Our estimate was about 50% over

Didn’t exceed capacity (but got quite close)

Profiling and caching was essential

Or we would have gone down
What did we learn?

Use timezones to rollout slowly

Traffic will follow daily trends

Live monitoring is essential

Backup plans are comforting

Pre-fill caches before launch
So, how does this
    help me?
1. Estimate

Choose your metric

Estimate launch traffic

Compare against capacity

Make performance targets

Know your limitations
2. Profile requests

Start with a sample of traffic

Extract data for your metric

Visualise the results

Identify expensive requests for your metric

Use profiling tools on individual requests
3. Optimise
Reduce number of requests

Set the right HTTP caching headers

  Combine with reverse web proxy

  Prime caches for common calls

Use an object cache

Avoid language level optimisation
Web Request



Load balancer



HTTP Cache



 Web Server



Object Cache



  Database
Web Request



Load balancer



HTTP Cache



 Web Server



Object Cache



  Database
Web Request



Load balancer



HTTP Cache



 Web Server



Object Cache



  Database
4. Plan for failure

Simulate failures

Know your weak spots

Prepare backups plans

Communicate with users and partners
5. Launch it!
Roll out slowly, if you can

Setup live monitoring

If something goes wrong;

  Don’t panic

  Keep people updated

Have some champagne on ice
1. Start with an estimate
2. Profile your traffic
3. Make optimisations
4. Prepare for the worst
5. Launch it!
Last.fm vs. Xbox

Questions?
David Singleton
last.fm/user/underpangs
twitter.com/dsingleton

Mais conteúdo relacionado

Semelhante a Last.fm vs Xbox

Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
csching
 
ASERT's DDoS Malware Corral, Volume 2
ASERT's DDoS Malware Corral, Volume 2ASERT's DDoS Malware Corral, Volume 2
ASERT's DDoS Malware Corral, Volume 2
dschwarz_arbor
 

Semelhante a Last.fm vs Xbox (20)

Twitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat ArchitectureTwitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat Architecture
 
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason JonesASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
 
Zombie DNS
Zombie DNSZombie DNS
Zombie DNS
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
 
DEF CON 27 - D4KRM4TTER MIKE SPICER - I know what you did last summer
DEF CON 27 - D4KRM4TTER MIKE SPICER - I know what you did last summerDEF CON 27 - D4KRM4TTER MIKE SPICER - I know what you did last summer
DEF CON 27 - D4KRM4TTER MIKE SPICER - I know what you did last summer
 
NPW2009 - my.opera.com scalability v2.0
NPW2009 - my.opera.com scalability v2.0NPW2009 - my.opera.com scalability v2.0
NPW2009 - my.opera.com scalability v2.0
 
ASERT's DDoS Malware Corral, Volume 2
ASERT's DDoS Malware Corral, Volume 2ASERT's DDoS Malware Corral, Volume 2
ASERT's DDoS Malware Corral, Volume 2
 
SANOG 40: DDoS in South Asia
SANOG 40: DDoS in South AsiaSANOG 40: DDoS in South Asia
SANOG 40: DDoS in South Asia
 
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
 
Writing a Fullstack Application with Javascript - Remote media player
Writing a Fullstack Application with Javascript - Remote media playerWriting a Fullstack Application with Javascript - Remote media player
Writing a Fullstack Application with Javascript - Remote media player
 
Being Open: How Facebook got its Edge
Being Open: How Facebook got its EdgeBeing Open: How Facebook got its Edge
Being Open: How Facebook got its Edge
 
Dan York - Presentation at Emerging Communications Conference & Awards (eComm...
Dan York - Presentation at Emerging Communications Conference & Awards (eComm...Dan York - Presentation at Emerging Communications Conference & Awards (eComm...
Dan York - Presentation at Emerging Communications Conference & Awards (eComm...
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-Mobile
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
Android Development Tools
Android Development ToolsAndroid Development Tools
Android Development Tools
 
Honeypots - Tracking the Blackhat Community
Honeypots - Tracking the Blackhat CommunityHoneypots - Tracking the Blackhat Community
Honeypots - Tracking the Blackhat Community
 
Asas Pelayaran Internet
Asas Pelayaran InternetAsas Pelayaran Internet
Asas Pelayaran Internet
 
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterLISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
 
Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5
 
A journey through the machine learning model deployment to production
A journey through the machine learning model deployment to productionA journey through the machine learning model deployment to production
A journey through the machine learning model deployment to production
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Last.fm vs Xbox

  • 1. Last.fm vs. Xbox David Singleton last.fm/user/underpangs twitter.com/dsingleton
  • 2. Music discovery powered by Scrobbling Personalised radio, social network, events, a “wikipedia of music” High traffic Monthly visitors: 40 million Monthly page views: 500,000 million
  • 3. Last.fm Architecture Load balancer HTTP Cache Web Server Object Cache Database
  • 4. Xbox Live Platform, millions of users Last.fm Radio App Built by Microsoft Powered by our API Launched along side Facebook & Twitter
  • 6. So, what’s there to talk about? Good Things ™ New users. It’s really cool. Bad Things™ Lots of new users, traffic spikes A very important, high profile, launch How did Last.fm approach this?
  • 7. Xbox Live: 15 million users assuming a 10% take-up rate = 1,500,000 users startup: 5 requests + starting radio: 5 requests + 15 minutes of radio: 60 requests 1 hour of radio = 250 requests per user an hour of radio per user is a rough averaged guess 1,500,000 users = 375,000,000 requests over 24 hours assuming an even distribution = 4,500 requests / second Likely peaking at more than triple = 15,000 requests / second Last.fm: 2,000 requests/sec based on number of servers and apache configuration estimated max capacity of 3,500 requests per second
  • 9. What next? Picked a metric: requests per second Estimated traffic increase vs capacity Selected our goals; Serve requests faster Reduce number requests
  • 10. Profiling traffic Used traffic generated beta testing Web server request logs Common format, widely supported format Hundreds of existing tools We generated some stats using AWK...
  • 11. Which API requests were made? Method 71638 track.getInfo 53941 artist.getImages 15150 radio.getPlaylist 7308 library.getArtists 5020 user.getRecentStations 4979 ads.getVideos 4205 radio.tune 3155 track.love 1507 artist.getInfo 1258 user.getRecommendedArtists 1135 user.getInfo 1130 geo.getTopArtists 1128 radio.gamerStations 1102 tag.getTopArtists 1021 track.ban 1006 user.getLovedTracks 340 library.addArtist 206 auth.getMobileSession
  • 12. Which API requests were made?
  • 13. Raw data from beta Calls Method Total Average 53941 artist.getImages 19647 0.36 71638 track.getInfo 15789 0.22 15150 radio.getPlaylist 6962 0.46 7308 library.getArtists 2402 0.33 4979 ads.getVideos 1810 0.36 5020 user.getRecentStations 1674 0.33 1102 tag.getTopArtists 1488 1.35 1258 user.getRecommendedArtists 1457 1.16 4205 radio.tune 923 0.22 1130 geo.getTopArtists 575 0.51 1507 artist.getInfo 440 0.29 1128 radio.gamerStations 298 0.26 1006 user.getLovedTracks 271 0.27 1135 user.getInfo 171 0.15 206 auth.getMobileSession 38 0.19 136 user.signUp 32 0.24 123 user.terms 16 0.13 3155 track.love 0 0.00
  • 14. How long did each method take?
  • 15. Why so many track.getInfo calls? A tiny UI tweak... ...responsible for 25% of calls. Arrggghhhhhh Added that information to a sensible API call Microsoft kindly updated the app
  • 17. What about the getImages calls? Powers an artist slideshow visualisation Results of this call won’t change often Set a HTTP cache timeout Set caching on a few other calls too
  • 19. Request generation Calls Method Total Average 53941 artist.getImages 19647 0.36 71638 track.getInfo 15789 0.22 15150 radio.getPlaylist 6962 0.46 7308 library.getArtists 2402 0.33 4979 ads.getVideos 1810 0.36 5020 user.getRecentStations 1674 0.33 1102 tag.getTopArtists 1488 1.35 1258 user.getRecommendedArtists 1457 1.16 4205 radio.tune 923 0.22 1130 geo.getTopArtists 575 0.51 1507 artist.getInfo 440 0.29 1128 radio.gamerStations 298 0.26 1006 user.getLovedTracks 271 0.27 1135 user.getInfo 171 0.15 206 auth.getMobileSession 38 0.19 136 user.signUp 32 0.24 123 user.terms 16 0.13 3155 track.love 0 0.00
  • 20. kcachegrind http://kcachegrind.sourceforge.net webgrind http://code.google.com/p/webgrind/
  • 21. What happens if things break? Simulated failing calls Highlighted essential calls Acted as a dry-run for launch day failures Informed our backup plans
  • 22. Only essential requests
  • 23. Prepare for the worst Unexpected problems we’ve had: Servers overheating (twice) Hardware (almost) stolen from data-centers Power outage in the office
  • 24. Backup plans, AKA The “Kill List” Plan Effect Severity Disable radio DB- Faster calls Minor backing Disable Flash Player Save 200 req/sec Major Drop non essential Reduce Xbox traffic Extreme Xbox API calls by 0 - 50% Drop X% of radio Reduce Xbox traffic Nuclear tune calls by X%
  • 26. Last.fm: Launch Day (When traffic attacks)
  • 27. How did it go? Our estimate was about 50% over Didn’t exceed capacity (but got quite close) Profiling and caching was essential Or we would have gone down
  • 28. What did we learn? Use timezones to rollout slowly Traffic will follow daily trends Live monitoring is essential Backup plans are comforting Pre-fill caches before launch
  • 29. So, how does this help me?
  • 30. 1. Estimate Choose your metric Estimate launch traffic Compare against capacity Make performance targets Know your limitations
  • 31. 2. Profile requests Start with a sample of traffic Extract data for your metric Visualise the results Identify expensive requests for your metric Use profiling tools on individual requests
  • 32. 3. Optimise Reduce number of requests Set the right HTTP caching headers Combine with reverse web proxy Prime caches for common calls Use an object cache Avoid language level optimisation
  • 33. Web Request Load balancer HTTP Cache Web Server Object Cache Database
  • 34. Web Request Load balancer HTTP Cache Web Server Object Cache Database
  • 35. Web Request Load balancer HTTP Cache Web Server Object Cache Database
  • 36. 4. Plan for failure Simulate failures Know your weak spots Prepare backups plans Communicate with users and partners
  • 37. 5. Launch it! Roll out slowly, if you can Setup live monitoring If something goes wrong; Don’t panic Keep people updated Have some champagne on ice
  • 38. 1. Start with an estimate 2. Profile your traffic 3. Make optimisations 4. Prepare for the worst 5. Launch it!
  • 39. Last.fm vs. Xbox Questions? David Singleton last.fm/user/underpangs twitter.com/dsingleton