SlideShare uma empresa Scribd logo
1 de 23
Distributed “Web Scale” Systems



                      Ricardo Vice Santos
                            @ricardovice
Who am I?
•  I’m Ricardo!
•  Lead Engineer at Spotify
•  ricardovice on twitter, spotify, about.me, kiva, slideshare, github,
   bitbucket, delicious…
•  Portuguese
•  Previously working in the video streaming industry
•  (only) Discovered Spotify late 2009
•  Joined in 2010
spotifiera:           to use Spotify;
spo·ti·fie·ra   Verb to provide a service free of cost;
What’s Spotify all about?
•  A big catalogue, tons of music
•  Available everywhere
•  Great user experience
•  More convenient than piracy
•  Reliable, high availability
•  Scalable for many, many users
But what really got me hooked up:
•  Free, legal ad-supported service
•  Very fast
The importance of being fast
•  High latency can be a problem, not only in First
   Person Shooters
•  Slow performance is a major user experience killer
•  At Velocity 2009, Eric Schurman (Bing) and Jake
   Brutlag (Google Search) showed that increased
   latency directly hurt usage and revenue per user[1].
•  Latency leads to users leaving, many wont ever
   come back
•  Users will share their experience with friends


          [1] http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html
So how fast is Spotify?
•  We monitor playback latency on the client side
•  Current median latency to play any track is 265ms
•  On average, the human notion of “instant” is
   anything under 200ms
•  Due to disk lookup, at times it's actually faster to
   start playing a track from network than from disk
•  Below 1% of playbacks experienced stutter
“Spotify is fast due to P2P”
•  This is something I read a lot around the web
•  P2P does play a crucial role in the picture, but…
•  Experience at Spotify showed me that most latency issues are
   directly linked to backend problems
•  It’s a mistake to think that we could be this fast without a smart and
   scalable backend architecture

So let’s give credit where credit is due.
Going web scale!!1




“Scaling Twitter”
Blaine Cook, 2007
http://www.slideshare.net/Blaine/scaling-twitter
Handling growth
Things to keep in mind:
•  Scaling is not an exact science
•  There is no such thing as a magic formula
•  Usage patterns differ
•  There is always a limit to what you can handle
•  Fail gracefully
•  Continuous evolution process
Scaling horizontally
•    You can always add more machines!
•    Stateless services
•    Several processes can share memcached
•    Possible to run in “the cloud” (EC2, Rackspace)
•    Need some kind of load balancer
•    Data sharing/synchronization can be hard
•    Complexity: many pieces, maybe hidden SPOFs
•    Fundamental to the application’s design
Usage patterns
Typically, some services are more demanding than
others, this can be due to:
•  Higher popularity
•  Higher complexity
•  Low latency expectation
•  All combined
Decoupling
•    Divide and conquer!
•    The Unix way
•    Resources assigned individually
•    Using the right tools to address each problem
•    Organization and delegation
•    Problems are isolated
•    Easier to handle growth
Read only services
•    The easiest to scale
•    Stateless
•    Use indices, large read-optimized data containers
•    Each node has its local copy
•    Data structured according to service
•    Updated periodically, during off-peak hours
•    Take advantage of OS page cache
Read-write services
•  User generated content, e.g. playlists
•  Hard to ensure consistence of data across instances

Solutions:
•  Eventual consistency:
   •  Reads of just written data not guaranteed to be up-to-date
•  Locking, atomic operations
    •  Creating globally unique keys, e.g. usernames
    •  Transactions, e.g. billing
Decoupling at Spotify
Finding a service via DNS
Each service has an SRV DNS record:
•  One record with same name for each service instance
•  Clients (AP) resolve to find servers providing that service
•  Lowest priority record is chosen with weighted shuffle
•  Clients retry other instances in case of failures

Example SRV record
_frobnicator._http.example.com. 3600 SRV 10     50   8081 frob1.example.com.!
       name                     TTL type prio weight port      host!
Request assignment
•    Hardware load balancers
•    Round-robin DNS
•    Proxy servers
•    Sharding:
      •  Each server/instance responsible for subset of data
      •  Directs client to instance that has its data
      •  Easy if nothing is shared
      •  Hard if you require replication
Sharding using a DHT
Some Spotify services use Dynamo inspired DHTs[1]:
•  Each request has a key
•  Each service node is responsible for a range of hash keys
•  Data is distributed among service nodes
•  Redundancy is ensured by re-hashing and writing to replica node
•  Data must be transitioned when ring changes
!




         [1] http://dl.acm.org/citation.cfm?id=1294281
DHT example
Spotify’s DNS powered DHT
Configuration of DHT
config._frobnicator._http.example.com.     3600    TXT          “slaves=0”!
      config.srv_name.                     TTL     type   !   no replication!
!
config._frobnicator._http.example.com.     3600    TXT      “slaves=2 redundancy=host”!
      config.srv_name.                     TTL!    type   !      three replicas!
                                                                on separate hosts!

Ring segment, one per node
tokens.8081.frob1.example.com.   3600    TXT      “00112233445566778899aabbccddeeff”!
      tokens.port.host.          TTL     type                last key!
!
And if none of this works for you
Remember
/dev/null is
web scale!!




          http://www.xtranormal.com/watch/6995033/
Questions?
                     get in touch!
                    @ricardovice
             ricardo@spotify.com
Thank you.

                    @ricardovice
             ricardo@spotify.com

Mais conteúdo relacionado

Mais procurados

CI/CD (DevOps) 101
CI/CD (DevOps) 101CI/CD (DevOps) 101
CI/CD (DevOps) 101Hazzim Anaya
 
Flusso Continuous Integration & Continuous Delivery
Flusso Continuous Integration & Continuous DeliveryFlusso Continuous Integration & Continuous Delivery
Flusso Continuous Integration & Continuous DeliveryJoost van der Griendt
 
DevOps Introduction
DevOps IntroductionDevOps Introduction
DevOps IntroductionRobert Sell
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systemsbcantrill
 
DevOps overview 2019-04-13 Nelkinda April Meetup
DevOps overview  2019-04-13 Nelkinda April MeetupDevOps overview  2019-04-13 Nelkinda April Meetup
DevOps overview 2019-04-13 Nelkinda April MeetupShweta Sadawarte
 
June OpenNTF Webinar - Domino V12 Certification Manager
June OpenNTF Webinar - Domino V12 Certification ManagerJune OpenNTF Webinar - Domino V12 Certification Manager
June OpenNTF Webinar - Domino V12 Certification ManagerHoward Greenberg
 
CI CD Pipeline Using Jenkins | Continuous Integration and Deployment | DevOps...
CI CD Pipeline Using Jenkins | Continuous Integration and Deployment | DevOps...CI CD Pipeline Using Jenkins | Continuous Integration and Deployment | DevOps...
CI CD Pipeline Using Jenkins | Continuous Integration and Deployment | DevOps...Edureka!
 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...confluent
 
Linux basic commands
Linux basic commandsLinux basic commands
Linux basic commandsSagar Kumar
 
Janus RTP forwarders @ FOSDEM 2020
Janus RTP forwarders @ FOSDEM 2020Janus RTP forwarders @ FOSDEM 2020
Janus RTP forwarders @ FOSDEM 2020Lorenzo Miniero
 
Getting started with Jenkins
Getting started with JenkinsGetting started with Jenkins
Getting started with JenkinsEdureka!
 
Data Structures in and on IPFS
Data Structures in and on IPFSData Structures in and on IPFS
Data Structures in and on IPFSC4Media
 
Janus SFU cascading @ IIT-RTC 2022
Janus SFU cascading @ IIT-RTC 2022Janus SFU cascading @ IIT-RTC 2022
Janus SFU cascading @ IIT-RTC 2022Lorenzo Miniero
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
DevOps - Overview - One of the Top Trends in IT Industry
DevOps - Overview - One of the Top Trends in IT IndustryDevOps - Overview - One of the Top Trends in IT Industry
DevOps - Overview - One of the Top Trends in IT IndustryRahul Tilloo
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesShiva Narayanaswamy
 

Mais procurados (20)

CI/CD (DevOps) 101
CI/CD (DevOps) 101CI/CD (DevOps) 101
CI/CD (DevOps) 101
 
Flusso Continuous Integration & Continuous Delivery
Flusso Continuous Integration & Continuous DeliveryFlusso Continuous Integration & Continuous Delivery
Flusso Continuous Integration & Continuous Delivery
 
DevOps Introduction
DevOps IntroductionDevOps Introduction
DevOps Introduction
 
DevOps and Tools
DevOps and ToolsDevOps and Tools
DevOps and Tools
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systems
 
DevOps overview 2019-04-13 Nelkinda April Meetup
DevOps overview  2019-04-13 Nelkinda April MeetupDevOps overview  2019-04-13 Nelkinda April Meetup
DevOps overview 2019-04-13 Nelkinda April Meetup
 
"DevOps > CI+CD "
"DevOps > CI+CD ""DevOps > CI+CD "
"DevOps > CI+CD "
 
June OpenNTF Webinar - Domino V12 Certification Manager
June OpenNTF Webinar - Domino V12 Certification ManagerJune OpenNTF Webinar - Domino V12 Certification Manager
June OpenNTF Webinar - Domino V12 Certification Manager
 
CI CD Pipeline Using Jenkins | Continuous Integration and Deployment | DevOps...
CI CD Pipeline Using Jenkins | Continuous Integration and Deployment | DevOps...CI CD Pipeline Using Jenkins | Continuous Integration and Deployment | DevOps...
CI CD Pipeline Using Jenkins | Continuous Integration and Deployment | DevOps...
 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
 
Introduction to CI/CD
Introduction to CI/CDIntroduction to CI/CD
Introduction to CI/CD
 
Linux basic commands
Linux basic commandsLinux basic commands
Linux basic commands
 
Janus RTP forwarders @ FOSDEM 2020
Janus RTP forwarders @ FOSDEM 2020Janus RTP forwarders @ FOSDEM 2020
Janus RTP forwarders @ FOSDEM 2020
 
Getting started with Jenkins
Getting started with JenkinsGetting started with Jenkins
Getting started with Jenkins
 
Platform engineering
Platform engineeringPlatform engineering
Platform engineering
 
Data Structures in and on IPFS
Data Structures in and on IPFSData Structures in and on IPFS
Data Structures in and on IPFS
 
Janus SFU cascading @ IIT-RTC 2022
Janus SFU cascading @ IIT-RTC 2022Janus SFU cascading @ IIT-RTC 2022
Janus SFU cascading @ IIT-RTC 2022
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
DevOps - Overview - One of the Top Trends in IT Industry
DevOps - Overview - One of the Top Trends in IT IndustryDevOps - Overview - One of the Top Trends in IT Industry
DevOps - Overview - One of the Top Trends in IT Industry
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best Practices
 

Destaque

Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613Jamie Huggins
 
Astronaut Wheelock Pictures
Astronaut Wheelock PicturesAstronaut Wheelock Pictures
Astronaut Wheelock PicturesTom Kuipers
 
Riding promotion team
Riding promotion teamRiding promotion team
Riding promotion teamJungkoo Kim
 
Sharing china photos on flickr
Sharing china photos on flickrSharing china photos on flickr
Sharing china photos on flickrLeigh Scott
 
Guide for One Person Company Registration
Guide for One Person Company RegistrationGuide for One Person Company Registration
Guide for One Person Company RegistrationBinoy Chacko
 
Kỹ năng thuyết trình
Kỹ năng thuyết trìnhKỹ năng thuyết trình
Kỹ năng thuyết trìnhLinh Pham Dieu
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Melanie Zurba
 
Vincent tema5 town
Vincent tema5 townVincent tema5 town
Vincent tema5 townJacket25
 
Azkena rock
Azkena rockAzkena rock
Azkena rockaneborja
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Melanie Zurba
 

Destaque (20)

Spotify: Data center & Backend buildout
Spotify: Data center & Backend buildoutSpotify: Data center & Backend buildout
Spotify: Data center & Backend buildout
 
Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613
 
Astronaut Wheelock Pictures
Astronaut Wheelock PicturesAstronaut Wheelock Pictures
Astronaut Wheelock Pictures
 
Riding promotion team
Riding promotion teamRiding promotion team
Riding promotion team
 
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
 
2016 Leading Seagulls 4 Todays Interns
2016 Leading Seagulls 4 Todays Interns 2016 Leading Seagulls 4 Todays Interns
2016 Leading Seagulls 4 Todays Interns
 
Sharing china photos on flickr
Sharing china photos on flickrSharing china photos on flickr
Sharing china photos on flickr
 
Quechua
QuechuaQuechua
Quechua
 
17 icsqcc hayal koksal
17 icsqcc hayal koksal17 icsqcc hayal koksal
17 icsqcc hayal koksal
 
Guide for One Person Company Registration
Guide for One Person Company RegistrationGuide for One Person Company Registration
Guide for One Person Company Registration
 
www.toneabs.info
www.toneabs.infowww.toneabs.info
www.toneabs.info
 
Kỹ năng thuyết trình
Kỹ năng thuyết trìnhKỹ năng thuyết trình
Kỹ năng thuyết trình
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1
 
Bibliotecas famosas
Bibliotecas famosasBibliotecas famosas
Bibliotecas famosas
 
Lurdes
LurdesLurdes
Lurdes
 
2016 leading seagulls 7 teacher candy dates
2016 leading seagulls 7 teacher candy dates2016 leading seagulls 7 teacher candy dates
2016 leading seagulls 7 teacher candy dates
 
Vincent tema5 town
Vincent tema5 townVincent tema5 town
Vincent tema5 town
 
Azkena rock
Azkena rockAzkena rock
Azkena rock
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1
 
2016 leading seagulls 16 beautiful minds
2016 leading seagulls 16 beautiful minds 2016 leading seagulls 16 beautiful minds
2016 leading seagulls 16 beautiful minds
 

Semelhante a Distributed "Web Scale" Systems

Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
Spotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingSpotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingRicardo Vice Santos
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkTomas Doran
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into OverdriveTodd Palino
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)Panagiotis Kanavos
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)Panagiotis Kanavos
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
Design for Scale / Surge 2010
Design for Scale / Surge 2010Design for Scale / Surge 2010
Design for Scale / Surge 2010Christopher Brown
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remanijaxconf
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling SoftwareAbdelmonaim Remani
 
Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Alec Muffett
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Bob Pusateri
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 

Semelhante a Distributed "Web Scale" Systems (20)

Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Spotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingSpotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streaming
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Realtime web2012
Realtime web2012Realtime web2012
Realtime web2012
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Design for Scale / Surge 2010
Design for Scale / Surge 2010Design for Scale / Surge 2010
Design for Scale / Surge 2010
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remani
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
 
Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 

Último

AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 

Último (20)

AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 

Distributed "Web Scale" Systems

  • 1. Distributed “Web Scale” Systems Ricardo Vice Santos @ricardovice
  • 2. Who am I? •  I’m Ricardo! •  Lead Engineer at Spotify •  ricardovice on twitter, spotify, about.me, kiva, slideshare, github, bitbucket, delicious… •  Portuguese •  Previously working in the video streaming industry •  (only) Discovered Spotify late 2009 •  Joined in 2010
  • 3. spotifiera: to use Spotify; spo·ti·fie·ra Verb to provide a service free of cost;
  • 4. What’s Spotify all about? •  A big catalogue, tons of music •  Available everywhere •  Great user experience •  More convenient than piracy •  Reliable, high availability •  Scalable for many, many users But what really got me hooked up: •  Free, legal ad-supported service •  Very fast
  • 5. The importance of being fast •  High latency can be a problem, not only in First Person Shooters •  Slow performance is a major user experience killer •  At Velocity 2009, Eric Schurman (Bing) and Jake Brutlag (Google Search) showed that increased latency directly hurt usage and revenue per user[1]. •  Latency leads to users leaving, many wont ever come back •  Users will share their experience with friends [1] http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html
  • 6. So how fast is Spotify? •  We monitor playback latency on the client side •  Current median latency to play any track is 265ms •  On average, the human notion of “instant” is anything under 200ms •  Due to disk lookup, at times it's actually faster to start playing a track from network than from disk •  Below 1% of playbacks experienced stutter
  • 7. “Spotify is fast due to P2P” •  This is something I read a lot around the web •  P2P does play a crucial role in the picture, but… •  Experience at Spotify showed me that most latency issues are directly linked to backend problems •  It’s a mistake to think that we could be this fast without a smart and scalable backend architecture So let’s give credit where credit is due.
  • 8. Going web scale!!1 “Scaling Twitter” Blaine Cook, 2007 http://www.slideshare.net/Blaine/scaling-twitter
  • 9. Handling growth Things to keep in mind: •  Scaling is not an exact science •  There is no such thing as a magic formula •  Usage patterns differ •  There is always a limit to what you can handle •  Fail gracefully •  Continuous evolution process
  • 10. Scaling horizontally •  You can always add more machines! •  Stateless services •  Several processes can share memcached •  Possible to run in “the cloud” (EC2, Rackspace) •  Need some kind of load balancer •  Data sharing/synchronization can be hard •  Complexity: many pieces, maybe hidden SPOFs •  Fundamental to the application’s design
  • 11. Usage patterns Typically, some services are more demanding than others, this can be due to: •  Higher popularity •  Higher complexity •  Low latency expectation •  All combined
  • 12. Decoupling •  Divide and conquer! •  The Unix way •  Resources assigned individually •  Using the right tools to address each problem •  Organization and delegation •  Problems are isolated •  Easier to handle growth
  • 13. Read only services •  The easiest to scale •  Stateless •  Use indices, large read-optimized data containers •  Each node has its local copy •  Data structured according to service •  Updated periodically, during off-peak hours •  Take advantage of OS page cache
  • 14. Read-write services •  User generated content, e.g. playlists •  Hard to ensure consistence of data across instances Solutions: •  Eventual consistency: •  Reads of just written data not guaranteed to be up-to-date •  Locking, atomic operations •  Creating globally unique keys, e.g. usernames •  Transactions, e.g. billing
  • 16. Finding a service via DNS Each service has an SRV DNS record: •  One record with same name for each service instance •  Clients (AP) resolve to find servers providing that service •  Lowest priority record is chosen with weighted shuffle •  Clients retry other instances in case of failures Example SRV record _frobnicator._http.example.com. 3600 SRV 10 50 8081 frob1.example.com.! name TTL type prio weight port host!
  • 17. Request assignment •  Hardware load balancers •  Round-robin DNS •  Proxy servers •  Sharding: •  Each server/instance responsible for subset of data •  Directs client to instance that has its data •  Easy if nothing is shared •  Hard if you require replication
  • 18. Sharding using a DHT Some Spotify services use Dynamo inspired DHTs[1]: •  Each request has a key •  Each service node is responsible for a range of hash keys •  Data is distributed among service nodes •  Redundancy is ensured by re-hashing and writing to replica node •  Data must be transitioned when ring changes ! [1] http://dl.acm.org/citation.cfm?id=1294281
  • 20. Spotify’s DNS powered DHT Configuration of DHT config._frobnicator._http.example.com. 3600 TXT “slaves=0”! config.srv_name. TTL type ! no replication! ! config._frobnicator._http.example.com. 3600 TXT “slaves=2 redundancy=host”! config.srv_name. TTL! type ! three replicas! on separate hosts! Ring segment, one per node tokens.8081.frob1.example.com. 3600 TXT “00112233445566778899aabbccddeeff”! tokens.port.host. TTL type last key! !
  • 21. And if none of this works for you Remember /dev/null is web scale!! http://www.xtranormal.com/watch/6995033/
  • 22. Questions? get in touch! @ricardovice ricardo@spotify.com
  • 23. Thank you. @ricardovice ricardo@spotify.com