SlideShare uma empresa Scribd logo
1 de 129
November 3rd, 2014
Email: jedberg@{gmail,netflix}.com
Twitter: @jedberg
Web: www.jedberg.net
Facebook: facebook.com/jedberg
Linkedin: www.linkedin.com/in/jedberg
You won't believe how the biggest
sites build scalable and resilient
systems!
Email: pfisher-ogden@netflix.com
Twitter: @philip_pfo
Linkedin: www.linkedin.com/in/philfish
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/scalable-resilient-systems
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
We used to live in a world where
the assumption was that nothing
breaks
That’s just not true anymore
Scale
breaks hardware
Speed
breaks software
Speed at Scale
breaks everything
@adrianco
What you’ll hear today
• Operational Best Practices
• Data Best Practices
• Evolution of the
architecture of theVideo
History Service at Netflix
Cloud Native
10s of thousands of instances, thousands
created and removed daily
Thousands of storage nodes, petabytes of
data, nodes can be removed without harm
(Some folks call this Microservices)
Why do we use the Public Cloud?
Things'We'Don’t'Do'Things We Don’t Do
Better Business Agility
Benefits of Public Clouds
• Don’t have to procure servers anymore
• No racking or imaging servers anymore
• Systems are always “just the right size”
• Machines can be named by function
• Time to market is faster
• Multiple physical locations with AZs and regions
• Elasticity!
We want to use clouds, not build
them
• Public cloud for agility and scale
• We use electricity, but we don’t
build our own power stations
• AWS because they are big enough
to allocate thousands of instances
per hour when necessary
What about private clouds?
• Some of the problems you don’t
have: noisy neighbors, lack of
physical access
• Problem you do have: You have
to pay for your spare capacity
instead of someone else
All systems
choices assume
some part will fail
at some point.
“Build for three”
“Build for three”
Photo courtesy of Nasa
• Easier auto-scaling
• Easier capacity planning
• Identify problematic code-paths
more easily
• Narrow in the effects of a change
• More efficient local caching
Advantages to a Service Oriented
Architecture
12B outbound
requests per day to
API dependencies
Movie
Ratings
Personalizatio
n Engine
User Info
Movie
Metadata
Similar
Movies
Reviews
A/B Test
Engine
2B requests per
day
into the Netflix API
Discovery
API
Streaming
API
Movie
Ratings
Personalizatio
n Engine
User Info
Movie
Metadata
Similar
Movies
Reviews
A/B Test
Engine
Discovery
API
Streaming
API
Content
Encoding
CDN
Management
QOS
Logging
DRM
OpenConnect
Edge
Locations
Browse
Play
Watch
• Services are built by different
teams who work together to
figure out what each service
will provide.
• The service owner publishes
an API that anyone can use.
Highly aligned, loosely coupled
Automate all the things!
http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html
• Application startup
• Configuration
• Code deployment
•System
deployment
Automate all the things!
The Netflix way
• Fully automated build tools
to test and make packages
• Fully automated machine
image bakery
• Fully automated image
deployment
• Standard base image
• Tools to manage all the
systems
• Reduce errors through
reproducibility
Automation
Continuous Integration
• Each checkin results in a deployment
• Runs automatically with a new checkin
• Includes running tests and canaries
Self Service
• The goal is to make
everything self service
• This is how an organization
scales their operations
slower than their growth
What’s going
on?!
How we built it
• Built our own big data
system
• Based on S3 and EMR
• Less copies, lower
resolution, and slower
speed retrieval based on
age of data
• But all the data is there if
we need it
Self Serve is the Key
• Developers choose
what metrics to
submit
• What graphs they put
on their dashboards
• What to alert on
• They are closest to
the app, so they know
best
• Simulate things
that go wrong
• Find things that
are different
The Monkey Theory
• What went wrong?
• How could we have detected it sooner?
• How could we have prevented it?
• How can we prevent this class of
problem in the future?
• How can we improve our behavior for
next time?
Ask the key questions:
Incident Reviews
PR
Customer Service
Metrics Impact / Feature Disable
No Impact -- Fast recovery or automatic failover
Data
Data is the most
important asset
your business
will have.
Shared state should
be stored in a
shared service
Data on an instance
should be replicated
to other instances
• Have multiple copies of all data
• Keep those copies in multiple datacenter (AZs)
• Avoid keeping state on a single instance
• Take frequent snapshots of EBS disks
• No secret keys on the instance
Best Practices for Data
Queues are your friend
• Any unpredictable workload, i.e.
anything based on a user interaction
• Gives great insight because you can
see if the queue is processing fast
enough
• Aids in autoscaling as an input into
the calculation
Second class users
• Logged out users get
cached content.
• CDN bears the brunt of
the traffic
Database Scaling with Sharding
Sharding
• Split writes across master databases
• Each can have a slave, some many
slaves based on workload
• One can avoid reading from the
master if possible
• Picking the sharing key well is
essential and fraught with peril
Building a data model
•What questions you want to
ask your data?
•Don’t try and normalize
anything
•Instead of changing a value
keep a record of what
happened
Data schemas
• Unless you are really really
sure of your business model...
• The less schema the better
• reddit’s database is literally
just keys and values, despite
being in Postgress
Cassandra
• Availability over consistency
• Writes over reads
• We know Java
• Open source + support
Why Cassandra?
•Priam
• Zero touch auto-config
• State management
• Token assignment
• Node replacement
• Backup/restore to/from S3
Using Cassandra at Netflix
•Astyanax
• OO abstraction
to Cassandra
• Multi-region
support
A
BC
3
2
A
BC
3
2
D
Rendezvous Hashing
or Highest Random
Weight Hashing
Going multi-zone
Cassandra Architecture
Going Multi-region
• 100% uptime is theoretically possible.
• You have to replicate your data
• This will cost money
Leveraging Mutli-region
Expire your data
• It’s a lot easier to manage
if your data is either gone
or in static form
• Users will almost never
notice
Think of SSDs as cheap RAM,
not expensive disk
• Data replication
• Cache invalidation
• Misdirected users
• Sudden load increase during
failover
• When do you fail over?
Multi-Region Challenges
• Three strategies available to
users:
• No replication
• Invalidation only
• Full copy
Cache Replication
Dynomite
Lambda/Kappa Architecture
http://lambda-architecture.net/
http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
Lambda/Kappa Architecture
Stream processing
• Storm, Kafka, Spark, Spark Streaming, etc.
• Spark is nice because you can use the same
programming model for both batch and
stream processing
Netflix Architecture Evolution
Other Netflix Talks
• Mon 11:50 — Asynchronous programming at
Netflix — Bayview
• Mon 1:40 — Mantis: Netflix’s Event Stream
Processing System — Seacliff
• Mon 2:55 — How We Learned to Stop
Worrying and Start Deploying the Netflix API
— Ballroom B/C
• Tue 10:35 — Reactive Programming with RX
— Ballroom A
• Wed 2:55 — Scalable Microservices at Netflix
— Ballroom B/C
Scalable data architectures –
from thousands to billions of events
@philip_pfo
Story
Netflix streaming – 2007 to present
Device Growth
2007
1 device
2008
10s of devices
2009
10s of devices
2010
100s of devices
2011+
1000+ devices
Experience Evolution
Subscriber & Viewing Growth
Improved
Personalization
Better
Experience
Viewing
Virtuous Cycle
Viewing Data
Who, What, When, Where, How Long
Real time data use cases
What have I watched?
Real time data use cases
Where was I at?
Real time data use cases
What else am I watching?
Session Analytics
Session Analytics
Active
Sessions
Last
Position
Viewing
History
Data
Feed
Generic Architecture
Start Stop
Collect
Process
Stream
State
Session
Summary
Event
Stream
Provide
Architecture Evolution
• Different generations
• Pain points & learnings
• Re-architecture
motivations
Real Time Data
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached
Real Time Data – gen 1
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached
Real Time Data – gen 1
Start Stop
Sessions
Logs /
Events
History /
Position
SQL
Real Time Data – gen 1 pain points
• Scalability
– DB scaled up not out
• Event Data Analytics
– ad hoc
• Fixed schema
Real Time Data – gen 2
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached
Real Time Data – gen 2 motivations
• Scalability
– Scale out not up
• Flexible schema
– Key/value attributes
• Service oriented
Real Time Data – gen 2
Start Stop
NoSQL
50 data partitions
Viewing Service
Real Time Data – gen 2 pain points
• Scale out
– Resharding was painful
• Performance
– Hot spots
• Disaster Recovery
– SimpleDB had no backups
Real Time Data – gen 3
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached
Real Time Data – gen 3 landscape
• Cassandra 0.6
• Before SSDs in AWS
• Netflix in 1 AWS region
Real Time Data – gen 3 motivations
• Order of magnitude
increase in requests
• Scalability
– Actually scale out
rather than up
Real Time Data – gen 3
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Stateless
Tier
(fallback)
Sessions
Viewing History
Memcached
Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Start
Stop
Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
update
Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
snapshot
Sessions
Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
Viewing History
Memcached
Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
Viewing History
Memcached
stop
Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Stateless
Tier
(fallback)
Sessions
Viewing History
Memcached
Real Time Data – gen 3 reads
ViewingService
Stateful
Tier
What
have I
watched?
Viewing History
Memcached
View Summary
Real Time Data – gen 3 reads
ViewingService
Stateful
Tier
Latest Positions
Where
was I at?
Viewing History
Stateless
Tier
(fallback)
Memcached
Real Time Data – gen 3 reads
ViewingService
Stateful
Tier
What else
am I
watching?
Active Sessions
gen 3 - Requests Scale
Operation Scale
Create (start streaming) 1,000s per second
Update (heartbeat, close) 100,000s per second
Append (session events/logs) 10,000s per second
Read viewing history 10,000s per second
Read latest position 100,000s per second
gen 3 – Cluster Scale
Cluster Scale
Cassandra Viewing History ~100 hi1.4xl nodes
~48 TB total space used
Viewing Service Stateful Tier ~1700 r3.2xl nodes
50GB heap memory per node
Memcached ~450 r3.2xl/xl nodes
~8TB memory used
Real Time Data – gen 3 pain points
• Stateful tier
– Hot spots
– Multi-region complexity
• Monolithic service
• read-modify-write poorly
suited for memcached
Real Time Data – gen 3 learnings
• Distributed stateful
systems are hard
– Go stateless, use
C*/memcached/redis…
• Decompose into
microservices
Real Time Data – gen 4
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Stateless
Tier
(fallback)
Viewing History
Sessions
Memcached
Real Time Data – gen 4
Stream State/Event
Collectors
Stateless Microservices
Data Processors
Data Services Data Feeds
Real Time Data – gen 4
Viewing History
Session State Session Positions
Session Events
Data Tiers
redisredis
Session Analytics
• Summarize detailed
event data
• Non-real time, but
near real time
• Some shared logic
with real time
Session Analytics - Processing
2007 2009 20102008 2011 2012 2013 2014 Future
Custom
Service
(Java on AWS)
Mantis
Batch
Near
Real-Time
Stream
Processing
Session Analytics - Storage
2007 2009 20102008 2011 2012 2013 2014 Future
Batch
Near
Real-Time
Stream
Processing
Session Analytics – gen 1
• Storage • Processing
SessionsLogs
Session Analytics – gen 1 pain points
• MapReduce good for batch
– Not for near real time
• Complexity
– Code in 2 systems /
frameworks
– Operational burden of 2
systems
Session Analytics – gen 2
• Storage • Processing
Session Events
& Logs
Java
Session Analytics – gen 2 learnings
• Reduced complexity
– shared code and ops
• Batch still available
• New bottleneck
– harder to extend logic
Session Analytics – gen 3 (*)
• Storage • Processing
Mantis
Storm
Samza
Spark Streaming
Stream Processing Frameworks
Takeaways
• Polygot Persistence
– One size fits all doesn’t
fit all
• Strong opinions, loosely
held
– Design for long term, but
be open to redesigns
Thanks!
@philip_pfo
Photo Credits
• http://www.flickr.com/photos/jmarty/440330328/
• http://www.flickr.com/photos/aarghj/4208003744/
• NASA
• http://www.flickr.com/photos/historyinanhour/4775644390/
• http://www.flickr.com/photos/usnavy/5957825634/
• http://www.flickr.com/photos/specialkrb/3376739919/
• http://www.flickr.com/photos/marc_smith/6246433861/sizes/l/
Photo Credits
• http://www.flickr.com/photos/rachelpasch/2815827189/sizes/l/
• http://www.flickr.com/photos/9305729@N05/8488900567/sizes/l/
• http://www.flickr.com/photos/11707873@N00/4312361721/sizes/l/
• http://www.flickr.com/photos/webatelier/5929298123/sizes/l/
Questions?
Don’t forget to vote!
We’ll be at the open space
for this track immediately
following the break
Getting in touch
Email: jedberg@{gmail,netflix}.com
Twitter: @jedberg
Web: www.jedberg.net
Facebook: facebook.com/jedberg
Linkedin: www.linkedin.com/in/jedberg
Email: pfisher-ogden@netflix.com
Twitter: @philip_pfo
Linkedin: www.linkedin.com/in/philfish
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/scalable-
resilient-systems

Mais conteúdo relacionado

Mais de C4Media

Mais de C4Media (20)

Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

You Won't Believe How the Biggest Sites Build Scalable and Resilient Systems!

  • 1. November 3rd, 2014 Email: jedberg@{gmail,netflix}.com Twitter: @jedberg Web: www.jedberg.net Facebook: facebook.com/jedberg Linkedin: www.linkedin.com/in/jedberg You won't believe how the biggest sites build scalable and resilient systems! Email: pfisher-ogden@netflix.com Twitter: @philip_pfo Linkedin: www.linkedin.com/in/philfish
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /scalable-resilient-systems
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 4. We used to live in a world where the assumption was that nothing breaks
  • 5. That’s just not true anymore
  • 8. Speed at Scale breaks everything @adrianco
  • 9. What you’ll hear today • Operational Best Practices • Data Best Practices • Evolution of the architecture of theVideo History Service at Netflix
  • 10. Cloud Native 10s of thousands of instances, thousands created and removed daily Thousands of storage nodes, petabytes of data, nodes can be removed without harm (Some folks call this Microservices)
  • 11. Why do we use the Public Cloud?
  • 14. Benefits of Public Clouds • Don’t have to procure servers anymore • No racking or imaging servers anymore • Systems are always “just the right size” • Machines can be named by function • Time to market is faster • Multiple physical locations with AZs and regions • Elasticity!
  • 15. We want to use clouds, not build them • Public cloud for agility and scale • We use electricity, but we don’t build our own power stations • AWS because they are big enough to allocate thousands of instances per hour when necessary
  • 16. What about private clouds? • Some of the problems you don’t have: noisy neighbors, lack of physical access • Problem you do have: You have to pay for your spare capacity instead of someone else
  • 17. All systems choices assume some part will fail at some point.
  • 21. • Easier auto-scaling • Easier capacity planning • Identify problematic code-paths more easily • Narrow in the effects of a change • More efficient local caching Advantages to a Service Oriented Architecture
  • 22. 12B outbound requests per day to API dependencies Movie Ratings Personalizatio n Engine User Info Movie Metadata Similar Movies Reviews A/B Test Engine 2B requests per day into the Netflix API Discovery API Streaming API
  • 23. Movie Ratings Personalizatio n Engine User Info Movie Metadata Similar Movies Reviews A/B Test Engine Discovery API Streaming API Content Encoding CDN Management QOS Logging DRM OpenConnect Edge Locations Browse Play Watch
  • 24. • Services are built by different teams who work together to figure out what each service will provide. • The service owner publishes an API that anyone can use. Highly aligned, loosely coupled
  • 25. Automate all the things! http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html
  • 26. • Application startup • Configuration • Code deployment •System deployment Automate all the things!
  • 27. The Netflix way • Fully automated build tools to test and make packages • Fully automated machine image bakery • Fully automated image deployment
  • 28. • Standard base image • Tools to manage all the systems • Reduce errors through reproducibility Automation
  • 29. Continuous Integration • Each checkin results in a deployment • Runs automatically with a new checkin • Includes running tests and canaries
  • 30. Self Service • The goal is to make everything self service • This is how an organization scales their operations slower than their growth
  • 32.
  • 33. How we built it • Built our own big data system • Based on S3 and EMR • Less copies, lower resolution, and slower speed retrieval based on age of data • But all the data is there if we need it
  • 34. Self Serve is the Key • Developers choose what metrics to submit • What graphs they put on their dashboards • What to alert on • They are closest to the app, so they know best
  • 35. • Simulate things that go wrong • Find things that are different The Monkey Theory
  • 36. • What went wrong? • How could we have detected it sooner? • How could we have prevented it? • How can we prevent this class of problem in the future? • How can we improve our behavior for next time? Ask the key questions: Incident Reviews
  • 37. PR Customer Service Metrics Impact / Feature Disable No Impact -- Fast recovery or automatic failover
  • 38. Data
  • 39. Data is the most important asset your business will have.
  • 40. Shared state should be stored in a shared service Data on an instance should be replicated to other instances
  • 41. • Have multiple copies of all data • Keep those copies in multiple datacenter (AZs) • Avoid keeping state on a single instance • Take frequent snapshots of EBS disks • No secret keys on the instance Best Practices for Data
  • 42. Queues are your friend • Any unpredictable workload, i.e. anything based on a user interaction • Gives great insight because you can see if the queue is processing fast enough • Aids in autoscaling as an input into the calculation
  • 43. Second class users • Logged out users get cached content. • CDN bears the brunt of the traffic
  • 45. Sharding • Split writes across master databases • Each can have a slave, some many slaves based on workload • One can avoid reading from the master if possible • Picking the sharing key well is essential and fraught with peril
  • 46. Building a data model •What questions you want to ask your data? •Don’t try and normalize anything •Instead of changing a value keep a record of what happened
  • 47. Data schemas • Unless you are really really sure of your business model... • The less schema the better • reddit’s database is literally just keys and values, despite being in Postgress
  • 49. • Availability over consistency • Writes over reads • We know Java • Open source + support Why Cassandra?
  • 50. •Priam • Zero touch auto-config • State management • Token assignment • Node replacement • Backup/restore to/from S3 Using Cassandra at Netflix •Astyanax • OO abstraction to Cassandra • Multi-region support
  • 53. Rendezvous Hashing or Highest Random Weight Hashing
  • 57. • 100% uptime is theoretically possible. • You have to replicate your data • This will cost money Leveraging Mutli-region
  • 58.
  • 59.
  • 60.
  • 61. Expire your data • It’s a lot easier to manage if your data is either gone or in static form • Users will almost never notice
  • 62. Think of SSDs as cheap RAM, not expensive disk
  • 63. • Data replication • Cache invalidation • Misdirected users • Sudden load increase during failover • When do you fail over? Multi-Region Challenges
  • 64. • Three strategies available to users: • No replication • Invalidation only • Full copy Cache Replication
  • 68. Stream processing • Storm, Kafka, Spark, Spark Streaming, etc. • Spark is nice because you can use the same programming model for both batch and stream processing
  • 70. Other Netflix Talks • Mon 11:50 — Asynchronous programming at Netflix — Bayview • Mon 1:40 — Mantis: Netflix’s Event Stream Processing System — Seacliff • Mon 2:55 — How We Learned to Stop Worrying and Start Deploying the Netflix API — Ballroom B/C • Tue 10:35 — Reactive Programming with RX — Ballroom A • Wed 2:55 — Scalable Microservices at Netflix — Ballroom B/C
  • 71. Scalable data architectures – from thousands to billions of events @philip_pfo
  • 72. Story Netflix streaming – 2007 to present
  • 73. Device Growth 2007 1 device 2008 10s of devices 2009 10s of devices 2010 100s of devices 2011+ 1000+ devices
  • 77. Viewing Data Who, What, When, Where, How Long
  • 78. Real time data use cases What have I watched?
  • 79. Real time data use cases Where was I at?
  • 80. Real time data use cases What else am I watching?
  • 84. Architecture Evolution • Different generations • Pain points & learnings • Re-architecture motivations
  • 85. Real Time Data 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached
  • 86. Real Time Data – gen 1 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached
  • 87. Real Time Data – gen 1 Start Stop Sessions Logs / Events History / Position SQL
  • 88. Real Time Data – gen 1 pain points • Scalability – DB scaled up not out • Event Data Analytics – ad hoc • Fixed schema
  • 89. Real Time Data – gen 2 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached
  • 90. Real Time Data – gen 2 motivations • Scalability – Scale out not up • Flexible schema – Key/value attributes • Service oriented
  • 91. Real Time Data – gen 2 Start Stop NoSQL 50 data partitions Viewing Service
  • 92. Real Time Data – gen 2 pain points • Scale out – Resharding was painful • Performance – Hot spots • Disaster Recovery – SimpleDB had no backups
  • 93. Real Time Data – gen 3 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached
  • 94. Real Time Data – gen 3 landscape • Cassandra 0.6 • Before SSDs in AWS • Netflix in 1 AWS region
  • 95. Real Time Data – gen 3 motivations • Order of magnitude increase in requests • Scalability – Actually scale out rather than up
  • 96. Real Time Data – gen 3 ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Stateless Tier (fallback) Sessions Viewing History Memcached
  • 97. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Start Stop
  • 98. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop
  • 99. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop update
  • 100. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop snapshot Sessions
  • 101. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop Viewing History Memcached
  • 102. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop Viewing History Memcached stop
  • 103. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Stateless Tier (fallback) Sessions Viewing History Memcached
  • 104. Real Time Data – gen 3 reads ViewingService Stateful Tier What have I watched? Viewing History Memcached View Summary
  • 105. Real Time Data – gen 3 reads ViewingService Stateful Tier Latest Positions Where was I at? Viewing History Stateless Tier (fallback) Memcached
  • 106. Real Time Data – gen 3 reads ViewingService Stateful Tier What else am I watching? Active Sessions
  • 107. gen 3 - Requests Scale Operation Scale Create (start streaming) 1,000s per second Update (heartbeat, close) 100,000s per second Append (session events/logs) 10,000s per second Read viewing history 10,000s per second Read latest position 100,000s per second
  • 108. gen 3 – Cluster Scale Cluster Scale Cassandra Viewing History ~100 hi1.4xl nodes ~48 TB total space used Viewing Service Stateful Tier ~1700 r3.2xl nodes 50GB heap memory per node Memcached ~450 r3.2xl/xl nodes ~8TB memory used
  • 109. Real Time Data – gen 3 pain points • Stateful tier – Hot spots – Multi-region complexity • Monolithic service • read-modify-write poorly suited for memcached
  • 110. Real Time Data – gen 3 learnings • Distributed stateful systems are hard – Go stateless, use C*/memcached/redis… • Decompose into microservices
  • 111. Real Time Data – gen 4 ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Stateless Tier (fallback) Viewing History Sessions Memcached
  • 112. Real Time Data – gen 4 Stream State/Event Collectors Stateless Microservices Data Processors Data Services Data Feeds
  • 113. Real Time Data – gen 4 Viewing History Session State Session Positions Session Events Data Tiers redisredis
  • 114. Session Analytics • Summarize detailed event data • Non-real time, but near real time • Some shared logic with real time
  • 115. Session Analytics - Processing 2007 2009 20102008 2011 2012 2013 2014 Future Custom Service (Java on AWS) Mantis Batch Near Real-Time Stream Processing
  • 116. Session Analytics - Storage 2007 2009 20102008 2011 2012 2013 2014 Future Batch Near Real-Time Stream Processing
  • 117. Session Analytics – gen 1 • Storage • Processing SessionsLogs
  • 118. Session Analytics – gen 1 pain points • MapReduce good for batch – Not for near real time • Complexity – Code in 2 systems / frameworks – Operational burden of 2 systems
  • 119. Session Analytics – gen 2 • Storage • Processing Session Events & Logs Java
  • 120. Session Analytics – gen 2 learnings • Reduced complexity – shared code and ops • Batch still available • New bottleneck – harder to extend logic
  • 121. Session Analytics – gen 3 (*) • Storage • Processing Mantis Storm Samza Spark Streaming Stream Processing Frameworks
  • 122. Takeaways • Polygot Persistence – One size fits all doesn’t fit all • Strong opinions, loosely held – Design for long term, but be open to redesigns
  • 124. Photo Credits • http://www.flickr.com/photos/jmarty/440330328/ • http://www.flickr.com/photos/aarghj/4208003744/ • NASA • http://www.flickr.com/photos/historyinanhour/4775644390/ • http://www.flickr.com/photos/usnavy/5957825634/ • http://www.flickr.com/photos/specialkrb/3376739919/ • http://www.flickr.com/photos/marc_smith/6246433861/sizes/l/
  • 125. Photo Credits • http://www.flickr.com/photos/rachelpasch/2815827189/sizes/l/ • http://www.flickr.com/photos/9305729@N05/8488900567/sizes/l/ • http://www.flickr.com/photos/11707873@N00/4312361721/sizes/l/ • http://www.flickr.com/photos/webatelier/5929298123/sizes/l/
  • 127. Don’t forget to vote! We’ll be at the open space for this track immediately following the break
  • 128. Getting in touch Email: jedberg@{gmail,netflix}.com Twitter: @jedberg Web: www.jedberg.net Facebook: facebook.com/jedberg Linkedin: www.linkedin.com/in/jedberg Email: pfisher-ogden@netflix.com Twitter: @philip_pfo Linkedin: www.linkedin.com/in/philfish
  • 129. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/scalable- resilient-systems