Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1y3uXp6.
The authors discuss about the lessons learned from all the biggest sites on the internet about how to build scalable and resilient architectures. Filmed at qconsf.com.
Jeremy Edberg is currently the Reliability Architect for Netflix, the largest video streaming service in the world. Philip Fisher-Ogden is the Director of Engineering for Playback Services at Netflix, responsible for systems that ensure every play-request to Netflix results in a play.
The 7 Things I Know About Cyber Security After 25 Years | April 2024
You Won't Believe How the Biggest Sites Build Scalable and Resilient Systems!
1. November 3rd, 2014
Email: jedberg@{gmail,netflix}.com
Twitter: @jedberg
Web: www.jedberg.net
Facebook: facebook.com/jedberg
Linkedin: www.linkedin.com/in/jedberg
You won't believe how the biggest
sites build scalable and resilient
systems!
Email: pfisher-ogden@netflix.com
Twitter: @philip_pfo
Linkedin: www.linkedin.com/in/philfish
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/scalable-resilient-systems
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
4. We used to live in a world where
the assumption was that nothing
breaks
9. What you’ll hear today
• Operational Best Practices
• Data Best Practices
• Evolution of the
architecture of theVideo
History Service at Netflix
10. Cloud Native
10s of thousands of instances, thousands
created and removed daily
Thousands of storage nodes, petabytes of
data, nodes can be removed without harm
(Some folks call this Microservices)
14. Benefits of Public Clouds
• Don’t have to procure servers anymore
• No racking or imaging servers anymore
• Systems are always “just the right size”
• Machines can be named by function
• Time to market is faster
• Multiple physical locations with AZs and regions
• Elasticity!
15. We want to use clouds, not build
them
• Public cloud for agility and scale
• We use electricity, but we don’t
build our own power stations
• AWS because they are big enough
to allocate thousands of instances
per hour when necessary
16. What about private clouds?
• Some of the problems you don’t
have: noisy neighbors, lack of
physical access
• Problem you do have: You have
to pay for your spare capacity
instead of someone else
21. • Easier auto-scaling
• Easier capacity planning
• Identify problematic code-paths
more easily
• Narrow in the effects of a change
• More efficient local caching
Advantages to a Service Oriented
Architecture
22. 12B outbound
requests per day to
API dependencies
Movie
Ratings
Personalizatio
n Engine
User Info
Movie
Metadata
Similar
Movies
Reviews
A/B Test
Engine
2B requests per
day
into the Netflix API
Discovery
API
Streaming
API
24. • Services are built by different
teams who work together to
figure out what each service
will provide.
• The service owner publishes
an API that anyone can use.
Highly aligned, loosely coupled
25. Automate all the things!
http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html
26. • Application startup
• Configuration
• Code deployment
•System
deployment
Automate all the things!
27. The Netflix way
• Fully automated build tools
to test and make packages
• Fully automated machine
image bakery
• Fully automated image
deployment
28. • Standard base image
• Tools to manage all the
systems
• Reduce errors through
reproducibility
Automation
29. Continuous Integration
• Each checkin results in a deployment
• Runs automatically with a new checkin
• Includes running tests and canaries
30. Self Service
• The goal is to make
everything self service
• This is how an organization
scales their operations
slower than their growth
33. How we built it
• Built our own big data
system
• Based on S3 and EMR
• Less copies, lower
resolution, and slower
speed retrieval based on
age of data
• But all the data is there if
we need it
34. Self Serve is the Key
• Developers choose
what metrics to
submit
• What graphs they put
on their dashboards
• What to alert on
• They are closest to
the app, so they know
best
36. • What went wrong?
• How could we have detected it sooner?
• How could we have prevented it?
• How can we prevent this class of
problem in the future?
• How can we improve our behavior for
next time?
Ask the key questions:
Incident Reviews
39. Data is the most
important asset
your business
will have.
40. Shared state should
be stored in a
shared service
Data on an instance
should be replicated
to other instances
41. • Have multiple copies of all data
• Keep those copies in multiple datacenter (AZs)
• Avoid keeping state on a single instance
• Take frequent snapshots of EBS disks
• No secret keys on the instance
Best Practices for Data
42. Queues are your friend
• Any unpredictable workload, i.e.
anything based on a user interaction
• Gives great insight because you can
see if the queue is processing fast
enough
• Aids in autoscaling as an input into
the calculation
43. Second class users
• Logged out users get
cached content.
• CDN bears the brunt of
the traffic
45. Sharding
• Split writes across master databases
• Each can have a slave, some many
slaves based on workload
• One can avoid reading from the
master if possible
• Picking the sharing key well is
essential and fraught with peril
46. Building a data model
•What questions you want to
ask your data?
•Don’t try and normalize
anything
•Instead of changing a value
keep a record of what
happened
47. Data schemas
• Unless you are really really
sure of your business model...
• The less schema the better
• reddit’s database is literally
just keys and values, despite
being in Postgress
63. • Data replication
• Cache invalidation
• Misdirected users
• Sudden load increase during
failover
• When do you fail over?
Multi-Region Challenges
64. • Three strategies available to
users:
• No replication
• Invalidation only
• Full copy
Cache Replication
68. Stream processing
• Storm, Kafka, Spark, Spark Streaming, etc.
• Spark is nice because you can use the same
programming model for both batch and
stream processing
85. Real Time Data
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached
86. Real Time Data – gen 1
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached
87. Real Time Data – gen 1
Start Stop
Sessions
Logs /
Events
History /
Position
SQL
88. Real Time Data – gen 1 pain points
• Scalability
– DB scaled up not out
• Event Data Analytics
– ad hoc
• Fixed schema
89. Real Time Data – gen 2
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached
90. Real Time Data – gen 2 motivations
• Scalability
– Scale out not up
• Flexible schema
– Key/value attributes
• Service oriented
91. Real Time Data – gen 2
Start Stop
NoSQL
50 data partitions
Viewing Service
92. Real Time Data – gen 2 pain points
• Scale out
– Resharding was painful
• Performance
– Hot spots
• Disaster Recovery
– SimpleDB had no backups
93. Real Time Data – gen 3
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached
94. Real Time Data – gen 3 landscape
• Cassandra 0.6
• Before SSDs in AWS
• Netflix in 1 AWS region
95. Real Time Data – gen 3 motivations
• Order of magnitude
increase in requests
• Scalability
– Actually scale out
rather than up
96. Real Time Data – gen 3
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Stateless
Tier
(fallback)
Sessions
Viewing History
Memcached
97. Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Start
Stop
98. Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
99. Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
update
100. Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
snapshot
Sessions
101. Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
Viewing History
Memcached
102. Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
Viewing History
Memcached
stop
103. Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Stateless
Tier
(fallback)
Sessions
Viewing History
Memcached
104. Real Time Data – gen 3 reads
ViewingService
Stateful
Tier
What
have I
watched?
Viewing History
Memcached
View Summary
105. Real Time Data – gen 3 reads
ViewingService
Stateful
Tier
Latest Positions
Where
was I at?
Viewing History
Stateless
Tier
(fallback)
Memcached
106. Real Time Data – gen 3 reads
ViewingService
Stateful
Tier
What else
am I
watching?
Active Sessions
107. gen 3 - Requests Scale
Operation Scale
Create (start streaming) 1,000s per second
Update (heartbeat, close) 100,000s per second
Append (session events/logs) 10,000s per second
Read viewing history 10,000s per second
Read latest position 100,000s per second
108. gen 3 – Cluster Scale
Cluster Scale
Cassandra Viewing History ~100 hi1.4xl nodes
~48 TB total space used
Viewing Service Stateful Tier ~1700 r3.2xl nodes
50GB heap memory per node
Memcached ~450 r3.2xl/xl nodes
~8TB memory used
109. Real Time Data – gen 3 pain points
• Stateful tier
– Hot spots
– Multi-region complexity
• Monolithic service
• read-modify-write poorly
suited for memcached
110. Real Time Data – gen 3 learnings
• Distributed stateful
systems are hard
– Go stateless, use
C*/memcached/redis…
• Decompose into
microservices
111. Real Time Data – gen 4
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Stateless
Tier
(fallback)
Viewing History
Sessions
Memcached
112. Real Time Data – gen 4
Stream State/Event
Collectors
Stateless Microservices
Data Processors
Data Services Data Feeds
113. Real Time Data – gen 4
Viewing History
Session State Session Positions
Session Events
Data Tiers
redisredis
114. Session Analytics
• Summarize detailed
event data
• Non-real time, but
near real time
• Some shared logic
with real time
115. Session Analytics - Processing
2007 2009 20102008 2011 2012 2013 2014 Future
Custom
Service
(Java on AWS)
Mantis
Batch
Near
Real-Time
Stream
Processing
118. Session Analytics – gen 1 pain points
• MapReduce good for batch
– Not for near real time
• Complexity
– Code in 2 systems /
frameworks
– Operational burden of 2
systems
122. Takeaways
• Polygot Persistence
– One size fits all doesn’t
fit all
• Strong opinions, loosely
held
– Design for long term, but
be open to redesigns