You Won't Believe How the Biggest Sites Build Scalable and Resilient Systems!

November 3rd, 2014
Email: jedberg@{gmail,netflix}.com
Twitter: @jedberg
Web: www.jedberg.net
Facebook: facebook.com/jedberg
Linkedin: www.linkedin.com/in/jedberg
You won't believe how the biggest
sites build scalable and resilient
systems!
Email: pfisher-ogden@netflix.com
Twitter: @philip_pfo
Linkedin: www.linkedin.com/in/philfish

InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/scalable-resilient-systems

Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com

We used to live in a world where
the assumption was that nothing
breaks

That’s just not true anymore

Speed at Scale
breaks everything
@adrianco

What you’ll hear today
• Operational Best Practices
• Data Best Practices
• Evolution of the
architecture of theVideo
History Service at Netﬂix

Cloud Native
10s of thousands of instances, thousands
created and removed daily
Thousands of storage nodes, petabytes of
data, nodes can be removed without harm
(Some folks call this Microservices)

Why do we use the Public Cloud?

Things'We'Don’t'Do'Things We Don’t Do

Beneﬁts of Public Clouds
• Don’t have to procure servers anymore
• No racking or imaging servers anymore
• Systems are always “just the right size”
• Machines can be named by function
• Time to market is faster
• Multiple physical locations with AZs and regions
• Elasticity!

We want to use clouds, not build
them
• Public cloud for agility and scale
• We use electricity, but we don’t
build our own power stations
• AWS because they are big enough
to allocate thousands of instances
per hour when necessary

What about private clouds?
• Some of the problems you don’t
have: noisy neighbors, lack of
physical access
• Problem you do have: You have
to pay for your spare capacity
instead of someone else

All systems
choices assume
some part will fail
at some point.

• Easier auto-scaling
• Easier capacity planning
• Identify problematic code-paths
more easily
• Narrow in the effects of a change
• More efﬁcient local caching
Advantages to a Service Oriented
Architecture

12B outbound
requests per day to
API dependencies
Movie
Ratings
Personalizatio
n Engine
User Info
Movie
Metadata
Similar
Movies
Reviews
A/B Test
Engine
2B requests per
day
into the Netﬂix API
Discovery
API
Streaming
API

Movie
Ratings
Personalizatio
n Engine
User Info
Movie
Metadata
Similar
Movies
Reviews
A/B Test
Engine
Discovery
API
Streaming
API
Content
Encoding
CDN
Management
QOS
Logging
DRM
OpenConnect
Edge
Locations
Browse
Play
Watch

• Services are built by different
teams who work together to
ﬁgure out what each service
will provide.
• The service owner publishes
an API that anyone can use.
Highly aligned, loosely coupled

Automate all the things!
http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html

• Application startup
• Conﬁguration
• Code deployment
•System
deployment
Automate all the things!

The Netﬂix way
• Fully automated build tools
to test and make packages
• Fully automated machine
image bakery
• Fully automated image
deployment

• Standard base image
• Tools to manage all the
systems
• Reduce errors through
reproducibility
Automation

Continuous Integration
• Each checkin results in a deployment
• Runs automatically with a new checkin
• Includes running tests and canaries

Self Service
• The goal is to make
everything self service
• This is how an organization
scales their operations
slower than their growth

How we built it
• Built our own big data
system
• Based on S3 and EMR
• Less copies, lower
resolution, and slower
speed retrieval based on
age of data
• But all the data is there if
we need it

Self Serve is the Key
• Developers choose
what metrics to
submit
• What graphs they put
on their dashboards
• What to alert on
• They are closest to
the app, so they know
best

• Simulate things
that go wrong
• Find things that
are different
The Monkey Theory

• What went wrong?
• How could we have detected it sooner?
• How could we have prevented it?
• How can we prevent this class of
problem in the future?
• How can we improve our behavior for
next time?
Ask the key questions:
Incident Reviews

PR
Customer Service
Metrics Impact / Feature Disable
No Impact -- Fast recovery or automatic failover

Data is the most
important asset
your business
will have.

Shared state should
be stored in a
shared service
Data on an instance
should be replicated
to other instances

• Have multiple copies of all data
• Keep those copies in multiple datacenter (AZs)
• Avoid keeping state on a single instance
• Take frequent snapshots of EBS disks
• No secret keys on the instance
Best Practices for Data

Queues are your friend
• Any unpredictable workload, i.e.
anything based on a user interaction
• Gives great insight because you can
see if the queue is processing fast
enough
• Aids in autoscaling as an input into
the calculation

Second class users
• Logged out users get
cached content.
• CDN bears the brunt of
the trafﬁc

Database Scaling with Sharding

Sharding
• Split writes across master databases
• Each can have a slave, some many
slaves based on workload
• One can avoid reading from the
master if possible
• Picking the sharing key well is
essential and fraught with peril

Building a data model
•What questions you want to
ask your data?
•Don’t try and normalize
anything
•Instead of changing a value
keep a record of what
happened

Data schemas
• Unless you are really really
sure of your business model...
• The less schema the better
• reddit’s database is literally
just keys and values, despite
being in Postgress

• Availability over consistency
• Writes over reads
• We know Java
• Open source + support
Why Cassandra?

•Priam
• Zero touch auto-conﬁg
• State management
• Token assignment
• Node replacement
• Backup/restore to/from S3
Using Cassandra at Netﬂix
•Astyanax
• OO abstraction
to Cassandra
• Multi-region
support

Rendezvous Hashing
or Highest Random
Weight Hashing

• 100% uptime is theoretically possible.
• You have to replicate your data
• This will cost money
Leveraging Mutli-region

Expire your data
• It’s a lot easier to manage
if your data is either gone
or in static form
• Users will almost never
notice

Think of SSDs as cheap RAM,
not expensive disk

• Data replication
• Cache invalidation
• Misdirected users
• Sudden load increase during
failover
• When do you fail over?
Multi-Region Challenges

• Three strategies available to
users:
• No replication
• Invalidation only
• Full copy
Cache Replication

Lambda/Kappa Architecture
http://lambda-architecture.net/

http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
Lambda/Kappa Architecture

Stream processing
• Storm, Kafka, Spark, Spark Streaming, etc.
• Spark is nice because you can use the same
programming model for both batch and
stream processing

Netﬂix Architecture Evolution

Other Netflix Talks
• Mon 11:50 — Asynchronous programming at
Netflix — Bayview
• Mon 1:40 — Mantis: Netflix’s Event Stream
Processing System — Seacliff
• Mon 2:55 — How We Learned to Stop
Worrying and Start Deploying the Netflix API
— Ballroom B/C
• Tue 10:35 — Reactive Programming with RX
— Ballroom A
• Wed 2:55 — Scalable Microservices at Netflix
— Ballroom B/C

Scalable data architectures –
from thousands to billions of events
@philip_pfo

Story
Netflix streaming – 2007 to present

Device Growth
2007
1 device
2008
10s of devices
2009
10s of devices
2010
100s of devices
2011+
1000+ devices

Improved
Personalization
Better
Experience
Viewing
Virtuous Cycle

Viewing Data
Who, What, When, Where, How Long

Real time data use cases
What have I watched?

Where was I at?

What else am I watching?

Active
Sessions
Last
Position
Viewing
History
Data
Feed
Generic Architecture
Start Stop
Collect
Process
Stream
State
Session
Summary
Event
Stream
Provide

Architecture Evolution
• Different generations
• Pain points & learnings
• Re-architecture
motivations

Real Time Data
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached

Real Time Data – gen 1
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached

Start Stop
Sessions
Logs /
Events
History /
Position
SQL

Real Time Data – gen 1 pain points
• Scalability
– DB scaled up not out
• Event Data Analytics
– ad hoc
• Fixed schema

2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached

Real Time Data – gen 2 motivations
• Scalability
– Scale out not up
• Flexible schema
– Key/value attributes
• Service oriented

Start Stop
NoSQL
50 data partitions
Viewing Service

• Scale out
– Resharding was painful
• Performance
– Hot spots
• Disaster Recovery
– SimpleDB had no backups

2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached

Real Time Data – gen 3 landscape
• Cassandra 0.6
• Before SSDs in AWS
• Netflix in 1 AWS region

Real Time Data – gen 3 motivations
• Order of magnitude
increase in requests
• Scalability
– Actually scale out
rather than up

ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Stateless
Tier
(fallback)
Sessions
Viewing History
Memcached

Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Start
Stop

ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop

ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
update

ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
snapshot
Sessions

ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
Viewing History
Memcached

ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
Viewing History
Memcached
stop

ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Stateless
Tier
(fallback)
Sessions
Viewing History
Memcached

Real Time Data – gen 3 reads
ViewingService
Stateful
Tier
What
have I
watched?
Viewing History
Memcached
View Summary

ViewingService
Stateful
Tier
Latest Positions
Where
was I at?
Viewing History
Stateless
Tier
(fallback)
Memcached

ViewingService
Stateful
Tier
What else
am I
watching?
Active Sessions

gen 3 - Requests Scale
Operation Scale
Create (start streaming) 1,000s per second
Update (heartbeat, close) 100,000s per second
Append (session events/logs) 10,000s per second
Read viewing history 10,000s per second
Read latest position 100,000s per second

gen 3 – Cluster Scale
Cluster Scale
Cassandra Viewing History ~100 hi1.4xl nodes
~48 TB total space used
Viewing Service Stateful Tier ~1700 r3.2xl nodes
50GB heap memory per node
Memcached ~450 r3.2xl/xl nodes
~8TB memory used

• Stateful tier
– Hot spots
– Multi-region complexity
• Monolithic service
• read-modify-write poorly
suited for memcached

Real Time Data – gen 3 learnings
• Distributed stateful
systems are hard
– Go stateless, use
C*/memcached/redis…
• Decompose into
microservices

ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Stateless
Tier
(fallback)
Viewing History
Sessions
Memcached

Stream State/Event
Collectors
Stateless Microservices
Data Processors
Data Services Data Feeds

Viewing History
Session State Session Positions
Session Events
Data Tiers
redisredis

Session Analytics
• Summarize detailed
event data
• Non-real time, but
near real time
• Some shared logic
with real time

Session Analytics - Processing
2007 2009 20102008 2011 2012 2013 2014 Future
Custom
Service
(Java on AWS)
Mantis
Batch
Near
Real-Time
Stream
Processing

Session Analytics - Storage
2007 2009 20102008 2011 2012 2013 2014 Future
Batch
Near
Real-Time
Stream
Processing

Session Analytics – gen 1
• Storage • Processing
SessionsLogs

Session Analytics – gen 1 pain points
• MapReduce good for batch
– Not for near real time
• Complexity
– Code in 2 systems /
frameworks
– Operational burden of 2
systems

Session Analytics – gen 2
Session Events
& Logs
Java

Session Analytics – gen 2 learnings
• Reduced complexity
– shared code and ops
• Batch still available
• New bottleneck
– harder to extend logic

Session Analytics – gen 3 (*)
Mantis
Storm
Samza
Spark Streaming
Stream Processing Frameworks

Takeaways
• Polygot Persistence
– One size fits all doesn’t
fit all
• Strong opinions, loosely
held
– Design for long term, but
be open to redesigns

Photo Credits
• http://www.flickr.com/photos/jmarty/440330328/
• http://www.flickr.com/photos/aarghj/4208003744/
• NASA
• http://www.flickr.com/photos/historyinanhour/4775644390/
• http://www.flickr.com/photos/usnavy/5957825634/
• http://www.flickr.com/photos/specialkrb/3376739919/
• http://www.flickr.com/photos/marc_smith/6246433861/sizes/l/

Photo Credits
• http://www.flickr.com/photos/rachelpasch/2815827189/sizes/l/
• http://www.flickr.com/photos/9305729@N05/8488900567/sizes/l/
• http://www.flickr.com/photos/11707873@N00/4312361721/sizes/l/
• http://www.flickr.com/photos/webatelier/5929298123/sizes/l/

Don’t forget to vote!
We’ll be at the open space
for this track immediately
following the break

Getting in touch
Email: jedberg@{gmail,netflix}.com
Twitter: @jedberg
Web: www.jedberg.net
Facebook: facebook.com/jedberg
Linkedin: www.linkedin.com/in/jedberg
Email: pfisher-ogden@netflix.com
Twitter: @philip_pfo
Linkedin: www.linkedin.com/in/philfish

Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/scalable-
resilient-systems

You Won't Believe How the Biggest Sites Build Scalable and Resilient Systems!

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de C4Media

Mais de C4Media (20)

Último

Último (20)

You Won't Believe How the Biggest Sites Build Scalable and Resilient Systems!