Netflix viewing data architecture evolution - QCon 2014

•Transferir como PPTX, PDF•

36 gostaram•26,592 visualizações

Netflix's architecture for viewing data has evolved as streaming usage has grown. Each generation was designed for the next order of magnitude, and was informed by learnings from the previous. From SQL to NoSQL, from data center to cloud, from proprietary to open source, look inside to learn how this system has evolved. (from talk given at QConSF 2014)

Tecnologia

Please read the notes associated with
each slide for the full context of the
presentation

Who am I?
Philip Fisher-Ogden
• Director of Engineering @
Netflix
• Playback Services (making
“click play” work)
• 6 years @ Netflix, from 10
servers to 10,000s

Story
Netflix streaming – 2007 to present

Device Growth
2007
1 device
2008
10s of devices
2009
10s of devices
2010
100s of devices
2011+
1000+ devices

Subscribers & Viewing
53M global subscribers
50 countries
>2 billion hours viewed per month

Improved
Personalization
Better
Experience
Viewing
Virtuous Cycle

Viewing Data
Who, What, When, Where, How Long

Real time data use cases
What have I watched?

Real time data use cases
Where was I at?

Real time data use cases
What else am I watching?

Active
Sessions
Last
Position
Viewing
History
Data
Feed
Generic Architecture
Start Stop
Collect
Process
Stream
State
Session
Summary
Event
Stream
Provide

Architecture Evolution
• Different generations
• Pain points & learnings
• Re-architecture
motivations

Real Time Data
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached

Real Time Data – gen 1
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached

Real Time Data – gen 1
Start Stop
Sessions
Logs /
Events
History /
Position
SQL

Real Time Data – gen 1 pain points
• Scalability
– DB scaled up not out
• Event Data Analytics
– ad hoc
• Fixed schema

Real Time Data – gen 2
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached

Real Time Data – gen 2 motivations
• Scalability
– Scale out not up
• Flexible schema
– Key/value attributes
• Service oriented

Real Time Data – gen 2
Start Stop
NoSQL
50 data partitions
Viewing Service

Real Time Data – gen 2 pain points
• Scale out
– Resharding was painful
• Performance
– Hot spots
• Disaster Recovery
– SimpleDB had no backups

Real Time Data – gen 3
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
NoSQL
Caching
redismemcached

Real Time Data – gen 3 landscape
• Cassandra 0.6
• Before SSDs in AWS
• Netflix in 1 AWS region

Real Time Data – gen 3 motivations
• Order of magnitude
increase in requests
• Scalability
– Actually scale out
rather than up

Real Time Data – gen 3
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Stateless
Tier
(fallback)
Sessions
Viewing History
Memcached

Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Start
Stop

Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop

Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
update

Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
snapshot
Sessions

Real Time Data – gen 3 writes
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
Viewing History
Memcached

Real Time Data – gen 3 reads
ViewingService
Stateful
Tier
What
have I
watched?
Viewing History
Memcached
View Summary

Real Time Data – gen 3 reads
ViewingService
Stateful
Tier
Latest Positions
Where
was I at?
Viewing History
Stateless
Tier
(fallback)
Memcached

Real Time Data – gen 3 reads
ViewingService
Stateful
Tier
What else
am I
watching?
Active Sessions

gen 3 - Requests Scale
Operation Scale
Create (start streaming) 1,000s per second
Update (heartbeat, close) 100,000s per second
Append (session events/logs) 10,000s per second
Read viewing history 10,000s per second
Read latest position 100,000s per second

gen 3 – Cluster Scale
Cluster Scale
Cassandra Viewing History ~100 hi1.4xl nodes
~48 TB total space used
Viewing Service Stateful Tier ~1700 r3.2xl nodes
50GB heap memory per node
Memcached ~450 r3.2xl/xl nodes
~8TB memory used

Real Time Data – gen 3 pain points
• Stateful tier
– Hot spots
– Multi-region complexity
• Monolithic service
• read-modify-write poorly
suited for memcached

Real Time Data – gen 3 learnings
• Distributed stateful
systems are hard
– Go stateless, use
C*/memcached/redis…
• Decompose into
microservices

Real Time Data – gen 4
ViewingService
Stateful
Tier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Stateless
Tier
(fallback)
Viewing History
Sessions
Memcached

Real Time Data – gen 4
Stream State/Event
Collectors
Stateless Microservices
Data Processors
Data Services Data Feeds

Real Time Data – gen 4
Viewing History
Session State Session Positions
Session Events
Data Tiers
redisredis

Session Analytics
• Summarize detailed
event data
• Non-real time, but
near real time
• Some shared logic
with real time

Session Analytics - Processing
2007 2009 20102008 2011 2012 2013 2014 Future
Custom
Service
(Java on AWS)
Mantis
Batch
Near
Real-Time
Stream
Processing

Session Analytics - Storage
2007 2009 20102008 2011 2012 2013 2014 Future
Batch
Near
Real-Time
Stream
Processing

Session Analytics – gen 1
• Storage • Processing
SessionsLogs

Session Analytics – gen 1 pain points
• MapReduce good for batch
– Not for near real time
• Complexity
– Code in 2 systems /
frameworks
– Operational burden of 2
systems

Session Analytics – gen 2
• Storage • Processing
Session Events
& Logs
Java

Session Analytics – gen 2 learnings
• Reduced complexity
– shared code and ops
• Batch still available
• New bottleneck
– harder to extend logic

Session Analytics – gen 3 (*)
• Storage • Processing
Mantis
Storm
Samza
Spark Streaming
Stream Processing Frameworks

Takeaways
• Polyglot Persistence
– One size fits all doesn’t
fit all
• Strong opinions, loosely
held
– Design for long term, but
be open to redesigns

Mais conteúdo relacionado

Mais procurados

Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Lucas Jellema

Facebook Messages & HBase强王

Introduction to Apache Kafka and Confluent... and why they matterconfluent

Introduction SQL Analytics on Lakehouse ArchitectureDatabricks

Microservice ArchitectureNguyen Tung

How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit

Data streaming fundamentalsMohammed Fazuluddin

Apache Kafka Architecture & Fundamentals Explainedconfluent

Stream processing using KafkaKnoldus Inc.

Edge architecture ieee international conference on cloud engineeringMikey Cohen - Hiring Amazing Engineers

Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz

When NOT to use Apache Kafka?Kai Wähner

Deep Dive into Apache Kafkaconfluent

Apache Kafka at LinkedInGuozhang Wang

Apache Flink, AWS Kinesis, Analytics Araf Karsh Hamid

Architecture: MicroservicesAmazon Web Services

Apache Kafka at LinkedInDiscover Pinterest

Processing Semantically-Ordered Streams in Financial ServicesFlink Forward

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent

Object Storage OverviewCloudian

Mais procurados (20)

Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...

Facebook Messages & HBase

Introduction to Apache Kafka and Confluent... and why they matter

Introduction SQL Analytics on Lakehouse Architecture

Microservice Architecture

How Uber scaled its Real Time Infrastructure to Trillion events per day

Data streaming fundamentals

Apache Kafka Architecture & Fundamentals Explained

Stream processing using Kafka

Edge architecture ieee international conference on cloud engineering

Kafka Connect & Streams - the ecosystem around Kafka

When NOT to use Apache Kafka?

Deep Dive into Apache Kafka

Apache Kafka at LinkedIn

Apache Flink, AWS Kinesis, Analytics

Architecture: Microservices

Apache Kafka at LinkedIn

Processing Semantically-Ordered Streams in Financial Services

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen

Object Storage Overview

Semelhante a Netflix viewing data architecture evolution - QCon 2014

WW Historian 10helenafinnan

Cloud Security Monitoring and Spark Analyticsamesar0

Data & Analytics Forum: Moving Telcos to Real TimeSingleStore

TenMax Data Pipeline Experience SharingChen-en Lu

Introducing Cloudian HyperStore 6.0Cloudian

AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services

In-Memory Computing EssentialsDenis Magda

Scality S3 Server: Node js Meetup PresentationScality

Azure Functions - the evolution of microservices platform or marketing gibber...Katherine Golovinova

CQRS + Event SourcingMike Bild

Anton Boyko "The future of serverless computing"Fwdays

SPSAD - Ultimate SharePoint Infrastructure Best Practices Session - SharePoin...Michael Noel

SPSSV 2013 - Ultimate SharePoint Infrastructure Best Practices SessionMichael Noel

[WSO2Con EU 2018] The Rise of Streaming SQLWSO2

SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...Michael Noel

Flink forward-2017-netflix keystones-paasMonal Daxini

Netflix Keystone—Cloud scale event processing pipelineMonal Daxini

AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services

AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services

Semelhante a Netflix viewing data architecture evolution - QCon 2014 (20)

WW Historian 10

Cloud Security Monitoring and Spark Analytics

Data & Analytics Forum: Moving Telcos to Real Time

TenMax Data Pipeline Experience Sharing

Introducing Cloudian HyperStore 6.0

AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...

In-Memory Computing Essentials

Scality S3 Server: Node js Meetup Presentation

Azure Functions - the evolution of microservices platform or marketing gibber...

CQRS + Event Sourcing

Anton Boyko "The future of serverless computing"

SPSAD - Ultimate SharePoint Infrastructure Best Practices Session - SharePoin...

SPSSV 2013 - Ultimate SharePoint Infrastructure Best Practices Session

[WSO2Con EU 2018] The Rise of Streaming SQL

SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...

Flink forward-2017-netflix keystones-paas

Netflix Keystone—Cloud scale event processing pipeline

AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...

AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...

Último

MS Copilot expands with MS Graph connectorsNanddeep Nachan

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

ICT role in 21st century education and its challengesrafiqahmad00786416

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services

DBX First Quarter 2024 Investor PresentationDropbox

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Netflix viewing data architecture evolution - QCon 2014

2. Please read the notes associated with each slide for the full context of the presentation

3. Who am I? Philip Fisher-Ogden • Director of Engineering @ Netflix • Playback Services (making “click play” work) • 6 years @ Netflix, from 10 servers to 10,000s

4. Story Netflix streaming – 2007 to present

5. Device Growth 2007 1 device 2008 10s of devices 2009 10s of devices 2010 100s of devices 2011+ 1000+ devices

6. Experience Evolution

7. Subscribers & Viewing 53M global subscribers 50 countries >2 billion hours viewed per month

8. Improved Personalization Better Experience Viewing Virtuous Cycle

9. Viewing Data Who, What, When, Where, How Long

10. Real time data use cases What have I watched?

11. Real time data use cases Where was I at?

12. Real time data use cases What else am I watching?

13. Session Analytics

14. Session Analytics

15. Active Sessions Last Position Viewing History Data Feed Generic Architecture Start Stop Collect Process Stream State Session Summary Event Stream Provide

16. Architecture Evolution • Different generations • Pain points & learnings • Re-architecture motivations

17. Real Time Data 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached

18. Real Time Data – gen 1 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached

19. Real Time Data – gen 1 Start Stop Sessions Logs / Events History / Position SQL

20. Real Time Data – gen 1 pain points • Scalability – DB scaled up not out • Event Data Analytics – ad hoc • Fixed schema

21. Real Time Data – gen 2 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached

22. Real Time Data – gen 2 motivations • Scalability – Scale out not up • Flexible schema – Key/value attributes • Service oriented

23. Real Time Data – gen 2 Start Stop NoSQL 50 data partitions Viewing Service

24. Real Time Data – gen 2 pain points • Scale out – Resharding was painful • Performance – Hot spots • Disaster Recovery – SimpleDB had no backups

25. Real Time Data – gen 3 2007 2009 20102008 2011 2012 2013 2014 Future SQL NoSQL Caching redismemcached

26. Real Time Data – gen 3 landscape • Cassandra 0.6 • Before SSDs in AWS • Netflix in 1 AWS region

27. Real Time Data – gen 3 motivations • Order of magnitude increase in requests • Scalability – Actually scale out rather than up

28. Real Time Data – gen 3 ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Stateless Tier (fallback) Sessions Viewing History Memcached

29. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Start Stop

30. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop

31. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop update

32. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop snapshot Sessions

33. Real Time Data – gen 3 writes ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Start Stop Viewing History Memcached

34. Real Time Data – gen 3 reads ViewingService Stateful Tier What have I watched? Viewing History Memcached View Summary

35. Real Time Data – gen 3 reads ViewingService Stateful Tier Latest Positions Where was I at? Viewing History Stateless Tier (fallback) Memcached

36. Real Time Data – gen 3 reads ViewingService Stateful Tier What else am I watching? Active Sessions

37. gen 3 - Requests Scale Operation Scale Create (start streaming) 1,000s per second Update (heartbeat, close) 100,000s per second Append (session events/logs) 10,000s per second Read viewing history 10,000s per second Read latest position 100,000s per second

38. gen 3 – Cluster Scale Cluster Scale Cassandra Viewing History ~100 hi1.4xl nodes ~48 TB total space used Viewing Service Stateful Tier ~1700 r3.2xl nodes 50GB heap memory per node Memcached ~450 r3.2xl/xl nodes ~8TB memory used

39. Real Time Data – gen 3 pain points • Stateful tier – Hot spots – Multi-region complexity • Monolithic service • read-modify-write poorly suited for memcached

40. Real Time Data – gen 3 learnings • Distributed stateful systems are hard – Go stateless, use C*/memcached/redis… • Decompose into microservices

41. Real Time Data – gen 4 ViewingService Stateful Tier 0 1 n-2 n-1 … Active Sessions Latest Positions View Summary Stateless Tier (fallback) Viewing History Sessions Memcached

42. Real Time Data – gen 4 Stream State/Event Collectors Stateless Microservices Data Processors Data Services Data Feeds

43. Real Time Data – gen 4 Viewing History Session State Session Positions Session Events Data Tiers redisredis

44. Session Analytics • Summarize detailed event data • Non-real time, but near real time • Some shared logic with real time

45. Session Analytics - Processing 2007 2009 20102008 2011 2012 2013 2014 Future Custom Service (Java on AWS) Mantis Batch Near Real-Time Stream Processing

46. Session Analytics - Storage 2007 2009 20102008 2011 2012 2013 2014 Future Batch Near Real-Time Stream Processing

47. Session Analytics – gen 1 • Storage • Processing SessionsLogs

48. Session Analytics – gen 1 pain points • MapReduce good for batch – Not for near real time • Complexity – Code in 2 systems / frameworks – Operational burden of 2 systems

49. Session Analytics – gen 2 • Storage • Processing Session Events & Logs Java

50. Session Analytics – gen 2 learnings • Reduced complexity – shared code and ops • Batch still available • New bottleneck – harder to extend logic

51. Session Analytics – gen 3 (*) • Storage • Processing Mantis Storm Samza Spark Streaming Stream Processing Frameworks

52. Takeaways • Polyglot Persistence – One size fits all doesn’t fit all • Strong opinions, loosely held – Design for long term, but be open to redesigns

53. Thanks! @philip_pfo

Notas do Editor

Over the past 7 years, Netflix streaming has expanded from thousands of customers watching occasionally, to millions of customers watching billions of hours every month. Each time a customer views, Netflix gathers events describing that view – both user-driven events like pause and resume, fast forward and rewind, and device-driven events like network throughput traces and video quality selections. To organize, understand, and create value out of these events, Netflix has built a data architecture to process these events. This architecture has evolved rapidly, keeping pace with the rapid global expansion of streaming itself.
Including the notes on the previous slide :-).
@philip_pfo, http://www.linkedin.com/in/philfish
Before we talk about the evolution of the data architecture, let’s see how Netflix streaming has evolved.
Today you can stream Netflix on over 1000+ device types; most consumers will have at least one way to stream Netflix already in their home. It wasn’t always that way, though. When streaming started in early 2007, you could only stream on a Windows PC. That was a time before iPhones and Android phones, before Roku set top boxes, when CRT TVs were nearly half of TV sales, and before most TVs became “smart”. Between 2007 and 2011, the consumer electronics landscape changed to include a device in every pocket and smart capabilities on every big screen. Netflix helped fuel this change by rapidly expanding the number of supported streaming devices. Timeline from https://pr.netflix.com/WebClient/loginPageSalesNetWorksAction.do?contentGroupId=10477&contentGroup=Company+Timeline
Similar to the growth in the number of supported devices, the interaction experience has also evolved rapidly. The first iteration was simple “queue readers”, where customers had to choose their list of titles using the website before they would show up on a device. That evolved into a richer UI with personalized suggestions displayed as a list of lists, plus search support. The website was no longer the sole source for discovering interesting titles to watch. Last year that evolved even further by bringing a rich cinematic feel to the TV user interface.
2014 public data, from https://pr.netflix.com/WebClient/loginPageSalesNetWorksAction.do?contentGroupId=10476&contentGroup=Company+Facts
Core to the experience is viewing. Data about viewing improves the experience through better personalization. A better experience leads to more viewing. Data about what is being watched fuels this virtuous cycle.
What do we mean by “Viewing Data”? Who (customer) watched what (title), when (date/time), where (location, device), and for how long (duration, position).
Viewing data is processed for use in real time and non-real time use cases. The first real time use case is “what have I watched”. This includes knowing a customer’s entire viewing history (for as long as they have subscribed), which feeds the recommendation algorithms so that they can find that perfect title for whatever mood they’re in. It also feeds the “recent titles you’ve watched” row in the UI.
Nothing is worse than getting 2/3rds of the way into a movie, having to stop, and then finding that your movie player doesn’t remember where you left off. You’re stuck fast forwarding for hours to find the last segment you watched. To solve this, Netflix collects data on how much you watched (duration) and where you are currently at (position), so you can easily continue watching where you left off.
Sharing your account with other family members usually means everyone gets to enjoy what they like when they’d like. It also means having to have that hard conversation about who has to stop watching when you’ve hit your account’s concurrent screens limit. Netflix’s viewing data system gathers periodic signals throughout each view to know when a user is versus isn’t watching.
Beyond the real time cases, Netflix has a variety of non-real time use cases for the viewing data. By analyzing the events during a viewing session, Netflix can determine things like the average video quality, frequency and duration of any rebuffers (playback interruptions due to network problems), and if there were any errors that prevented playback.
To understand the deeper details of what happened during playback, trace data is organized into segments and sent to the viewing data system. Network traces (top graph) describe the network throughput during a viewing session. Play traces (bottom graph) show what was actually displayed on the user’s screen. Gathering these traces enables understanding the environment and decisions made by the adaptive streaming algorithm. By measuring and analyzing multiple aspects of the viewing experience, problems can be understood and fixed and QoE algorithm improvements can be tested and deployed at scale.
The problem space looks similar today compared with what it was many years ago, but the scale of the problem has changed dramatically. In generic terms, users view, events are collected about those views, processing logic is applied, and summarized data is provided for each real time and non-real time use case.
For each architectural generation, let’s look at what were the driving forces behind a design, what pain points and learnings emerged, and what motivated a redesign. We’ll start with the real time data architecture, and then discuss the systems for offline / non-real time data analysis.
The real time data architecture evolved on two key dimensions – the primary data store and the caching tier. Note: Netflix’s flavor of memcached on AWS is EVCache - http://techblog.netflix.com/2013/02/announcing-evcache-distributed-in.html, and redis on AWS (with the added value of Dynamo semantics) is Dynomite - http://techblog.netflix.com/2014/11/introducing-dynomite.html.
For the primary data store, we started with a SQL database running on a single powerful machine running in our data center.
Devices at that time communicated when they started playback and when they stopped, but had no consistent periodic communications in between those two states. Semi-structured logs were sent during playback, but at unpredictable intervals. Data was collected into different SQL tables: session state with details on active and recent views, logs/events for the raw events, and history and position tables with summaries of what was watched and what the latest positions were.
The first generation architecture scaled for the first order of magnitude, from thousands to the low millions of events per day. The database scaling strategy was to scale up, meaning buy more expensive hardware to handle the increased load (scale vertically), rather than scale out, meaning add more machines to handle the additional load (scale horizontally). After one round of scaling up, which took weeks to migrate from one class of hardware to another, we were motivated to find solutions that scaled faster. In 2009 Netflix was starting to move from a fixed data center architecture to a cloud-based architecture, and solutions that scaled out could take advantage of the cloud’s elastic compute capabilities. We did ad hoc analytics on the detailed session event data by writing look-back time based queries against the log store and applying custom code to process those events. This limited us to providing only a small set of summarized analytics data for a sample of all views. We needed to find a scalable solution to meet the growing analytic needs of our rapidly expanding ecosystem. When you are a pioneer discovering a new world, as Netflix was when streaming started, data schemas need to adapt to the changing landscape. SQL database schema extension mechanisms existed but were brittle. NoSQL solutions were emerging with the promise of flexible schemas.
In 2010, we moved away from a SQL based solution and onto a NoSQL based solution, starting with SimpleDB.
Our second generation system was built with entirely different fundamentals than the first, fundamentals that we thought would enable us to scale to the next order of magnitude.
For the second generation, we migrated data from a SQL database in our data center over to Amazon’s SimpleDB running in EC2. We partitioned the data into 50 shards, moving closer to our goal of scale out not up. We extracted our collection and processing logic into a cloud-native viewing service, enabling independent evolution of that layer.
Some promises were kept, but many were broken. SimpleDB didn’t actually scale out – adding more nodes required painful repartitioning. The architecture had hot spots, both due to poorly chosen partitioning mechanisms and architectural flaws in SimpleDB’s design. SimpleDB had no backups, which was a step back from our previous primary data store’s disaster recovery solution.
Informed by the learnings from our first cloud native architecture, we set off to build the next generation architecture. We could see an expected order of magnitude increase in event volume on the horizon. The second generation struggled with handling millions of events; the third needed to effortlessly handle billions.
At that time, Cassandra was not even 1.0, SSD instances types didn’t exist in AWS yet, and Netflix infrastructure only ran in the US-East AWS region. An internal partner team analyzed and benchmarked multiple scalable database solutions, assessing current performance, future growth trajectory, and estimated cost of making each work for our expected future scale. Cassandra emerged as the leader, but required investment to make it work on AWS and to help it evolve to meet our future needs. We created an internal team to enable Cassandra success at Netflix, and also to contribute any extensions we developed back to the community. (For an overview of the general migration to C* and all the goodness it offers, see http://www.snia.org/sites/default/files2/SDC2013/GeneralSession/JasonBrown_Migrating_Cassandra_in_the_Cloud.pdf)
We redesigned the way that devices communicate information about what’s being played. Devices shifted to sending a periodic heartbeat during playback, enabling better session metrics and state management, but also significantly increasing the volume of requests to our viewing data architecture. This increase plus the continued growth in customers and viewing hours demanded a system that could easily scale out rather than up.
The third generation architecture’s primary interface is the viewing service, which is segmented into a stateful and stateless tier. The stateful tier has the latest data for all active views stored in memory. Data is partitioned into N stateful nodes by a simple mod N of the customer’s account id. When stateful nodes come online they go through a slot selection process to determine which data partition will belong to them. The stateless tier serves as a pass-through to external data stores, acting as a fallback when a stateful node is unreachable. Cassandra is the primary data store for all persistent data. Memcached is layered on top of Cassandra as a guaranteed low latency path for materialized, but possibly stale, views of the data. Let’s look at this architecture in action – how does it handle the write and read use cases.
On starting a new playback, a create event is sent to the viewing service, which gets processed by the stateful node for that customer’s account.
Upon receiving the create event, the stateful tier establishes an active session entry for that view, updates the position data to reference where the user started playing from, and creates a summary entry, to be updated as the user continues watching. All of this state is stored in memory.
As the user continues watching, update events are periodically sent to the viewing service. The presence of an update event refreshes the active sessions data, the absence of an update event (after a configured timeout period) removes that view from the active sessions data. The positions data and the summary for that view also are updated in response to this event.
Node failures can and do happen, so the in memory state needs to be synced to an external store to recover from such failures. Not all in memory state is synced, though. The active session information is periodically synced to Cassandra, used to recover some state if node failures occur.
The view summary gets updated using the active sessions and positions data. On certain state transitions, this summary gets added to the viewing history information stored in Cassandra, and a cached form of that is published to memcached.
The first read use case is “what have I watched?”. The memcached tier is consulted first, with the viewing service called on a cache miss. The viewing service merges any live views it has in memory with the historical data from Cassandra, returning the merged data and then updating memcached.
The second read use case is “where am I/was I at?”. The freshest data is available in the stateful tier, so that is consulted first to retrieve the latest position data. If the read requests position data for a movie that the user isn’t currently watching, the memcached and Cassandra layers are queried. Those results are cached in memory in the stateful tier, to enable faster response times to any subsequent position queries for that customer. The stateless tier is called if there is a stateful tier node failure, and it queries the same backing stores but does not cache the results in memory.
The third read use case is “what else is being watched on my account?”. Given that all active sessions for a given account are stored in memory in the stateful tier, that query is answered by that tier.
The third generation architecture has scaled out to support two orders of magnitude growth over the previous architecture. Both the read and the write paths are able to support 1Ks to 100Ks requests per second with acceptable average and 99th percentile latencies.
Further demonstrating the scale out capabilities of this architecture, here are recent stats on the instance types, number of nodes, and memory/disk usage for the third gen viewing data architecture. Note that our Cassandra clusters ran on spinning disk instance types for a few years before we migrated to the SSD instance types, and we’re looking to migrate to the latest SSD storage optimized EC2 instance types in the near future.
This architecture has scaled to handle Netflix’s growth over the past 3+ years. We’ve made incremental improvements and tunings, but the fundamental architecture has stayed the same throughout that time. As we developed new features and operated this system at scale, pain points emerged. Our stateful tier uses a simple sharding technique (account id mod N) that is subject to hot spots, as Netflix viewing usage is not evenly distributed across all current customers. Our Cassandra layer is not subject to these hot spots, as it uses consistent hashing with virtual nodes to partition the data. Additionally, when we moved from a single region to running in multiple AWS regions, we had to build a custom mechanism to communicate the state between stateful tiers in different regions. This added significant unnecessary complexity to our overall system. We created the viewing service to encapsulate the domain of viewing data collection, processing, and providing. As that system evolved to include more functionality and various read/write/update use cases, we identified multiple distinct components that were combined into this single monolithic service. These components would be easier to develop, test, debug, deploy, and operate if they were extracted into their own microservices. Lastly, memcached offers amazing throughput and latency characteristics, but isn’t well suited for our use case. For us to update the data in memcached, we read the latest data, append a new view entry (if none exists for that movie) or modify an existing entry (moving it to the front of the time-ordered list), and then write the updated data back to memcached. To avoid consistency issues, we’re restricted to a single writer per customer to do those updates. Redis offers similar throughput/latency characteristics as memcached, but has richer data types with operations better suited for our read-modify-write use case.
We created the stateful tier because we wanted the benefit of memory speed for our highest volume read/write use cases. Cassandra was in its pre-1.0 versions and wasn’t running on SSDs in AWS. We thought we could design a simple but robust distributed stateful system exactly suited to our needs, but ended up with a complex solution that was less robust than mature open source technologies. Rather than solve the hard distributed systems problems ourselves, we’d rather build on top of proven solutions like Cassandra, allowing us to focus our attention on solving the problems in our viewing data domain.
For our fourth generation architecture, we’re going stateless in our services. All persistent state will be externalized, and the distinct components needed to collect, process, and provide the viewing data will be separated into their own microservices. Data replication concerns will be handled at the persistence tiers rather than the application tier. The stateless microservices can easily expand and contract to support varying workloads, something that was difficult to do with the previous stateful service architecture.
Each microservice fits into one of four categories. Data collectors gather session events and semi-structured logs. Data processors analyze, summarize, and derive new data from each session’s event stream. Data services provide low latency access to the data, in a form tailored to each specific query use case. Data feeds provide the raw events and summary data to others, for them to extend the data according to their use cases. Asynchronous message passing is the preferred method of signaling between services.
State that formerly lived in the 3rd gen architecture’s stateful tier is externalized, and accessed via the microservices.
In addition to the real time use cases, we have non-real time analytics processing that we prefer to execute in near-real time. Similar to the evolution of our real time architectures, our analytics architecture has evolved multiple times in both the storage and processing dimensions.
Distributed storage and distributed processing – Hadoop + HDFS, Custom + C*, Mantis + Kafka
Distributed storage and distributed processing – Hadoop + HDFS, Custom + C*, Mantis + Kafka
Our first generation of storage and processing was built on S3 and Hadoop. We shipped logs from our data center database up to S3, and then developed and operated a Hadoop-based analytics pipeline.
There was a high conceptual and actual overlap in the processing logic between the real time and batch processing systems. Changing the logic meant changing code in two different systems, each with different paradigms.
For the second generation analytics architecture, our goal was to unify the development and operations by moving the logic into our real time system. Because we had experience with Cassandra, we chose to store the event and semi-structure log data in a Cassandra ring. The analytics processing happens when a viewing session is considered complete (either explicitly closed or implicitly closed via a time out), and produces results in near-real time. The data is available in S3, too, so map reduce/batch processing is still an option for ad hoc use cases.
We realized the benefits of a unified architecture, but found a new pain point. Changes to the analytics logic required code changes to be developed, tested, and deployed to our real time system. Development bottlenecked on the small number of people that knew how to code and deploy changes in that system.
We haven’t built a third generation of our analytics infrastructure, but here are some options we’re considering. Our main goal is to have a platform that allows for easy development of changes to the session analytics logic, without compromising on quality. We’re interested in Kafka as a possible replacement for Cassandra for storing the raw event data. On the processing side, we’re investigating many of the stream processing frameworks listed, all of which promise ease of development and powerful extension capabilities for the processing logic.

Netflix viewing data architecture evolution - QCon 2014

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Netflix viewing data architecture evolution - QCon 2014

Semelhante a Netflix viewing data architecture evolution - QCon 2014 (20)

Último

Último (20)

Netflix viewing data architecture evolution - QCon 2014

Notas do Editor