Netflix and Containers: Not A Stranger Thing

aspyker
and containers
Andrew Spyker (@aspyker) - Engineering Manager
Not
About Netflix
● 86.7M members
● A few thousand employees
● 190+ countries
● > ⅓ NA internet download traffic
● 500+ Microservices
● Many 10’s of thousands VM’s
● 3 regions across the world
Netflix has a elastic, cloud native, immutable microservice
architecture using full devops built on VM’s!
3
Why are we messing around with containers?
Technical motivating factors for containers
● Simpler management of compute resources
● Simpler deployment packaging artifacts for compute jobs
● Need for a consistent local developer environment
Sampling of realized container benefits
Media Encoding - encoding research development time
● VM’s platform to container platform - 1 month vs. 1 week
Continuous Integration Testing
● Build all Netflix codebases in hours
● Saves development 100’s of hours of debugging
Edge Re-architecture using NodeJS
● Focus returns to app development
● Simplifies, speeds test and deployment
5
Batch applications
Multi-tenant (cgroups/Mesos)
historically used for batch
Linux cgroups
What do batch users want?
● Simple shared resources, run till done, job files
● NOT
○ EC2 Instance sizes, autoscaling, AMI OS’s
● WHY
○ Offloads resource management ops, Simpler
Introducing Titus
Batch
Job Management
Resource Management & Optimization
Container Execution
Integration
Workflow, Data Analysis, Adhoc
Upstream Systems
Netflix Batch Job Examples
● Algorithm Model Training (with GPU’s)
Netflix Batch Job Examples
● Media Encoding
● Digital Watermarking
1 1
Netflix Batch Job Examples
Open Connect
CDN Reporting
Adhoc
Reporting
● Docker helped generalize use cases
● Scheduling required (GPU, elastic)
● Initially ignored failures (with retries)
● Time sensitive batch came later
Lessons Learned from Batch
Current Container Usage - Batch
● 1000’s of container hosts (g2, m4, r3 instances)
● 1000’s containers / hour average
● Large spikes of CI testing and Digital Watermarking
From day of 10/26
● 47K containers
● Bursts of 1000
containers in a
minute
Service applications
Why Services in containers?
Theory Reality
Developer
Opportunities to evolve our baking
● Java focused supported AMI, baking works well!
● However, wanted to allow
○ other stacks to evolve independent of OS updates
○ simplified builds (vs. Java and OS based tooling)
○ reliable smaller instances for dynamic languages
○ ability to develop locally with same image
Services are just long running batch?
Services
Job Management
Resource Management & Optimization
Container Execution
Integration
Service Apps
Batch
19
Nope, not that easy - Titus Details
19
Titus UITitus UI
Docker
Registry
Docker
Registry
Rhea
container
container
container
docker
Titus Agent
metrics agent
Titus executor
logging agent
zfs
Mesos agent
docker
RheaTitus API
Cassandra
Titus Master
Job Management &
Scheduler
S3
Zookeeper
Docker
Registry
EC2 Autocaling
API
Mesos Master
Titus UI
Fenzo
container
Pod & VPC network
drivers
containercontainer
AWS
metadata proxy
Integration
AWS VM’sCI/CD
Services more complex
● Services resize constantly and run forever
○ Autoscaling
○ Hard to upgrade underlying hosts
● Require IPC integration
○ Routable IPs, service discovery
○ Ready for traffic vs. just started/stopped
● Existing well defined dev, deploy, runtime & ops tools
Real networking is hard
Multi-tenant
Need IP per container - in VPC
Using security groups
Using IAM roles
Considering network resource isolation
Enabling VPC Networking
No IP, SecGrp A
Task 0
SecGrp Y,Z
Task 1 Task 2 Task 3
Titus EC2 Host VMeth1
ENI1
SecGrp=A
eth2
ENI2
SecGrp=X
eth3
ENI3
SecGrp=Y,Z
IP 1
IP 2
IP 3
pod root
veth<id>
app
SecGrp X
pod root
veth<id>
app
SecGrp X
pod root
veth<id>
appapp
veth<id>
Linux Policy Based
Routing + Traffic Control
Titus
EC2
Metadata
Proxy
169.254.169.254
IPTables NAT (*)
* **
169.254.169.254
Non-routable IP
*
Solutions
● VPC Networking driver
○ Supports ENI’s - full IP functionality
○ Scheduled security groups
○ Support traffic control (resource isolation)
● EC2 Metadata proxy
○ Adds container “node” identity
○ Delivers IAM roles
Reuse existing infrastructure services
VMVM
EC2
AWSAutoScaler
VMs
App
Cloud Platform
(metrics, IPC, health)
VPC
Netflix Cloud Infrastructure (VM’s + Containers)
Atlas Eureka Edda
Enable them for containers
VMVM
EC2
AWSAutoScaler
VMs
App
Cloud Platform
(metrics, IPC, health)
VPC
Netflix Cloud Infrastructure (VM’s + Containers)
VMVM
Atlas
TitusJobControl
Containers
App
Cloud Platform
(metrics, IPC, health)
Eureka Edda
VMVM
Batch
Containers
Spinnaker
Deploy based
on new images
tags
Basic resource
requirements
IAM Roles & Sec
Groups per
container
Deploy
Strategies
Same as VM’s
Easily see
health &
discovery
Netflix and Containers:  Not A Stranger Thing
Netflix and Containers:  Not A Stranger Thing
Secure Multi-tenancy
Common to VM’s and tiered security needed
● Protect the reduced host IAM role, Allow containers to have specific IAM
roles
● Needed to support same security groups in container networking as VM’s
User namespacing
● Docker 1.10 - Introduced User Namespaces
● Didn’t work /w shared networking NS
● Docker 1.11 - Fixed shared networking NS’s
● But, namespacing is per daemon, Not per container, as hoped
● Waiting on Linux
● Considering mass chmod / ZFS clones
Titus Advanced Scheduling
● Support for AZ balancing
● Multiple instance types selected based on workload
● Elastic underlying common resource pool
○ Bin packing managed transparently across all apps
● Hard and soft constraints
● Resource affinity and task affinity
● Capacity guarantees (critical tier)
34
Fenzo - Keep resource scheduling extensible
Fenzo - Extensible Scheduling Library
Features:
● Heterogeneous resources & tasks
● Autoscaling of mesos cluster
○ Multiple instance types
● Plugins based scheduling objectives
○ Bin packing, etc.
● Plugins based constraints evaluator
○ Resource affinity, task locality, etc.
● Scheduling actions visibility
Current Container Usage - Service
● Still small ~ 2000 long running containers
● NodeJS Device UI Scripts Apps
● Stream Processing Jobs - Flink
● Various Internal Dashboards
Questions?
Andrew Spyker (@aspyker) - Engineering Manager
1 de 37

Recomendados

Triangle Devops Meetup 10/2015 por
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015aspyker
1.1K visualizações38 slides
NetflixOSS and ZeroToDocker Talk por
NetflixOSS and ZeroToDocker TalkNetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker Talkaspyker
1.9K visualizações21 slides
Netflix Container Scheduling and Execution - QCon New York 2016 por
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016aspyker
14K visualizações39 slides
Netflix Cloud Platform and Open Source por
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Sourceaspyker
1.4K visualizações18 slides
Herding Kats - Netflix’s Journey to Kubernetes Public por
Herding Kats - Netflix’s Journey to Kubernetes PublicHerding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes Publicaspyker
832 visualizações22 slides
Container World 2018 por
Container World 2018Container World 2018
Container World 2018aspyker
4.8K visualizações23 slides

Mais conteúdo relacionado

Mais procurados

CS80A Foothill College Open Source Talk por
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talkaspyker
711 visualizações11 slides
Netflix Cloud Architecture and Open Source por
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
2.7K visualizações20 slides
Netflix Open Source: Building a Distributed and Automated Open Source Program por
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker
5K visualizações42 slides
Netflix Open Source Meetup Season 3 Episode 2 por
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2aspyker
15.4K visualizações51 slides
Velocity NYC 2016 - Containers @ Netflix por
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflixaspyker
6.4K visualizações65 slides
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons por
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
1.2K visualizações66 slides

Mais procurados(20)

CS80A Foothill College Open Source Talk por aspyker
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talk
aspyker711 visualizações
Netflix Cloud Architecture and Open Source por aspyker
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
aspyker2.7K visualizações
Netflix Open Source: Building a Distributed and Automated Open Source Program por aspyker
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
aspyker5K visualizações
Netflix Open Source Meetup Season 3 Episode 2 por aspyker
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2
aspyker15.4K visualizações
Velocity NYC 2016 - Containers @ Netflix por aspyker
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflix
aspyker6.4K visualizações
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons por aspyker
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
aspyker1.2K visualizações
Netflix oss season 1 episode 3 por Ruslan Meshenberg
Netflix oss season 1 episode 3 Netflix oss season 1 episode 3
Netflix oss season 1 episode 3
Ruslan Meshenberg154.2K visualizações
Netflix oss season 2 episode 1 - meetup Lightning talks por Ruslan Meshenberg
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
Ruslan Meshenberg107.5K visualizações
CMP376 - Another Week, Another Million Containers on Amazon EC2 por aspyker
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2
aspyker863 visualizações
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta por aspyker
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
aspyker1.6K visualizações
The new Netflix API por Katharina Probst
The new Netflix APIThe new Netflix API
The new Netflix API
Katharina Probst15.9K visualizações
Timed Text At Netflix por Rohit Puri
Timed Text At NetflixTimed Text At Netflix
Timed Text At Netflix
Rohit Puri2.5K visualizações
Netflix OSS Meetup Season 5 Episode 1 por aspyker
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1
aspyker2.1K visualizações
Season 7 Episode 1 - Tools for Data Scientists por aspyker
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientists
aspyker445 visualizações
Netflix OSS Meetup Season 4 Episode 4 por aspyker
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
aspyker3.1K visualizações
Series of Unfortunate Netflix Container Events - QConNYC17 por aspyker
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17
aspyker3.8K visualizações
Monitoring, the Prometheus Way - Julius Voltz, Prometheus por Docker, Inc.
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Docker, Inc.4.6K visualizações
KURMA - A Containerized Container Platform - KubeCon 2016 por Apcera
KURMA - A Containerized Container Platform - KubeCon 2016KURMA - A Containerized Container Platform - KubeCon 2016
KURMA - A Containerized Container Platform - KubeCon 2016
Apcera2.8K visualizações
Cloudsolutionday 2016: Docker & FAAS at getvero.com por AWS Vietnam Community
Cloudsolutionday 2016: Docker & FAAS at getvero.comCloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: Docker & FAAS at getvero.com
AWS Vietnam Community327 visualizações
How Docker EE Helps Open Doors at Assa Abloy por Docker, Inc.
How Docker EE Helps Open Doors at Assa AbloyHow Docker EE Helps Open Doors at Assa Abloy
How Docker EE Helps Open Doors at Assa Abloy
Docker, Inc.394 visualizações

Similar a Netflix and Containers: Not A Stranger Thing

Re:invent 2016 Container Scheduling, Execution and AWS Integration por
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
5.9K visualizações61 slides
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration... por
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...Amazon Web Services
3.3K visualizações61 slides
Scheduling a fuller house - Talk at QCon NY 2016 por
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Sharma Podila
6.3K visualizações39 slides
Kubernetes: від знайомства до використання у CI/CD por
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDStfalcon Meetups
522 visualizações86 slides
Openshift Workshop por
Openshift Workshop Openshift Workshop
Openshift Workshop PT Datacomm Diangraha
87 visualizações14 slides
AWS re:Invent 2016: Introduction to Container Management on AWS (CON303) por
AWS re:Invent 2016: Introduction to Container Management on AWS (CON303)AWS re:Invent 2016: Introduction to Container Management on AWS (CON303)
AWS re:Invent 2016: Introduction to Container Management on AWS (CON303)Amazon Web Services
1.8K visualizações48 slides

Similar a Netflix and Containers: Not A Stranger Thing(20)

Re:invent 2016 Container Scheduling, Execution and AWS Integration por aspyker
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
aspyker5.9K visualizações
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration... por Amazon Web Services
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
Amazon Web Services3.3K visualizações
Scheduling a fuller house - Talk at QCon NY 2016 por Sharma Podila
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
Sharma Podila6.3K visualizações
Kubernetes: від знайомства до використання у CI/CD por Stfalcon Meetups
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CD
Stfalcon Meetups522 visualizações
AWS re:Invent 2016: Introduction to Container Management on AWS (CON303) por Amazon Web Services
AWS re:Invent 2016: Introduction to Container Management on AWS (CON303)AWS re:Invent 2016: Introduction to Container Management on AWS (CON303)
AWS re:Invent 2016: Introduction to Container Management on AWS (CON303)
Amazon Web Services1.8K visualizações
Moving Applications into Azure Kubernetes por Hussein Salman
Moving Applications into Azure KubernetesMoving Applications into Azure Kubernetes
Moving Applications into Azure Kubernetes
Hussein Salman729 visualizações
Kubernetes 101 por Vishwas N
Kubernetes 101Kubernetes 101
Kubernetes 101
Vishwas N121 visualizações
OSDC 2018 | Three years running containers with Kubernetes in Production by T... por NETWAYS
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
NETWAYS246 visualizações
Pivotal Container Service Overview por VMware Tanzu
Pivotal Container Service Overview Pivotal Container Service Overview
Pivotal Container Service Overview
VMware Tanzu1.4K visualizações
Structured Container Delivery by Oscar Renalias, Accenture por Docker, Inc.
Structured Container Delivery by Oscar Renalias, AccentureStructured Container Delivery by Oscar Renalias, Accenture
Structured Container Delivery by Oscar Renalias, Accenture
Docker, Inc.815 visualizações
DockerCon SF 2015 : Reliably shipping containers in a resource rich world usi... por Docker, Inc.
DockerCon SF 2015 : Reliably shipping containers in a resource rich world usi...DockerCon SF 2015 : Reliably shipping containers in a resource rich world usi...
DockerCon SF 2015 : Reliably shipping containers in a resource rich world usi...
Docker, Inc.6.6K visualizações
Open shift and docker - october,2014 por Hojoong Kim
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
Hojoong Kim10K visualizações
Pivotal Container Service (PKS) at SF Cloud Foundry Meetup por cornelia davis
Pivotal Container Service (PKS) at SF Cloud Foundry MeetupPivotal Container Service (PKS) at SF Cloud Foundry Meetup
Pivotal Container Service (PKS) at SF Cloud Foundry Meetup
cornelia davis2.6K visualizações
Building and Deploying a Static Application using Jenkins and Docker in AWS por ijtsrd
Building and Deploying a Static Application using Jenkins and Docker in AWSBuilding and Deploying a Static Application using Jenkins and Docker in AWS
Building and Deploying a Static Application using Jenkins and Docker in AWS
ijtsrd86 visualizações
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302) por Amazon Web Services
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
Amazon Web Services3.4K visualizações
Netflix Titus WASP October 2017 por Andrew Leung
Netflix Titus WASP October 2017Netflix Titus WASP October 2017
Netflix Titus WASP October 2017
Andrew Leung171 visualizações
AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309) por Amazon Web Services
AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309)AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309)
AWS re:Invent 2016: Running Microservices on Amazon ECS (CON309)
Amazon Web Services5.7K visualizações
SRV409 Deep Dive on Microservices and Docker por Amazon Web Services
SRV409 Deep Dive on Microservices and DockerSRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and Docker
Amazon Web Services4K visualizações
Kubernetes is all you need por Vishwas N
Kubernetes is all you needKubernetes is all you need
Kubernetes is all you need
Vishwas N114 visualizações

Mais de aspyker

SRECon Lightning Talk por
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talkaspyker
373 visualizações20 slides
Netflix Open Source Meetup Season 4 Episode 3 por
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3aspyker
3K visualizações46 slides
Netflix Open Source Meetup Season 4 Episode 2 por
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
19.4K visualizações77 slides
Netflix Container Runtime - Titus - for Container Camp 2016 por
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016aspyker
4.8K visualizações5 slides
Netflix Cloud Architecture and Open Source por
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
7.7K visualizações29 slides
Ibm cloud nativenetflixossfinal por
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalaspyker
2.7K visualizações40 slides

Mais de aspyker(9)

SRECon Lightning Talk por aspyker
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talk
aspyker373 visualizações
Netflix Open Source Meetup Season 4 Episode 3 por aspyker
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3
aspyker3K visualizações
Netflix Open Source Meetup Season 4 Episode 2 por aspyker
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker19.4K visualizações
Netflix Container Runtime - Titus - for Container Camp 2016 por aspyker
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
aspyker4.8K visualizações
Netflix Cloud Architecture and Open Source por aspyker
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
aspyker7.7K visualizações
Ibm cloud nativenetflixossfinal por aspyker
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinal
aspyker2.7K visualizações
Docker Demo IBM Impact 2014 por aspyker
Docker Demo IBM Impact 2014Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014
aspyker4.4K visualizações
Netflix s2e1lightningtalk por aspyker
Netflix s2e1lightningtalkNetflix s2e1lightningtalk
Netflix s2e1lightningtalk
aspyker850 visualizações
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse por aspyker
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@PulseGoing Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
aspyker1.7K visualizações

Último

Java Platform Approach 1.0 - Picnic Meetup por
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic MeetupRick Ossendrijver
25 visualizações39 slides
How the World's Leading Independent Automotive Distributor is Reinventing Its... por
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...NUS-ISS
15 visualizações25 slides
AMAZON PRODUCT RESEARCH.pdf por
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdfJerikkLaureta
15 visualizações13 slides
STPI OctaNE CoE Brochure.pdf por
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdfmadhurjyapb
12 visualizações1 slide
Throughput por
ThroughputThroughput
ThroughputMoisés Armani Ramírez
36 visualizações11 slides
Data-centric AI and the convergence of data and model engineering: opportunit... por
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
34 visualizações40 slides

Último(20)

Java Platform Approach 1.0 - Picnic Meetup por Rick Ossendrijver
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver25 visualizações
How the World's Leading Independent Automotive Distributor is Reinventing Its... por NUS-ISS
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...
NUS-ISS15 visualizações
AMAZON PRODUCT RESEARCH.pdf por JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta15 visualizações
STPI OctaNE CoE Brochure.pdf por madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb12 visualizações
Data-centric AI and the convergence of data and model engineering: opportunit... por Paolo Missier
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier34 visualizações
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... por NUS-ISS
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
NUS-ISS37 visualizações
SAP Automation Using Bar Code and FIORI.pdf por Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Virendra Rai, PMP19 visualizações
Business Analyst Series 2023 - Week 3 Session 5 por DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10209 visualizações
RADIUS-Omnichannel Interaction System por RADIUS
RADIUS-Omnichannel Interaction SystemRADIUS-Omnichannel Interaction System
RADIUS-Omnichannel Interaction System
RADIUS15 visualizações
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze por NUS-ISS
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng TszeDigital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
NUS-ISS19 visualizações
Five Things You SHOULD Know About Postman por Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman27 visualizações
The details of description: Techniques, tips, and tangents on alternative tex... por BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada121 visualizações
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum... por NUS-ISS
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
NUS-ISS34 visualizações
PharoJS - Zürich Smalltalk Group Meetup November 2023 por Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi120 visualizações
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica... por NUS-ISS
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
NUS-ISS16 visualizações
The Importance of Cybersecurity for Digital Transformation por NUS-ISS
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital Transformation
NUS-ISS27 visualizações
Black and White Modern Science Presentation.pptx por maryamkhalid2916
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptx
maryamkhalid291614 visualizações

Netflix and Containers: Not A Stranger Thing

  • 1. and containers Andrew Spyker (@aspyker) - Engineering Manager Not
  • 2. About Netflix ● 86.7M members ● A few thousand employees ● 190+ countries ● > ⅓ NA internet download traffic ● 500+ Microservices ● Many 10’s of thousands VM’s ● 3 regions across the world
  • 3. Netflix has a elastic, cloud native, immutable microservice architecture using full devops built on VM’s! 3 Why are we messing around with containers?
  • 4. Technical motivating factors for containers ● Simpler management of compute resources ● Simpler deployment packaging artifacts for compute jobs ● Need for a consistent local developer environment
  • 5. Sampling of realized container benefits Media Encoding - encoding research development time ● VM’s platform to container platform - 1 month vs. 1 week Continuous Integration Testing ● Build all Netflix codebases in hours ● Saves development 100’s of hours of debugging Edge Re-architecture using NodeJS ● Focus returns to app development ● Simplifies, speeds test and deployment 5
  • 8. What do batch users want? ● Simple shared resources, run till done, job files ● NOT ○ EC2 Instance sizes, autoscaling, AMI OS’s ● WHY ○ Offloads resource management ops, Simpler
  • 9. Introducing Titus Batch Job Management Resource Management & Optimization Container Execution Integration Workflow, Data Analysis, Adhoc Upstream Systems
  • 10. Netflix Batch Job Examples ● Algorithm Model Training (with GPU’s)
  • 11. Netflix Batch Job Examples ● Media Encoding ● Digital Watermarking 1 1
  • 12. Netflix Batch Job Examples Open Connect CDN Reporting Adhoc Reporting
  • 13. ● Docker helped generalize use cases ● Scheduling required (GPU, elastic) ● Initially ignored failures (with retries) ● Time sensitive batch came later Lessons Learned from Batch
  • 14. Current Container Usage - Batch ● 1000’s of container hosts (g2, m4, r3 instances) ● 1000’s containers / hour average ● Large spikes of CI testing and Digital Watermarking From day of 10/26 ● 47K containers ● Bursts of 1000 containers in a minute
  • 16. Why Services in containers? Theory Reality Developer
  • 17. Opportunities to evolve our baking ● Java focused supported AMI, baking works well! ● However, wanted to allow ○ other stacks to evolve independent of OS updates ○ simplified builds (vs. Java and OS based tooling) ○ reliable smaller instances for dynamic languages ○ ability to develop locally with same image
  • 18. Services are just long running batch? Services Job Management Resource Management & Optimization Container Execution Integration Service Apps Batch
  • 19. 19 Nope, not that easy - Titus Details 19 Titus UITitus UI Docker Registry Docker Registry Rhea container container container docker Titus Agent metrics agent Titus executor logging agent zfs Mesos agent docker RheaTitus API Cassandra Titus Master Job Management & Scheduler S3 Zookeeper Docker Registry EC2 Autocaling API Mesos Master Titus UI Fenzo container Pod & VPC network drivers containercontainer AWS metadata proxy Integration AWS VM’sCI/CD
  • 20. Services more complex ● Services resize constantly and run forever ○ Autoscaling ○ Hard to upgrade underlying hosts ● Require IPC integration ○ Routable IPs, service discovery ○ Ready for traffic vs. just started/stopped ● Existing well defined dev, deploy, runtime & ops tools
  • 22. Multi-tenant Need IP per container - in VPC Using security groups Using IAM roles Considering network resource isolation
  • 23. Enabling VPC Networking No IP, SecGrp A Task 0 SecGrp Y,Z Task 1 Task 2 Task 3 Titus EC2 Host VMeth1 ENI1 SecGrp=A eth2 ENI2 SecGrp=X eth3 ENI3 SecGrp=Y,Z IP 1 IP 2 IP 3 pod root veth<id> app SecGrp X pod root veth<id> app SecGrp X pod root veth<id> appapp veth<id> Linux Policy Based Routing + Traffic Control Titus EC2 Metadata Proxy 169.254.169.254 IPTables NAT (*) * ** 169.254.169.254 Non-routable IP *
  • 24. Solutions ● VPC Networking driver ○ Supports ENI’s - full IP functionality ○ Scheduled security groups ○ Support traffic control (resource isolation) ● EC2 Metadata proxy ○ Adds container “node” identity ○ Delivers IAM roles
  • 25. Reuse existing infrastructure services VMVM EC2 AWSAutoScaler VMs App Cloud Platform (metrics, IPC, health) VPC Netflix Cloud Infrastructure (VM’s + Containers) Atlas Eureka Edda
  • 26. Enable them for containers VMVM EC2 AWSAutoScaler VMs App Cloud Platform (metrics, IPC, health) VPC Netflix Cloud Infrastructure (VM’s + Containers) VMVM Atlas TitusJobControl Containers App Cloud Platform (metrics, IPC, health) Eureka Edda VMVM Batch Containers
  • 28. Deploy based on new images tags
  • 29. Basic resource requirements IAM Roles & Sec Groups per container Deploy Strategies Same as VM’s
  • 33. Secure Multi-tenancy Common to VM’s and tiered security needed ● Protect the reduced host IAM role, Allow containers to have specific IAM roles ● Needed to support same security groups in container networking as VM’s User namespacing ● Docker 1.10 - Introduced User Namespaces ● Didn’t work /w shared networking NS ● Docker 1.11 - Fixed shared networking NS’s ● But, namespacing is per daemon, Not per container, as hoped ● Waiting on Linux ● Considering mass chmod / ZFS clones
  • 34. Titus Advanced Scheduling ● Support for AZ balancing ● Multiple instance types selected based on workload ● Elastic underlying common resource pool ○ Bin packing managed transparently across all apps ● Hard and soft constraints ● Resource affinity and task affinity ● Capacity guarantees (critical tier) 34
  • 35. Fenzo - Keep resource scheduling extensible Fenzo - Extensible Scheduling Library Features: ● Heterogeneous resources & tasks ● Autoscaling of mesos cluster ○ Multiple instance types ● Plugins based scheduling objectives ○ Bin packing, etc. ● Plugins based constraints evaluator ○ Resource affinity, task locality, etc. ● Scheduling actions visibility
  • 36. Current Container Usage - Service ● Still small ~ 2000 long running containers ● NodeJS Device UI Scripts Apps ● Stream Processing Jobs - Flink ● Various Internal Dashboards
  • 37. Questions? Andrew Spyker (@aspyker) - Engineering Manager