SlideShare uma empresa Scribd logo
1 de 47
Netflix: Embracing the Cloud
Neil Hunt, CPO / Yury Izrailevsky, VP Engineering
Netflix – Service Unavailable – Database Crashed

Rest assured that the right people
are losing sleep to fix this problem!

We expect to resume service in approximately 72h


12 Aug 2008 03:12am
Availability
                4 x nines




    Scale             Performance
 Unconstrained              Unlimited
horizontal scaling          compute
• Experimented with both
• Ended up with NoSQL for almost everything important
Transitional Infrastructure: “Roman Riding”
Phase          Components         Data & Prerequisites
Trial (2009)   Streaming Player   Content keys (RO)
                                  Membership status (RO)
Development Member product        Content catalog (RW)
(2010-11)   pages and APIs        Personalization data
                                  (RW) & recs algorithms
                                  AB Test data (RW)
Followthrough Account and         Membership data (RW)
(2011-12)     membership
Final (2013) Payments             PCI and SOX data
Availability
                4 x nines




    Scale             Performance
 Unconstrained              Unlimited
horizontal scaling          compute
Scalability   Performance   Availability
Scalability   Performance   Availability
1/4/2009
      2/4/2009
      3/4/2009
      4/4/2009
      5/4/2009
      6/4/2009
      7/4/2009
      8/4/2009
      9/4/2009
     10/4/2009
     11/4/2009
     12/4/2009
      1/4/2010
      2/4/2010
      3/4/2010
      4/4/2010
      5/4/2010
      6/4/2010
      7/4/2010
      8/4/2010
      9/4/2010
     10/4/2010
     11/4/2010
     12/4/2010
      1/4/2011
      2/4/2011
      3/4/2011
      4/4/2011
      5/4/2011
      6/4/2011
      7/4/2011
      8/4/2011
      9/4/2011
     10/4/2011
     11/4/2011
     12/4/2011
      1/4/2012
      2/4/2012
      3/4/2012
      4/4/2012
      5/4/2012
      6/4/2012
      7/4/2012
      8/4/2012
                 Scaling Netflix Streaming Service: Weekly Streaming Starts




23
Netflix Cross-Regional Cloud Architecture
Goal: Regional Failover
Building Global Netflix Streaming Product
Scalability   Performance   Availability
Weekly Cloud Cost Per Streaming Start (last 12 months)




                                                         28
Simian Army: Cloud Efficiency Automation
   Janitor Monkey
     Regularly scrape unused capacity
     Clean up instances, ASGs, ELBs, SGs, etc.
   Efficiency Monkey
     AI-based resource under-usage detection
      (CPU, memory, etc.)
   Automated Deletion of Old Data
     TTL for S3 (using ObjectExpiration)




                                                  29
Cyclical Streaming Usage Pattern




                                   30
Load-Based Auto Scaling




                             50%+ Cost Saving
                                          Scale up/down
                                             by 70%+




         Move to Load-Based Scaling



                                                          31
                                                          31
Scalability   Performance   Availability
A Truly Great Service…      Has To Just Work!




            Availability Goal: 99.99%
          (30 secs/week at peak traffic)
                                                33
7/17/2011
 7/24/2011
 7/31/2011
  8/7/2011
 8/14/2011
 8/21/2011
 8/28/2011
  9/4/2011
 9/11/2011
 9/18/2011
 9/25/2011
 10/2/2011
 10/9/2011
10/16/2011
10/23/2011
10/30/2011
 11/6/2011
11/13/2011
11/20/2011
11/27/2011
 12/4/2011
12/11/2011
12/18/2011
12/25/2011
  1/1/2012
  1/8/2012
 1/15/2012
 1/22/2012
 1/29/2012
  2/5/2012
 2/12/2012
 2/19/2012
 2/26/2012
  3/4/2012
 3/11/2012
 3/18/2012
 3/25/2012
  4/1/2012
  4/8/2012
 4/15/2012
 4/22/2012
                                                                                            Other AWS Outages




 4/29/2012
  5/6/2012
 5/13/2012
 5/20/2012
 5/27/2012
  6/3/2012
 6/10/2012
 6/17/2012
 6/24/2012
  7/1/2012
                                                                                                                Historical Streaming Availability (13wkMA)




  7/8/2012
                                                                          Outage




 7/15/2012
 7/22/2012
 7/29/2012
  8/5/2012
 8/12/2012
                                                                          AWS / Netflix




 8/19/2012
 8/26/2012
                                                                          June 29th, 2012




  9/2/2012
  9/9/2012
 9/16/2012
 9/23/2012
 9/30/2012
 10/7/2012
    14-Oct
10/21/2012
10/28/2012
             Using Redundancy in AWS Infrastructure to Survive Failures




 11/4/2012
11/11/2012
Cascading Failures




               API




              Instant
              Queue




              SimpleDB

                         35
Netflix Cloud Architecture




                             36
Cascading Failures




                   X                      …
99% Availability       99% Availability       99% Availability


                       300
            99%              = 4.90%                             37
Strategies to Improve Availability




        Graceful
       Degradation                   Redundancy




                                                  38
Graceful Degradation




                       39
Redundancy



                           A        B       C
    Zone   Zone   Zone          Cassandra
     A      B      C



                                S3 Backup

   Redundancy
 Across Availability           Secure Cloud
      Zones                      Backup

                         Storage Redundancy
                               Across
                                                40
                          Regions, Vendors
Testing Fault Tolerance: Simian Army




   Chaos Monkey       Latency Monkey   Chaos Gorilla




                                                       4
Open Source Portal at http://netflix.github.com
Superstorm Sandy

                   AWS Infrastructure Held Up


                   >2x Netflix Streaming Usage
                   in East Coast Markets
                      Boston
                      New York
                      Philadelphia
                      Baltimore
                      D.C.
Focus on Building a Great Streaming Product




                                              44
Netflix at 2012 re:Invent

Date/Time         Presenter             Topic
Wed 8:30-10:00    Reed Hastings         Keynote with Andy Jassy
Wed 1:00-1:45     Coburn Watson         Optimizing Costs with AWS
Wed 2:05-2:55     Kevin McEntee         Netflix’s Transcoding Transformation
Wed 3:25-4:15     Neil Hunt / Yury I.   Netflix: Embracing the Cloud
Wed 4:30-5:20     Adrian Cockcroft      High Availability Architecture at Netflix
Thu 10:30-11:20   Jeremy Edberg         Rainmakers – Operating Clouds
Thu 11:35-12:25   Kurt Brown            Data Science with Elastic Map Reduce (EMR)
Thu 11:35-12:25   Jason Chan            Security Panel: Learn from CISOs working with AWS
Thu 3:00-3:50     Adrian Cockcroft      Compute & Networking Masters Customer Panel
Thu 3:00-3:50     Ruslan M./Gregg U.    Optimizing Your Cassandra Database on AWS
Thu 4:05-4:55     Ariel Tseitlin        Intro to Chaos Monkey and the Simian Army
We are sincerely eager to
 hear your feedback on this
presentation and on re:Invent.

 Please fill out an evaluation
   form when you have a
            chance.
We are sincerely eager to
 hear your feedback on this
presentation and on re:Invent.

 Please fill out an evaluation
   form when you have a
            chance.

Mais conteúdo relacionado

Mais procurados

Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
 
Intro to Batch Processing on AWS - DevDay Los Angeles 2017
Intro to Batch Processing on AWS - DevDay Los Angeles 2017Intro to Batch Processing on AWS - DevDay Los Angeles 2017
Intro to Batch Processing on AWS - DevDay Los Angeles 2017Amazon Web Services
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWSAmazon Web Services
 
Operational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesOperational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesLaurent Bernaille
 
Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Kasper Nissen
 
“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the CloudAmazon Web Services
 
AWS re:Invent 2016: Workshop: Deploy a Deep Learning Framework on Amazon ECS ...
AWS re:Invent 2016: Workshop: Deploy a Deep Learning Framework on Amazon ECS ...AWS re:Invent 2016: Workshop: Deploy a Deep Learning Framework on Amazon ECS ...
AWS re:Invent 2016: Workshop: Deploy a Deep Learning Framework on Amazon ECS ...Amazon Web Services
 
Container Orchestration with Amazon ECS
Container Orchestration with Amazon ECSContainer Orchestration with Amazon ECS
Container Orchestration with Amazon ECSAmazon Web Services
 
Container orchestration on_aws
Container orchestration on_awsContainer orchestration on_aws
Container orchestration on_awsKasper Nissen
 
Should developers care about dockerfiles and kubernetes resources
Should developers care about dockerfiles and kubernetes resourcesShould developers care about dockerfiles and kubernetes resources
Should developers care about dockerfiles and kubernetes resourcesKasper Nissen
 
Kubernetes Kops - Automation Night
Kubernetes Kops - Automation NightKubernetes Kops - Automation Night
Kubernetes Kops - Automation NightKasper Nissen
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleSudhir Tonse
 
What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | ...
What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | ...What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | ...
What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | ...Amazon Web Services
 
[AWS LA Media & Entertainment Event 2015]: Shoot the Bird: Linear Broadcast o...
[AWS LA Media & Entertainment Event 2015]: Shoot the Bird: Linear Broadcast o...[AWS LA Media & Entertainment Event 2015]: Shoot the Bird: Linear Broadcast o...
[AWS LA Media & Entertainment Event 2015]: Shoot the Bird: Linear Broadcast o...Amazon Web Services
 
Intro to Batch Processing on AWS - DevDay Austin 2017
Intro to Batch Processing on AWS - DevDay Austin 2017Intro to Batch Processing on AWS - DevDay Austin 2017
Intro to Batch Processing on AWS - DevDay Austin 2017Amazon Web Services
 
Rendering Takes Flight
Rendering Takes FlightRendering Takes Flight
Rendering Takes FlightAvere Systems
 
CI&CD with AWS - AWS Prague User Group - May 2015
CI&CD with AWS - AWS Prague User Group - May 2015CI&CD with AWS - AWS Prague User Group - May 2015
CI&CD with AWS - AWS Prague User Group - May 2015Vladimir Simek
 
Building and Scaling a Containerized Microservice - DevDay Los Angeles 2017
Building and Scaling a Containerized Microservice - DevDay Los Angeles 2017Building and Scaling a Containerized Microservice - DevDay Los Angeles 2017
Building and Scaling a Containerized Microservice - DevDay Los Angeles 2017Amazon Web Services
 

Mais procurados (20)

Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Intro to Batch Processing on AWS - DevDay Los Angeles 2017
Intro to Batch Processing on AWS - DevDay Los Angeles 2017Intro to Batch Processing on AWS - DevDay Los Angeles 2017
Intro to Batch Processing on AWS - DevDay Los Angeles 2017
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWS
 
Operational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesOperational challenges behind Serverless architectures
Operational challenges behind Serverless architectures
 
Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"
 
“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud“Spikey Workloads” Emergency Management in the Cloud
“Spikey Workloads” Emergency Management in the Cloud
 
AWS re:Invent 2016: Workshop: Deploy a Deep Learning Framework on Amazon ECS ...
AWS re:Invent 2016: Workshop: Deploy a Deep Learning Framework on Amazon ECS ...AWS re:Invent 2016: Workshop: Deploy a Deep Learning Framework on Amazon ECS ...
AWS re:Invent 2016: Workshop: Deploy a Deep Learning Framework on Amazon ECS ...
 
Container Orchestration with Amazon ECS
Container Orchestration with Amazon ECSContainer Orchestration with Amazon ECS
Container Orchestration with Amazon ECS
 
Container orchestration on_aws
Container orchestration on_awsContainer orchestration on_aws
Container orchestration on_aws
 
Should developers care about dockerfiles and kubernetes resources
Should developers care about dockerfiles and kubernetes resourcesShould developers care about dockerfiles and kubernetes resources
Should developers care about dockerfiles and kubernetes resources
 
Kubernetes Kops - Automation Night
Kubernetes Kops - Automation NightKubernetes Kops - Automation Night
Kubernetes Kops - Automation Night
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | ...
What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | ...What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | ...
What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | ...
 
[AWS LA Media & Entertainment Event 2015]: Shoot the Bird: Linear Broadcast o...
[AWS LA Media & Entertainment Event 2015]: Shoot the Bird: Linear Broadcast o...[AWS LA Media & Entertainment Event 2015]: Shoot the Bird: Linear Broadcast o...
[AWS LA Media & Entertainment Event 2015]: Shoot the Bird: Linear Broadcast o...
 
Svm on cloud (presntation)
Svm on cloud  (presntation)Svm on cloud  (presntation)
Svm on cloud (presntation)
 
Intro to Batch Processing on AWS - DevDay Austin 2017
Intro to Batch Processing on AWS - DevDay Austin 2017Intro to Batch Processing on AWS - DevDay Austin 2017
Intro to Batch Processing on AWS - DevDay Austin 2017
 
Rendering Takes Flight
Rendering Takes FlightRendering Takes Flight
Rendering Takes Flight
 
CI&CD with AWS - AWS Prague User Group - May 2015
CI&CD with AWS - AWS Prague User Group - May 2015CI&CD with AWS - AWS Prague User Group - May 2015
CI&CD with AWS - AWS Prague User Group - May 2015
 
Building and Scaling a Containerized Microservice - DevDay Los Angeles 2017
Building and Scaling a Containerized Microservice - DevDay Los Angeles 2017Building and Scaling a Containerized Microservice - DevDay Los Angeles 2017
Building and Scaling a Containerized Microservice - DevDay Los Angeles 2017
 

Destaque

Cloud Native Cost Optimization
Cloud Native Cost OptimizationCloud Native Cost Optimization
Cloud Native Cost OptimizationAdrian Cockcroft
 
Building Cost-Aware Cloud Architectures - Jinesh Varia (AWS) and Adrian Cockc...
Building Cost-Aware Cloud Architectures - Jinesh Varia (AWS) and Adrian Cockc...Building Cost-Aware Cloud Architectures - Jinesh Varia (AWS) and Adrian Cockc...
Building Cost-Aware Cloud Architectures - Jinesh Varia (AWS) and Adrian Cockc...Amazon Web Services
 
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...Amazon Web Services
 
(ISM309) Efficient Innovation:High-Velocity Cost Management at Netflix
(ISM309) Efficient Innovation:High-Velocity Cost Management at Netflix(ISM309) Efficient Innovation:High-Velocity Cost Management at Netflix
(ISM309) Efficient Innovation:High-Velocity Cost Management at NetflixAmazon Web Services
 
Black Swan Based VM Placement and Migration Optimizations
Black Swan Based VM Placement and Migration OptimizationsBlack Swan Based VM Placement and Migration Optimizations
Black Swan Based VM Placement and Migration OptimizationsTokyo University of Science
 
presentation on reducing Cost in Cloud Computing
 presentation on reducing Cost in Cloud Computing presentation on reducing Cost in Cloud Computing
presentation on reducing Cost in Cloud ComputingMuhammad Faheem ul Hassan
 
The 27th Australasian Conference on Information Systems
The 27th Australasian Conference  on Information SystemsThe 27th Australasian Conference  on Information Systems
The 27th Australasian Conference on Information SystemsMahdi_Fahmideh
 
Test Driven Development with Puppet
Test Driven Development with Puppet Test Driven Development with Puppet
Test Driven Development with Puppet Puppet
 
Cost Optimization
Cost OptimizationCost Optimization
Cost OptimizationLextron
 
Optimizing Your AWS Applications and Usage to Reduce Costs
Optimizing Your AWS Applications and Usage to Reduce CostsOptimizing Your AWS Applications and Usage to Reduce Costs
Optimizing Your AWS Applications and Usage to Reduce CostsAmazon Web Services
 
Data Center Virtualization @ Cisco
Data Center Virtualization @ CiscoData Center Virtualization @ Cisco
Data Center Virtualization @ Ciscovmug
 
2016 Utah Cloud Summit: TCO & Cost Optimization
2016 Utah Cloud Summit: TCO & Cost Optimization2016 Utah Cloud Summit: TCO & Cost Optimization
2016 Utah Cloud Summit: TCO & Cost Optimization1Strategy
 
Gartner 2013 it cost optimization strategy, best practices & risks
Gartner  2013 it cost optimization strategy, best practices & risksGartner  2013 it cost optimization strategy, best practices & risks
Gartner 2013 it cost optimization strategy, best practices & risksSatya Harish
 
Cloud computing: cost reduction
Cloud computing: cost reductionCloud computing: cost reduction
Cloud computing: cost reductionHesham Shabana
 
Data Center Architecture Trends
Data Center Architecture TrendsData Center Architecture Trends
Data Center Architecture TrendsPanduit
 
Optimization of Resource Provisioning Cost in Cloud Computing
Optimization of Resource Provisioning Cost in Cloud ComputingOptimization of Resource Provisioning Cost in Cloud Computing
Optimization of Resource Provisioning Cost in Cloud ComputingAswin Kalarickal
 
Data center network architectures v1.3
Data center network architectures v1.3Data center network architectures v1.3
Data center network architectures v1.3Jeong, Wookjae
 
Dell Data Center Networking Overview
Dell Data Center Networking OverviewDell Data Center Networking Overview
Dell Data Center Networking OverviewDell World
 
Cost Optimization as Major Architectural Consideration for Cloud Application
Cost Optimization as Major Architectural Consideration for Cloud ApplicationCost Optimization as Major Architectural Consideration for Cloud Application
Cost Optimization as Major Architectural Consideration for Cloud ApplicationUdayan Banerjee
 

Destaque (20)

Cloud Native Cost Optimization
Cloud Native Cost OptimizationCloud Native Cost Optimization
Cloud Native Cost Optimization
 
Building Cost-Aware Cloud Architectures - Jinesh Varia (AWS) and Adrian Cockc...
Building Cost-Aware Cloud Architectures - Jinesh Varia (AWS) and Adrian Cockc...Building Cost-Aware Cloud Architectures - Jinesh Varia (AWS) and Adrian Cockc...
Building Cost-Aware Cloud Architectures - Jinesh Varia (AWS) and Adrian Cockc...
 
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...
An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Inven...
 
(ISM309) Efficient Innovation:High-Velocity Cost Management at Netflix
(ISM309) Efficient Innovation:High-Velocity Cost Management at Netflix(ISM309) Efficient Innovation:High-Velocity Cost Management at Netflix
(ISM309) Efficient Innovation:High-Velocity Cost Management at Netflix
 
Black Swan Based VM Placement and Migration Optimizations
Black Swan Based VM Placement and Migration OptimizationsBlack Swan Based VM Placement and Migration Optimizations
Black Swan Based VM Placement and Migration Optimizations
 
presentation on reducing Cost in Cloud Computing
 presentation on reducing Cost in Cloud Computing presentation on reducing Cost in Cloud Computing
presentation on reducing Cost in Cloud Computing
 
Final Review
Final ReviewFinal Review
Final Review
 
The 27th Australasian Conference on Information Systems
The 27th Australasian Conference  on Information SystemsThe 27th Australasian Conference  on Information Systems
The 27th Australasian Conference on Information Systems
 
Test Driven Development with Puppet
Test Driven Development with Puppet Test Driven Development with Puppet
Test Driven Development with Puppet
 
Cost Optimization
Cost OptimizationCost Optimization
Cost Optimization
 
Optimizing Your AWS Applications and Usage to Reduce Costs
Optimizing Your AWS Applications and Usage to Reduce CostsOptimizing Your AWS Applications and Usage to Reduce Costs
Optimizing Your AWS Applications and Usage to Reduce Costs
 
Data Center Virtualization @ Cisco
Data Center Virtualization @ CiscoData Center Virtualization @ Cisco
Data Center Virtualization @ Cisco
 
2016 Utah Cloud Summit: TCO & Cost Optimization
2016 Utah Cloud Summit: TCO & Cost Optimization2016 Utah Cloud Summit: TCO & Cost Optimization
2016 Utah Cloud Summit: TCO & Cost Optimization
 
Gartner 2013 it cost optimization strategy, best practices & risks
Gartner  2013 it cost optimization strategy, best practices & risksGartner  2013 it cost optimization strategy, best practices & risks
Gartner 2013 it cost optimization strategy, best practices & risks
 
Cloud computing: cost reduction
Cloud computing: cost reductionCloud computing: cost reduction
Cloud computing: cost reduction
 
Data Center Architecture Trends
Data Center Architecture TrendsData Center Architecture Trends
Data Center Architecture Trends
 
Optimization of Resource Provisioning Cost in Cloud Computing
Optimization of Resource Provisioning Cost in Cloud ComputingOptimization of Resource Provisioning Cost in Cloud Computing
Optimization of Resource Provisioning Cost in Cloud Computing
 
Data center network architectures v1.3
Data center network architectures v1.3Data center network architectures v1.3
Data center network architectures v1.3
 
Dell Data Center Networking Overview
Dell Data Center Networking OverviewDell Data Center Networking Overview
Dell Data Center Networking Overview
 
Cost Optimization as Major Architectural Consideration for Cloud Application
Cost Optimization as Major Architectural Consideration for Cloud ApplicationCost Optimization as Major Architectural Consideration for Cloud Application
Cost Optimization as Major Architectural Consideration for Cloud Application
 

Semelhante a Netflix: Embracing the Cloud

8 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp018 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp01Carl Chesal
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qconYiwei Ma
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s DilemmaAmazon Web Services
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...Amazon Web Services
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Adrian Cockcroft
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 
AWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote SlidesAWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote SlidesAmazon Web Services
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Testbed for Heterogeneous Cloud
Testbed for Heterogeneous CloudTestbed for Heterogeneous Cloud
Testbed for Heterogeneous CloudCloudLightning
 
Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebCloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebjineshvaria
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
C# Client to Cloud
C# Client to CloudC# Client to Cloud
C# Client to CloudStuart Lodge
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldRob Gillen
 

Semelhante a Netflix: Embracing the Cloud (20)

8 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp018 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp01
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qcon
 
Netflix in the cloud 2011
Netflix in the cloud 2011Netflix in the cloud 2011
Netflix in the cloud 2011
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
 
Fermilab aws on demand
Fermilab aws on demandFermilab aws on demand
Fermilab aws on demand
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
AWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote SlidesAWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote Slides
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Testbed for Heterogeneous Cloud
Testbed for Heterogeneous CloudTestbed for Heterogeneous Cloud
Testbed for Heterogeneous Cloud
 
Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebCloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWeb
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
C# Client to Cloud
C# Client to CloudC# Client to Cloud
C# Client to Cloud
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
 

Netflix: Embracing the Cloud

  • 1. Netflix: Embracing the Cloud Neil Hunt, CPO / Yury Izrailevsky, VP Engineering
  • 2.
  • 3. Netflix – Service Unavailable – Database Crashed Rest assured that the right people are losing sleep to fix this problem! We expect to resume service in approximately 72h 12 Aug 2008 03:12am
  • 4.
  • 5. Availability 4 x nines Scale Performance Unconstrained Unlimited horizontal scaling compute
  • 6.
  • 7.
  • 8.
  • 9. • Experimented with both • Ended up with NoSQL for almost everything important
  • 10.
  • 11.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. Phase Components Data & Prerequisites Trial (2009) Streaming Player Content keys (RO) Membership status (RO) Development Member product Content catalog (RW) (2010-11) pages and APIs Personalization data (RW) & recs algorithms AB Test data (RW) Followthrough Account and Membership data (RW) (2011-12) membership Final (2013) Payments PCI and SOX data
  • 18.
  • 19.
  • 20. Availability 4 x nines Scale Performance Unconstrained Unlimited horizontal scaling compute
  • 21. Scalability Performance Availability
  • 22. Scalability Performance Availability
  • 23. 1/4/2009 2/4/2009 3/4/2009 4/4/2009 5/4/2009 6/4/2009 7/4/2009 8/4/2009 9/4/2009 10/4/2009 11/4/2009 12/4/2009 1/4/2010 2/4/2010 3/4/2010 4/4/2010 5/4/2010 6/4/2010 7/4/2010 8/4/2010 9/4/2010 10/4/2010 11/4/2010 12/4/2010 1/4/2011 2/4/2011 3/4/2011 4/4/2011 5/4/2011 6/4/2011 7/4/2011 8/4/2011 9/4/2011 10/4/2011 11/4/2011 12/4/2011 1/4/2012 2/4/2012 3/4/2012 4/4/2012 5/4/2012 6/4/2012 7/4/2012 8/4/2012 Scaling Netflix Streaming Service: Weekly Streaming Starts 23
  • 26. Building Global Netflix Streaming Product
  • 27. Scalability Performance Availability
  • 28. Weekly Cloud Cost Per Streaming Start (last 12 months) 28
  • 29. Simian Army: Cloud Efficiency Automation  Janitor Monkey  Regularly scrape unused capacity  Clean up instances, ASGs, ELBs, SGs, etc.  Efficiency Monkey  AI-based resource under-usage detection (CPU, memory, etc.)  Automated Deletion of Old Data  TTL for S3 (using ObjectExpiration) 29
  • 31. Load-Based Auto Scaling 50%+ Cost Saving Scale up/down by 70%+ Move to Load-Based Scaling 31 31
  • 32. Scalability Performance Availability
  • 33. A Truly Great Service… Has To Just Work! Availability Goal: 99.99% (30 secs/week at peak traffic) 33
  • 34. 7/17/2011 7/24/2011 7/31/2011 8/7/2011 8/14/2011 8/21/2011 8/28/2011 9/4/2011 9/11/2011 9/18/2011 9/25/2011 10/2/2011 10/9/2011 10/16/2011 10/23/2011 10/30/2011 11/6/2011 11/13/2011 11/20/2011 11/27/2011 12/4/2011 12/11/2011 12/18/2011 12/25/2011 1/1/2012 1/8/2012 1/15/2012 1/22/2012 1/29/2012 2/5/2012 2/12/2012 2/19/2012 2/26/2012 3/4/2012 3/11/2012 3/18/2012 3/25/2012 4/1/2012 4/8/2012 4/15/2012 4/22/2012 Other AWS Outages 4/29/2012 5/6/2012 5/13/2012 5/20/2012 5/27/2012 6/3/2012 6/10/2012 6/17/2012 6/24/2012 7/1/2012 Historical Streaming Availability (13wkMA) 7/8/2012 Outage 7/15/2012 7/22/2012 7/29/2012 8/5/2012 8/12/2012 AWS / Netflix 8/19/2012 8/26/2012 June 29th, 2012 9/2/2012 9/9/2012 9/16/2012 9/23/2012 9/30/2012 10/7/2012 14-Oct 10/21/2012 10/28/2012 Using Redundancy in AWS Infrastructure to Survive Failures 11/4/2012 11/11/2012
  • 35. Cascading Failures API Instant Queue SimpleDB 35
  • 37. Cascading Failures X … 99% Availability 99% Availability 99% Availability 300 99% = 4.90% 37
  • 38. Strategies to Improve Availability Graceful Degradation Redundancy 38
  • 40. Redundancy A B C Zone Zone Zone Cassandra A B C S3 Backup Redundancy Across Availability Secure Cloud Zones Backup Storage Redundancy Across 40 Regions, Vendors
  • 41. Testing Fault Tolerance: Simian Army Chaos Monkey Latency Monkey Chaos Gorilla 4
  • 42. Open Source Portal at http://netflix.github.com
  • 43. Superstorm Sandy AWS Infrastructure Held Up >2x Netflix Streaming Usage in East Coast Markets  Boston  New York  Philadelphia  Baltimore  D.C.
  • 44. Focus on Building a Great Streaming Product 44
  • 45. Netflix at 2012 re:Invent Date/Time Presenter Topic Wed 8:30-10:00 Reed Hastings Keynote with Andy Jassy Wed 1:00-1:45 Coburn Watson Optimizing Costs with AWS Wed 2:05-2:55 Kevin McEntee Netflix’s Transcoding Transformation Wed 3:25-4:15 Neil Hunt / Yury I. Netflix: Embracing the Cloud Wed 4:30-5:20 Adrian Cockcroft High Availability Architecture at Netflix Thu 10:30-11:20 Jeremy Edberg Rainmakers – Operating Clouds Thu 11:35-12:25 Kurt Brown Data Science with Elastic Map Reduce (EMR) Thu 11:35-12:25 Jason Chan Security Panel: Learn from CISOs working with AWS Thu 3:00-3:50 Adrian Cockcroft Compute & Networking Masters Customer Panel Thu 3:00-3:50 Ruslan M./Gregg U. Optimizing Your Cassandra Database on AWS Thu 4:05-4:55 Ariel Tseitlin Intro to Chaos Monkey and the Simian Army
  • 46. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.
  • 47. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.

Notas do Editor

  1. Make clear it’s still tentative, not a committed project – longer term…