O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

2012 re:Invent Netflix: embracing the cloud final

960 visualizações

Publicada em

  • Seja o primeiro a comentar

2012 re:Invent Netflix: embracing the cloud final

  1. 1. Netflix: Embracing the CloudNeil Hunt, CPO / Yury Izrailevsky, VP Engineering
  2. 2. Netflix – Service Unavailable – Database CrashedRest assured that the right peopleare losing sleep to fix this problem!We expect to resume service in approximately 72h12 Aug 2008 03:12am
  3. 3. Availability 4 x nines Scale Performance Unconstrained Unlimitedhorizontal scaling compute
  4. 4. • Experimented with both• Ended up with NoSQL for almost everything important
  5. 5. Transitional Infrastructure: “Roman Riding”
  6. 6. Phase Components Data & PrerequisitesTrial (2009) Streaming Player Content keys (RO) Membership status (RO)Development Member product Content catalog (RW)(2010-11) pages and APIs Personalization data (RW) & recs algorithms AB Test data (RW)Followthrough Account and Membership data (RW)(2011-12) membershipFinal (2013) Payments PCI and SOX data
  7. 7. Availability 4 x nines Scale Performance Unconstrained Unlimitedhorizontal scaling compute
  8. 8. Scalability Performance Availability
  9. 9. Scalability Performance Availability
  10. 10. 1/4/2009 2/4/2009 3/4/2009 4/4/2009 5/4/2009 6/4/2009 7/4/2009 8/4/2009 9/4/2009 10/4/2009 11/4/2009 12/4/2009 1/4/2010 2/4/2010 3/4/2010 4/4/2010 5/4/2010 6/4/2010 7/4/2010 8/4/2010 9/4/2010 10/4/2010 11/4/2010 12/4/2010 1/4/2011 2/4/2011 3/4/2011 4/4/2011 5/4/2011 6/4/2011 7/4/2011 8/4/2011 9/4/2011 10/4/2011 11/4/2011 12/4/2011 1/4/2012 2/4/2012 3/4/2012 4/4/2012 5/4/2012 6/4/2012 7/4/2012 8/4/2012 Scaling Netflix Streaming Service: Weekly Streaming Starts23
  11. 11. Netflix Cross-Regional Cloud Architecture
  12. 12. Goal: Regional Failover
  13. 13. Building Global Netflix Streaming Product
  14. 14. Scalability Performance Availability
  15. 15. Weekly Cloud Cost Per Streaming Start (last 12 months) 28
  16. 16. Simian Army: Cloud Efficiency Automation Janitor Monkey  Regularly scrape unused capacity  Clean up instances, ASGs, ELBs, SGs, etc. Efficiency Monkey  AI-based resource under-usage detection (CPU, memory, etc.) Automated Deletion of Old Data  TTL for S3 (using ObjectExpiration) 29
  17. 17. Cyclical Streaming Usage Pattern 30
  18. 18. Load-Based Auto Scaling 50%+ Cost Saving Scale up/down by 70%+ Move to Load-Based Scaling 31 31
  19. 19. Scalability Performance Availability
  20. 20. A Truly Great Service… Has To Just Work! Availability Goal: 99.99% (30 secs/week at peak traffic) 33
  21. 21. 7/17/2011 7/24/2011 7/31/2011 8/7/2011 8/14/2011 8/21/2011 8/28/2011 9/4/2011 9/11/2011 9/18/2011 9/25/2011 10/2/2011 10/9/201110/16/201110/23/201110/30/2011 11/6/201111/13/201111/20/201111/27/2011 12/4/201112/11/201112/18/201112/25/2011 1/1/2012 1/8/2012 1/15/2012 1/22/2012 1/29/2012 2/5/2012 2/12/2012 2/19/2012 2/26/2012 3/4/2012 3/11/2012 3/18/2012 3/25/2012 4/1/2012 4/8/2012 4/15/2012 4/22/2012 Other AWS Outages 4/29/2012 5/6/2012 5/13/2012 5/20/2012 5/27/2012 6/3/2012 6/10/2012 6/17/2012 6/24/2012 7/1/2012 Historical Streaming Availability (13wkMA) 7/8/2012 Outage 7/15/2012 7/22/2012 7/29/2012 8/5/2012 8/12/2012 AWS / Netflix 8/19/2012 8/26/2012 June 29th, 2012 9/2/2012 9/9/2012 9/16/2012 9/23/2012 9/30/2012 10/7/2012 14-Oct10/21/201210/28/2012 Using Redundancy in AWS Infrastructure to Survive Failures 11/4/201211/11/2012
  22. 22. Cascading Failures API Instant Queue SimpleDB 35
  23. 23. Netflix Cloud Architecture 36
  24. 24. Cascading Failures X …99% Availability 99% Availability 99% Availability 300 99% = 4.90% 37
  25. 25. Strategies to Improve Availability Graceful Degradation Redundancy 38
  26. 26. Graceful Degradation 39
  27. 27. Redundancy A B C Zone Zone Zone Cassandra A B C S3 Backup Redundancy Across Availability Secure Cloud Zones Backup Storage Redundancy Across 40 Regions, Vendors
  28. 28. Testing Fault Tolerance: Simian Army Chaos Monkey Latency Monkey Chaos Gorilla 4
  29. 29. Open Source Portal at http://netflix.github.com
  30. 30. Superstorm Sandy AWS Infrastructure Held Up >2x Netflix Streaming Usage in East Coast Markets  Boston  New York  Philadelphia  Baltimore  D.C.
  31. 31. Focus on Building a Great Streaming Product 44
  32. 32. Netflix at 2012 re:InventDate/Time Presenter TopicWed 8:30-10:00 Reed Hastings Keynote with Andy JassyWed 1:00-1:45 Coburn Watson Optimizing Costs with AWSWed 2:05-2:55 Kevin McEntee Netflix’s Transcoding TransformationWed 3:25-4:15 Neil Hunt / Yury I. Netflix: Embracing the CloudWed 4:30-5:20 Adrian Cockcroft High Availability Architecture at NetflixThu 10:30-11:20 Jeremy Edberg Rainmakers – Operating CloudsThu 11:35-12:25 Kurt Brown Data Science with Elastic Map Reduce (EMR)Thu 11:35-12:25 Jason Chan Security Panel: Learn from CISOs working with AWSThu 3:00-3:50 Adrian Cockcroft Compute & Networking Masters Customer PanelThu 3:00-3:50 Ruslan M./Gregg U. Optimizing Your Cassandra Database on AWSThu 4:05-4:55 Ariel Tseitlin Intro to Chaos Monkey and the Simian Army
  33. 33. We are sincerely eager to hear your feedback on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.
  34. 34. We are sincerely eager to hear your feedback on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.

×