O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Now Playing on Netflix:
Adventures in a Cloudy Future
CMG November 2013
Adrian Cockcroft
@adrianco @NetflixOSS
http://www....
Netflix Member Web Site Home Page
Personalization Driven – How Does It Work?
How Netflix Used to Work
Consumer
Electronics

Oracle
Monolithic Web
App

AWS Cloud
Services

MySQL

CDN Edge
Locations

O...
How Netflix Streaming Works Today
Consumer
Electronics

User Data
Web Site or
Discovery API

AWS Cloud
Services

Personali...
Nov
2012
Streaming
Bandwidth

March
2013
Mean
Bandwidth
+39% 6mo
Netflix Scale
• Tens of thousands of instances on AWS
– Typically 4 core, 30GByte, Java business logic
– Thousands created...
Reactions over time
2009 “You guys are crazy! Can’t believe it”
2010 “What Netflix is doing won’t work”

2011 “It only wor...
"This is the IT swamp draining manual for anyone who is neck deep in alligators." Adrian Cockcroft, Cloud Architect at Net...
Web-scale

Cloud
Commodity

ClientServer

Mainframe
Goal of Traditional IT:
Reliable hardware
running stable software
SCALE
Breaks hardware
….SPEED
Breaks software
SPEED at
SCALE
Breaks everything
Incidents – Impact and Mitigation
Public Relations
Media Impact

PR

Y incidents mitigated by Active
Active, game day prac...
Web Scale Architecture
AWS
Route53

DynECT
DNS

UltraDNS

DNS
Automation

Regional Load Balancers

Regional Load Balancers...
CIO Says Speed IT Up!
“Get inside your adversaries'
OODA loop to disorient them”
Colonel Boyd, USAF
Land grab
opportunity

Engage
customers

Deliver

Measure
Customers

Act

Competitive
Move

Observe

Colonel Boyd,
USAF
“G...
Territory
Expansion

Print Ad
Campaign
Upgrade
Mainframe

Measure
Revenue

Act

Foreign
Competition

Observe

Mainframe
Er...
80’s Mainframe Innovation Cycle
•
•
•
•
•

Cost $1M to $100M
Duration 1 to 5 years
Bet the whole company
Cost of failure –...
Territory
Expansion

TV Advert
Campaign
Install
Servers

Measure
Revenue

Act

Foreign
Competition

Observe

Client/Server...
90’s Client Server Innovation Cycle
•
•
•
•
•

Cost $100K to $10M
Duration 3 – 12 months
Bet a product line or division
Co...
Territory
Expansion

Web
Display Ads

Measure
Sales

Install
Capacity

Act

Competitive
Moves

Observe

Commodity
Era – 2 ...
00’s Commodity Agile Innovation Cycle
•
•
•
•
•

Cost $10K to $1M
Duration 2 – 12 weeks
Bet a product feature
Cost of fail...
Train Model Process Hand-Off Steps
Product Manager

Developer
QA Integration Team
Operations Deploy Team
BI Analytics Team
What Happened?
Rate of change
increased

Cost and size
and risk of
change reduced
Cloud Native
Construct a highly agile and highly
available service from ephemeral and
assumed broken components
Real Web Server Dependencies Flow
(Netflix Home page business transaction as seen by AppDynamics)
Each icon is
three to a ...
Continuous Deployment
No time for handoff to IT
Developer Self Service
Freedom and Responsibility
Developers run what
they wrote
Root access and pagerduty
IT is a Cloud API
DEVops automation
Github all the things!
Leverage social coding
Putting it all together…
Land grab
opportunity

Launch AB
Test
Automatic
Deploy

Measure
Customers

Act

Competitive
Move

Observe

Continuous
Deli...
Continuous Innovation Cycle
•
•
•
•
•

Cost near zero, variable expense
Duration hours to days
Bet a decoupled microservic...
Continuous Deploy Hand-Off Steps
Product Manager
A/B test setup and enable
Self service hypothesis test results

Developer...
Continuous Deploy Automation
Check in code, Jenkins build
Bake AMI, launch in test env

Functional and performance test
Pr...
Bad Canary Signature
Happy Canary Signature
Global Deploy Automation
Afternoon in California
Night-time in Europe
If passes test suite, canary then deploy

West Coast...
Ephemeral Instances
• Largest services are autoscaled
• Average lifetime of an instance is 36 hours

Autoscale Up

Autosca...
(New Today!) Predictive Autoscaling

24 Hours predicted traffic vs. actual
More morning load
Sat/Sun high traffic

Lower l...
Inspiration
Takeaway
Speed Wins
Assume Broken
Cloud Native Automation
Github is your “app store” and resumé
@adrianco @NetflixOSS
http...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery
Próximos SlideShares
Carregando em…5
×

Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery

52.285 visualizações

Publicada em

Flowcon keynote was a few days before CMG, a few tweaks and some extra content added at the start and end. Opening Keynote talk for both conferences on how Speed Wins and how Netflix is doing Continuous Delivery

Publicada em: Tecnologia
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is doing Continuous Delivery

  1. 1. Now Playing on Netflix: Adventures in a Cloudy Future CMG November 2013 Adrian Cockcroft @adrianco @NetflixOSS http://www.linkedin.com/in/adriancockcroft
  2. 2. Netflix Member Web Site Home Page Personalization Driven – How Does It Work?
  3. 3. How Netflix Used to Work Consumer Electronics Oracle Monolithic Web App AWS Cloud Services MySQL CDN Edge Locations Oracle Datacenter Customer Device (PC, PS3, TV…) Monolithic Streaming App MySQL Content Management Limelight/Level 3 Akamai CDNs Content Encoding
  4. 4. How Netflix Streaming Works Today Consumer Electronics User Data Web Site or Discovery API AWS Cloud Services Personalization CDN Edge Locations DRM Datacenter Customer Device (PC, PS3, TV…) Streaming API QoS Logging OpenConnect CDN Boxes CDN Management and Steering Content Encoding
  5. 5. Nov 2012 Streaming Bandwidth March 2013 Mean Bandwidth +39% 6mo
  6. 6. Netflix Scale • Tens of thousands of instances on AWS – Typically 4 core, 30GByte, Java business logic – Thousands created/removed every day • Thousands of Cassandra NoSQL storage nodes – Mostly 8 core, 60Gbyte, 2TByte of SSD – 65 different clusters, over 300TB data, triple zone – Over 40 are multi-region clusters (6, 9 or 12 zone) – Biggest 288 nodes, 300K rps, 1.3M wps
  7. 7. Reactions over time 2009 “You guys are crazy! Can’t believe it” 2010 “What Netflix is doing won’t work” 2011 “It only works for ‘Unicorns’ like Netflix” 2012 “We’d like to do that but can’t” 2013 “We’re on our way using Netflix OSS code”
  8. 8. "This is the IT swamp draining manual for anyone who is neck deep in alligators." Adrian Cockcroft, Cloud Architect at Netflix
  9. 9. Web-scale Cloud Commodity ClientServer Mainframe
  10. 10. Goal of Traditional IT: Reliable hardware running stable software
  11. 11. SCALE Breaks hardware
  12. 12. ….SPEED Breaks software
  13. 13. SPEED at SCALE Breaks everything
  14. 14. Incidents – Impact and Mitigation Public Relations Media Impact PR Y incidents mitigated by Active Active, game day practicing X Incidents High Customer Service Calls CS YY incidents mitigated by better tools and practices XX Incidents Affects AB Test Results Metrics impact – Feature disable XXX Incidents No Impact – fast retry or automated failover XXXX Incidents YYY incidents mitigated by better data tagging
  15. 15. Web Scale Architecture AWS Route53 DynECT DNS UltraDNS DNS Automation Regional Load Balancers Regional Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas
  16. 16. CIO Says Speed IT Up!
  17. 17. “Get inside your adversaries' OODA loop to disorient them” Colonel Boyd, USAF
  18. 18. Land grab opportunity Engage customers Deliver Measure Customers Act Competitive Move Observe Colonel Boyd, USAF “Get inside your adversaries' OODA loop to disorient them” Customer Pain Point Analysis Orient Model Hypotheses Implement Decide Commit Resources Plan Response Get Buy-in
  19. 19. Territory Expansion Print Ad Campaign Upgrade Mainframe Measure Revenue Act Foreign Competition Observe Mainframe Era - 1 year cycle Customer Pain Point Systems Analysis Orient Capacity Model Customize Vendor SW Decide Vendor Evaluation 5 year Plan Board Level Buyin
  20. 20. 80’s Mainframe Innovation Cycle • • • • • Cost $1M to $100M Duration 1 to 5 years Bet the whole company Cost of failure – bankrupt or bought Cobol and DB2 on MVS
  21. 21. Territory Expansion TV Advert Campaign Install Servers Measure Revenue Act Foreign Competition Observe Client/Server Era – 3 month cycle Customer Pain Point Data Warehouse Orient Capacity Estimate Customize Vendor SW Decide Vendor Evaluation 1 year Plan CIO Level Buy-in
  22. 22. 90’s Client Server Innovation Cycle • • • • • Cost $100K to $10M Duration 3 – 12 months Bet a product line or division Cost of failure – revenue hit, CIO’s job C++ and Oracle on Solaris
  23. 23. Territory Expansion Web Display Ads Measure Sales Install Capacity Act Competitive Moves Observe Commodity Era – 2 week agile train Customer Pain Point Data Warehouse Orient Capacity Estimate Code Feature Decide Feature Priority 2 Week Plan Business Buy-in
  24. 24. 00’s Commodity Agile Innovation Cycle • • • • • Cost $10K to $1M Duration 2 – 12 weeks Bet a product feature Cost of failure – product mgr reputation Java and MySQL on RedHat Linux
  25. 25. Train Model Process Hand-Off Steps Product Manager Developer QA Integration Team Operations Deploy Team BI Analytics Team
  26. 26. What Happened? Rate of change increased Cost and size and risk of change reduced
  27. 27. Cloud Native Construct a highly agile and highly available service from ephemeral and assumed broken components
  28. 28. Real Web Server Dependencies Flow (Netflix Home page business transaction as seen by AppDynamics) Each icon is three to a few hundred instances across three AWS zones Cassandra memcached Start Here Personalization movie group choosers (for US, Canada and Latam) Web service S3 bucket
  29. 29. Continuous Deployment No time for handoff to IT
  30. 30. Developer Self Service Freedom and Responsibility
  31. 31. Developers run what they wrote Root access and pagerduty
  32. 32. IT is a Cloud API DEVops automation
  33. 33. Github all the things! Leverage social coding
  34. 34. Putting it all together…
  35. 35. Land grab opportunity Launch AB Test Automatic Deploy Measure Customers Act Competitive Move Observe Continuous Delivery on Cloud Customer Pain Point Analysis Orient Model Hypotheses Increment Implement Decide Plan Response Share Plans JFDI
  36. 36. Continuous Innovation Cycle • • • • • Cost near zero, variable expense Duration hours to days Bet a decoupled microservice code push Cost of failure – near zero, instant rollback Clojure/Scala/Python on NoSQL on Cloud
  37. 37. Continuous Deploy Hand-Off Steps Product Manager A/B test setup and enable Self service hypothesis test results Developer Automated test Self service deploy, on call Self service analytics
  38. 38. Continuous Deploy Automation Check in code, Jenkins build Bake AMI, launch in test env Functional and performance test Production canary test Production red/black push
  39. 39. Bad Canary Signature
  40. 40. Happy Canary Signature
  41. 41. Global Deploy Automation Afternoon in California Night-time in Europe If passes test suite, canary then deploy West Coast Load Balancers East Coast Load Balancers Europe Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Canary then deploy Next day on West Coast Canary then deploy Next day on East Coast After peak in Europe
  42. 42. Ephemeral Instances • Largest services are autoscaled • Average lifetime of an instance is 36 hours Autoscale Up Autoscale Down P u s h
  43. 43. (New Today!) Predictive Autoscaling 24 Hours predicted traffic vs. actual More morning load Sat/Sun high traffic Lower load on Weds Prediction driving AWS Autoscaler to plan capacity
  44. 44. Inspiration
  45. 45. Takeaway Speed Wins Assume Broken Cloud Native Automation Github is your “app store” and resumé @adrianco @NetflixOSS http://netflix.github.com

×