2. Operations at Web Scale
is the ability to consistently
create and deploy reliable
software to an unreliable
platform that scales horizontally.
http://radar.oreilly.com/2007/10/operations-is-a-competitive-ad.html
9. This man is
John Allspaw
http://www.flickr.com/photos/norby/7446208116/
10. This is
Allspaw’s
Monster
RAAAWR!!!
I’m SCARY!
Image Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/ops-metametrics-the-currency-you-pay-for-change
11. Slide Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/ops-metametrics-the-currency-you-pay-for-change
12. Slide Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/ops-metametrics-the-currency-you-pay-for-change
13. Slide Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/ops-metametrics-the-currency-you-pay-for-change
14. I can haz
cuddle?
Images Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/ops-metametrics-the-currency-you-pay-for-change
20. Tools* are not enough
(* even really great tools like Chef!)
21. Conway’s law:
“Organizations which design systems ...
are constrained to produce designs
which are copies of the communication
structures of these organizations...”
24. Choose:
Discourage change in the
interests of stability
or
Allow change to happen as often
as it needs to
Slide Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
25. Common Attributes of Web Scale Cultures
Infrastructure Application Dev / Ops
as Code as Services as Teams
‣ Full Stack Automation ‣ Service Orientation ‣ Shared Metrics /
‣ Commodity Hardware ‣ Versioned APIs Monitoring
‣ Reliability in software ‣ Software Resiliency ‣ Incident Management
stack (Design for Failure) ‣ Service Owners On-call
‣ Datacenter APIs ‣ Database/Storage ‣ Tight integration
‣ Core Infra Services Abstraction ‣ Continuous Integration
‣ Infrastructure as ‣ Complexity pushed up ‣ Continuous
Product the stack Deployment
‣ App as Customer ‣ Deep Instrumentation ‣ SRE/SRO
‣ GameDay
26. This isn’t new
‣ Theory of Constraints
‣ Lean / JIT
‣ Six Sigma
‣ Toyota Production System
‣ Agile
‣ etc...
26
27. ...but we can’t
do it that way
because...
elephants cannot fly just by
flapping their ears harder...
http://www.flickr.com/photos/garymacfadyen/6860004327/
28. To fly you must have wings, surface area, and a
high power to weight ratio...
elephants cannot fly by flapping
their ears harder...
http://www.flickr.com/photos/lhirlimann/4872199920/
30. Common Attributes of Web Scale Cultures
Infrastructure Application Dev / Ops
as Code as Services as Teams
‣ Full Stack Automation ‣ Service Orientation ‣ Shared Metrics /
‣ Commodity Hardware ‣ Versioned APIs Monitoring
‣ Reliability in software ‣ Software Resiliency ‣ Incident Management
stack (Design for Failure) ‣ Service Owners On-call
‣ Datacenter APIs ‣ Database/Storage ‣ Tight integration
‣ Core Infra Services Abstraction ‣ Continuous Integration
‣ Infrastructure as ‣ Complexity pushed up & Deployment
Product the stack ‣ SRE/SRO
‣ App as Customer ‣ Deep Instrumentation ‣ GameDay
31. Infrastructure as Code:
Enable the reconstruction of the business
from nothing but a source code repository,
an application data backup, and bare
resources.
33. Managing Complexity Then
To Add a New Server…
• 2x Web Server
Configurations
• 2 Web Server Restarts
Add 1 server Frontend Web • 4x Database Configurations
20+ Changes Servers
• 8x Firewall Configurations
• DNS Service
Application Servers
• Network Configuration
• Deployer
Database • 8x Monitoring Changes
The Bottom Line…
20+ Changes
12+ New Infrastructure
Dependencies
4+ Hours
34. Managing Complexity Later
We added:
• Load Balancers
• MemCache
• Search Appliances
• Lots of VM’s
• More Scale
Exponential Increase In
• Configuration Changes
• Infrastructure
Dependencies
• Skills Needed
• Greater Risk
35. Managing Complexity Today
How Do we Manage This
at Cloud Scale?
• Thousands of
infrastructure
dependencies and
configurations needed
for each change.
• Huge Amounts of Time
• Increased Cost of
Correction of Manual
Errors
• Huge Need for Talent
• Risk of Critical Skills
Shortage
37. How you get to Continuous Deployment
Full
Infrastructure
Continuous Automation
Application
Configuration
Management
Deployment
Common
Discovery and Management
Automation Tasks:
Visibility Scripts, OS
Compliance,
Updates & Patches
38. Continuous Deployment: Version Control
Keep every relevant artifact in version
control.
‣ Infrastructure
‣ Operations
‣ Applications
‣ Tests
‣ Documentation
39. Continuous Deployment: Code Review
Review your code before deployment.
‣ Gate what gets pushed
‣ Code reviews
‣ Partial pre-testing with continuous
integration
43. Step 2 – Introducing the Build Stage
Build Changes in SCM
llin
g triggers builds
Pu
and tests
Tag
Payload
N
Application Software
Devs Configuration
Management
(SCM) Payload
3
Payload
2
Payload
1
Infrastructure
Devs
44. Step 3 – Introducing Chef Server and the CD Process
Latest Codebase and Build
Create Data (#)
Upload Cookbook
Build Update DEV DEV
Autodeploy to
Chef
ng Server
Application Devs Infrastructure Devs Pul l i localhost Promote
Tag
Payload Request Portal QA
N
1, 2, … N
Software Autodeploy
Promote
Configuration
N
Management PROD
(SCM) Payload
3 …..
…..
Payload
2 2
….
Payload Builds 1
1
45. Common Attributes of Web Scale Cultures
Infrastructure Application Dev / Ops
as Code as Services as Teams
‣ Full Stack Automation ‣ Service Orientation ‣ Shared Metrics /
‣ Commodity Hardware ‣ Versioned APIs Monitoring
‣ Reliability in software ‣ Software Resiliency ‣ Incident Management
stack (Design for Failure) ‣ Service Owners On-call
‣ Datacenter APIs ‣ Database/Storage ‣ Tight integration
‣ Core Infra Services Abstraction ‣ Continuous Integration
‣ Infrastructure as ‣ Complexity pushed up & Deployment
Product the stack ‣ SRE/SRO
‣ App as Customer ‣ Deep Instrumentation ‣ GameDay