Keynote Presentation by Damon Edwards, co-founder of Rundeck, at DevOps Days Austin , May 4, 2017.
Deployment is a solved problem. Sure there is still work to be done, but the DevOps community has successfully proven that anyone can both scale deployment automation and distribute the capability to execute deployments. Now, we have to turn our attention to the next critical constraint: What happens after deployment?
We all know that failure is inevitable and is coming our way at any moment. How do we respond quickly and effectively to those failures? What works when there is just a small set of teams or an isolated system to manage will quickly break down when the organization grows in size and complexity. But on the other hand, what has been commonly practiced in large-scale enterprises is proving to be too cumbersome, too silo dependent, and simply too slow for today's business needs.
How do we rapidly respond to incidents and recover complex interdependent systems while working within an equally complex and interdependent organization? How do Ops teams embrace the DevOps and Agile inspired demand for speed while maintaining quality and control?
This talk examines the trial-and-error lessons learned by some forward-thinking enterprises who are currently streamlining how they:
-Resolve incidents
-Reduce friction between teams
-Divide up operational responsibilities
-Improve the quality of their ongoing operations (and organizational learning)
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
13. Operations is getting squeezed
OpsBusiness
Idea
Shorter Time-to-Market
Fast Feedback
from Users
Dev Ops
Running
Services
Improved Quality
Digital and DevOps
Availability Auditing
Security Compliance
"Go faster!"
"Open up!"
"Be more secure!"
"Be more reliable!"
14. Operations is getting squeezed
OpsBusiness
Idea
Shorter Time-to-Market
Fast Feedback
from Users
Dev Ops
Running
Services
Improved Quality
Digital and DevOps
"Go faster!"
"Open up!"
Availability Auditing
Security Compliance
"Be more secure!"
"Be more reliable!"
Costs
"Spend less!"
"Do more!"
15. Operations is getting squeezed
OpsBusiness
Idea
Shorter Time-to-Market
Fast Feedback
from Users
Dev Ops
Running
Services
Improved Quality
Digital and DevOps
"Go faster!"
"Open up!"
Availability Auditing
Security Compliance
"Be more secure!"
"Be more reliable!"
Costs
"Spend less!"
"Do more!"
Ops
is
Difficult
19. Planned + Unplanned = Expensive Context Switching
Gerald Weinberg via Jeff Atwood
https://blog.codinghorror.com/the-multi-tasking-myth/
20. Silos are everywhere
ContextContext
Work Task
Work Task
Work Task
Work Task
Queue
Work Task
Work Task
Work Task
Work Task
Queue
Work Task
Work Task
Silo A Silo B
Work Task
!
Handoffs
!
Feedback
21. Silos are everywhere
ContextContext
Work Task
Work Task
Work Task
Work Task
Queue
Work Task
Work Task
Work Task
Work Task
Queue
Work Task
Work Task
Silo A Silo B
Work Task
!
Handoffs
!
Feedback
23. Cross functional teams? Never enough to go around.
ContextContext
Work Task
Work Task
Work Task
Work Task
Queue
Work Task
Work Task
Work Task
Work Task
Queue
Team A Team B
Context Work Task
Work Task
Work Task
Work Task
Queue
!
Handoffs
!
Handoffs
Team C
29. Silos + Tool Evolution = Islands of Automation
Puppet Chef
Shell Scripts
Data ETL
PowershellScripts
Network
Management
Monitoring
Ansible
Legacy
Datacenter
Automation
ContainerManagement
SQL
Tools
NewTools
New
Tools
30. Unplanned and planned work
Silos are everywhere
Its a complex system2
Crushing technical debt
Islands of automation
Again: What makes operations so difficult?
31. Unplanned and planned work
Silos are everywhere
Its a complex system2
Crushing technical debt
Islands of automation
Again: What makes operations so difficult?
= Not enough time or people!
+
32. “Self-Service Operations” to improve capacity
Self-Service
Operations
"Consumer" "Producer"
Define actions
(optional)
Execute actions
Define actions
(or vet actions)
Control policies
Build and Scale
Allows definition, execution, and management control to be
separated and moved to where most effective use of labor
36. Two prevailing models of operations support
Running
Service
“You build it. They run it.” “You build it. You run it.”
Development
Team
Operations
Team
Dev Ops
Integrated Delivery Team
Running
Service
37. Two prevailing models of operations support
Running
Service
“You build it. They run it.” “You build it. You run it.”
Development
Team
Operations
Team
Dev Ops
Integrated Delivery Team
Running
Service
38. Two prevailing models of operations support
Running
Service
“You build it. They run it.” “You build it. You run it.”
Development
Team
Operations
Team
Dev Ops
Integrated Delivery Team
Running
Service
“two-pizza team”
39. “You build it. They run it.” (aka… the way it always was)
It’s 2am ….
It’s 2pm ….
It’s the NOC…
Talk them through: health checks,
reviewing log files, and process of
diagnosing and recovering the system.
Same as you did for dev teams 2
months ago, QA teams last month,
Ops during deploy last week, etc.
40. “You build it. They run it.” (aka… the way it always was)
It’s 2am ….
It’s 2pm ….
41. “You build it. They run it.” (aka… the way it always was)
It’s 2am ….
It’s 2pm ….
It’s Ops…
“Will your applications be affected if
we take down EU-West?”
“Is it ok if we change these firewall
rules?”
“We are getting customer complaints
about performance. Are you sure you
didn’t change something?”.
42. “You build it. They run it.” (aka… the way it always was)
Running
Service
Development
Team
Operations
Team
43. “You build it. They run it.” (aka… the way it always was)
Running
Service
Development
Team
Operations
Team
44. “You build it. You run it.”
Dev Ops
Integrated Delivery Team
45. “You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
46. “You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
47. “You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
48. “You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
49. “You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
50. “You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
51. “You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
“two-pizza teams”?
Just change how
business is structured,
funded, and operated.
53. Have the labor scaling benefits of “you build it, they run it”
without
the frequent escalations
the bad handoffs
Self-Service Operations lets you…
54. Have the labor scaling benefits of “you build it, they run it”
without
the frequent escalations
the bad handoffs
Self-Service Operations lets you…
Have the responsiveness/control of “you build it, you run it”
without
the scaling limitations
62. Enables Ops managers to focus on creating value
Self-Service
Operations
"Consumer" "Producer"
Define actions
(optional)
Execute actions
Define actions
(or vet actions)
Control policies
Build and Scale
Manager
Old mindset:
Protect capacity
Say “no”
New mindset:
Scaling service
Get more users
63. Example of putting these types of principles to work
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
64. Example of putting these types of principles to work
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
• Automated Ops procedures written/vetted by the delivery teams
65. Example of putting these types of principles to work
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
• Automated Ops procedures written/vetted by the delivery teams
• Ops remained in full control of what can run and security policy
66. Example of putting these types of principles to work
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
• Automated Ops procedures written/vetted by the delivery teams
• Ops remained in full control of what can run and security policy
• Empowered support teams with self-service ops tasks
67. Example of putting these types of principles to work
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
• Automated Ops procedures written/vetted by the delivery teams
• Ops remained in full control of what can run and security policy
• Empowered support teams with self-service ops tasks
• Empowered developers with limited self-service operations
68. Example of putting these types of principles to work
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
• Automated Ops procedures written/vetted by the delivery teams
• Ops remained in full control of what can run and security policy
• Empowered support teams with self-service ops tasks
• Empowered developers with limited self-service operations
• Combined with new incident response model
69. Better for the business and a better way to work
90% Reduction in MTTR
50% Reduction in escalations
55% Reduction of overall support costs
70. Recap
Move definition,
execution, and
management controls
to where best use of
labor
Understand the
pressures on Ops
Explicit investment in
process and tooling
OpsBusiness
Idea
Shorter Time-to-Market
Fast Feedback
from Users
Dev Ops
Running
Services
Improved Quality
Digital and DevOps
"Go faster!"
"Open up!"
Availability Auditing
Security Compliance
"Be more secure!"
"Be more reliable!"
Costs
"Spend less!"
"Do more!"
Ops
is
Difficult
Self-Service
Operations
"Consumer" "Producer"
Define actions
(optional)
Execute actions
Define actions
(or vet actions)
Control policies
Build and Scale
Self-Service Operations Pattern