presented at DevOps Days Newcastle 2018, this talk examines the breadth and depth of DevOps through the lens of “failure”. Understanding failure is essential to gain the rewards DevOps offers. It covers reliability engineering, testing, culture, psychological safety, and more!
9. The Second Way: Feedback
• Devs on call with ops - experience
failure rather than distance from it
• Greater sense of “we”
• Greater understanding of how
business value is created
• Fix defects faster
You Build it,
You Run it
— WernerVogels
10. The Second Way: Feedback
• Define metrics at all layers
• Business
• Application
• Operations
• Deployments
• Invest in Telemetry
• Provide an easy way to instrument code
• Instrument everything
• Centralise metrics collection
• Extract value
Measure it
StatsD::increment(“AllTheThings");
If it moves
11. The Second Way: Feedback
UI
SERVICE
UNIT
I said I wanted
to be a Test
Pilot
13. The Third Way: Continual Experimentation and Learning
• Out-experimenting the competition
means failing fast
• A-B Testing
• You’re either a learning
organisation or you’re losing to
someone who is
— Andrew Shafer
There’s no
for experience
COMPRESSIONALGORITHM
14. The Third Way: Continual Experimentation and Learning
• Netflix can only survive failure
by failing all the time
• Everything fails, all the time
• Design for Failure
• Deliberately break things to make
things better
15. The Third Way: Continual Experimentation and Learning
• Every incident is a
learning opportunity
• Blameless Post-Mortems
• Beware…
• Hindsight Bias
• The Fundamental Attribution Error
• Finding a single root cause
• Are they learning focused?
• What stopped this being worse?
• It’s not all about the action items
Incidents are
unplanned investments
in your company’s
survival
— John Allspaw
16. Safety Culture & Psychological Safety
Safety Culture
“a culture that allows people to give the boss bad news”
• Industries and environments where safe
practices are questions of life and death
(Airlines, hospitals, mining, steel making)
• Safety 1
Who caused the problem
(and how should they be punished)
• Safety 2
What caused the problem
(who’s hurt, what are their needs, how can we help)
Psychological safety
• The belief that no one
will be punished or
humiliated for speaking
up with ideas, questions,
concerns or mistakes
• Far and away the most
important of the five key
dynamics that set
successful teams apart
at Google
17. When people fail
Who comes to work to look…
"
I’m with
Stupid
• Ignorant
• Incompetent
• Intrusive
• Negative
Don’t ask questions
Don’t admit mistakes
Don’t offer ideas
Don’t challenge the status quo
If you don’t want to look
18. Failure vs Accountability
The freedom to Fail
is the hallmark of
Learning
#life-long-learning
Low High
LowHigh
PsychologicalSafety
Accountability to meet demanding goals
Comfort
Zone
Learning
Zone
Anxiety
Zone
Apathy
Zone
The Competitive Imperative of Learning Amy C. Edmondson
https://hbr.org/2008/07/the-competitive-imperative-of-learning
19. I’d love to continue
the conversation
and find even more DevOps T-shirt inspiration
James Boswell
@xcapee