What if we could measure the indirect costs of pain building up on a software project? What if we could measure the effects of learning curves, collaboration pain, and problems building up in the code?
We could:
Identify the highest leverage opportunities for improvement
Make the case to management that budget should be allocated for a solution
Lead the organization in making better decisions with a data-driven feedback loop to guide the way
Several years ago, I stumbled into a solution for measuring the growing “friction” in developer experience. Visibility turned my world upside-down.
We've been trying to explain the pain of Technical Debt for generations, but we've never been able to measure it. Visibility introduces a whole new world of possibilities.
In this talk, I'll show you what I'm measuring, how exactly I'm measuring it, then we'll talk through the implications for our teams, our organizations, and our industry.
We can identify the highest leverage improvement opportunities and steer our projects with a data-driven feedback loop.
We can breakdown the "wall of ignorance" between developers and management by defining an explicit language for managing technical risk.
We can teach the art of software development with a data-driven feedback loop and codify our knowledge into sharable decision principles.
We can revolutionize our business accounting methods to take the pain of software development into account, so the costs and risks are visible at the highest levels of the organization.
We can conquer the challenges across the software industry by working together, learning together, and sharing our knowledge with the world.
With visibility, we can start a revolution in data-driven learning.
10. RESET
“A description of the goal is not a strategy.”
-- Richard P. Rumelt
What’s wrong with our current strategy?
11. Our “Strategy” for Success
High Quality Code
Low Technical Debt
Easy to Maintain
Good Code Coverage
12. RESET
“A good strategy is a specific and coherent response to—
and approach for overcoming—the obstacles to progress.”
-- Richard P. Rumelt
The problem is we don’t have a strategy...
13. What are the biggest obstacles that
prevent us from breaking the cycle?
Start%
Over%
Unmaintainable%
So0ware%
Why Can’t We Break the Cycle?
14. Consulting with Failing Projects
Engineers: “We’re going to CRASH!”
Manager: “What do I do? We can’t miss these deadlines.”
25. The amount of PAIN was caused by…
Likeliness(of((
Unexpected(
Behavior(
Cost(to(Troubleshoot(and(Repair(
High(Frequency(
Low(Impact(
Low(Frequency(
Low(Impact(
Low(Frequency(
High(Impact(
PAIN(
26. What Causes Unexpected
Behavior (likeliness)?
What Makes Troubleshooting
Time-Consuming (impact)?
Semantic Mistakes
Stale Memory Mistakes
Association Mistakes
Bad Input Assumption
Tedious Change Mistakes
Copy-Edit Mistakes
Transposition Mistakes
Failed Refactor Mistakes
False Alarm
Non-Deterministic Behavior
Ambiguous Clues
Lots of Code Changes
Noisy Output
Cryptic Output
Long Execution Time
Environment Cleanup
Test Data Creation
Using Debugger
Most of the pain was caused by human factors.
What causes PAIN?
27. What Causes Unexpected
Behavior (likeliness)?
What Makes Troubleshooting
Time-Consuming (impact)?
Non-Deterministic Behavior
Ambiguous Clues
Lots of Code Changes
Noisy Output
Cryptic Output
Long Execution Time
Environment Cleanup
Test Data Creation
Using Debugger
What causes PAIN?
Most of the pain was caused by human factors.
Semantic Mistakes
Stale Memory Mistakes
Association Mistakes
Bad Input Assumption
Tedious Change Mistakes
Copy-Edit Mistakes
Transposition Mistakes
Failed Refactor Mistakes
False Alarm
28. What Causes Unexpected
Behavior (likeliness)?
What Makes Troubleshooting
Time-Consuming (impact)?
Non-Deterministic Behavior
Ambiguous Clues
Lots of Code Changes
Noisy Output
Cryptic Output
Long Execution Time
Environment Cleanup
Test Data Creation
Using Debugger
What causes PAIN?
Semantic Mistakes
Stale Memory Mistakes
Association Mistakes
Bad Input Assumption
Tedious Change Mistakes
Copy-Edit Mistakes
Transposition Mistakes
Failed Refactor Mistakes
False Alarm
Most of the pain was caused by human factors.
29. What Causes Unexpected
Behavior (likeliness)?
What Makes Troubleshooting
Time-Consuming (impact)?
Non-Deterministic Behavior
Ambiguous Clues
Lots of Code Changes
Noisy Output
Cryptic Output
Long Execution Time
Environment Cleanup
Test Data Creation
Using Debugger
What causes PAIN?
PAIN is a consequence of how we interact with the code.
Semantic Mistakes
Stale Memory Mistakes
Association Mistakes
Bad Input Assumption
Tedious Change Mistakes
Copy-Edit Mistakes
Transposition Mistakes
Failed Refactor Mistakes
False Alarm
30. The Consequences of Problem-Breakdown
7:01
Iterative Validation with Unit Tests
7:010:00
14:230:00
Skipping Tests and Validating at the End
31. 7:01
Iterative Validation with Unit Tests
7:010:00
14:230:00
Skipping Tests and Validating at the End
Urgency Leads to High-Risk Decisions
If I make no mistakes I save ~2 hours.
If I make several mistakes I lose ~8 hours.
The Consequences of Problem-Breakdown
35. PAIN occurs during the process of
understanding and extending the software
Complex(
So*ware(
PAIN
Not the Code.
Optimize “Idea Flow”
36. My team spent tons of time working on
improvements that didn’t make much difference.
We had tons of automation, but the
automation didn’t catch our bugs.
37. My team spent tons of time working on
improvements that didn’t make much difference.
We had well-modularized code,
but it was still extremely time-consuming to troubleshoot defects.
38. The hard part isn’t solving the problems
it’s identifying the right problems to solve.
“What are the specific problems
that are causing the team’s pain?”
39. measures the time spent on:
Idea Flow
x
Troubleshooting
x
Learning
x
Rework
Quality Risk Familiarity Risk Assumption Risk
40. Why Measure These Things?
x
Troubleshooting
x
Learning
x
Rework
measures the time spent on:
Idea Flow
41. The Rhythm of Software Development
Write a little code.
Work out the kinks.
Write a little code.
Work out the kinks.
Write a little code.
Work out the kinks.
42. The Rhythm of Software Development
ConflictConfirm
Rework'
Learn'
Validate(
Modify'
Progress Loop Conflict Loop
Troubleshoot'
43. The Rhythm of Software Development
ConflictConfirm
Rework'
Learn'
Validate(
Modify'
Progress Loop Conflict Loop
Troubleshoot'
51. Reading Visual Indicators in Idea Flow Maps
Le#$Atrium$
Le#$Ventricle$
Right$Ventricle$
Right$Atrium$
What’s$causing$this$pa7ern?$
Similar to how an EKG helps doctors diagnose heart problems...
52. ...Idea Flow Maps help developers diagnose software problems.
Problem-Solving
Machine
Reading Visual Indicators in Idea Flow Maps
78. “What seems to be our
biggest cause of pain?”
Add up the Friction by Tag
#ReportingEngine
#Hibernate
#MergeHell
79. Case Study: Huge Mess with Great Team
1. Test Data Generation
2. Merging Problems
3. Repairing False Alarms
1000 hours/month
The Biggest Problem:
~700 hours/month generating test data
81. Experiment
Time
Our Perception of Time was WAY OFF
Setup Experiment Analyze Results &
Decide Next Experiment
Execute
waiting - time goes slow
doing - time zooms by
84. Distill Lessons Learned into
“Decision Principles”
Answers two questions
How do I evaluate my situation?
What should I optimize for?
Trade-off Decisions
85. The Code Sandwich Principle
The thickness of the
sandwich increases
troubleshooting
difficulty
Behavior Complexity
Observability
Ease of Manipulation
Optimize for low diagnostic difficulty.
87. The Waxy Coating Principle
Optimize the signal to noise ratio.
Software
(before)
Software
(after)
Tests are like a waxy coating poured over the code.
88. Friction in the BRAND NEW microservices code…
27:15
“Why is there so much friction?”
94. “The Idea Flow Factory”
(supply chain model)
Optimize the Rate of Idea Flow
Across the Organization (or the Industry)
95. 18 months after a Micro-Services/Continuous Delivery rewrite.
Troubleshooting
Progress
Learning
40-60% of dev capacity on “friction”
0:00 28:15
12:230:00
Case Study: From Monolith to Microservices
96. 18 months after a Micro-Services/Continuous Delivery rewrite.
Troubleshooting
Progress
Learning
40-60% of dev capacity on “friction”
0:00 28:15
12:230:00
Case Study: From Monolith to Microservices
98. Long-Term Pain
0%
100%
Release 1 Release 2 Release 3
Troubleshooting
Progress
Learning
Percentage Capacity spent on Troubleshooting (red) and Learning (blue)
(extrapolated from samples)
99. 0%
100%
Release 1 Release 2 Release 3
Percentage Capacity spent on Troubleshooting (red) and Learning (blue)
Figure out what to do
Learning is front-loaded
Troubleshooting
Progress
Learning
Long-Term Pain
100. 0%
100%
Release 1 Release 2 Release 3
Percentage Capacity spent on Troubleshooting (red) and Learning (blue)
Rush Before the Deadline
Validation is Deferred
Troubleshooting
Progress
Learning
Long-Term Pain
102. 0%
100%
Release 1 Release 2 Release 3
Percentage Capacity spent on Troubleshooting (red) and Learning (blue)
Chaos Reigns
Unpredictable work stops
fitting in the timebox
Troubleshooting
Progress
Learning
Long-Term Pain
103. Developers tried to explain to management:
“Technical debt is building up in the code!”
Managers:
“We’re already behind schedule!!”
104. PAIN
Eventually the problems got so bad,
they couldn’t be ignored.
Long-Term Pain
Builds were breaking
Releases were painful
Productivity slowing to a crawl
Begging for time
105. The Team’s Focus: Reducing technical debt
in the micro-services code + test automation
The Biggest Problem: Integration problems across teams
leading to environment down time.
1000 hours/month!!
Cost:
Case Study: From Monolith to Microservices
106. The cost of bad architecture
in the microservices world
is EXTREMELY HIGH.
Visibility gives us a way to
manage long-term technical risk.
107. Risk is the Bridge Language
Between Managers and Developers
Quality Risk Familiarity Risk
0%
100%
Release 1 Release 2 Release 3
Percentage Capacity spent on Troubleshooting (red) and Learning (blue)
115. How much work does it take to
complete a software task?
=
Side-Effects from
Ignoring the Risk
Trade-off Decisions
Writing Unit Tests Troubleshooting Mistakes
Direct Cost Indirect Costs
or
We make high-risk decisions because the indirect costs are hard to quantify.
Likeliness of Event
Potential Impact
x=
Risk
126. Quality Risk Familiarity Risk
0%
100%
Release 1 Release 2 Release 3
Percentage Capacity spent on Troubleshooting (red) and Learning (blue)
“How do we control the risk?”
Risk is the Bridge Language
Between Managers and Developers
127. Process Control in Manufacturing
This is “Out of Control”
Lower Variability = Better Control
128. “Pain Control” in Software Development
Average Pain per Incident
This is Control
Target
Control Limit
129. This is “Out of Control”
16:10
This is “Control”
130. Idea Flow Learning Framework
Input:
Decision Constraints
Target: Optimize the Rate of Idea Flow
short-term looplong-term
loop
1.
Visibility
2.
Clarity
3.
Awareness
F ocus!
Output: “Friction” in Idea Flow
Improve Quality of Decisions
131. Input:
Decision Constraints
Target: Optimize the Rate of Idea Flow
short-term looplong-term
loop
1.
Visibility
2.
Clarity
3.
Awareness
F ocus!
Output: “Friction” in Idea Flow
Target - The direction of “better”
Target: Optimize the Rate of Idea Flow
Idea Flow Learning Framework
132. Input:
Decision Constraints
Target: Optimize the Rate of Idea Flow
short-term looplong-term
loop
1.
Visibility
2.
Clarity
3.
Awareness
F ocus!
Output: “Friction” in Idea Flow
Input - The constraints that limit our short-term choices…
Idea Flow Learning Framework
133. Input:
Decision Constraints
Target: Optimize the Rate of Idea Flow
short-term looplong-term
loop
1.
Visibility
2.
Clarity
3.
Awareness
F ocus!
Output: “Friction” in Idea Flow
Output - The pain signal we’re trying to improve
Idea Flow Learning Framework
134. Input:
Decision Constraints
Target: Optimize the Rate of Idea Flow
short-term looplong-term
loop
1.
Visibility
2.
Clarity
3.
Awareness
F ocus!
Output: “Friction” in Idea Flow
Focus on the biggest pain…
F ocus!
Idea Flow Learning Framework
135. Input:
Decision Constraints
Target: Optimize the Rate of Idea Flow
short-term looplong-term
loop
1.
Visibility
2.
Clarity
3.
Awareness
F ocus!
Output: “Friction” in Idea Flow
1. Visibility - Identify the specific patterns.
1.
Visibility
Idea Flow Learning Framework
136. Input:
Decision Constraints
Target: Optimize the Rate of Idea Flow
short-term looplong-term
loop
1.
Visibility
2.
Clarity
3.
Awareness
F ocus!
Output: “Friction” in Idea Flow
2. Clarity - Understand cause and effect.
2.
Clarity
Idea Flow Learning Framework
137. Input:
Decision Constraints
Target: Optimize the Rate of Idea Flow
short-term looplong-term
loop
1.
Visibility
2.
Clarity
3.
Awareness
F ocus!
Output: “Friction” in Idea Flow
3.
Awareness
3. Awareness - Learn strategies to avoid the pain.
Idea Flow Learning Framework
138. Input:
Decision Constraints
Target: Optimize the Rate of Idea Flow
short-term looplong-term
loop
1.
Visibility
2.
Clarity
3.
Awareness
F ocus!
Output: “Friction” in Idea Flow
Improve Quality of Decisions
Idea Flow Learning Framework
139. “The Idea Flow Factory”
(supply chain model)
Optimize the Rate of Idea Flow
Across the Software Supply Chain
140. Brevity?
What should we be optimizing for?
Modularity?
Code Coverage?
What does “better” really mean?
142. Data-Driven Software Mastery
Input:
Decision Constraints
Target: Optimize the Rate of Idea Flow
short-term looplong-term
loop
1.
Visibility
2.
Clarity
3.
Awareness
F ocus!
Output: “Friction” in Idea Flow
Improve Quality of Decisions
143. measures the time spent on:
Idea Flow
x
Troubleshooting
x
Learning
x
Rework
Quality Risk Familiarity Risk Assumption Risk
144. Idea Flow gives us a
universal definition of effective practice.
146. Quality Risk Familiarity Risk Assumption Risk
What are the
consequences of our decisions?
Idea Flow gives us a
universal definition of effective practice.
147. Idea Flow gives us a
universal language for sharing our experiences.
148. Quality Risk Familiarity Risk Assumption Risk
What strategies tend to
minimize the friction in Idea Flow?
Idea Flow gives us a
universal language for sharing our experiences.
And in what contexts?
149. Idea Flow gives us the
capability to learn together as an industry.
150. Quality Risk Familiarity Risk Assumption Risk
We have an opportunity to learn together
like we’ve never had before.
Idea Flow gives us the
capability to learn together as an industry.
151. Why? Because this REALLY Sucks.
Seeing your creation get stomped on by organizational
dysfunction is an emotionally damaging experience.
154. Collaboration Platform
IFM Tools
Team Mastery Platform
Team
Joe
Sally
Mark
Eric
Industry Collaboration
Platform
Anonymized
Data
Integrated #HashTag
Glossary
Project
Tiger
Project
Bear
(REST)
156. Change Starts with Making
the PAIN Visible!
Janelle Klein
openmastery.org @janellekz
157. Janelle Klein
openmastery.org @janellekz
Check out openmastery.org for details.
Read my Book.
Think About It.
FREE with
Reading GroupBuy It
How to Measure the PAIN
in Software Development
Janelle Klein
158. Why read Idea Flow?
Because Rene and Matt said so!
“If you just want to read ONE book about software
engineering, this year, read this one: leanpub.com/
ideaflow #ideaflow”
Rene Gröschke, Gradle Inc.
“If you don't know about @janellekz's work on Idea
Flow Learning, you're missing out. leanpub.com/
ideaflow”
Matt Stine, NFJS Tour Speaker
@breskeby
@mstine
Check out openmastery.org for details.