To meet expectations and optimize flow, managing risk is an important part of Kanban. Anticipating and adapting to things that "go wrong" and the uncertainty they cause is topic of this session. We look at techniques for quantifying what risks should be considered important to deal with.
Although discouraging, forecasting size, effort, staff and cost is sometimes necessary. Of course we have to do as little of this as possible, but when we do, we have to do it well with the data we have available. Forecasting is made difficult by un-reliable information as inputs to our process – the amount of work is uncertain, the historical data we are basing our forecasts on is biased and tainted, the situation seems hopeless. But it isn't. Good decisions can be made on imperfect data, and this session discusses how. This session shows immediately usable and simple techniques to capture, analyze, cleanse and assess data, and then use that data for reliable forecasting.
Second and hopefully draft of LKCE 2014 talk.
Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - LKCE 2014
1. Get Slides: http://bitly.com/1E9Hh8l
Risk Management and Reliable
Forecasting Using Un-Reliable Data
First Presented at Lean Kanban Central
Europe, Hamburg. November 2014
Troy Magennis Twitter: @t_magennis
6. People
• People are biased
– intentionally and/or un-intentionally
• In order to forecast and manage risk
– We need good expert opinions
– We need to confirm these opinions against reality
– We need to learn from our forecast errors
• Often we get opinions on a fractional
understanding of the eventual problem solved
7 @t_magennis
7. 8
Not Getting Data
(At All or Early Enough)
@t_magennis
8. Getting Reliable Data from People
• Why would people take the time?
– We tell them (rarely works as intended)
– Was politely ask them (works sometimes)
– We make it part of their self-interest (most likely)
• Gamification
• Challenge their view on fairness
• NEVER: Embarrass a team or individual
– you will totally destroy reliable data capture….
9 @t_magennis
9. Strategy 1 – “Gamify” Presentation
Teams
Teams don’t like being “Red”
(default to red; teams will make them green)
10 Interactive charts get attention, vibrant colors for teams with good data
Strategies
Features
Coloring teams in
dull (grey) based on
poor quality data
capture often gets
action.
Make it sexy. Show
how “my” metric
connects to strategy
10. Strategy 2 – Visibility to Decisions
• Operations Reviews! Giving meaning to data!
• Make it clear when data has led to decisions
– “Based on the data and analysis presented, this is clearly
an opportunity we will pursue.”
– “Lets track the first month actuals against the model and
fully invest if it is tracking well.”
• Make it clear when more data would have “won”
• “If I could clearly see the impact of giving you those extra team
members, this would be easy”
• Promote lively debate around data
– React quickly if data presented is gamed or teams
repetitively fail against THEIR models
11 @t_magennis
11. Strategy 3 – Perceived Fairness
• One team gets some “extra” attention based
on an argument supported by data
– Extra resources, More Investment
– More time to demo
• With just a few examples, often there is an
avalanche of willing metric support by others
• Make it clear why the data swayed a decision
12 @t_magennis
13. Checking for Gaming & Errors
• We can ask tougher questions
– What assumptions are built into this forecast?
• Why would we be 2x better than we ever have before?
– Walk me through the logic supporting your analysis
– Looking at historical data, we predict very poorly
when there are 3 or more dependent teams. Have you
considered this?
• We can test for unlikely patterns
– Distribution analysis
– Benford’s Law
14 @t_magennis
14. Evidence of data quality is a
well formed and explainable
distribution shape
Customer: “Our data is crap.
You can’t use any of it”
Throughput per week
15 @t_magennis
15. Distribution Shape & Outliers
• Plot visually using Histogram
• Set a rule: E.g. >10 times the mode? (state it)
16
Mode is 3
50 & 100 are outliers
worth discussion..
@t_magennis
16. Benford’s Law
• Benford's Law, also
called the First-Digit
Law, refers to the
frequency distribution
of digits in many real-life
17
sources of data.
• Know to apply to:
electricity bills, street
addresses, stock prices,
population numbers,
death rates, lengths of
rivers, …, and processes
described by power laws.
Source: Wikipedia
Common in story counts per epics
in software projects. Also probable
in lead time cycle time values.
@t_magennis
17. Benford’s Law Applied to Story Count
• Story count estimate for
48 randomly picked epics
• The frequency of the first
digits was computed
• These were compared to
Benford’s prediction
(green within 1.5%)
18
d
Benford’s
Prediction
P(d)
Actual
Data
P(d)
1 30.1% 31.3%
2 17.6% 18.8%
3 12.5% 20.8%
4 9.7% 8.3%
5 7.9% 8.3%
6 6.7% 8.3%
7 5.8% 0%
8 5.1% 4.2%
9 4.6% 0%
@t_magennis Based on real data n = 48
18. Data Analysis Spreadsheet
https://github.com/FocusedObjective/FocusedObjective.Resources
19
@t_magennis
24. Good Contextual Forecasting
• Know the past
– Track the date of significant company events
• Reorgs, releases, competitor releases,
– Track reference data that may show context
• Staff numbers by date, National Holidays
– Markup all charts and data with context labels
• Consider the future
– What events are likely over the forecast period
– Draw samples considering these contexts
25 @t_magennis
25. Some Context Events…
• Internal differences in team skills
• Any change (Hawthorn Effect)
• Change of Risk Profile
• Unstable WIP
• Poor Quality
• Unstable Test Environment
• Seasons - Vacations
• Executive Re-org
• Natural Disasters
• Exceptional Sickness
• Changes in Staff
• Team Changes
• Location
• Environmental Disturbance
• Moral Shifts
• Process Change
• Architectural Change
• Fatigue (Low Work Moral)
• Change of demand for different classes of service
• Account of Expedites
• Changes in how to measure
• Poor record keeping
• Delivery frequency / cadence
• Org changes / staffing
• Gaming the System
• Mergers and Acquisitions
• Multi tasking
• High attrition rates
• Staff availability due to prod issues
• Critical specialists not available
• Introduce new technology
• Technical architectural changes
• Legal requirements (date fixed)
• Beginning the project
• User stories too large
• Dependency identification
• Technical complexity
• External spot demands
• Changing prioritization
• Expedited work
• External dependencies
• Better coffee
• Relevant training
• Process changes
• Process problem moving tickets
• New management policy
26 @t_magennis
26. 27
Forecasting using poor
estimates from “Experts”
“Uncertain Uncertainty”
@t_magennis
27. Improving Estimates
Stop
• Point estimates
• Ignoring uncertainty
• Thinking it’s easy
• “Never speak of this again”
• Inventing units (points)
• Rewarding gaming
• Tolerating ambiguity
Start
• Using Range estimates
• Expressing Un-certainty
• Train & practice estimation
• Learning with feedback
• Using dollars, time, counts
• Rewarding honesty
• Presenting unbiased data
28 @t_magennis
29. Estimation Training
• How sure you are about guesses?
• This can be practiced
• Calibration – Trivia Game
– Ask a question about a known actual
– Ask people to guess the range
• “True or False: "A hockey puck fits in a golf hole”
• “Confidence: Choose the probability that best
represents your chance of getting this question
right...
50% 60% 70% 80% 90% 100%”
– Disclose the result – 50% (no idea) should
get 50% of the questions right by guess
alone
32 Source: http://en.wikipedia.org/wiki/Calibrated_probability_assessment
30. No Lead Time Data?
• No team yet? No history?
• We need two estimates with probability
– 1 in 5 tasks should take less than 1 day
– 4 in 5 tasks should take less than 5 days
• We need to solve the curve that fits these two
probabilities (and hopefully the others)
33 @t_magennis
32. 35
How do we get experts to
estimate ranges and predict
higher order percentiles
from two estimates?
20% <= 1 Day
(1 in 5)
80% <= 5 Days
(4 in 5)
@t_magennis
33. 36
80% <= 5 Days
20% <= 1 Day
p2 x2
p1 x1
See detailed paper on the mathematics:
http://www.johndcook.com/quantiles_parameters.pdf
?
37. Long Tail Distribution Sampling
Good chance
of Samples
40 @t_magennis
Low chance
of Samples
38. Hard to sample high-end percentiles…
• You find high end quickly for uniform dist.
– 12 samples (50% certain of finding 90% range)
• Not so, for long tail distribution (Eg. Weibull shape: 1.5)
From samples
(likely in practice)
– 88% never found after 1000 trials, avg. 425 if lucky
41 @t_magennis
By Formula
(NOT likely in practice)
39. What is Risk?
42
95% <=
8.29 Days
Big Risks
How can we
identify these?
@t_magennis
41. Contact Details
www.FocusedObjective.com
Download latest software, videos, presentations and articles on
forecasting and applied predictive analytics
Troy.Magennis@focusedobjective.com
My email address for all questions and comments
@t_magennis
Twitter feed from Troy Magennis
44 @t_magennis
42. Do we have to break down EVERY epic to estimate story counts?
CASE STUDY: ESTIMATING TOTAL
STORY COUNT
45 @t_magennis
43. Problem: Getting a high level
time and cost estimate for
proposed business strategy
time and costs
Approach: Randomly sample
epics from the 328 proposed
and perform story breakdown.
Then use throughput history to
estimate time and costs
46 @t_magennis
44. 9
13 13
5
11
47 @t_magennis
Trial 1Trial 2 Trial 100
Sum: 51
1
4
7
5
11
28
35
19
5
13
11
83
…
Number of stories
Sample with replacement
Remember to put the piece of paper
back in after each draw!
45. Epic Breakdown – Sample Count
Facilitated by well known consulting
company, team performed story
breakdown (counts) of epics.
48 (out of 328) epics were analyzed.
48
Actual Sum
Process 50%
CI
262
75%
CI
95%
CI
MC 48 samples 261 282 315
MC 24 samples 236 257 292
MC 12 samples 223 239 266
MC 6 samples 232 247 268
@t_magennis
47. Being < 0 at
MEAN – 1 SD
should be an
indicator
something is
Fibonacci Bias…
Perceived (5) Mathematical (10.5)
1 2 3 5 8 13 … 21
wrong!
Team (3 of 130, 82% Median 5) Median Mean SD
Team A
Process Change Team 5 4.4 3
Team B
UI Software Dev Team 5 5.4 6
Team C
Library Software Dev Team 5 5.7 5.5
50
Question:
What is the
middle value
for this scale?
@t_magennis
49. Paper: Does the use of Fibonacci
numbers in Planning Poker affect
effort estimates?
“Conclusion: The use of a Fibonacci scale, and possibly
other non-linear scales, is likely to affect the effort
estimates towards lower values compared to linear scales.
A possible explanation for this scale-induced effect is that
people tend to be biased towards toward the middle of the
52
provided scale, especially when the uncertainty is
substantial. The middle value is likely to be perceived as
lower for the Fibonacci than for the linear scale.”
R. Tamrakar and M. Jørgensen (2012)
@t_magennis
https://www.simula.no/publications/Simula.simula.1282
50. Really, really, know the question…
• What is the goal or question being asked?
• How is this question answered now?
– Good enough? Is it believed?
– Current cost OK?
• What data would be necessary to answer this
question slightly better?
– Is the cost justified?
– Would the result be more reliable?
53 @t_magennis
54. Correlation != Causation
• Criteria for causality
– The cause precedes the effect in sequence
– The cause and effect are empirically correlated
and have a plausible interaction
– The correlations is not spurious
57 Sources: Kan,2003 pp80 and Babbie, 1986
(HTTP://XKCD.COM/552/ CREATIVE COMMONS ATTRIBUTION-NONCOMMERCIAL 2.5 LICENSE)
@t_magennis