Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - LKCE 2014

Get Slides: http://bitly.com/1E9Hh8l
Risk Management and Reliable
Forecasting Using Un-Reliable Data
First Presented at Lean Kanban Central
Europe, Hamburg. November 2014
Troy Magennis Twitter: @t_magennis

2
Don’t Follow the Light
@t_magennis

Question Current Approaches to…
3

Sources of
Forecast
Risk
Work
Throughput
Dependencies
4

People
• People are biased
– intentionally and/or un-intentionally
• In order to forecast and manage risk
– We need good expert opinions
– We need to confirm these opinions against reality
– We need to learn from our forecast errors
• Often we get opinions on a fractional
understanding of the eventual problem solved
7 @t_magennis

8
Not Getting Data
(At All or Early Enough)
@t_magennis

Getting Reliable Data from People
• Why would people take the time?
– We tell them (rarely works as intended)
– Was politely ask them (works sometimes)
– We make it part of their self-interest (most likely)
• Gamification
• Challenge their view on fairness
• NEVER: Embarrass a team or individual
– you will totally destroy reliable data capture….
9 @t_magennis

Strategy 1 – “Gamify” Presentation
Teams
Teams don’t like being “Red”
(default to red; teams will make them green)
10 Interactive charts get attention, vibrant colors for teams with good data
Strategies
Features
Coloring teams in
dull (grey) based on
poor quality data
capture often gets
action.
Make it sexy. Show
how “my” metric
connects to strategy

Strategy 2 – Visibility to Decisions
• Operations Reviews! Giving meaning to data!
• Make it clear when data has led to decisions
– “Based on the data and analysis presented, this is clearly
an opportunity we will pursue.”
– “Lets track the first month actuals against the model and
fully invest if it is tracking well.”
• Make it clear when more data would have “won”
• “If I could clearly see the impact of giving you those extra team
members, this would be easy”
• Promote lively debate around data
– React quickly if data presented is gamed or teams
repetitively fail against THEIR models
11 @t_magennis

Strategy 3 – Perceived Fairness
• One team gets some “extra” attention based
on an argument supported by data
– Extra resources, More Investment
– More time to demo
• With just a few examples, often there is an
avalanche of willing metric support by others
• Make it clear why the data swayed a decision
12 @t_magennis

13
Uncertain Data Quality
@t_magennis

Checking for Gaming & Errors
• We can ask tougher questions
– What assumptions are built into this forecast?
• Why would we be 2x better than we ever have before?
– Walk me through the logic supporting your analysis
– Looking at historical data, we predict very poorly
when there are 3 or more dependent teams. Have you
considered this?
• We can test for unlikely patterns
– Distribution analysis
– Benford’s Law
14 @t_magennis

Evidence of data quality is a
well formed and explainable
distribution shape
Customer: “Our data is crap.
You can’t use any of it”
Throughput per week
15 @t_magennis

Distribution Shape & Outliers
• Plot visually using Histogram
• Set a rule: E.g. >10 times the mode? (state it)
16
Mode is 3
50 & 100 are outliers
worth discussion..
@t_magennis

Benford’s Law
• Benford's Law, also
called the First-Digit
Law, refers to the
frequency distribution
of digits in many real-life
17
sources of data.
• Know to apply to:
electricity bills, street
addresses, stock prices,
population numbers,
death rates, lengths of
rivers, …, and processes
described by power laws.
Source: Wikipedia
Common in story counts per epics
in software projects. Also probable
in lead time cycle time values.
@t_magennis

Benford’s Law Applied to Story Count
• Story count estimate for
48 randomly picked epics
• The frequency of the first
digits was computed
• These were compared to
Benford’s prediction
(green within 1.5%)
18
d
Benford’s
Prediction
P(d)
Actual
Data
P(d)
1 30.1% 31.3%
2 17.6% 18.8%
3 12.5% 20.8%
4 9.7% 8.3%
5 7.9% 8.3%
6 6.7% 8.3%
7 5.8% 0%
8 5.1% 4.2%
9 4.6% 0%
@t_magennis Based on real data n = 48

Data Analysis Spreadsheet
https://github.com/FocusedObjective/FocusedObjective.Resources
19
@t_magennis

21
Forecasting using data without
considering context
@t_magennis

Throughput Trend by Week
22
0
200
400
600
800
1000
1200
1400
1600
W2-2012
W5-2012
W8-2012
W11-2012
W14-2012
W17-2012
W20-2012
W23-2012
W26-2012
W29-2012
W32-2012
W35-2012
W38-2012
W41-2012
W44-2012
W47-2012
W50-2012
W53-2012
W2-2013
W5-2013
W8-2013
W11-2013
W14-2013
W17-2013
W20-2013
W23-2013
W26-2013
W29-2013
W32-2013
W35-2013
W38-2013
W41-2013
W44-2013
W47-2013
W50-2013
W53-2013
W3-2014
W6-2014
W9-2014
W12-2014
W15-2014
W18-2014
W21-2014
All Enabling Spec Bugs NFRs
@t_magennis

23
0
200
400
600
800
1000
1200
1400
1600
W2-2012
W5-2012
W8-2012
W11-2012
W14-2012
W17-2012
W20-2012
W23-2012
W26-2012
W29-2012
W32-2012
W35-2012
W38-2012
W41-2012
W44-2012
W47-2012
W50-2012
W53-2012
W2-2013
W5-2013
W8-2013
W11-2013
W14-2013
W17-2013
W20-2013
W23-2013
W26-2013
W29-2013
W32-2013
W35-2013
W38-2013
W41-2013
W44-2013
W47-2013
W50-2013
W53-2013
W3-2014
W6-2014
W9-2014
W12-2014
W15-2014
W18-2014
W21-2014
@t_magennis

24
0
200
400
600
800
1000
1200
1400
1600
W2-2012
W5-2012
W8-2012
W11-2012
W14-2012
W17-2012
W20-2012
W23-2012
W26-2012
W29-2012
W32-2012
W35-2012
W38-2012
W41-2012
W44-2012
W47-2012
W50-2012
W53-2012
W2-2013
W5-2013
W8-2013
W11-2013
W14-2013
W17-2013
W20-2013
W23-2013
W26-2013
W29-2013
W32-2013
W35-2013
W38-2013
W41-2013
W44-2013
W47-2013
W50-2013
W53-2013
W3-2014
W6-2014
W9-2014
W12-2014
W15-2014
W18-2014
W21-2014
High Volatility
Decline?
Restructure?
Training? Coaches added
end of year break
@t_magennis

Good Contextual Forecasting
• Know the past
– Track the date of significant company events
• Reorgs, releases, competitor releases,
– Track reference data that may show context
• Staff numbers by date, National Holidays
– Markup all charts and data with context labels
• Consider the future
– What events are likely over the forecast period
– Draw samples considering these contexts
25 @t_magennis

Some Context Events…
• Internal differences in team skills
• Any change (Hawthorn Effect)
• Change of Risk Profile
• Unstable WIP
• Poor Quality
• Unstable Test Environment
• Seasons - Vacations
• Executive Re-org
• Natural Disasters
• Exceptional Sickness
• Changes in Staff
• Team Changes
• Location
• Environmental Disturbance
• Moral Shifts
• Process Change
• Architectural Change
• Fatigue (Low Work Moral)
• Change of demand for different classes of service
• Account of Expedites
• Changes in how to measure
• Poor record keeping
• Delivery frequency / cadence
• Org changes / staffing
• Gaming the System
• Mergers and Acquisitions
• Multi tasking
• High attrition rates
• Staff availability due to prod issues
• Critical specialists not available
• Introduce new technology
• Technical architectural changes
• Legal requirements (date fixed)
• Beginning the project
• User stories too large
• Dependency identification
• Technical complexity
• External spot demands
• Changing prioritization
• Expedited work
• External dependencies
• Better coffee
• Relevant training
• Process changes
• Process problem moving tickets
• New management policy
26 @t_magennis

27
Forecasting using poor
estimates from “Experts”
“Uncertain Uncertainty”
@t_magennis

Improving Estimates
Stop
• Point estimates
• Ignoring uncertainty
• Thinking it’s easy
• “Never speak of this again”
• Inventing units (points)
• Rewarding gaming
• Tolerating ambiguity
Start
• Using Range estimates
• Expressing Un-certainty
• Train & practice estimation
• Learning with feedback
• Using dollars, time, counts
• Rewarding honesty
• Presenting unbiased data
28 @t_magennis

http://ccnss.org/materials/pdf/sigman/callibration_probabilities_li
chtenstein_fischoff_philips.pdf
31
@t_magennis

Estimation Training
• How sure you are about guesses?
• This can be practiced
• Calibration – Trivia Game
– Ask a question about a known actual
– Ask people to guess the range
• “True or False: "A hockey puck fits in a golf hole”
• “Confidence: Choose the probability that best
represents your chance of getting this question
right...
50% 60% 70% 80% 90% 100%”
– Disclose the result – 50% (no idea) should
get 50% of the questions right by guess
alone
32 Source: http://en.wikipedia.org/wiki/Calibrated_probability_assessment

No Lead Time Data?
• No team yet? No history?
• We need two estimates with probability
– 1 in 5 tasks should take less than 1 day
– 4 in 5 tasks should take less than 5 days
• We need to solve the curve that fits these two
probabilities (and hopefully the others)
33 @t_magennis

http://bit.ly/1tC1Phy
• Why lead time is Weibull, Why you care…
34 @t_magennis

35
How do we get experts to
estimate ranges and predict
higher order percentiles
from two estimates?
20% <= 1 Day
(1 in 5)
80% <= 5 Days
(4 in 5)
@t_magennis

36
80% <= 5 Days
20% <= 1 Day
p2 x2
p1 x1
See detailed paper on the mathematics:
http://www.johndcook.com/quantiles_parameters.pdf
?

https://github.com/FocusedObjective/FocusedObjective.Resources
37
Excel Formula: =(LN(-LN(1-p2_param))-LN(-LN(1-
p1_param)))/(LN(x2_param) -LN(x1_param))
=x1_param/(POWER((-LN(1-p1_param)),(1/Shape_result)))
=Scale_result*POWER(-LN(1-A27),1/Shape_result)

38
Missing HUGE delays and
workload beyond the 95th
Percentile
@t_magennis

39
http://connected-knowledge.com/

Long Tail Distribution Sampling
Good chance
of Samples
40 @t_magennis
Low chance
of Samples

Hard to sample high-end percentiles…
• You find high end quickly for uniform dist.
– 12 samples (50% certain of finding 90% range)
• Not so, for long tail distribution (Eg. Weibull shape: 1.5)
From samples
(likely in practice)
– 88% never found after 1000 trials, avg. 425 if lucky
41 @t_magennis
By Formula
(NOT likely in practice)

What is Risk?
42
95% <=
8.29 Days
Big Risks
How can we
identify these?
@t_magennis

43
The RISK is out there…
@t_magennis

Contact Details
www.FocusedObjective.com
Download latest software, videos, presentations and articles on
forecasting and applied predictive analytics
Troy.Magennis@focusedobjective.com
My email address for all questions and comments
@t_magennis
Twitter feed from Troy Magennis
44 @t_magennis

Do we have to break down EVERY epic to estimate story counts?
CASE STUDY: ESTIMATING TOTAL
STORY COUNT
45 @t_magennis

Problem: Getting a high level
time and cost estimate for
proposed business strategy
time and costs
Approach: Randomly sample
epics from the 328 proposed
and perform story breakdown.
Then use throughput history to
estimate time and costs
46 @t_magennis

9
13 13
5
11
47 @t_magennis
Trial 1Trial 2 Trial 100
Sum: 51
1
4
7
5
11
28
35
19
5
13
11
83
…
Number of stories
Sample with replacement
Remember to put the piece of paper
back in after each draw!

Epic Breakdown – Sample Count
Facilitated by well known consulting
company, team performed story
breakdown (counts) of epics.
48 (out of 328) epics were analyzed.
48
Actual Sum
Process 50%
CI
262
75%
CI
95%
CI
MC 48 samples 261 282 315
MC 24 samples 236 257 292
MC 12 samples 223 239 266
MC 6 samples 232 247 268
@t_magennis

PROBLEMS WITH NON-LINEAR
SCALES
49 @t_magennis

Being < 0 at
MEAN – 1 SD
should be an
indicator
something is
Fibonacci Bias…
Perceived (5) Mathematical (10.5)
1 2 3 5 8 13 … 21
wrong!
Team (3 of 130, 82% Median 5) Median Mean SD
Team A
Process Change Team 5 4.4 3
Team B
UI Software Dev Team 5 5.4 6
Team C
Library Software Dev Team 5 5.7 5.5
50
Question:
What is the
middle value
for this scale?
@t_magennis

Normal?
51
Expect
~50%
Expect
~15%
Expect
~35%
@t_magennis

Paper: Does the use of Fibonacci
numbers in Planning Poker affect
effort estimates?
“Conclusion: The use of a Fibonacci scale, and possibly
other non-linear scales, is likely to affect the effort
estimates towards lower values compared to linear scales.
A possible explanation for this scale-induced effect is that
people tend to be biased towards toward the middle of the
52
provided scale, especially when the uncertainty is
substantial. The middle value is likely to be perceived as
lower for the Fibonacci than for the linear scale.”
R. Tamrakar and M. Jørgensen (2012)
@t_magennis
https://www.simula.no/publications/Simula.simula.1282

Really, really, know the question…
• What is the goal or question being asked?
• How is this question answered now?
– Good enough? Is it believed?
– Current cost OK?
• What data would be necessary to answer this
question slightly better?
– Is the cost justified?
– Would the result be more reliable?
53 @t_magennis

Import/Cleaning Tools
Importing
Normalizing
Imputing
Estimating missing values
Visualization
Re-runnable /
Automation
Machine Learning
54
@t_magennis

55 Spurious Correl@att_imoangsen: nhisttp://tylervigen.com/

56 Spurious Correl@att_imoangsen: nhisttp://tylervigen.com/

Correlation != Causation
• Criteria for causality
– The cause precedes the effect in sequence
– The cause and effect are empirically correlated
and have a plausible interaction
– The correlations is not spurious
57 Sources: Kan,2003 pp80 and Babbie, 1986
(HTTP://XKCD.COM/552/ CREATIVE COMMONS ATTRIBUTION-NONCOMMERCIAL 2.5 LICENSE)
@t_magennis

Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - LKCE 2014

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - LKCE 2014

Semelhante a Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - LKCE 2014 (20)

Último

Último (20)

Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - LKCE 2014