This document discusses integrating technical risk management with decision analysis. It notes that NASA currently manages risks individually without considering overall risk. The document proposes using decision analysis and probabilistic risk assessment to evaluate alternatives based on performance measures related to objectives like safety, cost and schedule. This would allow uncertainty to be considered and provide a more rigorous approach to risk-informed decision making.
1. Integration of Technical Risk
Management with Decision Analysis
Presented at NASA Project Management
Challenge 2007 Conference
Galveston, Texas
February 6-7, 2007
Homayoon Dezfuli, Ph.D., NASA HQ (hdezfuli@nasa.gov)
Robert Youngblood, Ph.D., ISL, Inc. (ryoungblood@islinc.com)
2. Acknowledgement
• Some of the materials in this presentation are based on
methodology developed by the Office of Safety and Mission
assurance in support of the following activities:
– Revision of NASA requirements for system safety (NPR 8715.3A
dated September 12, 2006)
– Revision of NASA System Engineering Handbook (to be released
June 2007)
– Development of a new NASA standard entitled “Risk-informed
Management of Safety and Mission Success” (planned for release
before the end of fiscal year 2007)
2
3. Why Integrate Technical Risk Management with
Decision Analysis?
Excerpt from 2006 Third Quarterly Aerospace Safety Advisory Panel
(ASAP) Report
“With regards to risk assessments that are being made to support launch
decisions, it appears that a series of fragmented, non standardized tools and
methodologies are in use. The result is that risk recommendations to senior
management concerning individual hazards effecting launch are sometimes
made in isolation without consideration of overall launch risk. For example, the
most recent Shuttle launch focused heavily on two of the 569 potentially
catastrophic hazards currently known to exist, without any assessment of the
overall likelihood of such a catastrophic failure. A lack of confidence in the
technical basis for the assessments also appears to sometimes exist and
variations in risk matrix definitions among programs has been observed. Lastly,
only limited guidance is available concerning agency policies on what risks
should be accepted under what conditions. The ASAP recommends that a
comprehensive risk assessment, communication and acceptance process be
implemented to ensure that overall launch risk is considered in an integrated
and consistent manner. The process should be sound, mature, consistently
implemented to yield high confidence and consistent results that are generally
accepted by the majority of the community.”
3
4. Key Points
• Decision analysis can be a powerful tool for aligning
decisions with program objectives, given the
current state of knowledge and the decision-maker’s
preferences
• Modern risk analysis methodology can significantly
improve on the one-at-a-time management of risk
issues criticized by the ASAP report
• Therefore:
– Perform Technical Risk Management within a decision
analysis framework
– Use modern risk analysis methodology, including
probabilistic risk assessment (PRA)
– Retain the track, control, and accountability strengths of
the Continuous Risk Management (CRM) Process
4
5. Outline
• Background
• State of Practice of Risk Management (RM) at NASA
• An Example of How Risks are Handled in Decision Processes
• Need for a More Rigorous Approach to Inform Risk
Management Decisions
• An Analytical Framework to Inform Risk Management
Decisions
• The Role of Probabilistic Risk Assessment (PRA) in Risk
Management
• PRA Methodology Synopsis
• Summary
5
6. Background
• NASA is embarking on a comprehensive space exploration
program
• The goals of the exploration program are “safe, sustained,
affordable human and robotic exploration of the Moon, Mars,
and beyond ... for less than one percent of the federal budget”
• Meeting these goals requires development of a constellation of
new systems
• The design and development of these systems will involve
many decisions that require weighting/trading various
competing programmatic and technical considerations against
one another
6
7. Programmatic and Technical Considerations
that Should be Factored into Our Decisions
• Affordability
– Meet budget constraints related to design and development cost,
technology development cost, operation cost, and facility cost
• Mission Technical Objectives and Performance
– Accomplish technical objectives in a sustainable manner and on
schedule
– Enhance effectiveness and performance
• Safety
– Protect the health of public and workforce, the environment and
mission assets
• Stakeholder Expectations
– Meet needs of intra-agency and extra-agency stakeholders
7
8. The Role of Risk Management
• Purpose of risk management is to promote program success
– Incorporating risk-informed decisionmaking in the design and
formulation of the program baseline
– Proactive identification and control of departures from the
program baseline
• An effective and proactive RM process should
– Provide input to determine the preferred decision alternative in
light of programmatic objectives
– Assess risk associated with implementation of the selected
alternative
– Assist in setting resource priorities (including prioritization of
work to resolve uncertainties if warranted)
– Plan, track, and control risk during the implementation of the
selected alternative
– Iterate with previous steps in light of new information
8
10. Continuous Risk Management (CRM) Process
• Identify – Identify program risk “issues”
Identify
rol
• Analyze – Estimate the likelihood and
Anal
Cont
consequence components of the risk Communicate and
yze
issues Document
Tra
• Plan – Plan the Track and Control actions ck
n
Pla
• Track – Track and compile the necessary
risk data, measuring how the CRM process
is progressing
Current emphasis on
• Control – Determine the appropriate
Control action, execute the decision linked • “management” of individual
to that action, and verify its effectiveness “risks,” given a decision already
made somewhere else
• Communicate and Document –
communicating and documenting all risk • monitoring and accountability for
information throughout each program action items associated with
phase “risks”
10
11. Risk Matrix Paradigm
Iso-risk contours
LIKELIHOOD
These two points
have the same
expected risk
• Risk “issues” are mapped onto the matrix individually
– Interaction between risks is not considered
– Total risk from all “issues” is not evaluated
– Highly subjective; uncertainties are not formally accounted for
– Unsuitable for combining risks to obtain aggregate risk
• Risk tolerance boundaries are defined based on iso-risk contours
– Expected consequences (probability times consequences) do not adequately inform
decisions relating to safety
• Limited ability to support risk-trade studies
11
12. An Example of How Risks are Handled in Decision
Processes
12
13. STS-121 (OV-103 Discovery) Contingency
Planning
Decision Alternatives
1. Crew Rescue:
– Success of contingency
(rescue) flight (STS-300)
– Success of Contingency
Shuttle Crew Support (CSCS)
capability (on-orbit safe
haven)
2. Repair/Entry:
– Successful application of
patching materials (STA-54)
to repair damaged heat shield
– Vehicle integrity during
reentry (no downstream
heating damage) Source of Figure: Shuttle Program, NASA-JSC
– Successful approach and
landing (operations)
13
14. Pros and Cons Associated with Decision
Alternative 1 (Data Source: Shuttle Program)
STS-300
CSCS
OBSERVATIONS
Adverse consequences of Interest are:
• Loss of crew
• Loss of vehicle (OV-103)
• Evacuation of station
It appears that Low (L), Medium (M), and High
(H) are expressions of degree of belief in the
truth of the proposition
14
15. Pros and Cons Associated with Decision
Alternative 2 (Data Source: Shuttle Program)
Materials
OBSERVATIONS
• Additional adverse consequence of
interest: Public death or injury
• Acknowledgment of uncertainties
(imperfect models and a limited
state of knowledge)
Vehicle Integrity
Operations
15
16. Which Decision Alternative is More Desirable?
Source of Matrices: Shuttle Program, NASA-JSC
COMMENTS
• The shuttle program uses the above matrix to assign a level of risk to a hazard based on its likelihood and
consequence severity
• Catastrophic: hazard could result in a mishap causing fatal injury to personnel and/or loss of one or more
major elements of the flight vehicle or ground facility
• Infrequent: Could happen in the life of the program. Controls have significant limitations or uncertainties
• The matrix is used here to analyze decisions
• The definition of “catastrophic” does not discriminate between human safety and asset safety (defined as an
inclusive consequence severity level)
16
17. Need for a More Rigorous Approach to Inform
Risk Management Decisions
• Characteristics of our Decision Situations
– They are complex: Need a systematic framework to sort
things out
– Must deal with uncertainty: Need to assess uncertainty
analytically, know when it is necessary to reduce it
– Must deal with multiple Objectives: Need to employ analytic
decision techniques to handle competing priorities (apples
vs. oranges)
17
18. An Analytic Framework to Inform Risk
Management Decisions
• Model consequences of decision alternatives in terms of
impact on fundamental objectives
– Use performance Measures (PMs) as metrics to characterize
performance of the decision alternatives with respect to a
particular fundamental objective. Examples:
• PM for crew safety is the probability of loss of crew
• PM for space asset safety is the probability of loss of space
vehicle
• PM for public safety is the probability of public death or injury
• PM for station occupancy is the probability of evacuation
– Inclusion of uncertainties is critical in evaluating PMs
• Compare the consequences of decision alternatives on the
PMs
• Collectively consider all PMs and associated uncertainties to
inform risk management decisions (deliberation has an
important role here)
18
19. Decision Analysis
Formulation of
Objectives Hierarchy and
Performance Measures
(PMs)
Input from Technical and Programmatic Processes
Proposing and or
Identifying Decision
Alternatives
Risk Analysis of
Decision Alternatives,
Performing Trade
Studies, & Quantification
of Performance Index
Deliberation and
Recommendation on
Decision Alternative
Decision Making and
Implementation of
Decision Alternative
Tracking and Controlling
Performance Deviations
19
20. Integration of CRM Process with Decision
Analysis
Probabilistic Risk Decision Analysis
Assessment Plays a
Major Role in this Formulation of
Objectives Hierarchy and
Activity Performance Measures
(PMs)
See Chart 21
Input from Technical and Programmatic Processes
Proposing and or
Identifying Decision
Alternatives
Continuous Risk
Management (CRM) Process Risk Analysis of
Decision Alternatives,
Performing Trade
Studies, & Quantification
of Performance Index
See Chart 24
Identify
rol
Analy
Cont
Communicate, Deliberation and
Deliberate, and
ze
Recommendation on
Document Decision Alternative
See Chart 26
Tra
ck Decision Making and
n
Pla Implementation of
Decision Alternative
Tracking and Controlling
Performance Deviations
See Chart 27
20
21. Formulation of Objective Hierarchy and
Performance Measures (PMs)
Mission Success
Objectives
Technical Objectives Other
Hierarchy Affordability Safety
and Performance Stakeholders
Support
Enhance Provide Protect Protect Realize Other
Meet Budget Meet Achieve Mission Protect
Effectiveness/ Extensibility/ Workforce & Mission and Stakeholders
Constraints Schedules Critical Functions Environment
Performance Supportability Public Health Public Assets Expectations
Design/ Loss of Astronauts Loss of
Schedule Mass/ Cargo Commercial Earth Public
Development Mission Death or Flight
Slippage Capacity Extensibility Contamination Support
Performance Cost Overrun Function x Injury Systems
Measures
(PMs)
Representative
PMs are Loss of Planetary Loss of Science
Operation …….. …….. Reliability/ Public Death
Shown Support Contamination Public Community
Cost Overrun Availability or Injury
Capability x Property Support
Focus of Technical Risk Assessment
Economics and Schedule Models Models Quantifying Performance Measures (PMs) Stakeholder
Model-Based Models
Models to Assess Life Cycle Cost and
Analysis of Models Quantifying Capability Metrics Relating to Mission Models Quantifying Metrics Relating to Frequency
Schedule Performance
Performance Requirements (Mass, Thrust, Cargo Capacity, …) or Probability of Failure to Meet High-Level Safety
Measures Objectives (e.g., PRA)
Models Quantifying Metrics Relating to Probability of
Loss Mission Critical System / Function (e.g., PRA)
Decision
Alternative
21
22. PMs of Interest to the Decision Situation
Discussed Earlier
Mission Success
Objectives
Technical Objectives Other
Hierarchy Affordability Safety
and Performance Stakeholders
Support
Protect Protect
Meet Achieve Mission
Workforce & Mission and
Schedules Critical Functions
Public Health Public Assets
Loss of Astronauts
Schedule Loss of
Mission Death or
Slippage Vehicles(s)
Function x Injury
Performance
Measures
(PMs)
Public Death Station
or Injury Evacuation
Amenable to PRA
22
23. PMs of Interest to NASA's Exploration Systems
Architecture Study (ESAS)
In ESAS study, the
PMs were referred
to as “Figures of
Merit (FOMs)”
Source: ESAS Report
23
24. Risk Analysis of Decision Alternatives
Examples of Decisions
Architecture A vs. Architecture B vs. Architecture C
Technology A vs. Technology B
Making changes to existing systems
Extending the life of existing systems
Changing requirements
Responding to operational occurrences in real time Deliberation and
Prioritization
Contingency Plan A vs. Contingency Plan B
Ranking / Selection of
Launch or No Launch Preferred Alternative
Decision
Alternatives
For Analysis Identify Qualitative
Risk and PM
Analyze Techniques Results
Available Techniques
Spectrum of
Scoping & Is the
Determination of Preliminary
Ranking / Yes
Methods To Be Risk & PM
Comparison
Used Results
Robust?
No
Identify Quantitative Cost-
Analyze No
Techniques Beneficial
to Reduce
Uncertainty?
Risk Analysis
Techniques Yes
PRA Iteration
Techniques
Additional Uncertainty Reduction If Necessary Per Stakeholders
24
25. Implications of Uncertainty in Comparing
Alternatives
PDF(PM(Alt 1)) & PDF(PM(Alt2))
PDF(PM(Alt1)) & PDF(PM(Alt2))
3
PDF(Performance
PDF(Performance
2.5 1
0.8
Measure)
Measure
2
Alternative 1 0.6 Alternative 1
1.5
Alternative 2 0.4 Alternative 2
1
0.2
0.5
0
0
0 1 2 3 4
0 1 2 3 4
Good Bad Good Performance Measure Bad
Performance Measure
PDF (PM (Alt X)) : Probability Density Function of Performance Measure associated with Decision Alternative X
Alt 2 is clearly better than Alt 1 Alt 1 is slightly better on average, but
uncertainty in performance implies
Assuming that the uncertainty some probability that Alt 2 is actually
distributions reflect the actual state better than Alt 1
of knowledge, it is very unlikely that
uncertainty reduction will improve Uncertainty reduction will improve
the decision the decision
25
26. Deliberation and RM Planning
Need for Need to
Risk & TPM Present Preliminary No No
Additional Refine / Adjust
Results to
Results Uncertainty Decision
Stakeholders
Reduction? Alternatives?
Yes Yes
Analysis
• Deliberation Selection
– To make use of collective wisdom to Furnish Recommendation to
Decision-Maker
promote selection of the best Capture Basis
Decision Making and
Implementation of
alternative for actual Develop / Update Risk Decision Alternative
Management Plan
implementation Refine Metrics and Develop
Monitoring Thresholds
• RM Planning
– Identified and analyzed risks are
managed in one of four possible
actions (Mitigate, Research, Watch,
or Accept)
– A portfolio of observables and
thresholds needs to be identified
• Measurable (or calculable)
parameters
• provides timely indication of
performance issues
– RM Protocols are determined
– Responsibility for Tracking
activities is assigned
26
27. Monitoring Performance Deviations
Decision Making and
Implementation of
Decision Alternative
Performance Monitoring and Feedback / Control
• Detection of significant Track Control
deviations from program intent Trend Performance
Based on PMs and
Intervene if
Performance
in a timely fashion, without Thresholds or Other
Decision Rules
Thresholds Exceeded
over-burdening the program
• Mitigation/Control of risk when Identify
Iterate Continuous
a performance threshold is Metrics & Thresholds from “Plan” Risk Management
Process
exceeded
– A perceived insignificant risk
“issue” becomes significant
– Emergence of new risk “issues”
27
28. The Role of Probabilistic Risk Assessment (PRA)
in Risk Management
28
29. PRA can be a Powerful Tool for Risk
Management
• Quantifies goal-related performance measures
– Probability of Loss of Crew (LOC) versus Reliability Analysis of
sub-systems
– PRA metrics are integral risk metrics as opposed to limited
metrics such as sub-system reliability
• Captures dependences and other relationships between sub-
systems
• Works within a scenario-based concept of risk that best
informs decision-making
– Identifies contributing elements (initiating events, pivotal events,
basic events)
– Quantifies the risk significance of contributing elements, helping
focus on where improvements will be effective
– Provides a means of re-allocating analytical priorities according to
where the dominant risk contributors appear to be coming from
– Provides a framework for a monitoring / trending program to
detect risk-significant adverse trends in performance
29
30. PRA can be a Powerful Tool for Risk
Management (cont.)
• Quantifies uncertainty and ranks contributors to uncertainty
• Supports trade studies by quantifying
– The most goal-related metrics
– System interfaces and dependencies
– Responses to system and function challenges
– Effects of varying performance levels of different systems
30
31. PRA Methodology Synopsis
Understanding of Consequence of Interest Master Logic Diagram (Hierarchical Structure) Event Sequence Diagram (Inductive Logic)
(Performance Measures (PMs)) to Decision- maker
End State: ES1 IE A B End State: OK
End State: ES2
End State: ES2
End State: ES2
C D E End State: ES1
End State: ES2
Modeling of Pivotal Events Using Techniques such as Modeling of Basic Events Using Various Techniques
Event Tree (Inductive Logic)
Fault Tree (Some Examples are Shown Below)
End Not A
IE A B C D E
State
1
Logic Gate Q(km ) = α Q
1: OK Basic Event ⎛ m - 1⎞ k t
⎜ ⎟
2: ES1
⎝ k - 1⎠
3: ES2
4: ES2
5: ES2
Prf ( t ) = 1 − e − λt
6: ES2
Link to another fault tree
Probabilistic Modeling of Basic Events Communicating Risk Results and Insights to
Model Integration and Quantification
30 50 Decision-maker
60
25 40
50
20 100 End State: ES2
40 30
15
30
20 10 20 80 Reporting PMs and associated uncertainties
5 10 End State: ES1
10
60 Ranking of risk scenarios
0.01 0.02 0.03 0.04 0.02 0.04 0.06 0.08 0.02 0.04 0.06 0.08 Quantification of PMs Ranking of risk significance contributors (e.g.,
40
and propagation of hardware failure, human errors, etc.)
Examples:
Probability that the hardware x fails when needed
20
epistemic uncertainties Insights into how various systems interact
Probability of nozzle burn-through 0.01 0.02 0.03 0.04 0.05 Tabulation of key modeling assumptions
Probability of common cause failure of two redundant devices
Identification of key contributors to uncertainty
The uncertainty in occurrence frequency of an event Proposing candidate risk reduction strategies
is characterized by a probability distribution
31
32. Integrated Nature of PRA Addresses Some
ASAP Concerns
• Some of the pros and cons of the example discussed earlier
would have been modeled explicitly in an integrated risk
analysis
• Performance uncertainties are included explicitly
• Treating the issues as consistently as possible, within an
integrated framework making use of all available information,
is an improvement over mapping each issue separately into a
5x5 and trying to trade them off against each other
32
33. Summary
• The proposed RM approach is based on an analytic-
deliberative decision-making methodology
• Embeds current Continuous Risk Management (CRM) process
in a broader decision analysis framework
• It is analytical because
– It promotes analyses of the consequences of decision options in
terms of PMs relating to program fundamental objectives
– It promotes the use of analytical methods rather than judgment,
wherever the methods are practical
– It promotes systematic incorporation of decision maker’s
preferences (values) into decision-alternative ranking process
• It is deliberative because
– Allows the consideration of elements that have not been captured
by the formal analysis
– Provides an opportunity to scrutinize the modeling assumptions
of the analysis and the relevant uncertainties
33
34. Acronyms
• Alt. Decision Alternative
• Cat. Catastrophic
• CDRA Carbon Dioxide Removal Assembly
• ASAP Aerospace Safety Advisory Panel
• Crit. Critical
• CRM Continuous Risk Management
• CSCS Contingency Shuttle Crew Support
• ECLSS Environmental Control and Life Support Systems
• ESAS Exploration Systems Architecture Study
• EVA Extravehicular Activity
• FOM Figure of Merit
• HAC Heading Alignment Circle
• IE Initiating Event
• ISS International Space Station
• LiOH Lithium Hydroxide
• LOC Loss of Crew
• Marg. Marginal
• PDF Probability Density Function
• PRA Probabilistic Risk Assessment
• PM Performance Measure
• RM Risk Management
• STA-54 Shuttle Tile Ablator 54
• STS Space Transportation System
34