SlideShare a Scribd company logo
1 of 26
Why do we fail?
And how do we stop doing that?
Major cause of incidents
    is human factor
               -- folk knowledge
Outages become less frequent and shorter in duration as data centers increase in size. The
smaller the data center the longer and more common the outages.

Root causes and responses by organizations

Eighty percent of respondents know the root cause of the unplanned outage. The most frequently
cited root causes of data center outages are: UPS battery failure (65 percent), UPS capacity
exceeded (53 percent), accidental EPO/human error (51 percent) and UPS equipment failure (49
percent). Most common responses to unplanned outages are to repair, replace or purchase
additional IT or infrastructure equipment (60 and 56 percent, respectively) followed by contacting
the equipment vendor for support (51 percent).

Fifty-seven percent believe all or most of the unplanned outages could have been prevented. The
most common prevention tactics to avoid downtime are investing in improved equipment (50
percent), increasing the budget and staff of the data center (34 and 20 percent, respectively),
improving infrastructure design/incorporating redundant components (19 and 18 percent,
                                                           National Survey on Data
respectively) as well as performing preventative maintenance of critical infrastructure (16
percent).
                                                           Center Outages

       National Survey on Data Sponsoredconducted by Ponemon Institute
                               Independently
                                              by Emerson Network Power
                                                                                              LLC


       Center Outages          Publication Date: 30 September 2010




      Sponsored by Emerson Network Power
      Independently conducted by Ponemon Institute LLC
      Publication Date: 30 September 2010
                                                         Ponemon Institute© Research Report
marine accident
year project to     Common patterns that were found included:
marine accidents,   •      Human error continues to be the dominant factor in
yze the contents.          maritime accidents, contributing toABS to 85 percentPAPERS 2006
                                                               80 TECHNICAL
er understand the          of accidents.
                            •       Based on USCG data, for all accidents over the                                                        efficient approach for in
                                    reporting period of 1999 to 2001, approximately 80                                                    documenting marine inciden
                                    to 85% of the accidents analyzed involved human                                                      1developed the MaRCAT m
                                    error. Of these, about 50% of maritime accidents                                                      and combining the best te
                                    were initiated by human error, and another 30% of                                                     proving and improving the
                                    were associated with human error.                                                                     MaRCAT’s application duri
                            •       In MARS reports (voluntary mariner self reporting                                                     The ABS MaRCAT approa
                                    of accident and near misses), mariners note human                                                     caters to the unique need
                                    error in the majority of reports, and chiefly attribute                                               including human element;
                                    accidents and near misses to: lack of competence,                                                     structural and security conc
                                    knowledge and ability; human fatigue; workload;                                                       ABS MaRCAT approach are
                                    manning; complacency, and; risk tolerance.
                                                                                                                                          •   Assist clients with inve
                            •       USCG data on offshore pollution events in                                                                 (e.g., groundings, coll
                                    California suggests that 46% are caused or                                                                incidents (minor to maj
                                    associated with human error.                                                                              to their vessels and facil
                            •       For accidents associated with pollution events in the
                                                                                                                                          •   Allow analysis of losses
                                    State of California, accident causes are chiefly
                                                                                                                                              the environment, human
                                    attributed to failures of situation awareness (94%).
                                                                                                                                              reliability, quality or bus
                            •      Among all human error types classified in numerous
                                                                ABS TECHNICAL PAPERS 2006

                                   databases and libraries of accident reports, failures                                                  •   Provide ABS clients wit
                        TRENDING THE CAUSES OF MARINE INCIDENTS
                                   of situation awareness and situation assessment                                                            incident investigators in
                                   overwhelmingly predominate, being a causal factor
                        D.B. McCafferty, American Bureau of Shipping, USA                                                                     analyses and in provid
                        C.C. Baker, American Bureau of Shipping, USA
                                   in about 45% (offshore) to about 70% (ships) of the                                                        identifying, documentin
                                   recorded accidentsFrom Marine Incidents 3 Conference held in London,(see
                                         Presented at the Learning
                                                                     associated with human error
                              January 25-26, 2006, and reprinted with the kind permission of the Royal Institution of Naval Architects
                                                                                                                                              accidents and near misse
                        SUMMARY
                                   Figure 1, below).                                                                                      •   Assist    and    facilitate
more sources?
Other industries have
   figured it out
• Aviation
• Medicine
• Nuclear powerplants
• Military
• …?
Usability studies in IT
 focus on end users.
SYSADMIN
How other industries
 ensure reliability?
• Procedures & checklists
• Verification & redundancy
• Eliminating human factor where it’s
  unnecessary

• Focusing human effort where it’s actually
  needed
Procedures & checklists
• A lot of what we do is following known
  procedures
• If we write it down & follow the
  instructions, we can focus on what is
  special, not on doing the usual steps
• Also, we won’t forget any of the usual steps
  when situation goes sideways
ITIL?
http://gawande.com/the-checklist-manifesto
ctors 35(2), pp. 28-43.




                     COCKPIT CHECKLISTS:
                   CONCEPTS, DESIGN, AND USE

                                  Asaf Degani
                      San Jose State University Foundation
                                  San Jose, CA
                                Earl L. Wiener
                              University of Mami
                               Coral Gables, FL
task-checklists for almost all segment of the flight, i.e., PREFLIGHT, TAXI, BEFORE
LANDING, etc.; and in particular before the critical segments: TAKEOFF, APPROACH,
and LANDING. Two other checklists are also used on the flight-deck: the abnormal and
emergency checklist. This paper will address only the normal checklist.
We believe that normal checklists are intended to achieve the following objectives:
   1. Provide a standard foundation for verifying aircraft configuration that will attempt
      to defeat any reduction in the flight crew's psychological and physical condition.
   2. Provide a sequential framework to meet internal and external cockpit operational
      requirements.
   3. Allow mutual supervision (cross checking) among crew members.
   4. Dictate the duties of each crew member in order to facilitate optimum crew
      coordination as well as logical distribution of cockpit workload.
   5. Enhance a team concept for configuring the plane by keeping all crew members
      “in the loop.”
   6. Serve as a quality control tool by flight management and government regulators
      over the flight crews.
Another objective of an effective checklist, often overlooked, is the promotion of a
positive “attitude” toward the use of this procedure. For this to occur, the checklist must
be well grounded within the “present day” operational environment, so that the flight
crews will have a sound realization of its importance, and not regard it as a nuisance task
Various types of checklist devices have evolved over the years. Among them are the
scroll, mechanical, and vocal checklist (Degani and Wiener, 1990; Turner and Huntley,
1991). More modern ones involve computer-based text displayed on a CRT and
electronic checklist devices that sense sub-system’s state (Rouse, Rouse, and Hammer,
1982; Palmer and Degani, 1991).
The paper checklist is the most common checklist device used today in commercial
operation. It has a list of items written on a paper card (see Figure 1). Usually, the card is
held in the pilot’s hand. Because of the wide prevalence of this device, it will be the focus
of this paper.
Sanders and McCormick (1987) state that “because humans are often the weak link in the
system, it is common to see human-machine systems designed to provide parallel
redundancy” (p. 18). A similar principle of backup and redundancy is applied in the
checklist procedure. There are two types of redundancies embedded in this procedure.
The first is the redundancy between configuring the aircraft from memory and only then
using the checklist procedure to verify that all items have been accomplished properly
(set-up redundancy). The second is the redundancy between the two or three pilots
monitoring each another while conducting the checklist procedure (mutual redundancy).

The Method
There are two dominant methods of conducting (“running”) a checklist—the do-list and
the challenge-response. Each is the product of a different operational philosophy.
Do-list. This method can be better termed “call-do-response.” The checklist itself is used
to lead and direct the pilot in configuring the aircraft, using a step-by-step “cookbook”
approach. The setup redundancy is eliminated here, and therefore, a skipped item can
easily pass unnoticed once the sequence is interrupted.
Challenge-response. In this method, which can be more accurately termed “challenge-
The first is the redundancy between configuring the aircraft from memory and only then
using the checklist procedure to verify that all items have been accomplished properly
(set-up redundancy). The second is the redundancy between the two or three pilots
monitoring each another while conducting the checklist procedure (mutual redundancy).

The Method
There are two dominant methods of conducting (“running”) a checklist—the do-list and
the challenge-response. Each is the product of a different operational philosophy.
Do-list. This method can be better termed “call-do-response.” The checklist itself is used
to lead and direct the pilot in configuring the aircraft, using a step-by-step “cookbook”
approach. The setup redundancy is eliminated here, and therefore, a skipped item can
easily pass unnoticed once the sequence is interrupted.
Challenge-response. In this method, which can be more accurately termed “challenge-
verification-response,” the checklist is a backup procedure. First, the pilots configure the
plane according to memory. Only then, the pilots use the checklist to verify that all the
items listed on the checklist have been correctly accomplished. This is the most common
checklist method used today by commercial operators.




                                            -5-
been created by people and can be documented and understood. But when it comes
to people, we are faced with a system element that comes with no operating manual
and no performance specifications, and that occasionally performs in ways not
anticipated by the system designers. Some of these failures can be easily explained,
an arithmetic error for example, while others are harder to predict. Although
individuals differ, researchers have discovered general principles of human
performance that can help us to create safer and more efficient systems. The focus
of this paper is on the functioning of people as elements of maintenance systems in
aviation.


The cost of maintenance error
Since the end of World War II, human factors researchers have studied pilots and
the tasks they perform, as well as air traffic control and cabin safety issues. Yet
until recently, maintenance personnel were overlooked by the human factors
profession. Whatever the reason for this, it is not because maintenance is
insignificant. Maintenance is one of the largest costs facing airlines. It has been
estimated that for every hour of flight, 12 man-hours of maintenance occur. Most
significantly, maintenance errors can have grave implications for flight safety.
Accident statistics for the worldwide commercial jet transport industry show
maintenance as the ‘primary cause factor’ in a relatively low four per cent of hull
loss accidents, compared with flight crew actions that are implicated as a primary
                                                    2
cause factor in more than 60 per cent of accidents. Yet primary cause statistics
may tend to understate the significance of maintenance as a contributing factor in
accidents. In 2003, Flight International reported that ‘technical/maintenance
failure’ emerged as the leading cause of airline accidents and fatalities, surpassing
controlled flight into terrain, which had previously been the predominant cause of
SYSADMIN
What do we do now?
• Use procedures where it makes sense
 • at least, write stuff down
• Automation, verification, redundancy
 • scripting not only for doing stuff, also for
    making sure it worked

• Usability of our panels & dashboards

More Related Content

Similar to Why do we fail? (And how do we stop doing that?

Safety and Environmental Risk and Reliability Model for Inland Waterway Colli...
Safety and Environmental Risk and Reliability Model for Inland Waterway Colli...Safety and Environmental Risk and Reliability Model for Inland Waterway Colli...
Safety and Environmental Risk and Reliability Model for Inland Waterway Colli...Oladokun Sulaiman Olanrewaju
 
Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493Oladokun Sulaiman Olanrewaju
 
Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493Oladokun Sulaiman
 
Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493Oladokun Sulaiman
 
Risk cost beefit of colision aversion 74 f493
Risk cost beefit  of  colision aversion 74 f493Risk cost beefit  of  colision aversion 74 f493
Risk cost beefit of colision aversion 74 f493Oladokun Sulaiman
 
Risk cost benefit of colision aversion 74 f493
Risk cost benefit  of  colision aversion 74 f493Risk cost benefit  of  colision aversion 74 f493
Risk cost benefit of colision aversion 74 f493Oladokun Sulaiman
 
research proposal-project
research proposal-projectresearch proposal-project
research proposal-projectMarios Gaitanos
 
Normalization of Deviance
Normalization of Deviance Normalization of Deviance
Normalization of Deviance mmcharter
 
145 173 pp jmee mb 37(kader)
145 173 pp jmee mb 37(kader)145 173 pp jmee mb 37(kader)
145 173 pp jmee mb 37(kader)Oladokun Sulaiman
 
Safety and environmental risk and reliability modeling
Safety and environmental risk and reliability modelingSafety and environmental risk and reliability modeling
Safety and environmental risk and reliability modelingOladokun Sulaiman Olanrewaju
 
Flood risk assessment: Introduction and examples.
Flood risk assessment: Introduction and examples.Flood risk assessment: Introduction and examples.
Flood risk assessment: Introduction and examples.Ahmed Saleh, Ph.D
 
Predictive and reliability o o_sulaiman _revision 2_
Predictive and reliability o o_sulaiman _revision 2_Predictive and reliability o o_sulaiman _revision 2_
Predictive and reliability o o_sulaiman _revision 2_Oladokun Sulaiman Olanrewaju
 

Similar to Why do we fail? (And how do we stop doing that? (16)

Safety and Environmental Risk and Reliability Model for Inland Waterway Colli...
Safety and Environmental Risk and Reliability Model for Inland Waterway Colli...Safety and Environmental Risk and Reliability Model for Inland Waterway Colli...
Safety and Environmental Risk and Reliability Model for Inland Waterway Colli...
 
Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493
 
Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493
 
Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493Risk cost benefit analysis of colision aversion model 74 f493
Risk cost benefit analysis of colision aversion model 74 f493
 
Risk cost beefit of colision aversion 74 f493
Risk cost beefit  of  colision aversion 74 f493Risk cost beefit  of  colision aversion 74 f493
Risk cost beefit of colision aversion 74 f493
 
Risk cost benefit of colision aversion 74 f493
Risk cost benefit  of  colision aversion 74 f493Risk cost benefit  of  colision aversion 74 f493
Risk cost benefit of colision aversion 74 f493
 
Procedure Based Maintenance
Procedure Based MaintenanceProcedure Based Maintenance
Procedure Based Maintenance
 
ADA502230 (1)
ADA502230 (1)ADA502230 (1)
ADA502230 (1)
 
research proposal-project
research proposal-projectresearch proposal-project
research proposal-project
 
Normalization of Deviance
Normalization of Deviance Normalization of Deviance
Normalization of Deviance
 
145 173 pp jmee mb 37(kader)
145 173 pp jmee mb 37(kader)145 173 pp jmee mb 37(kader)
145 173 pp jmee mb 37(kader)
 
145 173 pp jmee mb 37(kader)
145 173 pp jmee mb 37(kader)145 173 pp jmee mb 37(kader)
145 173 pp jmee mb 37(kader)
 
Safety and environmental risk and reliability modeling
Safety and environmental risk and reliability modelingSafety and environmental risk and reliability modeling
Safety and environmental risk and reliability modeling
 
Risk and Hazop Analysis
Risk and Hazop AnalysisRisk and Hazop Analysis
Risk and Hazop Analysis
 
Flood risk assessment: Introduction and examples.
Flood risk assessment: Introduction and examples.Flood risk assessment: Introduction and examples.
Flood risk assessment: Introduction and examples.
 
Predictive and reliability o o_sulaiman _revision 2_
Predictive and reliability o o_sulaiman _revision 2_Predictive and reliability o o_sulaiman _revision 2_
Predictive and reliability o o_sulaiman _revision 2_
 

More from Maciej Pasternacki

More from Maciej Pasternacki (6)

A Continuous Packaging Pipeline
A Continuous Packaging PipelineA Continuous Packaging Pipeline
A Continuous Packaging Pipeline
 
Odin Authenticator
Odin AuthenticatorOdin Authenticator
Odin Authenticator
 
Monitoringsucks
MonitoringsucksMonitoringsucks
Monitoringsucks
 
Test-driven development: a case study
Test-driven development: a case studyTest-driven development: a case study
Test-driven development: a case study
 
Amazon Web Services (cloud: is it good for anything?)
Amazon Web Services (cloud: is it good for anything?)Amazon Web Services (cloud: is it good for anything?)
Amazon Web Services (cloud: is it good for anything?)
 
Devops lightning talk
Devops lightning talkDevops lightning talk
Devops lightning talk
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Why do we fail? (And how do we stop doing that?

  • 1. Why do we fail? And how do we stop doing that?
  • 2. Major cause of incidents is human factor -- folk knowledge
  • 3. Outages become less frequent and shorter in duration as data centers increase in size. The smaller the data center the longer and more common the outages. Root causes and responses by organizations Eighty percent of respondents know the root cause of the unplanned outage. The most frequently cited root causes of data center outages are: UPS battery failure (65 percent), UPS capacity exceeded (53 percent), accidental EPO/human error (51 percent) and UPS equipment failure (49 percent). Most common responses to unplanned outages are to repair, replace or purchase additional IT or infrastructure equipment (60 and 56 percent, respectively) followed by contacting the equipment vendor for support (51 percent). Fifty-seven percent believe all or most of the unplanned outages could have been prevented. The most common prevention tactics to avoid downtime are investing in improved equipment (50 percent), increasing the budget and staff of the data center (34 and 20 percent, respectively), improving infrastructure design/incorporating redundant components (19 and 18 percent, National Survey on Data respectively) as well as performing preventative maintenance of critical infrastructure (16 percent). Center Outages National Survey on Data Sponsoredconducted by Ponemon Institute Independently by Emerson Network Power LLC Center Outages Publication Date: 30 September 2010 Sponsored by Emerson Network Power Independently conducted by Ponemon Institute LLC Publication Date: 30 September 2010 Ponemon Institute© Research Report
  • 4. marine accident year project to Common patterns that were found included: marine accidents, • Human error continues to be the dominant factor in yze the contents. maritime accidents, contributing toABS to 85 percentPAPERS 2006 80 TECHNICAL er understand the of accidents. • Based on USCG data, for all accidents over the efficient approach for in reporting period of 1999 to 2001, approximately 80 documenting marine inciden to 85% of the accidents analyzed involved human 1developed the MaRCAT m error. Of these, about 50% of maritime accidents and combining the best te were initiated by human error, and another 30% of proving and improving the were associated with human error. MaRCAT’s application duri • In MARS reports (voluntary mariner self reporting The ABS MaRCAT approa of accident and near misses), mariners note human caters to the unique need error in the majority of reports, and chiefly attribute including human element; accidents and near misses to: lack of competence, structural and security conc knowledge and ability; human fatigue; workload; ABS MaRCAT approach are manning; complacency, and; risk tolerance. • Assist clients with inve • USCG data on offshore pollution events in (e.g., groundings, coll California suggests that 46% are caused or incidents (minor to maj associated with human error. to their vessels and facil • For accidents associated with pollution events in the • Allow analysis of losses State of California, accident causes are chiefly the environment, human attributed to failures of situation awareness (94%). reliability, quality or bus • Among all human error types classified in numerous ABS TECHNICAL PAPERS 2006 databases and libraries of accident reports, failures • Provide ABS clients wit TRENDING THE CAUSES OF MARINE INCIDENTS of situation awareness and situation assessment incident investigators in overwhelmingly predominate, being a causal factor D.B. McCafferty, American Bureau of Shipping, USA analyses and in provid C.C. Baker, American Bureau of Shipping, USA in about 45% (offshore) to about 70% (ships) of the identifying, documentin recorded accidentsFrom Marine Incidents 3 Conference held in London,(see Presented at the Learning associated with human error January 25-26, 2006, and reprinted with the kind permission of the Royal Institution of Naval Architects accidents and near misse SUMMARY Figure 1, below). • Assist and facilitate
  • 6. Other industries have figured it out • Aviation • Medicine • Nuclear powerplants • Military • …?
  • 7. Usability studies in IT focus on end users.
  • 9. How other industries ensure reliability?
  • 10. • Procedures & checklists • Verification & redundancy • Eliminating human factor where it’s unnecessary • Focusing human effort where it’s actually needed
  • 11. Procedures & checklists • A lot of what we do is following known procedures • If we write it down & follow the instructions, we can focus on what is special, not on doing the usual steps • Also, we won’t forget any of the usual steps when situation goes sideways
  • 12. ITIL?
  • 14.
  • 15.
  • 16.
  • 17. ctors 35(2), pp. 28-43. COCKPIT CHECKLISTS: CONCEPTS, DESIGN, AND USE Asaf Degani San Jose State University Foundation San Jose, CA Earl L. Wiener University of Mami Coral Gables, FL
  • 18. task-checklists for almost all segment of the flight, i.e., PREFLIGHT, TAXI, BEFORE LANDING, etc.; and in particular before the critical segments: TAKEOFF, APPROACH, and LANDING. Two other checklists are also used on the flight-deck: the abnormal and emergency checklist. This paper will address only the normal checklist. We believe that normal checklists are intended to achieve the following objectives: 1. Provide a standard foundation for verifying aircraft configuration that will attempt to defeat any reduction in the flight crew's psychological and physical condition. 2. Provide a sequential framework to meet internal and external cockpit operational requirements. 3. Allow mutual supervision (cross checking) among crew members. 4. Dictate the duties of each crew member in order to facilitate optimum crew coordination as well as logical distribution of cockpit workload. 5. Enhance a team concept for configuring the plane by keeping all crew members “in the loop.” 6. Serve as a quality control tool by flight management and government regulators over the flight crews. Another objective of an effective checklist, often overlooked, is the promotion of a positive “attitude” toward the use of this procedure. For this to occur, the checklist must be well grounded within the “present day” operational environment, so that the flight crews will have a sound realization of its importance, and not regard it as a nuisance task
  • 19. Various types of checklist devices have evolved over the years. Among them are the scroll, mechanical, and vocal checklist (Degani and Wiener, 1990; Turner and Huntley, 1991). More modern ones involve computer-based text displayed on a CRT and electronic checklist devices that sense sub-system’s state (Rouse, Rouse, and Hammer, 1982; Palmer and Degani, 1991). The paper checklist is the most common checklist device used today in commercial operation. It has a list of items written on a paper card (see Figure 1). Usually, the card is held in the pilot’s hand. Because of the wide prevalence of this device, it will be the focus of this paper. Sanders and McCormick (1987) state that “because humans are often the weak link in the system, it is common to see human-machine systems designed to provide parallel redundancy” (p. 18). A similar principle of backup and redundancy is applied in the checklist procedure. There are two types of redundancies embedded in this procedure. The first is the redundancy between configuring the aircraft from memory and only then using the checklist procedure to verify that all items have been accomplished properly (set-up redundancy). The second is the redundancy between the two or three pilots monitoring each another while conducting the checklist procedure (mutual redundancy). The Method There are two dominant methods of conducting (“running”) a checklist—the do-list and the challenge-response. Each is the product of a different operational philosophy. Do-list. This method can be better termed “call-do-response.” The checklist itself is used to lead and direct the pilot in configuring the aircraft, using a step-by-step “cookbook” approach. The setup redundancy is eliminated here, and therefore, a skipped item can easily pass unnoticed once the sequence is interrupted. Challenge-response. In this method, which can be more accurately termed “challenge-
  • 20. The first is the redundancy between configuring the aircraft from memory and only then using the checklist procedure to verify that all items have been accomplished properly (set-up redundancy). The second is the redundancy between the two or three pilots monitoring each another while conducting the checklist procedure (mutual redundancy). The Method There are two dominant methods of conducting (“running”) a checklist—the do-list and the challenge-response. Each is the product of a different operational philosophy. Do-list. This method can be better termed “call-do-response.” The checklist itself is used to lead and direct the pilot in configuring the aircraft, using a step-by-step “cookbook” approach. The setup redundancy is eliminated here, and therefore, a skipped item can easily pass unnoticed once the sequence is interrupted. Challenge-response. In this method, which can be more accurately termed “challenge- verification-response,” the checklist is a backup procedure. First, the pilots configure the plane according to memory. Only then, the pilots use the checklist to verify that all the items listed on the checklist have been correctly accomplished. This is the most common checklist method used today by commercial operators. -5-
  • 21.
  • 22.
  • 23. been created by people and can be documented and understood. But when it comes to people, we are faced with a system element that comes with no operating manual and no performance specifications, and that occasionally performs in ways not anticipated by the system designers. Some of these failures can be easily explained, an arithmetic error for example, while others are harder to predict. Although individuals differ, researchers have discovered general principles of human performance that can help us to create safer and more efficient systems. The focus of this paper is on the functioning of people as elements of maintenance systems in aviation. The cost of maintenance error Since the end of World War II, human factors researchers have studied pilots and the tasks they perform, as well as air traffic control and cabin safety issues. Yet until recently, maintenance personnel were overlooked by the human factors profession. Whatever the reason for this, it is not because maintenance is insignificant. Maintenance is one of the largest costs facing airlines. It has been estimated that for every hour of flight, 12 man-hours of maintenance occur. Most significantly, maintenance errors can have grave implications for flight safety. Accident statistics for the worldwide commercial jet transport industry show maintenance as the ‘primary cause factor’ in a relatively low four per cent of hull loss accidents, compared with flight crew actions that are implicated as a primary 2 cause factor in more than 60 per cent of accidents. Yet primary cause statistics may tend to understate the significance of maintenance as a contributing factor in accidents. In 2003, Flight International reported that ‘technical/maintenance failure’ emerged as the leading cause of airline accidents and fatalities, surpassing controlled flight into terrain, which had previously been the predominant cause of
  • 25. What do we do now?
  • 26. • Use procedures where it makes sense • at least, write stuff down • Automation, verification, redundancy • scripting not only for doing stuff, also for making sure it worked • Usability of our panels & dashboards

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n