1. MANAGING INFO IN THE INFORMATION AGE – A CLIENT CASE MATT FOURIE THINKING DIMENSIONS
2. Some of our recent clients... Thinking Dimensions International - operating KEPNERandFOURIE RCA company initiatives for the last 23 years Specialise in RCA for IT, Telecoms & Manufacturing Barclays IT Macquarie ITG Unisys Woolworths IT Capita UK SITA Global BT Financial McDonalds IT
3. AGENDA “Most incident investigators ask the wrong questions, so don’t change your people, change the questions they are asking” Introduction Intro Client Case Stakeholder commitment Managing Information Quality of Information Investigation support Process demonstration Client outcomes Questions & answers
4. Investigation Info “It takes a company without a formal and effective Root Cause Analysis culture, up to 3 days to restore service incidents, but up to 25 days to find the root cause” KEPNERandFOURIE 2010
5. Client Case situation Lack of Stakeholder commitment Poor management of information Working with poor quality information Poor incident investigation support International Australian Investment Bank’s IT Division 2007-2010
6. Client situation - results Reduced downtime of critical systems by at least 60% Virtually eliminated recurring incidents Level of escalations dropped > 50% Visible improvement of productivity “The key to success is to be insistent about specificity – the more specific you are the better your chances to solve the incident.” KEPNERandFOURIE
7. How did they do it? Decided to follow four strategies to improve the management & quality of Incident Investigation information Improve Stakeholder involvement & commitment Improve management of information Improve quality of information thus decreasing incident investigation cycles Improve support for incident investigations
8. Strategy 1: Improve stakeholder commitment Client Actions Introduced a formal division wide Root Cause Analysis (RCA)system Provided common processes in troubleshooting and solution finding Introduced stakeholder/info source analysis Provided an easy way for SME’s to contribute meaningfully Specific challenges Lack of cross-silo collaboration Poor stakeholder buy-in Reluctant contributions from subject matter experts (SME’s)
9. Best in class 3 hrs Stakeholder Commitment Resolution time to repair a critical outage (3 hrs vs 45 hours) 71% increased improvement in mean-time-to-repair of critical bus apps vs 11% decline 98% availability of critical business applications vs 82% availability Aberdeen Group Boston Feb 2010 J DeBarros & G Patil
10. Best in class with RCA Stakeholder Commitment 69% of Best in Class Co’s implemented RCA over the last 2 years with 50% improvement in productivity and 19% improvement in profitability. 28% indicated they will do RCA in next year 19% of Average rated Co’s implemented RCA with a 12% improvement of productivity. Only 19% is planning to do RCA in next 12 months The Laggards did not do any RCA with a 9% drop in productivity. Nearly 30% to implement RCA
12. Common process Everybody uses the same process for finding causes and solutions The process determines which questions to ask at each step for each type of incident investigation approach Designed for minimalistic information combined with a good focus to provide quick answers Step 1: Identify Problem Situation Step 2: Gather Incident Information Step 3: Analyse Incident Information Step 4: Determine Conclusion
13. Stakeholder analysis What do you know? What don’t you know? Who has the information? How will you obtain the missing information? Decision makers Implementers Influencers
16. Achieving more in less time and not adverse to attending Incident Investigation meetings
17. Management promoting the use of the formal RCA processes“If a team could not solve a problem, the person with the information was not invited!” Chuck Kepner
18. Strategy 2: Improve management of information Client actions Introduced “rules of engagement” Introduced a framework of “levels of troubleshooting” to align with PM’s severity levels Taught staff to trust the processes to deliver the correct answers – templates with questions Introduced the “minimalistic” principle Specific challenges Inappropriate use of information sources Either too much or too little information High level of escalations Duplication of efforts
19. Rules of engagement TOP – Commitment to training of key staff and facilitators. Publicise the rules for engagement Top MIDDLE – Commitment to declare a situation as an unresolved incident. Gives instruction for direct reports to do a RCA exercise to resolve incident Middle WORKFORCE – Allow IT professionals 2-8 hours to resolve a problem. If not, they would be allowed to escalate incident and apply the RCA process workforce
20. Levels of troubleshooting SEV 3: - Thinking on Your Feet – “Checklist” problem solving using appropriate checklists. Leadership would allow the IT professional to resolve an incident within 8 hours. If this does not happen the incident is escalated. SEV 2: - Intuitive Analysis – Leadership instructs and allows the natural team to perform an intuitive RCA on the incident. If not resolved the team escalates the incident. SEV 1: - Investigative Analysis – In-house trained RCA facilitators have the permission of Leadership to assemble a cross-silo team to formally investigate the incident with the appropriate RCA tools to systematically arrive at the TRUE & ROOT causes for a problem situation
21. “Minimalistic principle”.. “Too much information can cause confusion. The key is to get all the relevant information onto one page and that is normally substantially less than gathering ‘all’ the Information.” Innovation – the FreeZone thinking experience. by Kepner & Fourie Only need to analyse the information that would be relevant to the incident Worked questions within a customised “factor analysis” framework Get a quick factual “snapshot” of the characteristics of the incident and then use SME experience and gut feel to explain the snapshot Test SME inputs against logic of snapshot
24. Gave IT professionals the confidence that they were working through a problem situation systematically and comprehensively
25. Developed a “no-nonsense” incident investigation culture – you ask a question; you either have the answer or you need to go and get it.“Every incident has multiple entry points. To be successful in solving the incident you need to find the correct entry point.” Matt Fourie
26. Strategy 3: Improve quality of information Specific challenges Wasted time and effort having to do too many replications Mostly dealing with raw data instead of information Long investigation cycle times High levels of recurring incidents Client Actions Introduced a set of interrogative questions to convert raw data into meaningful information Created “deductive” reasoning culture to arrive at answers quickly and effectively Testing possible causes on paper to eliminate 90% of replication time, effort and money
28. Snapshot info for causes OBJECT – What object and which other object(s) not? FAULT – What fault and which other typical faults not? USERS – Who has the problem and who does not? WHERE – Where are these users and where could they have been but are not? TIMING – When did it happen first time and when not? PATTERN – What is the pattern of faults and what could it have been but is not? CYCLE– In which cycle does the problem occur and in which cycle does it not occur?
30. Snapshot info for Solutions Four Question Drill What are the results you want to achieve with this solution? What are the existing problems you would like to remove with this solution? What are the potential risks you would like to avoid with this solution? What money and time do you have or do you need to preserve? What are the restrictions out of your control?
41. Cycle times for incident investigations reduced drasticallyI keep six honest serving-men: (They taught me all I knew) Their names are What and Where and When And How and Why and Who. I send them over land and sea, I send them East and West; but after they have worked for me, I give them all a rest. Rudyard Kipling
42. Strategy 4: Improve support for incident investigations Specific challenges Did not know “Who, What, How and When” No “Go To” person to help with effective investigations Client actions Trained in-house professional RCA investigators Established a “rules of engagement” for facilitators Publicise successes Recognition by Management
43. Training in-house facilitators Advice to Incident Owner on who to invite to RCA meeting to improve chances of a quick success (Stakeholders & Info Sources) How to prepare a team for an effective RCA meeting Exceptional investigation facilitation skills (the art of asking the right questions and how to verify it for authenticity) RCA process skills to enable the facilitator to lead any team at any level in investigations. “One of the main reasons for incident investigation failure is “analysis paralysis” – having to work with too much information” Infra-Structure Manager Airline Software Platforms
44.
45. Facilitators are now also used to help solve vendor issues affecting application performance
46. Facilitators started to feed results into an agreed knowledge data base, also encouraging informal use of RCA incidents to be recorded
47. Increased division awareness of how well they are doing with application performance issues“It is always a good strategy to stand a few steps back and looking at the incident from a different angle” Unknown
48. Application Performance results M-T-T-R went from weeks to a couple of hours Improvement in M-T-T-R practices by nearly 50% Availability of critical systems went from 77% to 94% HOURS WEEKS
49. Improvement in escalations Escalation of severity 3 to severity 2 reduced by nearly 24% Escalation of severity 2 to severity 1 reduced by 76% Recurring incidents reduced by 35%
50. Lessons learned.. Most of the recurring incidents and problems are caused by “out of date procedures” and lack of proper documentation RCA is a “mental orientation” which people have to get trained in – “does not come with experience” IT professionals need a “thinking approach” that could be applied in most situations Rules of Engagement to become a standing order Encourage use in all incident investigation meetings – ask for the paperwork/evidence Sponsors continuous RCA training Regular email communications to publish successes
51. Thank you for your time! If you have any further questions regarding Minor or Major Investigations and how to acquire the in-house skills to improve your metrics on this drastically, please do not hesitate to speak to us after this or Andrew on; andrew@thinkingdimensions.com.au