11. Business Alignment Corporate Vision Corporate Mission Corporate SWOT Corporate Strategies And Goals IT Vision IT Mission IT SWOT IT Strategic Goals IT Tactical Goals IT Operational Goals Are we working towards achieving our vision? Are we operating within the purpose and scope? What needs to be measured? What needs to be measured? What needs to be measured? Business Objectives IT Objectives Performance Indicators
12.
13.
14. Service Portfolio Management DEFINE---Collect information and inventories of existing services. Establish the requirements for the requested service, and establish the business case for implementing the service. ANALYZE---Review the long-term business goals, and determine what services are required to meet those goals. Then analyze the requested service for financial viability, operational capability, and technical feasibility to determine how the organization is going to get there. (You may decide to obtain the service from an outsourcer rather than develop it internally.)
15. Service Portfolio Management APPROVE---Make a decision to retain, replace, renew, or retire the services. CHARTER---Communicate action items to the organization to implement approved service, and allocate budget and resources. The Define, Analyze, and Approve steps are described in the ITIL V3 Service Strategy book. The Charter step is discussed in the ITIL V3 Service Design book.
16. Configuration Management System (CMS) CMS was introduced in ITIL V3. It provides a strong foundation for the service catalog. CMS is an ecosystem that feeds, manages, analyzes, and presents the information contained in the Configuration Management Database (CMDB) which is another fundamental component of ITIL. CMDB is depicted in the ITIL books as merely a core component of CMS, a well-architected, federated CMDB implements much of the functionality of the CMS
17.
18. Processes vs. Departments IT Service Desk 2 nd Dept. 3 rd Dept. Department View Step 1 Step 2 Step 3 Step 4 Step 5 Process View
27. The Service Delivery Process Model SLA’s, OLA’s, SLR’s Service requests Service catalogue SIP Exception reports Audit reports Management Tools The Business, Customers & users Capacity Plan CDB Targets/Thresholds Capacity Reports Schedule Audit Reports Capacity Alerts Exceptions Changes IT Service Continuity IT Continuity Plans BIA & Risk Analysis Define Requirements Control Centers DR Contacts Reports Audit Reports IT Financial Management Availability Plan AMDB Design Criteria Targets/Thresholds Reports Audit Reports Queries Enquiries Communication Updates Reports Requirements Targets Achievements Financial Plans Types & Models Costs & Charges Reports Budgets & Forecasts Audit Reports Service Level Management Availability
28.
29.
30. Generic Process Model Process Input & Input Specifications Output & Output Specifications Process Owner Process Goal Quality Parameters & Key Performance Indicators Process Control Resources Roles Process Enablers A connected series of actions, activities, changes etc. performed by agents with the intent of satisfying a purpose or achieving a goal. Activities & Sub-Processes
31.
32.
33.
34.
35. Service Support KPI’s Quality Compliance # of releases by type that satisfy release management criteria when submitted to Change # of releases that bypass the process Ensure production readiness, quality and authorization of new or modified CIs and their planned deployment Release Quality Value % of CMDB data population and accuracy vs actual, according to scope % Growth or Change by CI type over an elapsed time period Identify / control / manage IT resources within a Configuration Management Database Config. Quality Value # of changes by type / category / Group / Customer. (emergency changes trending down) # of changes that have resulting incidents, or fail and have to be backed out Handle changes efficiently while minimizing impact to service delivery Change Quality Value # of problems identified & root cause determined with solution or workaround. # of Repeat incidents by category trending downwards Identify systemic Infrastructure Errors and eliminate them to minimize impact and improve availability Problem Quality Performance # of Incident by category, priority and resolution type by LOB # of Incidents restored within SLA Targets Restore service degradations to expected level ASAP Incident Category Example Core KPIs Core Objective Process
36. Service Delivery KPI’s Compliance % deviation of forecasted versus actual cost of IT services within defined tolerance limits (% of Deviation $ of Deviation) Plan for and deliver IT Services within a forecasted budget against actual cost Finance Quality Performance % of systems that fail recovery test Time to execute test of plan and recover IT services in a contingency state against expected targets. Recover IT systems to normal state in an alternate way after a disaster within an expected timeframe ITSCM Quality % of components the breach tolerance thresholds in correspondence to planned capacity levels for components and complete IT systems. Current and future resources are greater than or equal to demand, but excess is planned Capacity Quality % of service availability within SLA negotiated requirements. Define and plan for service availability to meet or exceed stated business requirements through process, technology and people resource planning and implementation Avail. Value % Score of customer satisfaction survey trends up over time i.e. Customer Satisfaction Survey Define services Agree on level, scope, quality, performance Monitor & Manage SLM Category Example Core KPI Core Objective Process
37.
38. Incident Management Activities Ownership, Monitoring, Tracking & Communication Incident Detection & Recording Resolution & Recovery Investigation & Diagnosis Classification & Initial Support Incident Closure Service Request Procedure Yes No Service Request
39. Incident Management Process Resolutions/ Work-arounds Leave Process RFC Resolution Change Management Process CMDB Configuration Details Problem & Error DB Resolution/ Work-around Service Request Procedures Routing Monitoring Incidents Enter Process Service Desk Computer Operations Networking Procedures Other Sources Of Incidents
40. First, Second & Third Line Support 1 st level Incident Detection & Recording Service Request Procedure Service Request Classification & Initial Support Investigation & Diagnosis Resolution & Recovery Resolved? Incident Closure No Yes 2 nd level Investigation & Diagnosis Resolution & Recovery Resolved? No Yes 3 rd level Resolved? Investigation & Diagnosis Resolution & Recovery No Yes N th level Etc.
41. Prioritization 3 2 1 4 3 2 5 4 3 Low Medium High Low Medium High URGENCY IMPACT Highest Priority
45. Incident Matching Procedure Incident alert Y Y N N Update Incident record with ID of Known Error Update incident record with category data Extract resolution or circumvention action from Known Errors Db Support required? Execute resolution action Assess Incident and follow routine procedure Update Incident count on Problem Record Update Incident record with ID of Problem Update Incident record with category data Extract resolution or circumvention action from Problem Db Support required? Execute resolution action Routine Incident? Raise new record on Problem Db Allocate to PM Team N Inform Customer of Work-around Match on Problem Db? Match on Known Errors Db New Problem Alert Process Incident/ Problem Update Incident count on Known Error record Y Y Y N N
46.
47.
48.
49. Reactive Problem Management Error Identification & Recording Error Assessment Record Error Resolution Tracking & Monitoring Of Errors RFC Problem Identification & Recording Problem Resolution & Evolution To Known Error Problem Investigation & Diagnosis Problem Classification Tracking & Monitoring Of Problems Problem Control Error Control Close Error & Associated Problems Change Successfully Implemented
50.
51.
52.
53.
54.
55. The Error Cycle In The Live & Development Environments Live Known Errors Db Development Known Errors Db Investigation & Diagnosis Applications Development & Maintenance Live Operations Investigation & Diagnosis Change Management Requests For Change Problems Problems Release
56.
57.
58.
59.
60.
61. From Incident(s) To A Problem To A Known Error To A Change Problem Known Error Change X X X X Problem evolves into error record Change Management Incident Management Problem Management CI at Fault X X X X } X X X X } X X X X } Workaround } Incident Matching Root cause determined Temporary solution RFC
ITIL stands for the “Information Technology Infrastructure Library”. Developed by the CCTA (Central Computer and Telecommunications Agency) in the late 80’s. ITIL has become the de facto standard in Service Management. CCTA now OGC (Office of Government Commerce) part of HM Treasury – authority for best practices within Government. Underpins ISO 9000 quality standards, considered a fast path toward quality certification. ITIL stands for the “Information Technology Infrastructure Library”. The fact that it is referred to as a “library” relates to the fact that the information and guidance surrounding the implementation of each process in an organization can be found in books.
Business goals is the referee between innovation and formalization This slide illustrates the dynamic action between innovation and formalization. Formalization is the red tape, procedures, etc. Innovation is the business’s adaptability to the global economy Red arrow is the dynamic tension. The referee in this game is the Business Goals. In IT we often speak of the PPT triangle; people, processes and technology are key ingredients in IT service delivery. Achieving a balance between these is often challenging. It’s more than just People processes & technology. These are tied to IT strategy which is tied to Corporate strategy. Draw triangle on flip chart: P P T
This is a modified slide incorporating additional arrows to illustrate the more normal flow of operations. The process view has been added to show the similarities and differences from the department view of things. Animation was added to this slide Each item comes up when <enter> is pressed The process view department view is dimmed Instead of having to go thru management to get things done, people are empowered to perform the required activities and boundaries (silos) between functional groups are removed
This is the end in mind. After covering the service support processes, this slide will be revisited and the delegates will have a much better understanding of the whole picture. This slide illustrate the relationships between the support processes. There is a similar one for the delivery processes. However one must not conclude that delivery and support processes do not share information, on the contrary. It is simply that a diagram attempting to illustrate the relationships between all processes would be too busy.
This is the end in mind. After covering the service support processes, this slide will be revisited and the delegates will have a much better understanding of the whole picture. This slide illustrate the relationships between the support processes. There is a similar one for the delivery processes. However one must not conclude that delivery and support processes do not share information, on the contrary. It is simply that a diagram attempting to illustrate the relationships between all processes would be too busy.
Ownership, Monitoring, Tracking & Communication: The Service Desk is responsible for owning and overseeing the resolution of all outstanding Incidents The Service Desk is 1 st line Incident Management Incident Detection & Recording: Symptoms, basic diagnostic data, and information about the related Configuration Item should be included in Incident records during detection and recording Classification & Initial Support: Classification is the process of identifying the reason for the Incident. Many Incidents are regularly experienced and the appropriate resolution actions are well known. The Service Desk should then apply the known fix asap. This is not always the case, however, and a procedure (Investigation & Diagnosis) for matching Incident classification data against that for Problems and Known Errors is necessary. (Incident Matching) Successful matching gives access to proven resolution actions, which should require no further investigation effort. Service Request: Specific to the Service Desk – non-failure related – follows an established path. Does not leave first line. Investigation & Diagnosis: Investigation and diagnosis may become an iterative process, starting with a different specialist support group and following elimination of a previous possible cause. It may involve multi-site support groups and support staff from different vendors. It may continue overnight with a new shift of support staff taking over the next day. All this demands a rigorous, disciplined approach and a comprehensive record of actions taken with corresponding results. Resolution & Recovery: After successful execution of the resolution or some circumvention activity, service recovery can be effected and recovery actions carried out, often by specialist staff (second- or third-level support). The Incident Management system should allow for the recording of events and actions during the resolution and recovery activity. Incident Closure: The Service Desk must ensure the following before closure can occur: details of the action taken to resolve the Incident are concise and readable classification is complete and accurate according to root cause resolution/action is agreed with the Customer - verbally or, preferably, by email or in writing all details applicable to this phase of the Incident control are recorded, such that: the Customer is satisfied cost-centre project codes are allocated the time spent on the Incident is recorded the person, date and time of closure are recorded.
Here is an example of a discussion you can have with the delegates: Although the Service Desk is not a process as per ITIL, let’s use it as an example What is the goal of the Service Desk? Assume you have more than one Service desk in your organization, who ensures that all help desks do things the same way? – Process owner How can you say you have a quality service desk? What is quality? A: fit for purpose. Timex and Rolex, which is a better product? What are some KPI’s for a service Desk? What are a few activities of a service desk? Inputs: What does a service desk need to know to do a good job? Outputs: What does a Service Desk produce? What resources does a Service desk need to have at their disposal? Phone system, escalation, call display, call tracking, knowledge base, equipment used at customer sites What roles will you find in a Service desk? A: Manager, team leader, supervisor, customer liaison, escalation management, agent, etc.
Give examples where they can be shared For example Priority and Categorization models should be shared by IM, PM, and ChM A difference is the models would be the response times – IM may be 2 hrs, PM may be 2 days etc…
Talk through slide points Actions have to come form goals and objectives
Ownership, Monitoring, Tracking & Communication: The Service Desk is responsible for owning and overseeing the resolution of all outstanding Incidents The Service Desk is 1 st line Incident Management Incident Detection & Recording: Symptoms, basic diagnostic data, and information about the related Configuration Item should be included in Incident records during detection and recording Classification & Initial Support: Classification is the process of identifying the reason for the Incident. Many Incidents are regularly experienced and the appropriate resolution actions are well known. The Service Desk should then apply the known fix ASAP. This is not always the case, however, and a procedure (Investigation & Diagnosis) for matching Incident classification data against that for Problems and Known Errors is necessary. (Incident Matching) Successful matching gives access to proven resolution actions, which should require no further investigation effort. Service Request: Specific to the Service Desk – non-failure related – follows an established path. Does not leave first line. Investigation & Diagnosis: Investigation and diagnosis may become an iterative process, starting with a different specialist support group and following elimination of a previous possible cause. It may involve multi-site support groups and support staff from different vendors. It may continue overnight with a new shift of support staff taking over the next day. All this demands a rigorous, disciplined approach and a comprehensive record of actions taken with corresponding results. Resolution & Recovery: After successful execution of the resolution or some circumvention activity, service recovery can be effected and recovery actions carried out, often by specialist staff (second- or third-level support). The Incident Management system should allow for the recording of events and actions during the resolution and recovery activity. Incident Closure: The Service Desk must ensure the following before closure can occur: details of the action taken to resolve the Incident are concise and readable classification is complete and accurate according to root cause resolution/action is agreed with the Customer - verbally or, preferably, by email or in writing all details applicable to this phase of the Incident control are recorded, such that: the Customer is satisfied cost-center project codes are allocated the time spent on the Incident is recorded the person, date and time of closure are recorded.
Animation on this slide helps the trainer to show to the delegates the flow of incident management and its relationship to other processes Animation is Sources of incidents – incidents can be detected by many sources, not just the service desk Service request procedure – these are often, FAQ’s, IMACS, “how-to” questions. They still have to be recorded but have a separate procedure. Consulting the CMDB – ideally the service would have access to auto populate the incident ticket and for investigation/diagnosis Inputs and outputs with problem management – lightly introduce the problem management process Inputs and outputs with change management – an incident can require a RFC to resolve it. Resolution and workarounds leave the process – closing the loop back
This is a new slide to illustrate the linkage with all other support levels. This is a complimentary slide to the previous two This diagram is taken from the book and replaces 2 slides that were part of the previous version. The arrows from 2 nd , 3 rd , Nth level support back towards first level indicate that information, education and training will be provided back to first level when required. In many organizations, only the service desk (I.e. 1 st level usually) can close an incident ticket. Animation is 1 st level – see slide xx (activities) 2 nd level – functional escalation + activities expected from 2 nd level 3 rd level – functional escalation + activities expected from 3 rd level N th level - functional escalation until someone finds a temporary solution, fix or workaround
No changes to this slide This is a prioritization matrix that was developed by Pink Elephant consultants for a client. Therefore, the matrix itself is not defined within ITIL, but demonstrates how the principles of “Impact” and “Urgency” were practically applied to an organization. Be sure to mention that “ expected effort ” could be used as a tie breaker. This grid is used as a guideline only. Impact and urgency can move along the grid over time.
Categorization is one of the most important aspects to ensuring proper management information is available. Category trees need to be maintained and not too cumbersome. IMPACT, URGENCY, EXPECTED EFFORT - Should be consistent across Incident, Problem and Change Mgmt
Exercise – Incident matching exercise.. create grid on flip chart for teams to enter matches – see differences and similarities – discuss the differences, why they are / or not matched – incident ticket and description is not the challenge – the challenge is the problem ticket Guide delegates through this slide, but don’t make major issues out of remembering it. This process will be different in each organization. The main message is that it should exist and this is what ITIL regard as best practice.
Proactive Proactive activities to minimize the adverse impact of Incidents and Problems on the business that are caused by errors within the IT Infrastructure and to prevent the recurrence of Incidents related to these errors. The activity involved to reach this goal is root cause analysis of the Incidents and then initiate actions to improve or correct the situation. Proactive Problem Management is concerned with identifying and solving Problems and Known Errors before Incidents occur in the first place. Book: “Analysing Incidents over differing time periods.” Reactive This is concerned with solving Problems in response to one or more Incidents. Book: “Analysing Incidents as they occur.”
This illustrates the reactive activities within Problem Management. The attendees should be very comfortable with this information now.
Occurrence of Incidents is unavoidable – increasingly complex IT infrastructures will always cause Incidents to occur and interruptions to normal service. Reactive Problem Control is concerned with the first three phases involved in the Problem Control process. Problem Identification and Recording takes place when: Matching the Incident and CIs to existing Problems and Known Errors is unsuccessful during the stage of initial incident support and classification Analysis of Incident date reveals recurrent Incidents Analysis of Incident data reveals Incidents that are not yet matched to existing Problems or KEs Analysis of the IT infrastructure indicates a Problem that could potentially lead to Incidents A major or significant Incident (serious and adverse impact on services to the Customer) occurs for which a structural solution has to be found. Problems should be recorded on CMDB and linked to any related Incidents on the Incident records Solutions and work-arounds of Incidents should be recorded in the relevant Problem Records for others to access should other related Incidents occur. NOTES Some Problems will be identified from other processes i.e. Capacity - and Availability is concerned with detection and avoidance of problems and incidents to the IT infrastructure. It may not be cost effective to put this possibly expensive resource skill into place for some Incidents and Problems. If this is the case then a dummy Problem Record can be introduced in the CMDB related to all connect Incidents, Known Errors, RFCs and CIs. Problem Classification Problem classification similar to Incident Classification i.e. (see next slides)
Error control covers the processes involved in successful correction of Known Error s. The objective is to change IT components to remove Known Errors affecting the IT infrastructure and thus to prevent any recurrence of Incident s. Many IT departments are concerned with error control, and it spans both the live and development environments. It directly interfaces with, and operates alongside, Change Management processes. Figure shows the three phases of the error control process. The monitoring and tracking phase covers the entire Problem /error life-cycle.
Diagnosis frequently reveals that the cause of a Problem is not an error in a registered CI (hardware, software item, documentation or procedure) but is procedural. Incorrect release of a version of a program is one example. These situations result in Problem closure with an appropriate categorization code. Problems of this type do not automatically achieve the formal status of Known Error . To ensure that these Problems are followed up and that action is taken to address them, consider creating a dummy CI record for the offending procedure and re-classifying the Problem as a Known Error, or raise an RFC . Diagnosis showing the cause to be a fault in a registered CI should automatically change the status of the Problem into a Known Error. At this point the error control system and procedures take over. As indicated earlier, the objectives of Problem investigation frequently conflict with those of Incident resolution. For example, Problem investigation may require detailed diagnostics data, which is available only when an Incident has occurred; its capture may significantly delay the restoration of normal services. Be sure to liaise closely with Incident control and the computer operations or network control functions to get a balanced view of the right time for such actions.
Problem Management staff perform an initial assessment of the means of resolving the error, in collaboration with specialist staff. If necessary, they then complete an RFC according to Change Management procedures. The priority of the RFC is determined by the urgency and impact of the error on the business. The RFC identifier should be included in the Known Error record and vice versa in order to maintain a full audit trail, or the two records should be linked. The final stages of error resolution - impact analysis, detailed assessment of the resolution action to be carried out, amendment of the item in error, and testing of the Change - are under the control of Change Management. In extreme circumstances, authorization and execution of an urgent resolution may be necessary
This just supports the previous slides and explains how the live environment and the development environment interact to cope with Known Errors and resolutions. Error control in the software environments The processes of Problem and error control are essentially the same in the live and development environments. The support tools described earlier for Problem Management in the live environment are precisely those required in the development environment. Figure shows how there is a cyclical relationship between error control in the live and development environments. Interlocking and integrated Problem Management systems facilitate the handling of this situation. Errors found during live operations result in an accumulation of RFC s. The Release strategy ( Release Management ) allows for the eventual creation of a Release to incorporate authorized Changes for the amendment of system facilities. Development staff should be aware of all Known Error s and Problems that are associated with the package Release. They are required to delete Known Errors as they are corrected, but they add any newly introduced errors from the development activity itself, to a revised errors database (or CMDB ). Upon implementation of a new Release, this revised errors database replaces the database of the previous Release as the live version. The cycle then repeats itself as new errors are discovered in live operation.
Trend Analysis “ Incident and Problem analysis reports provide information for proactive measures to improve service quality.” Trends such as the post-change occurrence of particular Problem types Incipient faults of a particular type Recurring Problems of a particular type or with an individual item The need for more customer training or better documentation It may reveal: Problems occurring on one platform may occur on another platform – for example, a Problem concerning network software on a midrange system may well be of significance on a mainframe. The existence of recurring problems – for example, if three routers are substituted serially, because of the same failure, it may indicate that the router-type concerned is not appropriate and should be replaced by another type, or when a software application is involved then complete redevelopment might be necessary which would be classed as a major change.
Targeting preventive action Proactive Problem Management can estimate the business-related impact of Incidents in a specific Problem area in order to carry out target preventive actions. “ With this concept a pain value is given to each Incident category on the basis of a formula, taking into account, for instance: the volume of Incidents the number of customers impacted the duration and related costs of resolving the Incidents the cost to the business – this being perhaps the most important factor of all.”
Each incident is a result of an error in the IT infrastructure. To be able to discover the error, a problem is defined and this problem can be researched and diagnosed. As soon as the CI that causes the error is found, the problem changes into a known error. A known error is removed by means of a change and Problem Management ensures that an RFC is sent to Change Management. To remove the error, a CI, for example, must be replaced or repaired. Only after a successful change has the cause of the incidents - the error - been removed and is it guaranteed these incidents will never recur. Only then has a structural solution to the incidents been applied. In the interests of clarity, the differences between the Help Desk and Problem Management are explained below: Help Desk The objective of the Help Desk process is to resolve incidents as quickly as possible. Incidents arise during the utilization of the IT infrastructure and all incidents are recorded. Problem Management Problem Management focuses on introducing structural improvements to the IT infrastructure. The main concern here is not that it is solved quickly, as with the Help Desk, but that the solution is reliable and robust. A problem is defined on the basis of the incidents recorded by the Help Desk.