SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
Information Technology
                       Quality of Service
                      Metrics at ibm.com


                                October 2003



                      IBM Technical Report TR-40.0031




Tegan Lee                                                        David Leip
TeleWeb Operations Program Manager          Sr. Manager and Corporate Webmaster




© 2003 IBM CORPORATION           Page 1 of 11
Abstract
Information Technology Operations Management within IBM includes the managing of
Solution (i.e. infrastructure, application, business process) Performance Metrics. This
case study reviews how IBM’s Web organization (ibm.com) performs effective Quality
of Service management for web site availability and response time. At ibm.com we find
that as our Quality of Service improves, we are rewarded in the marketplace with
increased overall customer satisfaction and larger revenue capture on the web. To
maximize the benefit of Quality of Service metrics ibm.com focuses its resources on Key
Applications and Business Processes, and drives broad improvement through basic
Quality of Service management across the entire portfolio of applications. IBM sets
targets for high availability and response time based on best of breed benchmarking. To
deliver on the response time and availability requirements, ibm.com uses an exception
alerting system to generate immediate attention to issues. To proactively deliver on high
Quality of Service metrics the management reviews and identifies actions in the
framework of a standing calendar of Quality of Service reviews.

Introduction
IBM has undergone a major financial, competitive, and cultural transformation since
1993. The Business Transformation Management System (BTMS) is a component of
that transformation, and is used by IBM worldwide to identify, develop, and deploy IBM
information technology and infrastructure. BTMS provides Operational Management
guidance related to Solution Performance Management. Solution Performance
Management Metrics include reporting on web traffic, customer events (such as quantity
of orders), customer satisfaction, and availability and response time. In the area of
availability and response time metrics ibm.com has exceeded the base guidance, and has
been a pioneer in extending standards.

ibm.com is the organization within IBM that develops, deploys and manages IBM’s web
presence. This organization controls the ibm.com Internet domain, provides web sites for
Commerce and stakeholder (i.e. customers, investors, the press and potential employees)
support, and provides guidance to all external IBM web sites.

This paper reflects the experiences and lessons learned by ibm.com in managing the
Quality of Service metrics, availability and response time, for IBM’s web presence.




© 2003 IBM CORPORATION                Page 2 of 11
Marketplace impact of Availability and Response Time
Service availability and response time expectations are two basic Quality of Service
metrics an institution needs to achieve to maintain satisfied constituents. For example if
there are two gas stations close to your home, and one is open more hours and the wait
time to be served is significantly shorter, over time you are likely to use the more
available gas station, and possibly switch forever.

We at ibm.com know our customers rely on the web to learn about our goods and
services, shop, buy and effectively use the goods and services. The ibm.com retail
segment presents the most significant customer retention challenges to IBM. Retail
customer sites are sticky, which is to say someone keeps going back or sticks to the same
web site as long as it satisfies their need. If the retail web site is unavailable, the
customer will switch to a new site if their needs are immediate. Once they switch sites,
they may not switch again until the competitor site fails to satisfy their needs.


                                 Customer Sat trend compared to site
  ibm.com Retail                                        10
  commerce site survey
                                                        9
  data shows a strong
                                  Average Daily Score




  correlation between
                                                        8
  outages and
  decreased                                             7
    customer
                                                        6
    satisfaction,
    web site                                                                                                      age
                                                        5
                                                                       age




    satisfaction, and
                                                                                                                  e
                                                                                                               in
                                                                                                             Out

                                                                                                            prim
                                                                                                                t
                                                                       e




                                                                                                            shif
                                                                   in
                                                                 Out

                                                                prim
                                                                     t
                                                                shif




    likelihood to do                                    4
                                                             Day 1   Day 3   Day 5   Day 7    Day 9     Day 11   Day 13   Day 15
    business with IBM.
                                                                Overall Sat POS                       Web Site Performance Sat
                                                                Likelihood to buy again POS



ibm.com has found web site response time can influence Customer Satisfaction, web site
satisfaction and likelihood to do business with IBM. ibm.com has found that
improvements in site response time by 20-30% produce modest increases in overall
customer satisfaction of 5-10%. Significant increases of 20-50% in response time result
in a decline of customer satisfaction of 10-15%. ibm.com’s conclusion is response time
alone does not drive significant increases in customer satisfaction; yet substantial
increases in response time can drive customers away.




© 2003 IBM CORPORATION                                   Page 3 of 11
Key Applications and Business Processes
                                                          Key Applications or Business Processes
    ibm.com Key Applications as % of Portfolio

                                                          are designations at ibm.com that
                                                          mandate a minimal level of system
                                                          availability, system response time, and
                                                          metric reporting. Key Applications or
                                                          Business Processes meet one or more of
                                                          the following criteria:
               Key Applications   Applications


1. Used by external IBM Customers
2. Quantity of revenue or order capture volume
3. Web Site or Event influencing company image in major way
      a. Investor webcasts or hosting of a major sports event website are extremely
          visible events that can define the IBM image to an influential segment of the
          IBM stakeholders and customers.
      b. In general we use quantity of expected site visits to ascertain the Web Site or
          Event is critical. ibm.com measures web site traffic using Surfaid™.
4. Quantity of visits by entitled customers
      a. ibm.com customers have executed contracts with IBM for specific customer
          functions (like technical support) that have implied service level objectives
5. Alternative processing cost exposure
      a. IBM has found certain functions related to product delivery, like order status,
          when unavailable generate a deluge of alternative contacts into IBM that are
          dealt with in a less cost effective manner
      b. In high volume, low margin item handling an order processed by other then
          the web is prohibitive as alternative order processing costs reduce profit
          margins



Setting Availability Targets
The service level expectations, in commercial and non-profit operations, historically
started out with the hours the physical facility was open for business. For commercial
enterprises the hours of operation, when not regulated by law, became a differentiator
among firms. A web site, or any technology that enables access to a commercial
organization (i.e. automated teller machines for banks), raises customer expectations that
these institutions are always available to process a request with prompt response time.

IBM strives to achieve the highest possible availability, tempered by the costs for
supporting the infrastructure or application architecture. The ibm.com availability
standard is 99.5% for the underlying web infrastructure, excluding the specific system
maintenance time requirements. On top of that infrastructure availability standard, we
have put in place specific Key Application or Business Process availability requirements.
An example is that for the Key Application www.ibm.com homepage we have set a
99.95% availability target (about four hours per year), which we have exceeded for the


© 2003 IBM CORPORATION                           Page 4 of 11
prior two years, as IBM has had no measurable outages. Other applications, such as
Commerce, have 99.5% availability targets that we still have room for improvement in
our attainment. (See Figure 1Sample Availability and Response Time Report on Page 21).

Setting Response Time Targets
                                                           IBM Response time standards evolved
       Performance Standards
                                                           from IBM internally defined timings to
                                                           targets based on competitive intelligence.
                                                           The initial standard was based on the
                                                           ibm.com assessment of what the web
                                    Competitive
                                                           site could achieve. The revolution in
                                     Timings
                                      Based
                                                           thinking was the transition to true
                     Corporate
                        Wide
                                                           marketplace measures of acceptable
Application            Generic
 Specific           Transactions
                                                           response time, as defined during
Standards


                                                           benchmarking.
              Evolution over Time



Annually, for Key Applications or Key Business Processes, we set target response time
targets by benchmarking our competitors’ sites1. The competitors chosen by the business
teams are those doing well through the web channel. We do this benchmark on a
geographic basis, so that we ensure we meet the challenge in each of the geographies we
serve. With the benchmark data of similar competitor sites, we set response time
standards that are at or lower then our competitor’s response time. The output of this
competitive exercise can be somewhat sobering to the technology and business staff, as it
sets targets based on what you need to achieve in the marketplace, and not just setting
targets based on what you can achieve.


Alerts and underlying monitoring

At IBM we strive to identify performance or availability service delivery issues prior to
customer impact. We have alerts generated for infrastructure component issues,
application availability issues, or errors in business process monitors. We analyze all
alerts via automated analysis or staff intervention. In cases where ibm.com has redundant
resources to handle the customer requests, often there is no visible customer impact.

For infrastructure components the costs to setup monitoring and act upon alerts are built
into the service delivery rates, and are not considered discretionary. Discrete web
processes or web link monitoring setup alert handling and reporting, is done for Key
Applications or Business Processes. The alert handling and reporting costs can be
significant, if it requires staff to review the information and initiate corrective actions.
For infrastructure components we have either available staff or a page-out procedure to
initiate complex problem analysis and correction. For Key Applications, or applications
that have unique interim requirements, there is 24X7X365 hour staff coverage to respond

1
 ibm.com reviews with our Legal staff to validate our perception of publicly available data, versus
unethical competitive practices.


© 2003 IBM CORPORATION                            Page 5 of 11
to alerts. The staff response to alerts is: verification, problem definition, and then
initiation of corrective action.

ibm.com does monitoring at two levels: operational level and user perspective.
Operations level monitors individual technology components. User perspective
monitoring includes business scenario validation. Alerts issued for processing errors are
returned from either the operational or end-to-end monitoring staff. Operational alerts are
generated upon a change of status or a lack of response. End-to-end monitoring generates
alerts due to a failure to respond prior to the timeout value2 or the response does not
match the expected content anticipated.

Operations level monitoring techniques include:
• Enabling Tivoli monitoring to alert any time equipment, previously present and
   operational, does not respond.
• Running System Resource Monitoring for servers to check on CPU Usage, Run
   Queue for AIX, Memory and Storage Capacity used; I/O Wait and Paging.
• Validating DB2 Tables are at acceptable capacities
• Measuring Network usage versus committed capacity.
• Using IP Pings . With bi-directional probing the failing component (i.e. Firewall,
   network node, virtual private network link) in the network path can be identified.
• Issuing HTTP head requests to specific servers that confirm a web server is running
   and responding to requests. This type of request is “light” and with minimal impact
   on capacity provides a significant measurement of server health.
End-to-end (user perspective) monitoring techniques include:
• Initiating XML requests for key common services (i.e. authentication) to ensure the
   directory is responsive.
• Running a simple routine to ensure initial page load responds to a browser
• Processing a fully scripted business scenario with business response validation for
   Key Applications or Business Processes.

Periodically our staff performs site monitoring and validation directly. The most likely
reason for manual site verification is that a significant upcoming event requires additional
focus. For example, we at IBM do marketing campaigns that are intended to generate
significant interest in our products and services. We may have extra monitoring of
specific sites or perform specific functional verification to validate that we will reap the
maximum benefit from the marketing activity. Another example is key demonstrations
for selected customers. We offer custom web sites for our largest customers. We do
manual monitoring of those sites during key customer demonstrations to ensure we
effectively support our marketing efforts.

Monitor as established from different points of presence depending on the information
needed for operational management. We place monitors on both our internal network and
external network points of presence. Monitors on our internal networks are used for

2
 For web transactions we set the timeout value to 45 seconds, unless the nature of the request requires a
shorter or longer time.


© 2003 IBM CORPORATION                        Page 6 of 11
technology level monitoring, for business process measurement for Key Applications
heavily used by IBM staff, and for providing a quick method to validate that the problem
is external to ibm.com.

                                                  For web sites where the majority of
Probing Application from Europe
                                                  users are coming in via the Internet, we
                                                  monitor from external network points of
                                                  presence throughout the geographical
                                                  area that serves our stakeholders. The
                                                  chart on the left illustrates Points of
                                                  Presence in Europe and Africa used for
                                                  monitoring an application hosted in
                                                  North America.


ibm.com uses commercially available IBM Global Services, IBM software products, and
external service providers for monitoring. At ibm.com we are evaluating Client Perceived
Response Time tools to collect the performance information for Key Applications or
Business Processes. Client Perceived Response Time (CPRT) is a technology for
accurately measuring the customer experience of a WWW service by instrumenting web
pages with executables which send back to a collection point the response time data.
While this technology may provide us a new basis to collect the data, we are too early in
our evaluation to comment on the implications to our management system.

At ibm.com we put great value on monitoring tools that provide real time, or close to real
time, reports that aid in operational issue identification. The real time business probe has
become fundamental in both immediate and historical problem analysis. The historical
data from monitoring tools can pinpoint when a change in the response time or
availability arose. Often the response time change in our sites have been related to
content, and by identifying the date and time we can narrow the review of changes.

ibm.com End-to-End probes are excluded from business usage metrics. The need to
eliminate the monitoring traffic is so that ibm.com can ascertain true customer usage
trends and directions. We periodically assess the volume of our end-to-end probes to
make sure we are optimizing our activity across the IBM company. An example is we
have unique portals for each of our large customers that are customized to their needs to
learn, shop, buy and use IBM goods and services. We found multiple IBM brands (i.e.
Software, Server, Learning Services, Sales and Distribution) groups were doing Business
Scenario probing for their unique Quality of Service metrics. In one case we found
approximately 30% of the portal web hits were monitoring transactions. Upon
understanding this statistic, we consolidated the number of probes scenarios, yielding
both system resource efficiency in dealing with customer requests and reduced internal
reporting costs once the shared reporting was put in place.




© 2003 IBM CORPORATION                 Page 7 of 11
Availability and Response Time Reviews

ibm.com performance and availability reports are based off data generated by technology
monitors or business process probes, tempered by qualitative analysis. We use raw
technology monitor data to represent the Quality of Service metrics of availability and
response time for most applications. For our Key Applications or Business Processes we
refine this data with verified outage data and quality of system usability criteria.
Figure 1Sample Availability and Response Time Report



        13-Month Availability and
                                                                                                                                    Met or Exceeded                    Green
                                                                                                                                    Degraded Service Miss              Amber
                                                                                                                                    Missed Target                       Red


          Response Summary
                                                                                                                                    New Application
                                                                                                                                                                        ][


                              SLO*                                                                                                                                           YTD
            Application                 Aug           Sept       Oct         Nov       Dec       Jan       Feb       Mar      Apr        May     June       July    Aug
                               %                                                                                                                                             2003
          Portal - AP SP
                              99.5      99.5          100.0      99.5        99.0      100.0     99.9      99.9      99.6     100.0     100.0    100.0      100.0   100.0    99.9
         4.0
          Portal - NA SP
                              99.5      99.1          100.0      99.9        97.0      100.0     100.0     99.8      98.9     99.5       99.8    99.2       100.0   99.7     99.6
         4.0
          Portal - Japan SP
                              99.5      98.2          99.7       98.4        100.0     100.0     100.0     100.0     97.7     99.8      100.0    100.0      100.0   100.0    99.7
         3.6
          Portal - EMEA SP
                              99.5      98.6          100.0      99.8        97.9      99.9      100.0     100.0     97.7     99.4      100.0    100.0      100.0   99.7     99.6
         4.0




                                SLO                                                                                                                                            YTD
          Application                          Aug        Sept         Oct      Nov       Dec       Jan       Feb      Mar      Apr       May     June       July    Aug
                              Seconds                                                                                                                                          2003

          Portal - AP SP
                               105.8           70.9       71.1      68.9        65.7      66.0      70.6      85.8     74.6     64.2      62.1     68.4      64.7    64.3      69.3
         4.0

          Portal - NA SP
                               117.8           49.0       53.3      41.7        38.3      36.3      40.7      38.7     40.3     41.5      36.5     41.2      48.3    47.6      42.0
         4.0

         Portal - Japan
                                90             30.1       32.8      32.2        30.7      31.7      34.5      27.9     33.2     35.3      33.7     35.5      38.9    40.5      34.9
         SP 3.6

         Portal - EMEA
                                125            64.3       65.4      60.1        57.0      52.9      52.1      52.3     56.3     55.9      50.0     53.3      53.5    53.2      53.3
         SP 4.0




              x

                                                        Availability Summary

 Immediate verification upon an alert allows a more accurate representation to the
business if the web site is failing or if there is a potential monitoring issue. We have
found that technology monitors or business process probes themselves may fail in
isolated cases To guard against these alerts detracting our attention from hard outages, we
have a rule that two failures must occur in a row for this to be a verified outage.
Secondly, for Key Applications or Business Processes, staffs with scripts verify any
reported failure. This is to guard against situations when there is no perceived customer
impact. For example probes that fail due to content changes that are acceptable to a
business user but fails automated validation is different problem then the site failing. In
the case where the probes are failing, but the business script with a human execution is
working, we do not count it as a system outage. The best analogy for this is a doctor may
re-run a test if the first results are not consistent, and then possibly do further exploratory
work before reporting to a patient definitively they have a critical health issue.



© 2003 IBM CORPORATION                                                         Page 8 of 11
Quality of system usability criteria include judgments as to web site usability; tolerances
for intermittent failures, and thresholds for response times. We have had situations where
the technology monitors and business process probes work fine, however the site is
unusable for the target audience. In one embarrassing episode, the content for the United
Kingdom site was loaded with that for another country. The web site was responding,
and business verification of the monitors was satisfied; however the customers were not
getting relevant information. This warranted the reporting of an outage, although it was
the content, and not the technology, that failed. At times there can be intermittent failures
that with effort customers can overcome. ibm.com defines an outage when 40% or more
of the probe firings within a time period fail due to timeout. ibm.com also investigates
significant (>50%) fluctuations in response time to identify potential availability issues.
Many of our Key Applications have dynamic content. We have had cases where system
response time is dramatically higher or lower then historical levels. This may point to
content or functions were so significantly revised, that we need to either recalibrate our
monitors or address a production problem.

Quality of Service reports are reviewed daily, weekly and monthly with different
objectives. The daily operational reviews attendees include the application maintenance
staff, service delivery staff and business owner. The response time and availability data
is then compiled into a weekly report for review with the management of the application
maintenance, service delivery, and business owner organizations. A monthly compilation
is then reviewed with the Executive management of the application maintenance, service
delivery, business transformation and business owner organizations. At each review the
observations of trends will be discussed, and recommendations to resolve the problems
will be refined. Below is a table that indicates the content of the reports by review cycle:

                    Daily               Weekly                  Monthly
Availability        Current Day +       Prior Week + 13         Prior Month + 13 Months
                    other Weekdays      Weeks trend chart       Trend Chart
Performance         Current Day +       Prior Week + 13         Prior Month + 13 Months
                    other Weekdays      Weeks trend chart       Trend Chart
Root Cause          Prior Day Issue     Open RCA’s and          RCA’s approved for
Analysis (RCA)                          requests for closure    closure
                                        Current Month           Current Month Outage
                                        Outage Analysis by      Analysis by Root Cause,
                                        Root Cause              with 13 month trend.

In our quality of service reporting we have 24X7 system availability and response time
numbers, Root Cause Analyses for outages. The 24X7 system availability statistics allow
us to continuously highlight the need to always be available, and to eschew system
maintenance windows as much as practical.

 For select Key Applications or Business Processes, reports for each Geography during
core business hours in the local time zone, are produced. Most IBM sites are worldwide,
so we probe from worldwide points of presence and generate reports for availability and
response time in the local time zone, so the business can understand the customer


© 2003 IBM CORPORATION                  Page 9 of 11
experiences by Geography. The underlying data for the reporting is maintained in
Greenwich Mean Time (GMT), such that combined reporting worldwide and cross
application portfolio can easily be accomplished.

 The Root Cause Analysis is provided not only for discrete outages, but we also include
trending analysis by category. The categories for the Root Cause Analysis trending are:
Application Package or Customization, Hardware Failure, System Software (OS,
Middleware), Network, Application Maintenance Process, Service Delivery Management
Process, and Business Owner (i.e. Site Content).

Web Attributable Outage Impact

                                Outage Impact
                                                                             For Key Applications or Business
            80

                                                                             Processes ibm.com has established
            70
            60

                                                                             quantitative estimates of jeopardized
Thousands




            50
  USD $




            40
                                                                             revenue and alternate process impact
            30

                                                                             during an outage. For any given hour in
            20
            10

                                                                             the typical week we estimate alternate
            0
                 App 1                  App 2              App 3
                                                                             processing volumes as an incremental
                         Alternate Processing Costs   Revenue

                                                                             cost, and the typical order volume as the
                                                                             revenue at risk.

IBM Customers who cannot access us via the Web may call our staff or may elect to
order from a competitor whose web site store is open. This quantitative analysis has been
used to stress to everyone involved in the delivery of the web experience the immediate
cost impact of an outage at any time, and to articulate the impact of outages we
experience. These measurements have been powerful tools to justify the incremental
technology investments to get to that next level of availability.



Conclusions

ibm.com has made Quality of Service availability and response time metrics part of the
management of our business. The active management to improve these statistics has
yielded metrics with a high degree of credibility. These credible metrics are then
analyzed to ascertain tactical and strategic action items, with the ultimate goal of
improving our web sites value proposition for our customers and stockholders.


Acknowledgements
The authors would like to acknowledge the contributions of for review of the transcripts,
and suggestions for improvement.




© 2003 IBM CORPORATION                                             Page 10 of 11
References


About the Authors

David Leip is IBM’s corporate webmaster, with direct technical responsibility for IBM’s
corporate portal which spans 83 countries on the wired and wireless web. Prior to
becoming the corporate webmaster in 1999, David worked for IBM’s CIO office as
program manager of web enablement. Earlier he worked in IBM Software Development
Lab in Toronto. David has an MSc in Computing & Information Science from the
University of Guelph. His personal web site can be found at: http://www.Leip.ca/

Tegan Lee is a lead for Information Technology Operations Management in the ibm.com
unit. Tegan from 1995-2001 was the Project Executive for an IBM provided home
banking web platform that at its peak had over 3,000,000 subscribers. Tegan has over 25
years of Information Technology experience in developing, deploying and operating
business application computing platforms. Tegan has an MBA in Marketing and Finance
from Pace University, and a BBA in Statistics from Baruch College.




© 2003 IBM CORPORATION               Page 11 of 11

Mais conteúdo relacionado

Semelhante a Information Technology Quality of Service Metrics at ibm.com

July 23 Retail Webinar (OS_IBM_ISR)_FINAL
July 23 Retail Webinar (OS_IBM_ISR)_FINALJuly 23 Retail Webinar (OS_IBM_ISR)_FINAL
July 23 Retail Webinar (OS_IBM_ISR)_FINALLaura Roach
 
IBM Z for the Digital Enterprise 2018 - Automate Delivery Pipeline
IBM Z for the Digital Enterprise 2018 - Automate Delivery PipelineIBM Z for the Digital Enterprise 2018 - Automate Delivery Pipeline
IBM Z for the Digital Enterprise 2018 - Automate Delivery PipelineDevOps for Enterprise Systems
 
Ebizframe whitepaper
Ebizframe whitepaperEbizframe whitepaper
Ebizframe whitepaperNadeem Jafar
 
Amplify_Session_1185_Final Version
Amplify_Session_1185_Final VersionAmplify_Session_1185_Final Version
Amplify_Session_1185_Final VersionGuruprasad Samaga
 
A Business Requirements Litmus Test for Project Managers
A Business Requirements Litmus Test for Project ManagersA Business Requirements Litmus Test for Project Managers
A Business Requirements Litmus Test for Project ManagersRon Baldwin
 
Connecting IT and Business
Connecting IT and BusinessConnecting IT and Business
Connecting IT and BusinessCall Sumo
 
Optimizing z/OS Batch
Optimizing z/OS BatchOptimizing z/OS Batch
Optimizing z/OS BatchMartin Packer
 
Iowa RPA Meetup: the Five Persistent Myths of Robotic Process Automation
Iowa RPA Meetup: the Five Persistent Myths of Robotic Process AutomationIowa RPA Meetup: the Five Persistent Myths of Robotic Process Automation
Iowa RPA Meetup: the Five Persistent Myths of Robotic Process AutomationDoug Ross
 
Insight2014 ibm client_center_4_adv_analytics_7171
Insight2014 ibm client_center_4_adv_analytics_7171Insight2014 ibm client_center_4_adv_analytics_7171
Insight2014 ibm client_center_4_adv_analytics_7171IBMgbsNA
 
Incentive Compensation Management
Incentive Compensation ManagementIncentive Compensation Management
Incentive Compensation ManagementCRMIT
 
Iag api management architect presentation
Iag   api management architect presentationIag   api management architect presentation
Iag api management architect presentationsflynn073
 
RPA Portfolio Assessment
RPA Portfolio Assessment RPA Portfolio Assessment
RPA Portfolio Assessment Eric Rodman
 
Business transformation with_bpm_Manila_Apr-2013
Business transformation with_bpm_Manila_Apr-2013Business transformation with_bpm_Manila_Apr-2013
Business transformation with_bpm_Manila_Apr-2013Logan Vadivelu
 
Cvg executive-summit-2018-rpa-plus-video
Cvg executive-summit-2018-rpa-plus-videoCvg executive-summit-2018-rpa-plus-video
Cvg executive-summit-2018-rpa-plus-videoDoug Ross
 
APM Talk
APM TalkAPM Talk
APM TalkMongoDB
 
IBM’s Smarter Process Reinvent Business
IBM’s Smarter Process Reinvent BusinessIBM’s Smarter Process Reinvent Business
IBM’s Smarter Process Reinvent BusinessAgora Group
 

Semelhante a Information Technology Quality of Service Metrics at ibm.com (20)

July 23 Retail Webinar (OS_IBM_ISR)_FINAL
July 23 Retail Webinar (OS_IBM_ISR)_FINALJuly 23 Retail Webinar (OS_IBM_ISR)_FINAL
July 23 Retail Webinar (OS_IBM_ISR)_FINAL
 
WCBeat
WCBeatWCBeat
WCBeat
 
IBM Z for the Digital Enterprise 2018 - Automate Delivery Pipeline
IBM Z for the Digital Enterprise 2018 - Automate Delivery PipelineIBM Z for the Digital Enterprise 2018 - Automate Delivery Pipeline
IBM Z for the Digital Enterprise 2018 - Automate Delivery Pipeline
 
Ebizframe whitepaper
Ebizframe whitepaperEbizframe whitepaper
Ebizframe whitepaper
 
Amplify_Session_1185_Final Version
Amplify_Session_1185_Final VersionAmplify_Session_1185_Final Version
Amplify_Session_1185_Final Version
 
Dit yvol4iss36
Dit yvol4iss36Dit yvol4iss36
Dit yvol4iss36
 
000 010
000 010000 010
000 010
 
A Business Requirements Litmus Test for Project Managers
A Business Requirements Litmus Test for Project ManagersA Business Requirements Litmus Test for Project Managers
A Business Requirements Litmus Test for Project Managers
 
Connecting IT and Business
Connecting IT and BusinessConnecting IT and Business
Connecting IT and Business
 
Optimizing z/OS Batch
Optimizing z/OS BatchOptimizing z/OS Batch
Optimizing z/OS Batch
 
Iowa RPA Meetup: the Five Persistent Myths of Robotic Process Automation
Iowa RPA Meetup: the Five Persistent Myths of Robotic Process AutomationIowa RPA Meetup: the Five Persistent Myths of Robotic Process Automation
Iowa RPA Meetup: the Five Persistent Myths of Robotic Process Automation
 
Insight2014 ibm client_center_4_adv_analytics_7171
Insight2014 ibm client_center_4_adv_analytics_7171Insight2014 ibm client_center_4_adv_analytics_7171
Insight2014 ibm client_center_4_adv_analytics_7171
 
Incentive Compensation Management
Incentive Compensation ManagementIncentive Compensation Management
Incentive Compensation Management
 
Iag api management architect presentation
Iag   api management architect presentationIag   api management architect presentation
Iag api management architect presentation
 
RPA Portfolio Assessment
RPA Portfolio Assessment RPA Portfolio Assessment
RPA Portfolio Assessment
 
Business transformation with_bpm_Manila_Apr-2013
Business transformation with_bpm_Manila_Apr-2013Business transformation with_bpm_Manila_Apr-2013
Business transformation with_bpm_Manila_Apr-2013
 
Cvg executive-summit-2018-rpa-plus-video
Cvg executive-summit-2018-rpa-plus-videoCvg executive-summit-2018-rpa-plus-video
Cvg executive-summit-2018-rpa-plus-video
 
APM Talk
APM TalkAPM Talk
APM Talk
 
IBM’s Smarter Process Reinvent Business
IBM’s Smarter Process Reinvent BusinessIBM’s Smarter Process Reinvent Business
IBM’s Smarter Process Reinvent Business
 
journey to always-on
journey to always-onjourney to always-on
journey to always-on
 

Mais de David Leip

Dance Music Favourites
Dance Music FavouritesDance Music Favourites
Dance Music FavouritesDavid Leip
 
Software Development in the Brave New world
Software Development in the Brave New worldSoftware Development in the Brave New world
Software Development in the Brave New worldDavid Leip
 
OpenID: An Executive Briefing
OpenID: An Executive BriefingOpenID: An Executive Briefing
OpenID: An Executive BriefingDavid Leip
 
A Business Person's Introduction to Second Life
A Business Person's Introduction to Second LifeA Business Person's Introduction to Second Life
A Business Person's Introduction to Second LifeDavid Leip
 
Agile Software Development Meets Corporate Deployment Procedures: Stretching ...
Agile Software Development Meets Corporate Deployment Procedures: Stretching ...Agile Software Development Meets Corporate Deployment Procedures: Stretching ...
Agile Software Development Meets Corporate Deployment Procedures: Stretching ...David Leip
 
One XP Experience: Introducing Agile (XP) Software Development into a Culture...
One XP Experience: Introducing Agile (XP) Software Development into a Culture...One XP Experience: Introducing Agile (XP) Software Development into a Culture...
One XP Experience: Introducing Agile (XP) Software Development into a Culture...David Leip
 
Extended Reach: An Efficient Content Management Technique for Sharing and Loc...
Extended Reach: An Efficient Content Management Technique for Sharing and Loc...Extended Reach: An Efficient Content Management Technique for Sharing and Loc...
Extended Reach: An Efficient Content Management Technique for Sharing and Loc...David Leip
 

Mais de David Leip (7)

Dance Music Favourites
Dance Music FavouritesDance Music Favourites
Dance Music Favourites
 
Software Development in the Brave New world
Software Development in the Brave New worldSoftware Development in the Brave New world
Software Development in the Brave New world
 
OpenID: An Executive Briefing
OpenID: An Executive BriefingOpenID: An Executive Briefing
OpenID: An Executive Briefing
 
A Business Person's Introduction to Second Life
A Business Person's Introduction to Second LifeA Business Person's Introduction to Second Life
A Business Person's Introduction to Second Life
 
Agile Software Development Meets Corporate Deployment Procedures: Stretching ...
Agile Software Development Meets Corporate Deployment Procedures: Stretching ...Agile Software Development Meets Corporate Deployment Procedures: Stretching ...
Agile Software Development Meets Corporate Deployment Procedures: Stretching ...
 
One XP Experience: Introducing Agile (XP) Software Development into a Culture...
One XP Experience: Introducing Agile (XP) Software Development into a Culture...One XP Experience: Introducing Agile (XP) Software Development into a Culture...
One XP Experience: Introducing Agile (XP) Software Development into a Culture...
 
Extended Reach: An Efficient Content Management Technique for Sharing and Loc...
Extended Reach: An Efficient Content Management Technique for Sharing and Loc...Extended Reach: An Efficient Content Management Technique for Sharing and Loc...
Extended Reach: An Efficient Content Management Technique for Sharing and Loc...
 

Último

Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876dlhescort
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...lizamodels9
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...lizamodels9
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noidadlhescort
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 MonthsIndeedSEO
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPanhandleOilandGas
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfAdmir Softic
 
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceDamini Dixit
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLkapoorjyoti4444
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture conceptP&CO
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...amitlee9823
 
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceMalegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceDamini Dixit
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperityhemanthkumar470700
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 
JAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
JAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLJAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
JAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLkapoorjyoti4444
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwaitdaisycvs
 

Último (20)

unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation Final
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceMalegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperity
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
JAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
JAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLJAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
JAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 

Information Technology Quality of Service Metrics at ibm.com

  • 1. Information Technology Quality of Service Metrics at ibm.com October 2003 IBM Technical Report TR-40.0031 Tegan Lee David Leip TeleWeb Operations Program Manager Sr. Manager and Corporate Webmaster © 2003 IBM CORPORATION Page 1 of 11
  • 2. Abstract Information Technology Operations Management within IBM includes the managing of Solution (i.e. infrastructure, application, business process) Performance Metrics. This case study reviews how IBM’s Web organization (ibm.com) performs effective Quality of Service management for web site availability and response time. At ibm.com we find that as our Quality of Service improves, we are rewarded in the marketplace with increased overall customer satisfaction and larger revenue capture on the web. To maximize the benefit of Quality of Service metrics ibm.com focuses its resources on Key Applications and Business Processes, and drives broad improvement through basic Quality of Service management across the entire portfolio of applications. IBM sets targets for high availability and response time based on best of breed benchmarking. To deliver on the response time and availability requirements, ibm.com uses an exception alerting system to generate immediate attention to issues. To proactively deliver on high Quality of Service metrics the management reviews and identifies actions in the framework of a standing calendar of Quality of Service reviews. Introduction IBM has undergone a major financial, competitive, and cultural transformation since 1993. The Business Transformation Management System (BTMS) is a component of that transformation, and is used by IBM worldwide to identify, develop, and deploy IBM information technology and infrastructure. BTMS provides Operational Management guidance related to Solution Performance Management. Solution Performance Management Metrics include reporting on web traffic, customer events (such as quantity of orders), customer satisfaction, and availability and response time. In the area of availability and response time metrics ibm.com has exceeded the base guidance, and has been a pioneer in extending standards. ibm.com is the organization within IBM that develops, deploys and manages IBM’s web presence. This organization controls the ibm.com Internet domain, provides web sites for Commerce and stakeholder (i.e. customers, investors, the press and potential employees) support, and provides guidance to all external IBM web sites. This paper reflects the experiences and lessons learned by ibm.com in managing the Quality of Service metrics, availability and response time, for IBM’s web presence. © 2003 IBM CORPORATION Page 2 of 11
  • 3. Marketplace impact of Availability and Response Time Service availability and response time expectations are two basic Quality of Service metrics an institution needs to achieve to maintain satisfied constituents. For example if there are two gas stations close to your home, and one is open more hours and the wait time to be served is significantly shorter, over time you are likely to use the more available gas station, and possibly switch forever. We at ibm.com know our customers rely on the web to learn about our goods and services, shop, buy and effectively use the goods and services. The ibm.com retail segment presents the most significant customer retention challenges to IBM. Retail customer sites are sticky, which is to say someone keeps going back or sticks to the same web site as long as it satisfies their need. If the retail web site is unavailable, the customer will switch to a new site if their needs are immediate. Once they switch sites, they may not switch again until the competitor site fails to satisfy their needs. Customer Sat trend compared to site ibm.com Retail 10 commerce site survey 9 data shows a strong Average Daily Score correlation between 8 outages and decreased 7 customer 6 satisfaction, web site age 5 age satisfaction, and e in Out prim t e shif in Out prim t shif likelihood to do 4 Day 1 Day 3 Day 5 Day 7 Day 9 Day 11 Day 13 Day 15 business with IBM. Overall Sat POS Web Site Performance Sat Likelihood to buy again POS ibm.com has found web site response time can influence Customer Satisfaction, web site satisfaction and likelihood to do business with IBM. ibm.com has found that improvements in site response time by 20-30% produce modest increases in overall customer satisfaction of 5-10%. Significant increases of 20-50% in response time result in a decline of customer satisfaction of 10-15%. ibm.com’s conclusion is response time alone does not drive significant increases in customer satisfaction; yet substantial increases in response time can drive customers away. © 2003 IBM CORPORATION Page 3 of 11
  • 4. Key Applications and Business Processes Key Applications or Business Processes ibm.com Key Applications as % of Portfolio are designations at ibm.com that mandate a minimal level of system availability, system response time, and metric reporting. Key Applications or Business Processes meet one or more of the following criteria: Key Applications Applications 1. Used by external IBM Customers 2. Quantity of revenue or order capture volume 3. Web Site or Event influencing company image in major way a. Investor webcasts or hosting of a major sports event website are extremely visible events that can define the IBM image to an influential segment of the IBM stakeholders and customers. b. In general we use quantity of expected site visits to ascertain the Web Site or Event is critical. ibm.com measures web site traffic using Surfaid™. 4. Quantity of visits by entitled customers a. ibm.com customers have executed contracts with IBM for specific customer functions (like technical support) that have implied service level objectives 5. Alternative processing cost exposure a. IBM has found certain functions related to product delivery, like order status, when unavailable generate a deluge of alternative contacts into IBM that are dealt with in a less cost effective manner b. In high volume, low margin item handling an order processed by other then the web is prohibitive as alternative order processing costs reduce profit margins Setting Availability Targets The service level expectations, in commercial and non-profit operations, historically started out with the hours the physical facility was open for business. For commercial enterprises the hours of operation, when not regulated by law, became a differentiator among firms. A web site, or any technology that enables access to a commercial organization (i.e. automated teller machines for banks), raises customer expectations that these institutions are always available to process a request with prompt response time. IBM strives to achieve the highest possible availability, tempered by the costs for supporting the infrastructure or application architecture. The ibm.com availability standard is 99.5% for the underlying web infrastructure, excluding the specific system maintenance time requirements. On top of that infrastructure availability standard, we have put in place specific Key Application or Business Process availability requirements. An example is that for the Key Application www.ibm.com homepage we have set a 99.95% availability target (about four hours per year), which we have exceeded for the © 2003 IBM CORPORATION Page 4 of 11
  • 5. prior two years, as IBM has had no measurable outages. Other applications, such as Commerce, have 99.5% availability targets that we still have room for improvement in our attainment. (See Figure 1Sample Availability and Response Time Report on Page 21). Setting Response Time Targets IBM Response time standards evolved Performance Standards from IBM internally defined timings to targets based on competitive intelligence. The initial standard was based on the ibm.com assessment of what the web Competitive site could achieve. The revolution in Timings Based thinking was the transition to true Corporate Wide marketplace measures of acceptable Application Generic Specific Transactions response time, as defined during Standards benchmarking. Evolution over Time Annually, for Key Applications or Key Business Processes, we set target response time targets by benchmarking our competitors’ sites1. The competitors chosen by the business teams are those doing well through the web channel. We do this benchmark on a geographic basis, so that we ensure we meet the challenge in each of the geographies we serve. With the benchmark data of similar competitor sites, we set response time standards that are at or lower then our competitor’s response time. The output of this competitive exercise can be somewhat sobering to the technology and business staff, as it sets targets based on what you need to achieve in the marketplace, and not just setting targets based on what you can achieve. Alerts and underlying monitoring At IBM we strive to identify performance or availability service delivery issues prior to customer impact. We have alerts generated for infrastructure component issues, application availability issues, or errors in business process monitors. We analyze all alerts via automated analysis or staff intervention. In cases where ibm.com has redundant resources to handle the customer requests, often there is no visible customer impact. For infrastructure components the costs to setup monitoring and act upon alerts are built into the service delivery rates, and are not considered discretionary. Discrete web processes or web link monitoring setup alert handling and reporting, is done for Key Applications or Business Processes. The alert handling and reporting costs can be significant, if it requires staff to review the information and initiate corrective actions. For infrastructure components we have either available staff or a page-out procedure to initiate complex problem analysis and correction. For Key Applications, or applications that have unique interim requirements, there is 24X7X365 hour staff coverage to respond 1 ibm.com reviews with our Legal staff to validate our perception of publicly available data, versus unethical competitive practices. © 2003 IBM CORPORATION Page 5 of 11
  • 6. to alerts. The staff response to alerts is: verification, problem definition, and then initiation of corrective action. ibm.com does monitoring at two levels: operational level and user perspective. Operations level monitors individual technology components. User perspective monitoring includes business scenario validation. Alerts issued for processing errors are returned from either the operational or end-to-end monitoring staff. Operational alerts are generated upon a change of status or a lack of response. End-to-end monitoring generates alerts due to a failure to respond prior to the timeout value2 or the response does not match the expected content anticipated. Operations level monitoring techniques include: • Enabling Tivoli monitoring to alert any time equipment, previously present and operational, does not respond. • Running System Resource Monitoring for servers to check on CPU Usage, Run Queue for AIX, Memory and Storage Capacity used; I/O Wait and Paging. • Validating DB2 Tables are at acceptable capacities • Measuring Network usage versus committed capacity. • Using IP Pings . With bi-directional probing the failing component (i.e. Firewall, network node, virtual private network link) in the network path can be identified. • Issuing HTTP head requests to specific servers that confirm a web server is running and responding to requests. This type of request is “light” and with minimal impact on capacity provides a significant measurement of server health. End-to-end (user perspective) monitoring techniques include: • Initiating XML requests for key common services (i.e. authentication) to ensure the directory is responsive. • Running a simple routine to ensure initial page load responds to a browser • Processing a fully scripted business scenario with business response validation for Key Applications or Business Processes. Periodically our staff performs site monitoring and validation directly. The most likely reason for manual site verification is that a significant upcoming event requires additional focus. For example, we at IBM do marketing campaigns that are intended to generate significant interest in our products and services. We may have extra monitoring of specific sites or perform specific functional verification to validate that we will reap the maximum benefit from the marketing activity. Another example is key demonstrations for selected customers. We offer custom web sites for our largest customers. We do manual monitoring of those sites during key customer demonstrations to ensure we effectively support our marketing efforts. Monitor as established from different points of presence depending on the information needed for operational management. We place monitors on both our internal network and external network points of presence. Monitors on our internal networks are used for 2 For web transactions we set the timeout value to 45 seconds, unless the nature of the request requires a shorter or longer time. © 2003 IBM CORPORATION Page 6 of 11
  • 7. technology level monitoring, for business process measurement for Key Applications heavily used by IBM staff, and for providing a quick method to validate that the problem is external to ibm.com. For web sites where the majority of Probing Application from Europe users are coming in via the Internet, we monitor from external network points of presence throughout the geographical area that serves our stakeholders. The chart on the left illustrates Points of Presence in Europe and Africa used for monitoring an application hosted in North America. ibm.com uses commercially available IBM Global Services, IBM software products, and external service providers for monitoring. At ibm.com we are evaluating Client Perceived Response Time tools to collect the performance information for Key Applications or Business Processes. Client Perceived Response Time (CPRT) is a technology for accurately measuring the customer experience of a WWW service by instrumenting web pages with executables which send back to a collection point the response time data. While this technology may provide us a new basis to collect the data, we are too early in our evaluation to comment on the implications to our management system. At ibm.com we put great value on monitoring tools that provide real time, or close to real time, reports that aid in operational issue identification. The real time business probe has become fundamental in both immediate and historical problem analysis. The historical data from monitoring tools can pinpoint when a change in the response time or availability arose. Often the response time change in our sites have been related to content, and by identifying the date and time we can narrow the review of changes. ibm.com End-to-End probes are excluded from business usage metrics. The need to eliminate the monitoring traffic is so that ibm.com can ascertain true customer usage trends and directions. We periodically assess the volume of our end-to-end probes to make sure we are optimizing our activity across the IBM company. An example is we have unique portals for each of our large customers that are customized to their needs to learn, shop, buy and use IBM goods and services. We found multiple IBM brands (i.e. Software, Server, Learning Services, Sales and Distribution) groups were doing Business Scenario probing for their unique Quality of Service metrics. In one case we found approximately 30% of the portal web hits were monitoring transactions. Upon understanding this statistic, we consolidated the number of probes scenarios, yielding both system resource efficiency in dealing with customer requests and reduced internal reporting costs once the shared reporting was put in place. © 2003 IBM CORPORATION Page 7 of 11
  • 8. Availability and Response Time Reviews ibm.com performance and availability reports are based off data generated by technology monitors or business process probes, tempered by qualitative analysis. We use raw technology monitor data to represent the Quality of Service metrics of availability and response time for most applications. For our Key Applications or Business Processes we refine this data with verified outage data and quality of system usability criteria. Figure 1Sample Availability and Response Time Report 13-Month Availability and Met or Exceeded Green Degraded Service Miss Amber Missed Target Red Response Summary New Application ][ SLO* YTD Application Aug Sept Oct Nov Dec Jan Feb Mar Apr May June July Aug % 2003 Portal - AP SP 99.5 99.5 100.0 99.5 99.0 100.0 99.9 99.9 99.6 100.0 100.0 100.0 100.0 100.0 99.9 4.0 Portal - NA SP 99.5 99.1 100.0 99.9 97.0 100.0 100.0 99.8 98.9 99.5 99.8 99.2 100.0 99.7 99.6 4.0 Portal - Japan SP 99.5 98.2 99.7 98.4 100.0 100.0 100.0 100.0 97.7 99.8 100.0 100.0 100.0 100.0 99.7 3.6 Portal - EMEA SP 99.5 98.6 100.0 99.8 97.9 99.9 100.0 100.0 97.7 99.4 100.0 100.0 100.0 99.7 99.6 4.0 SLO YTD Application Aug Sept Oct Nov Dec Jan Feb Mar Apr May June July Aug Seconds 2003 Portal - AP SP 105.8 70.9 71.1 68.9 65.7 66.0 70.6 85.8 74.6 64.2 62.1 68.4 64.7 64.3 69.3 4.0 Portal - NA SP 117.8 49.0 53.3 41.7 38.3 36.3 40.7 38.7 40.3 41.5 36.5 41.2 48.3 47.6 42.0 4.0 Portal - Japan 90 30.1 32.8 32.2 30.7 31.7 34.5 27.9 33.2 35.3 33.7 35.5 38.9 40.5 34.9 SP 3.6 Portal - EMEA 125 64.3 65.4 60.1 57.0 52.9 52.1 52.3 56.3 55.9 50.0 53.3 53.5 53.2 53.3 SP 4.0 x Availability Summary Immediate verification upon an alert allows a more accurate representation to the business if the web site is failing or if there is a potential monitoring issue. We have found that technology monitors or business process probes themselves may fail in isolated cases To guard against these alerts detracting our attention from hard outages, we have a rule that two failures must occur in a row for this to be a verified outage. Secondly, for Key Applications or Business Processes, staffs with scripts verify any reported failure. This is to guard against situations when there is no perceived customer impact. For example probes that fail due to content changes that are acceptable to a business user but fails automated validation is different problem then the site failing. In the case where the probes are failing, but the business script with a human execution is working, we do not count it as a system outage. The best analogy for this is a doctor may re-run a test if the first results are not consistent, and then possibly do further exploratory work before reporting to a patient definitively they have a critical health issue. © 2003 IBM CORPORATION Page 8 of 11
  • 9. Quality of system usability criteria include judgments as to web site usability; tolerances for intermittent failures, and thresholds for response times. We have had situations where the technology monitors and business process probes work fine, however the site is unusable for the target audience. In one embarrassing episode, the content for the United Kingdom site was loaded with that for another country. The web site was responding, and business verification of the monitors was satisfied; however the customers were not getting relevant information. This warranted the reporting of an outage, although it was the content, and not the technology, that failed. At times there can be intermittent failures that with effort customers can overcome. ibm.com defines an outage when 40% or more of the probe firings within a time period fail due to timeout. ibm.com also investigates significant (>50%) fluctuations in response time to identify potential availability issues. Many of our Key Applications have dynamic content. We have had cases where system response time is dramatically higher or lower then historical levels. This may point to content or functions were so significantly revised, that we need to either recalibrate our monitors or address a production problem. Quality of Service reports are reviewed daily, weekly and monthly with different objectives. The daily operational reviews attendees include the application maintenance staff, service delivery staff and business owner. The response time and availability data is then compiled into a weekly report for review with the management of the application maintenance, service delivery, and business owner organizations. A monthly compilation is then reviewed with the Executive management of the application maintenance, service delivery, business transformation and business owner organizations. At each review the observations of trends will be discussed, and recommendations to resolve the problems will be refined. Below is a table that indicates the content of the reports by review cycle: Daily Weekly Monthly Availability Current Day + Prior Week + 13 Prior Month + 13 Months other Weekdays Weeks trend chart Trend Chart Performance Current Day + Prior Week + 13 Prior Month + 13 Months other Weekdays Weeks trend chart Trend Chart Root Cause Prior Day Issue Open RCA’s and RCA’s approved for Analysis (RCA) requests for closure closure Current Month Current Month Outage Outage Analysis by Analysis by Root Cause, Root Cause with 13 month trend. In our quality of service reporting we have 24X7 system availability and response time numbers, Root Cause Analyses for outages. The 24X7 system availability statistics allow us to continuously highlight the need to always be available, and to eschew system maintenance windows as much as practical. For select Key Applications or Business Processes, reports for each Geography during core business hours in the local time zone, are produced. Most IBM sites are worldwide, so we probe from worldwide points of presence and generate reports for availability and response time in the local time zone, so the business can understand the customer © 2003 IBM CORPORATION Page 9 of 11
  • 10. experiences by Geography. The underlying data for the reporting is maintained in Greenwich Mean Time (GMT), such that combined reporting worldwide and cross application portfolio can easily be accomplished. The Root Cause Analysis is provided not only for discrete outages, but we also include trending analysis by category. The categories for the Root Cause Analysis trending are: Application Package or Customization, Hardware Failure, System Software (OS, Middleware), Network, Application Maintenance Process, Service Delivery Management Process, and Business Owner (i.e. Site Content). Web Attributable Outage Impact Outage Impact For Key Applications or Business 80 Processes ibm.com has established 70 60 quantitative estimates of jeopardized Thousands 50 USD $ 40 revenue and alternate process impact 30 during an outage. For any given hour in 20 10 the typical week we estimate alternate 0 App 1 App 2 App 3 processing volumes as an incremental Alternate Processing Costs Revenue cost, and the typical order volume as the revenue at risk. IBM Customers who cannot access us via the Web may call our staff or may elect to order from a competitor whose web site store is open. This quantitative analysis has been used to stress to everyone involved in the delivery of the web experience the immediate cost impact of an outage at any time, and to articulate the impact of outages we experience. These measurements have been powerful tools to justify the incremental technology investments to get to that next level of availability. Conclusions ibm.com has made Quality of Service availability and response time metrics part of the management of our business. The active management to improve these statistics has yielded metrics with a high degree of credibility. These credible metrics are then analyzed to ascertain tactical and strategic action items, with the ultimate goal of improving our web sites value proposition for our customers and stockholders. Acknowledgements The authors would like to acknowledge the contributions of for review of the transcripts, and suggestions for improvement. © 2003 IBM CORPORATION Page 10 of 11
  • 11. References About the Authors David Leip is IBM’s corporate webmaster, with direct technical responsibility for IBM’s corporate portal which spans 83 countries on the wired and wireless web. Prior to becoming the corporate webmaster in 1999, David worked for IBM’s CIO office as program manager of web enablement. Earlier he worked in IBM Software Development Lab in Toronto. David has an MSc in Computing & Information Science from the University of Guelph. His personal web site can be found at: http://www.Leip.ca/ Tegan Lee is a lead for Information Technology Operations Management in the ibm.com unit. Tegan from 1995-2001 was the Project Executive for an IBM provided home banking web platform that at its peak had over 3,000,000 subscribers. Tegan has over 25 years of Information Technology experience in developing, deploying and operating business application computing platforms. Tegan has an MBA in Marketing and Finance from Pace University, and a BBA in Statistics from Baruch College. © 2003 IBM CORPORATION Page 11 of 11