The document summarizes the findings of a benchmark study that captured best practices in software quality metrics and dashboards from 10 technology companies. Key findings include: (1) there is no standard approach but best practices include automated metrics systems, root cause analysis, and normalization; (2) the best companies measure quality beyond defects to include predictability and customer satisfaction; (3) external benchmarks are used to set goals. Recommendations include focusing on important metrics like time to repair, adopting practices like root cause analysis of critical defects, and using automation and targets to track improvements over time.
1. Software Quality Metrics Benchmark Study
How Software Metrics and Dashboards are Applied in
High Technology Companies
Release Slip Rate Percentage
Root Cause
Analysis
Uses Automated John Carter
Vertical Axis Label
External Metrics
Benchmarks System Best
Rest
TCGen, Inc.
Total
Quality Normalizati Menlo Park, CA
(Predictabili on Benchmark
ty/Features)
Horizontal Axis Label
www.tcgen.com
May 1, 2012
2. Executive Summary from 10 Public Companies
The purpose of the benchmark study was to capture best practices in the application of SW metrics dashboards.
List of participants
Ten technology companies were benchmarked against these questions:
• What metrics on software quality are reported to management? 3 Highly Regulated
• Internal quality metrics, external field detected metrics?
• How are they normalized? Customers in field, LOC?
• What are the most important?
• Are they tabular, graphical? How many? Are target values shown? 7 from Technology
• How frequently are they reported? How many do you report on?
• What are key target values you look at for key metrics?
Key Highlights:
• There is no standard for the number of metrics, type of metrics, nor frequency of reporting Networking
• However there are best practices around Software Quality Metrics – We can look at what separates the
best from the rest
• The BEST have
1. Automated metrics tracking and analysis systems that allow drill down and reporting by product, release, Storage
customer
2. Normalization that ensures that the metrics are meaningful as the number of customers or the complexity of
code increases
3. Root Cause Analysis system that systematically analyzes defects that escape the company and are found in the
field
Computer
4. Quality metrics that go beyond product defects, and include release predictability and feature expectations
5. External benchmarks that are used to set goals (created by third parties to establish databases or perform
surveys)
3 Highly regulated companies
7 Networking/Computer/Storage
SWQA_Metrics_Benchmark_TCGen www.tcgen.com 2
3. How We Approached the Analysis
• The Process Capability Maturity Model (CMM) defines five level of process maturity
– Level 1 (Initial, Chaotic)
– Level 2 (Repeatable)
– Level 3 (Defined)
– Level 4 (Managed, Measured)
– Level 5 (Optimizing)
• Metrics are a key parts of the CMM model, and Level 4 indicates mastery of metrics
• SW metrics are well characterized, and are often divided up between Product Quality
Metrics, In-Process Metrics, and Metrics for SW Maintenance*
• From our survey of ten companies, we have derived a sense of metrics maturity, and have
created our own rating of SW Metrics Maturity using five factors
– Automated, Root Cause Analysis, Normalized, External Benchmarks, and Total Quality (not just
defects)
– The Best tend to have excellent scores on all five dimensions, the rest lag behind in one or more
areas
– The best tend to have measures in the three areas defined above (Product, In-Process, and
Maintenance)
“Best vs. Rest”
* Stephen Kan, “Metrics and Models in Software Quality Engineering”, Addison-Wesley, 2003
SWQA_Metrics_Benchmark_TCGen www.tcgen.com 3
4. Example SW Metrics Maturity
Hypothetical Radar Chart:
1. Automated metrics tracking and A 5 point scale, where
analysis systems that allow drill mastery is indicated as a 5
down and reporting by product, Root Cause (outermost), and absent is
release, customer Analysis a 0 (innermost)
2. Normalization that ensures that
the metrics are meaningful as the
number of customers or the
complexity of code increases
3. Root Cause Analysis system that Automated
systematically analyzes defects that Uses External
escape the company and are found Metrics
Benchmarks
in the field System Best
4. Quality metrics that go beyond
product defects, and include Rest
release predictability and feature
expectations
5. External benchmarks that are used
to set goals (created by third Total Quality
parties to establish databases or
perform surveys) (Predictability/ Normalization
Features)
The nature of the survey did not allow us to complete this chart for each participant, but this
treatment would be very useful to evaluate where you are today and where you should focus
in the future to close gaps between the best and the rest.
SWQA_Metrics_Benchmark_TCGen www.tcgen.com 4
5. Dashboard – Drawn from Benchmarking
Guiding Principles: Each metric should be linked to your overall quality objectives, which were
derived from your overall strategy
From the Benchmark Sample, the goals might be:
• Increasing Net Promoter Score (how highly you are recommended)
• Increasing Release Predictability • There should be
• Increasing Customer Satisfaction between 4-8
• Increasing Reported Quality (Field Quality) metrics
• Reducing time to repair
• Reducing the number of Critical Accounts • Two related
metrics per
Each chart has the following graphical properties: screen
• The charts are composed so that the ‘so what’ is very clear, and repeated for each so • Text describing &
that it is clear to managers that only see them once a quarter, so they know why the
metric is there and if there is any significance to the data, what the significance is. analyzing the
• Targets should be on all graphs data represented
• Where benchmark data exists, it will also be shown on the chart
• Each chart should have the following properties
Title & Description So What Consistent Design
Labeled Axes Target Curves Narrative
SWQA_Metrics_Benchmark_TCGen www.tcgen.com 5
6. Vertical Axis Label
Mean Time to Repair Percent of Release Slips
Vertical Axis Label
Major Release 2
Major Release 3
Benchmark
Horizontal Axis Label Horizontal Axis Label
This chart plots the average time, in weeks, that the This chart plots the percentage of actual versus planned
customers had to wait for resolution. Measured in weekly schedule for major and minor releases.
intervals, data captured per release.
• The target is derived to get to less than 5% slip by 2014,
• The target is derived to get to the fastest resolution (and closing the gap in a straight line, coming down from 22%
reduce the number outstanding) where we are today
• The increase shown in January, 2012 is driven by the A.x • The increase shown in November, 2011is driven by the
release. A.2a release, which had to go through 2 alpha
• The new methods for engineering releases should impact • We expect a steeper drop in July, 2012 because of our
this in 2013 new “Darken the Sky” program to provide requirements
stability
• Benchmarking indicates that the best in class number is a
slip rate of less than 15% (for 9 month release cycles).
7. Best Practices
In benchmarking studies like this, we often see some exemplary practices that demonstrate creative and
effective ways to stay ahead.
1. Use of third party firms to assess where your software defect performance stacks up against the
competition & use of industry standard databases for software quality
2. Test Escapes Analysis Process to perform root cause analysis on all significant escapes to the field
Top 5 3. SW Defects reported on dashboard includes broader measures like predictability, expectations
4. Automated, integrated system for real time metrics analysis and presentation to management is simply
pulling up current data and reviewing it formally
5. Normalization for complexity and or accounts in the field to ensure that proper comparisons are made
6. Create compound metric that pulls together several important factors for the business
Metrics to
7. Institute metrics that show (unit and integration) statement coverage, branch coverage, all tests passing,
Consider
and for functional testing, show requirements coverage and all tests passing
8. Institute metrics that show defect backlog, number of test cases planned, and Upgrade/Update failure
rate, Early Return Index, Fault Slip Through
9. Bug tool kit that goes to the field with exhaustive and searchable data to help customers avoid reporting
defects, learn about workarounds, and search with Google like strength
10. If external benchmark targets are not known, track improvement release over release
Other
Tips 11. Focus on what is important. One participant only tracks release predictability and customer satisfaction
12. Use parametric estimation metrics – for example 4 days for a test case to ensure high quality, data driven
schedule estimates (also helps demonstrate improvements over time)
SWQA_Metrics_Benchmark_TCGen www.tcgen.com 7
8. Summary Statistics
Key Highlights:
• 8 do report customer found defects to management (remaining 2
report customer sat at a high level)
• 6 report on the order of 4 metrics to management, the remaining 4
report more or less
• 5 include time to market as a metric in their quality dashboard
• 4 report escapes or customer found defects caused by bad fixes
• 4 companies have real time visibility of metrics, and they are
automatically updated on a daily basis
• 3 companies reported on compound metrics that combine reliability,
availability, time to fix
• 3 do not use targets for metrics reported to management, but only
report the improvement release to release
• 3 normalize metrics (LOC on inside, or Units in Field on outside)
SWQA_Metrics_Benchmark_TCGen www.tcgen.com 8
9. Implications
• Root cause analysis should be performed on defects from the field that are either
critical or from regressions
– Many companies have special processes for doing this effectively
• It appears that some participants have higher levels of automation and coverage
for both unit, integration, and functional test
– And it is measured
• Planning metrics, such as the number of days per test case should be used for
prediction and improvement
• If you are growing, some normalization should be used.
– It should be coarse (like judged Lines of Code, converted from Function Points)
• Walker Survey, Quest Database, and Manager-Tools.com are three recommended
vendors for metrics and management
– Walker Survey can determine how you stack up against your competitors regarding quality and
satisfaction
– Quest is a TL 9000 database
– Manager-Tools are helpful for developing QA managers
• Where absolute targets don’t exist, a target curve based on prior improvement
should be used to answer ‘are we getting better?’
SWQA_Metrics_Benchmark_TCGen www.tcgen.com 9