Estimating test effort part 2 of 2

Ian McDonald
Techniques for Estimating within Testing

Part 2 of 2 July 2010 v1

January 2013 v3

© 2010 Ian McDonald - First created 2010

Estimating Risk for Functionality
1) Count Defects raised against requirement features (or Widget / Page
functionality). The higher the number of defects the greater the risk (or the
greater the need to do regression testing). HOWEVER this has to be read
against other aspects, for example more defects might simply relate to improved
test coverage, in which case one may want to do the reverse.
2) Investigate code coverage – using a code coverage tool (line/statement and
decision/block) – or look at tests linked to requirements. Areas of low coverage
will need additional tests and will need to ensure more regression testing.
3) Investigate Risk Score, how likely an error is to occur – based upon code
Cyclometric complexity, or simply ask Dev to score how difficult to code (High,
Medium, Low or 3, 2, 1). Multiply the score by the impact upon the customer if
this failed (get from requirements team as a similar score). The product total will
flag areas (high score) that will need to be included in any regression pack.
4) If code exists and tooling is present to display Halstead Metrics, then it may be
possible to:
1) Extract the Cyclometric Complexity (C). Values of C above 10 present greater risk.
So risk can be updated and reassessed as code is delivered.
2) Calculate the Bug level (B). This indicates the number of bugs expected in code, based
upon proven numeric analysis. While it can underestimate for C/C++ code (unlike
Java), it can still help to contrast areas to identify greater risk.

Techniques for Deciding Sufficient Testing Done

Tests done can be a measure of completeness. However
if the test coverage is poor, then even trebling the number
of tests – stepping over the same lines of code will not
remove the risk from that part of the code that remains
untested.
Hence a range of techniques need to be used to define
when to stop testing.
This however will need to be outlined within the Test
Policy, so the criteria for “Good Enough” is understood
and accepted. With this acceptance comes an acceptable
level of risk linked to the “Good Enough” decision.

Sufficient Testing Overview

This section looks at the techniques to decide if sufficient testing is
carried out. It provides a way of assessing risk if the product is

Overview are not suitable for safety critical systems, or
delivered with no further testing.
The techniques however
where there is high commercial risk and every test case must be
tested.
These are more appropriate for an Agile delivery where Risk is
mitigated in line with a project budget.

02/01/2013 4

Deciding upon Sufficient Testing

 Deciding upon the quality of testing is always going to be difficult. No
matter how much testing you do, there is a risk that a number of defects
will leak through. The decision is therefore not so much about sufficient
testing, but sufficient risk mitigation. The following text is general
advice that has to be taken within the context of the project being
delivered. If a project has a safety element or critical function then more
testing will be required. If there are millions of pounds at stake in a
critical business or banking system then again more testing will be
required. For some customers there may be an urgent need to go live
but that has to be balanced with the companies reputation and future
business if the product fails.
 In looking at testing, the traditional approach is to look at the number of
defects that are found and fixed. When the defects found plateau, then a
decision is taken to test more or not.

Why Estimate Defects
1. Comparing the number of defects expected vs the number found,
gives an indication of the number left to find and so amount of effort
till the end of the project.
2. Defect prediction, which is decremented by the number found each
day or week, can provide valuable insight to the trend towards
product completeness. So a trend line (with a gradient +/- error line)
can give an indication to when all defects are found and so when the
product shall actually be delivered (within a range).

Defect Plateau for the Right Reasons – page 1 of 2

In reaching a plateau of rising and fixed defects does the plateau truly
reflect that all defects have been found? Alternative reasons may be:
 The test cases may not be exercising the code fully. While we may have
a large number of test cases:
 Do our tests penetrate sufficiently within the code?
 Have we considered all appropriate boundary values?
 Have we considered combinations of values and code paths?
 Have we considered negative as well as the positive test cases?
 Have we understood the requirements sufficiently to test all requirements at
each logical statement?
 Were our tests derived from the requirements, or (not to be recommended)
were the tests derived from the code or the programmer (who may have an
incorrect understanding of the code)?
 Have we considered all security and performance issues?

Defect Plateau for the Right Reasons – page 2 of 2

 We may believe that we have sufficient testing and have done
additional exploratory testing, but was that exploratory testing
targeted to minimise risk. That is did we target the testing at our
most vulnerable code?
 How did we decide where that vulnerability was and
 How did we measure the risk level and mitigate the risk?
 It is not uncommon to reach a plateau in rising defect numbers
detected, then uncover a weakness, only to see the number of
defects start to rise again to reach a further plateau. This part of
the presentation is about deciding if the plateau is reached at an
expected level and if it represents a remaining acceptable level of
risk.

How can you estimate the number of defects in code

 Rules of thumb:
 Based upon number of requirements
 Based upon number of lines of code or derivatives leading to that
estimate.
 Passed experience – similar project.

 Semi Quantitative
 Number of Use Cases

 Quantitative Approach
 Halstead Metrics
 Function Point Analysis / Test Point Analysis

Test Estimate based upon Number of Requirements

 Requirements are single logical Statements. If a
requirements calls in a specification, then a judgement
needs to be made if that specification should be
treated as a large list of separate requirements that
need to be added into the sum.
 Assume 1 to 1.5 defects per requirement.
 This gives you a minimum and maximum value for this
approach.
 So 655 requirements will give a range of 655 to 983
defects.

Test Estimate based upon Number of Lines of Code

 So 10% of Lines of code will have errors so Number of lines x 0.1 gives an
estimate for defects. You can refine this by adding the following:
 Typically 10% to 15% of defect fixes will be rejected. In industry this can be as high as
25% to 30%. However not finding a high level of defect failures can mean good coding
or poor testing. So take care. However if you take a 10% margin and a 15% margin as
your lower and top limit this will give a range.
 You can then add your adjustments for 50% reduction at each iteration and so for N
iterations this will give you a max and min total for this estimation approach.
 You can estimate the number of lines of code on the bases of number of developers. So
8 developers working for time T will say produce 5 code modules, each of 50 lines of
code, so have a total of 8 x 5 x 5 = 200 defects. So 3 iterations in total with 50%
reduction at each iteration means we have: 200 + 100 + 50 = 350 defects.
 Now add in 10% or 15% defect fix failure and this gives a range of (350 + 35) to (350 +
53) defects. This gives a range of 385 to 403 defects.
 You will also need to add in an estimate for existing code that is reused. Do not
assume that this will be error free, since interfaces will have changed and
potentially some data handling will have changed.

Test Estimate based upon Similar Project

 A similar Project may have given you a defect count of
say 225 defects. However you estimate that the
difficulty is between 2 and 3 times greater. So an
estimate might be therefore between 450 and 675
defects respectfully, depending upon difficulty.
 Difficulty and Complexity can be simply assessed to
begin with. However once code is written, the difficulty
and complexity can be reassessed quantitatively using
Halstead Metrics.

Test Estimate based upon Previous Phases or Code Drops

 You may decide that you have run a number of code drops and
typically you may expect an average of 10 defects per drop.
 However you may have a larger delivery or a smaller delivery
expected and may decide to scale the expected defect level.
 You may have decided to do a Risk, Likelihood Impact analysis
and decide that there are more defects detected with high risk
functionality. So you may decide to reference the risk analysis to
refine your prediction of defect levels.
 A simple method however is to expect that 10% of tests will find
defects, assuming adequate test analysis.
 The defect level prediction provides an indication of the amount of
re-test that is required. It will however not account for the number
of tests that fail a defect fix. That can be estimated as 5% of all
fixes.

Test Estimate Prediction - Reducing Error

 We added in values for defect fix failure. Was this at the right
level, did we do this in every estimation approach? Was it at an
appropriate level. Should we have perhaps considered 20%?
 Are the developers more or less experienced in this project to the
previous project upon which our core estimate was based? So
the level of complexity might need adjusting.
 Did we count all requirements, perhaps we decided not to include
sub-requirements or technical requirements. Obviously the closer
the methods of prediction align, so the percentage error for
prediction narrows.

Code Coverage - page 1 of 3
 While progress in testing may be thought of in terms of number of test scripts
successfully passed. There are significant risks in this approach:
 It is important to check the value that testing contributes, by checking the amount
of code paths covered by the test set. It may be difficult to check every single
path, however a good code coverage tool can be deployed to monitor code
coverage.
 In monitoring code coverage, remember that 100% coverage by any single code
coverage metric does not mean 100% of the code is covered. For full coverage a
number of different code coverage metrics need to be deployed. 100% of
coverage at the following levels will provide 100% true coverage:
 Branch Combination Condition Coverage
 Linear Code Sequence and Jump
 Declare and Use path Coverage (Du-Path).
 Note it can be extremely difficult to reach 100% coverage, since exception
conditions may be difficult and at times almost impossible to introduce. So
targets need to be realistic. Where testing is not possible, then test statically with
targeted reviews.


Adapted from BCS Sigist working party draft 3.3, dated 28 April 1997, which is the basis of BS7925-2.

 Typically one would use a variety of Coverage metrics and aim to
achieve:
 67% to 80% Line / Statement Coverage
 70% to 90% Decision / Block Coverage
 Remember however that 100% Decision / Block Coverage can be
achieved simply by touching each block. This is not the spirit of
what is required and the effectiveness of block Coverage needs
to be read alongside the code coverage metric.
 Investigate structural analysis to ensure that there are no
unreferenced or discarded code portions present. Structural
Analysis tools can help in spotting defects. e,g. Coverity Prevent.

Iteration Approach
 This works on measuring the time taken to test one iteration.
 We measure the defects open and see the level rise to a
plateau and fall as defects are fixed.
 An assumption is then made that the next iteration will have
fewer defects say 50% less. This is then used to predict the
time to completion.

Problem with Iteration Approach

 The problem with an iteration approach is that it does not take account of
unfound defects. If a plateau was reached due to poor testing or low
coverage or even not exercising a specific function, then the first plateau
should have been higher and so the estimate for the second iteration of
testing needs to be adjusted to take into account the unfound defects.
As a safeguard for a good, experienced test team, assume a further 10%
of defects went undiscovered.

Defect Tracking Tool – Graphical Output

 Defect tracking
tools and task
management tools
like Jira (with
Grasshopper) can
produce graphical
representation of
defect records,
which can help to
indicate time
expectations. But
adjustment is still
needed as
previously
discussed.

Halstead Metrics
 This is a method that is based upon tried and well proven (since 1970’s)
quantitative approaches. It is however based on a way of refining an estimate,
based upon qualities within code that can be physically measured. For defect
estimation it will for C and C++. under estimate slightly (assume 20%
underestimate as general guidance). However it will provide guidance and help to
get a ball park figure. Other languages provide good levels of indication.
 The main measures made are:
 Number of unique Operators n1 used in code (+, -, =, reserved, etc)
 Number of Unique Operands n2 used in code (ID, Type, Count, etc)
 Total number of Operators (N1)
 Total number of Operands (N2)

 NOTE: Code Coverage and Static Analysis Tools routinely provide values for
Halstead Metrics, quite often the values just need to be extracted from the tooling
deployed. It is simply knowing that the facility is there and if the value cannot be
extracted in one step, knowing how to create the value from the root metrics.

Halstead Calculations

 Halstead Programme Length N = N1 + N2
 Where:
 Total number of Operators = N1

 Total number of Operands = N2

 Halstead Programme Vocabulary n = n1 + n2
 Where
 Number of unique Operators n1 used in code
(+, -, =, reserved, etc)
 Number of Unique Operands n2 used in code
(ID, Type, Count, etc)

Halstead Code Difficulty & Effort

 Halstead Code Difficulty D = [(n1)/2] x [(N2)/(n2)]
 Where
 Number of unique Operators = n1
 Number of Unique Operands = n2
 Total number of Operands = N2

 Halstead Code Effort E = V x D
 Where
 V = Volume
 D = Difficulty

Halstead Code Volume

 Halstead Code Volume V = N x log2n
 Where
 N = Programme Length
 n = Programme Vocabulary

Halstead Delivered Defects

 Delivered Bugs B = [(E2/3)/3000]
 Where
 Effort = E

 The delivered Bugs is the total number of
defects that would be reasonably expected to
be delivered within a measured Volume of
code of measured Difficulty and taking a
measured amount of Effort to recreate.

Cyclometric Complexity
 Cyclometric Complexity (C) is a way of measuring the
level of risk associated with code, through the number
of code paths and so the need to run more test
scenarios.
 Cyclometric Complexity can also be used to compare
code modules and can help to indicate the areas of
code that perhaps need more review effort, more
testing and greater attention to general risk.
 As a general rule a value for C of 10 or above has
higher risk for a code module.

Function Point Analysis

 A further tool that can be used for
estimation is Function Point (FP)
Analysis. This is a formula based
approach that takes into account,
language for development, number of
requirements, inputs, outputs,
experience of the development and test
team. Explanation of FP is out of scope
of this slide set.

System Integration Testing

 Your application will need to sit on a
system with other software. This will
include other applications, management
systems (e.g. enterprise monitoring) and
monitoring software such as security
protective monitoring. All interface will
need to be tested.
 This assumes that the other products are
supplied by 3rd parties as fully tested.

Your target risks
 In SIT testing you need to establish which risks you
are testing to protect. Typically you will need to:
 Spend focused time on protective monitoring interface tests.
 Spend time checking enterprise monitoring.
 Test any PKI certificates, revoke certificates, invalid
certificates, etc. So you need to budget for the certificates
and have them ready.
 Test application interfaces.
 Test the creation of users and licences.
 Test system access, valid and invalid.
 Ensure you can test performance. Note covered in another
slide set.

Estimating Testing for Protective Monitoring Testing

Large systems will have software that will monitor the
system for unauthorised activity. This monitoring also
needs to be tested.

Overview

Testing of Protective Monitoring (PM) is relatively new. There is no
reliable guidance as yet for estimating testing.

Overview
The level of PM testing needs to be defined within the Test Policy for
the project.

02/01/2013 31

Learning

 For a team with no previous PM testing
experience allow 2 weeks learning time
and provide budgeted training and
support for that period.

Overview of PM Estimating
 The best baseline for estimate would be to take the CESG Good
Practice Guide 13 (GPG13) as the standard for implementation of
the solution.
 There's 12 protective monitoring controls (PMC’s) that would
require testing , but the breakdown into test cases is dependant
on the system architecture breakdown of boundary definitions,
the level of alerting defined, the verbosity of logging and how the
overall design of the security model influences the system
implementation.
 GPG13 is a guide and as such is open to interpretation, however
one should at least plan for the following:

Target Protective Monitoring Controls

 PMC 1 - Accurate time in logs
 PMC 2 - Recording of business traffic crossing a boundary
 PMC 3 - Recording relating to suspicious activity at the boundary
 PMC 4 - Recording on internal workstation, server or device
 PMC 5 - Recording relating to suspicious internal network activity
 PMC 6 - Recording relating to network connections
 PMC 7 - Recording on session activity by user and workstation
 PMC 8 - Recording on data backup status
 PMC 9 - Alerting critical events
 PMC 10 - Reporting on the status of the audit system
 PMC 11 - Production of sanitised and statistical management reports
 PMC 12 - Providing a legal framework for Protective Monitoring activities

Information over syslog or to windows event logs

 Note: If the software produced is actually
writing information over syslog or to the
windows event logs then each of the
entries it is writing would need to be
checked against the PMC it should be
related to.
 Standard operating system events need
to be tested.

Totals for Scripts
 Research time to identify events and scripts – 10 days typically.
 Number of scripts required =
number of events written to logs by software +
reported hardware events,
(typically 140)
 The time to write a PM test plan =
Having defined the detailed scripts…
around 3 to 5 days of effort minimum, (typically 70 pages). Plus review
time (typically a full day)
The time to write a test script is as above x 5 minutes per script.
 The time to execute the tests is as above x 15 minutes per script.

Mobile Testing

 This section looks at testing mobile and
personal digital assistants (PDA).

Deciding your target test set

 You will need to consider which
operating systems you plan to test,
which devices and if you are going to
test on real devices, hardware emulators
or software emulators.

Emulation vs Real Device
 Real devices will find defects that cannot be
found on emulated devices.
 Emulators will also be sensitive to defects that
may not be so simple to find on a real device.
 Emulation can be quick to set up, while testing
on real devices can be slow.
 It may be worth considering cloud solutions to
obtaining remote access to a wide range of
devices.
 Some emulators may not be available for new
devices.

Which devices
 Create a table of the operating systems you want to test.
 Create a list of the most popular recent devices, devices out there
in the world and the recent releases.
 Ask your marketing department to help create the list, or check on
the web and with phone providers.
 Decide which devices offer best value for targeting your primary
tests. Then decide which devices you will run a subset of test on.
 You might decide to run most tests on emulation, however
consider memory, CPU and other hardware related issues that
may impact a test when using real devices. Failure to delete an
item correctly might cause memory to fill up rapidly, so single
tests may not be appropriate. Try to stretch the infrastructure of
the device.

PDA Decisions

 Having chosen your approach for mobile
access, you will need to estimate the
number of tests, regression runs and any
time to set up the infrastructure.
 If you are operating within a secure
environment you will need to talk to your
IT security adviser in setting up
appropriate access for testing. This will
all need to be budgeted.

Acceptance Testing
 You are likely to go through a number of
phases for acceptance.
 Typically this may include:
 Testing of application wire frames.
 Factory Acceptance testing (FAT) of the
Application.
 System Acceptance Testing (SAT) of the integrated
system (tested as SIT).
 User Acceptance Testing (UAT)
 Operational Acceptance Testing (OAT) showing
operational readiness, disaster recovery, etc.

FAT
 FAT will have an input criteria reflected by the
functionality delivered and the defects outstanding with
permitted limits for each severity type.
 FAT is to demonstrate the functionality of the
developed application against functional requirements.
It might also include some early verification of non-
functional readiness.
 FAT will probably use end to end test cases that map
to functional requirements.
 If using delivery iterations, there may be a FAT for
each iteration or group of iterations.

UAT

 User Acceptance Testing is usually
planned and carried out by the customer
or if an internal project a group of users
acting as the customer.
 If you need to plan and carry out UAT,
you might be able to save time by using
end to end functional tests as the bases
for UAT planning.

SAT
 System Acceptance Testing focuses on the
application functionality within the system,
demonstrating that the application interacts
with the system as a whole. This will include
evidence of any monitoring software and
interfaces.
 Penetration testing and Load and Performance
Testing may be part of the SAT results.
 Some SAT activities might be moved to OAT
and some OAT activities might get moved to
SAT.

OAT

 Operational or Readiness Acceptance
Testing focuses on te complete system.
 It will take account of items such as:
 Backup /Restore
 Disaster recovery

 Fault reporting and resolution life cycle

 Day time and after hours support

Items to consider in OAT

 The following should be considered in
OAT and time allowed:
 Each site to be tested.
 Support processes and escalation.
 Issue logging and reporting.
 Environment monitoring.
 Protective monitoring.
 Data backup and recovery.
 Deployment of new builds and recovery.

OAT Planning

 Create flow charts for all your support
processes. Include out of hours as separate
activities. Incorporate within an OAT plan.
 Consider setting up of pagers for out of hours
and delivery and setting up of pagers for
testing.
 Tooling may need setting up for cross
reporting between application support teams
and environment support teams.

Estimating test effort part 2 of 2

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Estimating test effort part 2 of 2

Semelhante a Estimating test effort part 2 of 2 (20)

Mais de Ian McDonald

Mais de Ian McDonald (8)

Último

Último (20)

Estimating test effort part 2 of 2