The document discusses best practices for using static code analysis tools to maximize their effectiveness. It recommends: 1) Marking false positives to reduce future messages, 2) Using incremental analysis to check modified files, 3) Checking files modified in the last few days, and 4) Running analysis nightly on a build server. Following all recommendations provides the highest return on investment in static analysis by catching errors earlier in development.
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Static analysis is most efficient when being used regularly. We'll tell you why...
1. Static analysis is most efficient when
being used regularly. We'll tell you why...
Author: Evgeniy Ryzhkov
Date: 22.04.2013
Some of our users run static analysis only occasionally. They find new errors in their code and, feeling
glad about this, willingly renew PVS-Studio licenses. I should feel glad too, shouldn't I? But I feel sad -
because you get only 10-20% of the tool's efficiency when using it in such a way, while you could obtain
at least 80-90% if you used it otherwise. In this post I will tell you about the most common mistake
among users of static code analysis tools.
Discovering static analysis
Let's at first discuss the simplest scenario which is the most common with those who try the static code
analysis technology for the first time. Some member of a developer team once came across an article, or
a conference lecture, or even an advertisement of some static analyzer and decided to try it on his/her
project. I'm not telling about PVS-Studio in particular - it might be any static analysis tool. Now the
programmer, more or less easily, deploys a static analysis system and launches code analysis. The
following consequences are possible:
• The tool fails to work. It doesn't pick up the project settings, or gathers the environment
parameters incorrectly, or fails in any other way. The user naturally grows less inclined to trust
it.
• The tool performs the check successfully and generates some diagnostic messages. The user
studies them and finds them irrelevant. It doesn't mean that the tool is absolutely poor; it just
has failed to demonstrate its strong points on this particular project. Perhaps it should be given
a chance with another one.
• The tool generates a few relevant messages (among others) which obviously indicate that
genuine bugs are present in the code.
Strictly speaking, it's in the third case, when something real is found, that the team starts using a tool in
practice.
The biggest mistake you can make when adopting static analysis
But one may make a great mistake at this point when integrating static analysis into the development
process: namely, you may accept a practice to run static analysis before every release, for instance. Or
run it once a month. Let's see why such an approach is bad:
First, since you don't use the false positive suppression mechanism, you will see the same old false
positives again and again. And therefore you will have to waste time to investigate them. The more
messages the analyzer generates, the less focused the programmer is.
Second, diagnostic messages are generated even for the code which you didn't touch between the
checks. This means you'll get even more messages to examine.
2. Third, and most important, with such an approach you won't get the static analyzer to find all those
errors you were catching through other methods for so long and with so much sadness between two
checks. This thing is very important, and I want to discuss it in detail. It should be done also because it is
the thing people forget about when estimating the usefulness of static analysis. See the next section.
A fallacy: "Efficiency of static analysis can be estimated by comparing
analysis results for the last year's code base release and the current one"
Some programmers suggest using this method to estimate efficiency of static code analysis. Imagine a
team working on some project for several years. It keeps all the project release versions (1.0, 1.1., 1.2,
etc.). It is suggested that they get the latest version of some code analyzer and run it on the last year's
project source codes - say, version 1.3. Then the same version of the static analyzer should be run on the
latest code base release - let it be version 1.7. After that we get two reports by the analyzer. We study
the first report to find out that the older project contains 50 genuine errors. Then we study the second
report and see that the latest project contains 20 bugs out of those 50 ones (and some new ones, of
course). It means that 50-20 = 30 bugs have been fixed through alternative methods without using the
analyzer. These errors could have been found, for instance, through manual testing, or by users when
working with the release version, or otherwise. We draw a conclusion that the static analyzer could have
helped to quickly detect and fix 30 errors. If this number is pretty large for the project and developers'
time is expensive, we may estimate economic efficiency of purchasing and using the static code
analyzer.
This approach to economic efficiency estimation is absolutely incorrect! You cannot use it to evaluate a
static analyzer! The reason is that you make several errors at once when trying to do that.
First, you don't take into account those bugs which have been added into version 1.4 of the code base
and eliminated in version 1.6. You may argue: "Then we should compare two releases in succession, for
example 1.4 and 1.5!". But it is wrong too, since you don't take account of errors which appeared after
release 1.4 and were fixed before release 1.5.
Second, code base release versions are in themselves already debugged and contain few bugs - unlike
the current version the developers are working on. I believe the release wasn't as buggy and crash-
prone, was it? You surely fix bugs between releases, but you detect them through other methods which
are naturally more expensive.
Here we should remind you of the table demonstrating the dependency of the cost of bug fixes on the
time they were added into the code and detected. You should understand that running static analysis
only "before a release" automatically increases the cost of bug fixes.
Thus, you cannot truly evaluate efficiency of static analysis by simply running it on the last year's code
base release and the current one and comparing the results.
Tips on how to obtain the maximum benefit when adopting static
analysis
During the time we have been working in the field of static code analysis, we have worked out several
practices to obtain the maximum benefit from using static analysis tools. Although I will specify how
these mechanisms are supported in PVS-Studio, the tips can be used with any other static analysis tool.
3. Mark false positives to get fewer messages to study the next time
Any static code analyzer generates false positives. This is the nature of the tool and it can't be helped. Of
course, everybody tries to reduce the number of false positives, but you can't make it zero. In order not
to get the same false positives again and again, a good tool provides you with a mechanism to suppress
them. You can simply mark a message with "false positive" and thus tell the analyzer not to generate it
at the next check. In PVS-Studio, it is the "Mark As False Alarm" function responsible for this. See the
documentation section Suppression of false alarms for details.
Despite being very simple, this recommendation may help you to save much time. Moreover, you will
stay more focused when you have fewer messages to examine.
Use incremental analysis (automated check of freshly recompiled files)
The effective way of handling incremental analysis is to integrate it into IDE for the tool to be able to be
launched automatically when compiling freshly modified files. Ideally, incremental analysis should be
run by all the developers currently working on the code base on their computers. In this case, many
bugs will be detected and fixed before the code gets into the version control system. This practice
greatly reduces the "cost of bug fixes". PVS-Studio supports the incremental analysis mode.
Check files modified in the last several days
If you for some reason cannot install the analyzer on all the developers' computers, you may check the
code once in several days. In order not to get a pile of messages referring to old files, static analysis tools
provide the option "check files modified in the last N days". You can set it to 3 days, for example.
Although from the technical viewpoint nothing prevents you from setting this parameter to any number
(say, 7 or 10 days), we don't recommend you to do that. When you check the code just once in 10 days,
you repeat the "occasional use of the tool" mistake. You see, if a bug is added today, found by testers
tomorrow, described in the bug-tracker the day after tomorrow, and fixed in 5 days, running analysis
once in 10 days will be useless.
But the capability of checking the code once in two or three days may appear very useful. PVS-Studio
supports this option - see the settings command "Check only Files Modified In".
Set the static analyzer to run every night on the build server
Regardless of whether or not you use incremental analysis on every developer's computer, a very useful
practice is to perform a complete run of the analyzer on the whole code base every night. A highly
important capability of the tool is therefore the capability of command line launch, which is of course
supported in PVS-Studio.
The more practices you follow, the greater effect you get
Let's once again enumerate the tips on how to enhance efficiency of static code analysis:
1. Mark false positives to get fewer messages to study the next time.
2. Use incremental analysis (automated check of freshly recompiled files).
3. Check files modified in the last several days.
4. Set the static analyzer to run every night on the build server.
If you follow all the four recommendations, you'll get the highest payback from investing into static
analysis tools. Of course, it sometimes cannot be achieved due to various reasons, but you should
certainly strive for it.