SlideShare a Scribd company logo
1 of 5
Download to read offline
Using Static Analysis in Program
Development
Authors: Alexey Kolosov

Date: 31.01.2008


Abstract
Static analysis allows checking program code before the tested program is executed. The static analysis
process consists of three steps. First, the analyzed program code is split into tokens, i.e. constants,
identifiers, reserved symbols, etc. This operation is performed by lexer. Second, the tokens are passed
to parser, which builds an abstract syntax tree (AST) based on the tokens. Finally, the static analysis is
performed over the AST. This article describes three static analysis techniques: AST walker analysis, data
flow analysis and path-sensitive data flow analysis.


Introduction
Application testing is an important part of software development process. There are many different
types of software testing. Among them there are two types involving the application's code: static
analysis and dynamic analysis.

Dynamic analysis is performed on executable code of a compiled program. Dynamic analysis checks only
user-specific behavior. That is, only the code, executed during a test is checked. Dynamic analyzer can
provide the developer with information on memory leaks, program's performance, call stack, etc.

Static analysis allows checking program code before the tested program is executed. Compiler always
performs static analysis during the compilation process. However, in large, real-life projects it is often
necessary to make the entire source code fulfill some additional requirements. These additional
requirements may vary from variables naming to portability (for example, the code should be
successfully executed both on x86 and x64 platforms). The most common requirements are:

    •   Reliability - a lower amount of bugs in the tested program.
    •   Maintainability - better understanding of the source code by others so that it is easier to
        upgrade/change the source code.
    •   Testability - shorter testing time due to more effective testing process.
    •   Portability - flexibility when the tested program is launched on different hardware platforms (for
        example, x86 and x64, as it has already been mentioned above).
    •   Readability - better understanding of the code by others and therefore shorter review times and
        clearer code reading[1].

All requirements can be divided into two categories: rules and guidelines. Rules describe what is
mandatory, while guidelines describe what is recommended (by analogy with errors and warnings
produced by built-in code analyzers of standard compilers).

Rules and guidelines form a coding standard. A coding standard defines the way a developer must and
should write program code.
A static analyzer finds source code lines, which presumably do not fulfill the specified coding standard
and displays diagnostic messages so that the developer can understand what is wrong with these lines.
The static analysis process is similar to compilation except that no executable or object code is
generated. This article describes the static analysis process step by step.


The Analysis Process
Static analysis process consists of two steps: abstract syntax tree creation and abstract syntax tree
analysis.

In order to analyze source code, a static analysis tool should "understand" the code, that is, parse it and
create a structure, describing the code in a convenient form. This form is named abstract syntax tree
(often referred to as AST). To check, whether source code fulfils a coding standard, this tree should be
built.

In general case, an abstract syntax tree is built only for an analyzed fragment of a source code (for
example, for a specific function). Before the tree can be built, the source code is first processed by a
lexer and then by a parser.

The lexer is responsible for dividing the input stream into individual tokens, identifying the type of the
tokens, and passing tokens one at a time to the next stage of the analysis. The lexer reads text data line
by line and splits a line to reserved words, identifiers and constants, which are called tokens. After a
token is retrieved, the lexer identifies the type of the token.

If the first character of the token is a digit the token is a number, or if the first character is a minus sign
the token is a negative number. If the token is a number it might be a real or an integer. If it contains a
decimal point or the letter E (which indicates scientific notation) then it is a real, otherwise it is an
integer. Note that this could be masking a lexical error. If the analyzed source code contains a token
"4xyz" the lexer will turn it into an integer 4. It is likely that any such error will cause a syntax error,
which the parser can catch. However, such errors can also be processed by lexer.

If the token is not a number it could be a string. String constants can be identified by quote marks, single
quote marks or other symbols, depending on the syntax of the analyzed language.

Finally, if the token is not a string, it must be an identifier, a reserved word, or a reserved symbol. If the
token is not identified as one of them, it is a lexical error. The lexer does not handle errors itself, so it
simply notifies the parser that an unidentified token type has been found. The parser will handle the
error[2].

The parser has an understanding of the language's grammar. It is responsible for identifying syntax
errors and for translating an error free program into internal data structures, abstract syntax trees, that
can be processed by static analyzer.

While lexer knows only language's syntax, parser also recognizes context. For example, let's declare a C
function::

int Func(){return 0;}

Lexer will process this line in the following way (see table 1):

int         Func         (            )            {            return      0            ;            }
reserved identifier reserved reserved reserved                reserved    integer      reserved       reserved
word                    symbol       symbol       symbol      word        constant     symbol         symbol
Table 1. Tokens of the "int Func(){return 0};" string.

The line will be identified as 8 correct tokens and these tokens will be passed to parser. The parser will
check the context and find out that it is a declaration of function, which takes no parameters, returns an
integer, and always returns 0.

The parser will find it out by creating an abstract syntax tree from the tokens provided by the lexer and
analyzing the tree. If the tokens and the tree built from them will be considered to be correct, the tree
will be used for the static analysis. Otherwise, the parser will report an error.

However, building an abstract syntax tree is not just organizing a set of tokens in a tree form.


Abstract Syntax Trees
An abstract syntax tree captures the essential structure of the input in a tree form, while omitting
unnecessary syntactic details. ASTs can be distinguished from concrete syntax trees by their omission of
tree nodes to represent punctuation marks such as semi-colons to terminate statements or commas to
separate function arguments. ASTs also omit tree nodes that represent unary productions in the
grammar. Such information is directly represented in ASTs by the structure of the tree.

ASTs can be created with hand-written parsers or by code produced by parser generators. ASTs are
generally created bottom-up.

When designing the nodes of the tree, a common design choice is determining the granularity of the
representation of the AST. That is, whether all constructs of the source language are represented as a
different type of AST nodes or whether some constructs of the source language are represented with a
common type of AST node and differentiated using a value. One example of choosing the granularity of
representation is determining how to represent binary arithmetic operations. One choice is to have a
single binary operation tree node, which has as one of its attributes the operation, e.g. "+". The other
choice is to have a tree node for every binary operation. In an object-oriented language, this would
results in classes like: AddBinary, SubtractBinary, MultiplyBinary, etc. with an abstract base class of
Binary[3].

For example, let us parse two expressions: 1 + 2 * 3 + 4 * 5 and 1+ 2 * (3 + 4) * 5 (see figure 1):
Figure 1. Parsed expressions: 1 + 2 * 3 + 4 * 5 (left) and 1+ 2 * (3 + 4) * 5 (right).

As one can see from the figure, the expression can be restored to its original form if you walk the tree
from left to right.

After the abstract syntax tree is created and verified, the static analyzer will be able to check, whether
the source code fulfils the rules and guidelines specified by the code standard.


Static Analysis Techniques
There are many different analysis techniques, such as AST walker analysis, dataflow analysis, path-
sensitive data flow analysis, etc. Concrete implementations of these techniques vary from tool to tool.
Static analysis tools for different programming languages can be based on various analysis frameworks.
These frameworks contain core sets of common techniques, which can be used in static analysis tools so
that these tools reuse the same infrastructure. The supported analysis techniques and the way these
techniques are implemented varies from framework to framework. For example, a framework may
provide easy way to create an AST walker analyzer, but has no support for data-flow analysis[4].

Although all the three above mentioned analysis techniques use the AST created by parser, the
techniques differ by their algorithms and purposes.

AST walker analysis, as one can see from the term, is performed by walking the AST and checking
whether it fulfils the coding standard, specified as a set of rules and guidelines. This is the analysis
performed by compilers.

Data flow analysis can be described as a process to collect information about the use, definition, and
dependencies of data in programs. The data flow analysis algorithm operates on a control flow graph
(CFG), generated from the source code AST. The CFG represents all possible execution paths of a given
computer program: the nodes represent pieces of code and the edges represent possible control
transfers between these code pieces. Since the analysis is performed without executing the tested
program, it is impossible to determine the exact output of the program, i.e. to find out which execution
path in the control flow graph is actually taken. That is why data flow analysis algorithms make
approximations of this behavior, for example, by considering both branches of an if-then-else statement
and by performing a fixed-point computation for the body of a while statement. Such a fixed-point
always exists because the data flow equations compute sets of variables and there are only a finite
number of variables available since we only consider programs with a finite number of statements.
Therefore, there is a finite upper limit to the number of elements of the computed sets which means
that a fixed-point always exists. In terms of control flow graphs, static analysis means that all possible
execution paths are considered to be actual execution paths. The result of this assumption is that one
can only obtain approximate solutions for certain data flow analysis problems[5].

The data flow analysis algorithm described above is path-insensitive, because it contributes all execution
paths - whether feasible or infeasible, heavily or rarely executed - to a solution. However, programs
execute only a small fraction of their potential paths and, moreover, execution time and cost is usually
concentrated in a far smaller subset of hot paths. Therefore, it is natural to reduce the analyzed CFG
and, therefore, to reduce the amount of calculations so that only a subset of the CFG paths are
analyzed. Path-sensitive analysis operates on a reduced CFG, which does not include infeasible paths
and does not contain "dangerous" code. The paths selection criteria are different in different tools. For
example, a tool may analyze only the paths containing dynamic arrays declaration, which is considered
to be "dangerous" according to the tool's settings.


Conclusion
The number of static analysis tools and techniques grows from year to year and this proves the growing
interest in static analyzers. The cause of the interest is that the software under development becomes
more and more complex and, therefore, it becomes impossible for developers to check the source code
manually.

This article gave a brief description of the static analysis process and analysis techniques.


References
    1. Dirk Giesen Philosophy and practical implementation of static analyzer tools [Electronic
       resource]. -Electronic data. -Dirk Giesen, cop. 1998. -Access mode:
       http://www.viva64.com/go.php?url=63
    2. James Alan Farrell Compiler Basics [Electronic resource]. -Electronic data. -James Alan Farrell,
       cop 1995. -Access mode: http://www.viva64.com/go.php?url=64
    3. Joel Jones Abstract syntax tree implementation idioms [Electronic resource]. -Proceedings of the
       10th Conference on Pattern Languages of Programs 2003, cop 2003.
    4. Ciera Nicole Christopher Evaluating Static Analysis Frameworks [Electronic resource].- Ciera
       Nicole, cop. 2006. - Access mode: http://www.viva64.com/go.php?url=64
    5. Leon Moonen A Generic Architecture for Data Flow Analysis to Support Reverse Engineering
       [Electronic resource]. - Proceedings of the 2nd International Workshop on the Theory and
       Practice of Algebraic Specifications, cop. 1997.

More Related Content

What's hot

4 lexical and syntax analysis
4 lexical and syntax analysis4 lexical and syntax analysis
4 lexical and syntax analysisjigeno
 
3. Lexical analysis
3. Lexical analysis3. Lexical analysis
3. Lexical analysisSaeed Parsa
 
Compiler design and lexical analyser
Compiler design and lexical analyserCompiler design and lexical analyser
Compiler design and lexical analyserabhishek gupta
 
About Tokens and Lexemes
About Tokens and LexemesAbout Tokens and Lexemes
About Tokens and LexemesBen Scholzen
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design MAHASREEM
 
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISCORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISijseajournal
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical AnalysisMunni28
 
Ap Power Point Chpt3
Ap Power Point Chpt3Ap Power Point Chpt3
Ap Power Point Chpt3dplunkett
 
SL5 OBJECT: SIMPLER LEVEL 5 OBJECT EXPERT SYSTEM LANGUAGE
SL5 OBJECT: SIMPLER LEVEL 5 OBJECT EXPERT SYSTEM LANGUAGE SL5 OBJECT: SIMPLER LEVEL 5 OBJECT EXPERT SYSTEM LANGUAGE
SL5 OBJECT: SIMPLER LEVEL 5 OBJECT EXPERT SYSTEM LANGUAGE ijscmcjournal
 
Lexical Analysis - Compiler Design
Lexical Analysis - Compiler DesignLexical Analysis - Compiler Design
Lexical Analysis - Compiler DesignAkhil Kaushik
 

What's hot (20)

4 lexical and syntax analysis
4 lexical and syntax analysis4 lexical and syntax analysis
4 lexical and syntax analysis
 
3. Lexical analysis
3. Lexical analysis3. Lexical analysis
3. Lexical analysis
 
Compiler design and lexical analyser
Compiler design and lexical analyserCompiler design and lexical analyser
Compiler design and lexical analyser
 
About Tokens and Lexemes
About Tokens and LexemesAbout Tokens and Lexemes
About Tokens and Lexemes
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design
 
Plc part 2
Plc  part 2Plc  part 2
Plc part 2
 
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISCORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
 
Lecture3 lexical analysis
Lecture3 lexical analysisLecture3 lexical analysis
Lecture3 lexical analysis
 
Hema wt (1)
Hema wt (1)Hema wt (1)
Hema wt (1)
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
Lexical
LexicalLexical
Lexical
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Syntax
SyntaxSyntax
Syntax
 
Ap Power Point Chpt3
Ap Power Point Chpt3Ap Power Point Chpt3
Ap Power Point Chpt3
 
SL5 OBJECT: SIMPLER LEVEL 5 OBJECT EXPERT SYSTEM LANGUAGE
SL5 OBJECT: SIMPLER LEVEL 5 OBJECT EXPERT SYSTEM LANGUAGE SL5 OBJECT: SIMPLER LEVEL 5 OBJECT EXPERT SYSTEM LANGUAGE
SL5 OBJECT: SIMPLER LEVEL 5 OBJECT EXPERT SYSTEM LANGUAGE
 
Hd3
Hd3Hd3
Hd3
 
Lexical Analysis - Compiler Design
Lexical Analysis - Compiler DesignLexical Analysis - Compiler Design
Lexical Analysis - Compiler Design
 

Viewers also liked

A Collection of Examples of 64-bit Errors in Real Programs
A Collection of Examples of 64-bit Errors in Real ProgramsA Collection of Examples of 64-bit Errors in Real Programs
A Collection of Examples of 64-bit Errors in Real ProgramsPVS-Studio
 
Lesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmeticLesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmeticPVS-Studio
 
In what way can C++0x standard help you eliminate 64-bit errors
In what way can C++0x standard help you eliminate 64-bit  errorsIn what way can C++0x standard help you eliminate 64-bit  errors
In what way can C++0x standard help you eliminate 64-bit errorsPVS-Studio
 
Traps detection during migration of C and C++ code to 64-bit Windows
Traps detection during migration of C and C++ code to 64-bit WindowsTraps detection during migration of C and C++ code to 64-bit Windows
Traps detection during migration of C and C++ code to 64-bit WindowsPVS-Studio
 
Static code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applicationsStatic code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applicationsPVS-Studio
 
Knee-deep in C++ s... code
Knee-deep in C++ s... codeKnee-deep in C++ s... code
Knee-deep in C++ s... codePVS-Studio
 
Problems of testing 64-bit applications
Problems of testing 64-bit applicationsProblems of testing 64-bit applications
Problems of testing 64-bit applicationsPVS-Studio
 
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)PVS-Studio
 
Some examples of the 64-bit code errors
Some examples of the 64-bit code errorsSome examples of the 64-bit code errors
Some examples of the 64-bit code errorsPVS-Studio
 
Viva64: What Is It, and Who Is It for?
Viva64: What Is It, and Who Is It for?Viva64: What Is It, and Who Is It for?
Viva64: What Is It, and Who Is It for?PVS-Studio
 
Big Brother helps you
Big Brother helps youBig Brother helps you
Big Brother helps youPVS-Studio
 
Software code metrics
Software code metricsSoftware code metrics
Software code metricsPVS-Studio
 
About size_t and ptrdiff_t
About size_t and ptrdiff_tAbout size_t and ptrdiff_t
About size_t and ptrdiff_tPVS-Studio
 
Lesson 11. Pattern 3. Shift operations
Lesson 11. Pattern 3. Shift operationsLesson 11. Pattern 3. Shift operations
Lesson 11. Pattern 3. Shift operationsPVS-Studio
 
How we test the code analyzer
How we test the code analyzerHow we test the code analyzer
How we test the code analyzerPVS-Studio
 
Lesson 10. Pattern 2. Functions with variable number of arguments
Lesson 10. Pattern 2. Functions with variable number of argumentsLesson 10. Pattern 2. Functions with variable number of arguments
Lesson 10. Pattern 2. Functions with variable number of argumentsPVS-Studio
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source codePVS-Studio
 
Lesson 14. Pattern 6. Changing an array's type
Lesson 14. Pattern 6. Changing an array's typeLesson 14. Pattern 6. Changing an array's type
Lesson 14. Pattern 6. Changing an array's typePVS-Studio
 
Optimization of 64-bit programs
Optimization of 64-bit programsOptimization of 64-bit programs
Optimization of 64-bit programsPVS-Studio
 
The reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memoryThe reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memoryPVS-Studio
 

Viewers also liked (20)

A Collection of Examples of 64-bit Errors in Real Programs
A Collection of Examples of 64-bit Errors in Real ProgramsA Collection of Examples of 64-bit Errors in Real Programs
A Collection of Examples of 64-bit Errors in Real Programs
 
Lesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmeticLesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmetic
 
In what way can C++0x standard help you eliminate 64-bit errors
In what way can C++0x standard help you eliminate 64-bit  errorsIn what way can C++0x standard help you eliminate 64-bit  errors
In what way can C++0x standard help you eliminate 64-bit errors
 
Traps detection during migration of C and C++ code to 64-bit Windows
Traps detection during migration of C and C++ code to 64-bit WindowsTraps detection during migration of C and C++ code to 64-bit Windows
Traps detection during migration of C and C++ code to 64-bit Windows
 
Static code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applicationsStatic code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applications
 
Knee-deep in C++ s... code
Knee-deep in C++ s... codeKnee-deep in C++ s... code
Knee-deep in C++ s... code
 
Problems of testing 64-bit applications
Problems of testing 64-bit applicationsProblems of testing 64-bit applications
Problems of testing 64-bit applications
 
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
 
Some examples of the 64-bit code errors
Some examples of the 64-bit code errorsSome examples of the 64-bit code errors
Some examples of the 64-bit code errors
 
Viva64: What Is It, and Who Is It for?
Viva64: What Is It, and Who Is It for?Viva64: What Is It, and Who Is It for?
Viva64: What Is It, and Who Is It for?
 
Big Brother helps you
Big Brother helps youBig Brother helps you
Big Brother helps you
 
Software code metrics
Software code metricsSoftware code metrics
Software code metrics
 
About size_t and ptrdiff_t
About size_t and ptrdiff_tAbout size_t and ptrdiff_t
About size_t and ptrdiff_t
 
Lesson 11. Pattern 3. Shift operations
Lesson 11. Pattern 3. Shift operationsLesson 11. Pattern 3. Shift operations
Lesson 11. Pattern 3. Shift operations
 
How we test the code analyzer
How we test the code analyzerHow we test the code analyzer
How we test the code analyzer
 
Lesson 10. Pattern 2. Functions with variable number of arguments
Lesson 10. Pattern 2. Functions with variable number of argumentsLesson 10. Pattern 2. Functions with variable number of arguments
Lesson 10. Pattern 2. Functions with variable number of arguments
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Lesson 14. Pattern 6. Changing an array's type
Lesson 14. Pattern 6. Changing an array's typeLesson 14. Pattern 6. Changing an array's type
Lesson 14. Pattern 6. Changing an array's type
 
Optimization of 64-bit programs
Optimization of 64-bit programsOptimization of 64-bit programs
Optimization of 64-bit programs
 
The reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memoryThe reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memory
 

Similar to Using Static Analysis in Program Development

Similar to Using Static Analysis in Program Development (20)

Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compiler
 
automata theroy and compiler designc.pptx
automata theroy and compiler designc.pptxautomata theroy and compiler designc.pptx
automata theroy and compiler designc.pptx
 
Lexical Analysis.pdf
Lexical Analysis.pdfLexical Analysis.pdf
Lexical Analysis.pdf
 
Parsing
ParsingParsing
Parsing
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysis
 
1._Introduction_.pptx
1._Introduction_.pptx1._Introduction_.pptx
1._Introduction_.pptx
 
COMPILER CONSTRUCTION KU 1.pptx
COMPILER CONSTRUCTION KU 1.pptxCOMPILER CONSTRUCTION KU 1.pptx
COMPILER CONSTRUCTION KU 1.pptx
 
COMPILER DESIGN.pdf
COMPILER DESIGN.pdfCOMPILER DESIGN.pdf
COMPILER DESIGN.pdf
 
Symbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code GenerationSymbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code Generation
 
Chapter _4_Semantic Analysis .pptx
Chapter _4_Semantic Analysis .pptxChapter _4_Semantic Analysis .pptx
Chapter _4_Semantic Analysis .pptx
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
Viva
VivaViva
Viva
 
Viva
VivaViva
Viva
 
Elasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdfElasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdf
 
Ss ui lecture 2
Ss ui lecture 2Ss ui lecture 2
Ss ui lecture 2
 
Antlr
AntlrAntlr
Antlr
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Parser
ParserParser
Parser
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Using Static Analysis in Program Development

  • 1. Using Static Analysis in Program Development Authors: Alexey Kolosov Date: 31.01.2008 Abstract Static analysis allows checking program code before the tested program is executed. The static analysis process consists of three steps. First, the analyzed program code is split into tokens, i.e. constants, identifiers, reserved symbols, etc. This operation is performed by lexer. Second, the tokens are passed to parser, which builds an abstract syntax tree (AST) based on the tokens. Finally, the static analysis is performed over the AST. This article describes three static analysis techniques: AST walker analysis, data flow analysis and path-sensitive data flow analysis. Introduction Application testing is an important part of software development process. There are many different types of software testing. Among them there are two types involving the application's code: static analysis and dynamic analysis. Dynamic analysis is performed on executable code of a compiled program. Dynamic analysis checks only user-specific behavior. That is, only the code, executed during a test is checked. Dynamic analyzer can provide the developer with information on memory leaks, program's performance, call stack, etc. Static analysis allows checking program code before the tested program is executed. Compiler always performs static analysis during the compilation process. However, in large, real-life projects it is often necessary to make the entire source code fulfill some additional requirements. These additional requirements may vary from variables naming to portability (for example, the code should be successfully executed both on x86 and x64 platforms). The most common requirements are: • Reliability - a lower amount of bugs in the tested program. • Maintainability - better understanding of the source code by others so that it is easier to upgrade/change the source code. • Testability - shorter testing time due to more effective testing process. • Portability - flexibility when the tested program is launched on different hardware platforms (for example, x86 and x64, as it has already been mentioned above). • Readability - better understanding of the code by others and therefore shorter review times and clearer code reading[1]. All requirements can be divided into two categories: rules and guidelines. Rules describe what is mandatory, while guidelines describe what is recommended (by analogy with errors and warnings produced by built-in code analyzers of standard compilers). Rules and guidelines form a coding standard. A coding standard defines the way a developer must and should write program code.
  • 2. A static analyzer finds source code lines, which presumably do not fulfill the specified coding standard and displays diagnostic messages so that the developer can understand what is wrong with these lines. The static analysis process is similar to compilation except that no executable or object code is generated. This article describes the static analysis process step by step. The Analysis Process Static analysis process consists of two steps: abstract syntax tree creation and abstract syntax tree analysis. In order to analyze source code, a static analysis tool should "understand" the code, that is, parse it and create a structure, describing the code in a convenient form. This form is named abstract syntax tree (often referred to as AST). To check, whether source code fulfils a coding standard, this tree should be built. In general case, an abstract syntax tree is built only for an analyzed fragment of a source code (for example, for a specific function). Before the tree can be built, the source code is first processed by a lexer and then by a parser. The lexer is responsible for dividing the input stream into individual tokens, identifying the type of the tokens, and passing tokens one at a time to the next stage of the analysis. The lexer reads text data line by line and splits a line to reserved words, identifiers and constants, which are called tokens. After a token is retrieved, the lexer identifies the type of the token. If the first character of the token is a digit the token is a number, or if the first character is a minus sign the token is a negative number. If the token is a number it might be a real or an integer. If it contains a decimal point or the letter E (which indicates scientific notation) then it is a real, otherwise it is an integer. Note that this could be masking a lexical error. If the analyzed source code contains a token "4xyz" the lexer will turn it into an integer 4. It is likely that any such error will cause a syntax error, which the parser can catch. However, such errors can also be processed by lexer. If the token is not a number it could be a string. String constants can be identified by quote marks, single quote marks or other symbols, depending on the syntax of the analyzed language. Finally, if the token is not a string, it must be an identifier, a reserved word, or a reserved symbol. If the token is not identified as one of them, it is a lexical error. The lexer does not handle errors itself, so it simply notifies the parser that an unidentified token type has been found. The parser will handle the error[2]. The parser has an understanding of the language's grammar. It is responsible for identifying syntax errors and for translating an error free program into internal data structures, abstract syntax trees, that can be processed by static analyzer. While lexer knows only language's syntax, parser also recognizes context. For example, let's declare a C function:: int Func(){return 0;} Lexer will process this line in the following way (see table 1): int Func ( ) { return 0 ; }
  • 3. reserved identifier reserved reserved reserved reserved integer reserved reserved word symbol symbol symbol word constant symbol symbol Table 1. Tokens of the "int Func(){return 0};" string. The line will be identified as 8 correct tokens and these tokens will be passed to parser. The parser will check the context and find out that it is a declaration of function, which takes no parameters, returns an integer, and always returns 0. The parser will find it out by creating an abstract syntax tree from the tokens provided by the lexer and analyzing the tree. If the tokens and the tree built from them will be considered to be correct, the tree will be used for the static analysis. Otherwise, the parser will report an error. However, building an abstract syntax tree is not just organizing a set of tokens in a tree form. Abstract Syntax Trees An abstract syntax tree captures the essential structure of the input in a tree form, while omitting unnecessary syntactic details. ASTs can be distinguished from concrete syntax trees by their omission of tree nodes to represent punctuation marks such as semi-colons to terminate statements or commas to separate function arguments. ASTs also omit tree nodes that represent unary productions in the grammar. Such information is directly represented in ASTs by the structure of the tree. ASTs can be created with hand-written parsers or by code produced by parser generators. ASTs are generally created bottom-up. When designing the nodes of the tree, a common design choice is determining the granularity of the representation of the AST. That is, whether all constructs of the source language are represented as a different type of AST nodes or whether some constructs of the source language are represented with a common type of AST node and differentiated using a value. One example of choosing the granularity of representation is determining how to represent binary arithmetic operations. One choice is to have a single binary operation tree node, which has as one of its attributes the operation, e.g. "+". The other choice is to have a tree node for every binary operation. In an object-oriented language, this would results in classes like: AddBinary, SubtractBinary, MultiplyBinary, etc. with an abstract base class of Binary[3]. For example, let us parse two expressions: 1 + 2 * 3 + 4 * 5 and 1+ 2 * (3 + 4) * 5 (see figure 1):
  • 4. Figure 1. Parsed expressions: 1 + 2 * 3 + 4 * 5 (left) and 1+ 2 * (3 + 4) * 5 (right). As one can see from the figure, the expression can be restored to its original form if you walk the tree from left to right. After the abstract syntax tree is created and verified, the static analyzer will be able to check, whether the source code fulfils the rules and guidelines specified by the code standard. Static Analysis Techniques There are many different analysis techniques, such as AST walker analysis, dataflow analysis, path- sensitive data flow analysis, etc. Concrete implementations of these techniques vary from tool to tool. Static analysis tools for different programming languages can be based on various analysis frameworks. These frameworks contain core sets of common techniques, which can be used in static analysis tools so that these tools reuse the same infrastructure. The supported analysis techniques and the way these techniques are implemented varies from framework to framework. For example, a framework may provide easy way to create an AST walker analyzer, but has no support for data-flow analysis[4]. Although all the three above mentioned analysis techniques use the AST created by parser, the techniques differ by their algorithms and purposes. AST walker analysis, as one can see from the term, is performed by walking the AST and checking whether it fulfils the coding standard, specified as a set of rules and guidelines. This is the analysis performed by compilers. Data flow analysis can be described as a process to collect information about the use, definition, and dependencies of data in programs. The data flow analysis algorithm operates on a control flow graph (CFG), generated from the source code AST. The CFG represents all possible execution paths of a given computer program: the nodes represent pieces of code and the edges represent possible control transfers between these code pieces. Since the analysis is performed without executing the tested program, it is impossible to determine the exact output of the program, i.e. to find out which execution path in the control flow graph is actually taken. That is why data flow analysis algorithms make approximations of this behavior, for example, by considering both branches of an if-then-else statement and by performing a fixed-point computation for the body of a while statement. Such a fixed-point always exists because the data flow equations compute sets of variables and there are only a finite number of variables available since we only consider programs with a finite number of statements. Therefore, there is a finite upper limit to the number of elements of the computed sets which means that a fixed-point always exists. In terms of control flow graphs, static analysis means that all possible
  • 5. execution paths are considered to be actual execution paths. The result of this assumption is that one can only obtain approximate solutions for certain data flow analysis problems[5]. The data flow analysis algorithm described above is path-insensitive, because it contributes all execution paths - whether feasible or infeasible, heavily or rarely executed - to a solution. However, programs execute only a small fraction of their potential paths and, moreover, execution time and cost is usually concentrated in a far smaller subset of hot paths. Therefore, it is natural to reduce the analyzed CFG and, therefore, to reduce the amount of calculations so that only a subset of the CFG paths are analyzed. Path-sensitive analysis operates on a reduced CFG, which does not include infeasible paths and does not contain "dangerous" code. The paths selection criteria are different in different tools. For example, a tool may analyze only the paths containing dynamic arrays declaration, which is considered to be "dangerous" according to the tool's settings. Conclusion The number of static analysis tools and techniques grows from year to year and this proves the growing interest in static analyzers. The cause of the interest is that the software under development becomes more and more complex and, therefore, it becomes impossible for developers to check the source code manually. This article gave a brief description of the static analysis process and analysis techniques. References 1. Dirk Giesen Philosophy and practical implementation of static analyzer tools [Electronic resource]. -Electronic data. -Dirk Giesen, cop. 1998. -Access mode: http://www.viva64.com/go.php?url=63 2. James Alan Farrell Compiler Basics [Electronic resource]. -Electronic data. -James Alan Farrell, cop 1995. -Access mode: http://www.viva64.com/go.php?url=64 3. Joel Jones Abstract syntax tree implementation idioms [Electronic resource]. -Proceedings of the 10th Conference on Pattern Languages of Programs 2003, cop 2003. 4. Ciera Nicole Christopher Evaluating Static Analysis Frameworks [Electronic resource].- Ciera Nicole, cop. 2006. - Access mode: http://www.viva64.com/go.php?url=64 5. Leon Moonen A Generic Architecture for Data Flow Analysis to Support Reverse Engineering [Electronic resource]. - Proceedings of the 2nd International Workshop on the Theory and Practice of Algebraic Specifications, cop. 1997.