What is compiler?
Compiler is a program which takes one language as input and translate it into
an equivalent another language.
Compiler is divided into two parts: Analysis and Synthesis
Basic model of compiler can be represented as follows:
Conceptually, a compiler operates into phases, each of which transform the
source program from one representation to another.
Each phase takes input from its previous stage, has its own representation of source
program, and feeds its output to the next phase of the compiler.
The compiler has six phases called as lexical analyzer, syntax analyzer,
semantic analyzer, intermediate code generator, code optimizer and code
generator.
Two other activities, symbol-table management and error handling, are
interacting with the six phases of compiler.
Phases of compiler
To support the phases of compiler, symbol table is maintained. The task of
symbol table is to store identifiers used in program.
Basically symbol table is a data structure used to store the information
about identifiers.
The symbol table allows us to find the record for each identifier quickly and
to store or retrieve data from that record efficiently.
While doing the semantic analysis and intermediate code generation, we
need to know what type of identifiers are.
During code generation typically information about how much storage is
allocated to identifier is seen.
Symbol-table management
Error detection and reporting
In compilation, each phase detects error. These errors must be reported to
error handler whose task is to handle the error so that the compilation can
proceed.
Normally, the errors are reported in the form of message.
Large number of error can be detected in syntax analysis phase. Such error
are known as syntax error.
During semantic analysis, type mismatch kind of error is usually detected.
Lexical analysis
The lexical analysis is also known as scanning.
It is the phase of compilation in which the complete source code is scanned
and source program is broken up into group of strings called token.
A token is sequence of character having a collective meaning.
E.g. Total = count + rate
After lexical analysis, the statement is broken up into series of tokens as
follows:
identifier total, assignment operator, identifier count, plus sign, identifier
rate
syntax analysis
The syntax analysis is also known as parsing.
In this phase token generated by lexical analysis are grouped together to form
a hierarchical structure known as syntax tree.
semantic analysis
Once syntax is checked in the syntax analyzer phase the next phase i.e.
semantic analyzer determines the meaning of the source string.
For example meaning of source string means matching parenthesis, matching
if…else statement, performing arithmetic operation of expression that are type
compatible or checking operation scope.
Intermediate code generation
The intermediate code is a kind of code which is easy to generate and this
code can be easily converted to target code.
This code is in variety of form such as three address code, quadruple, triple,
posix.
For example, total = count + rate * 10
Intermediate code using three address code method is
t1 := int_to_float(10)
t2 := rate * t1
t3 := count + t2
total := t3
Code optimization
The code optimization phase attempt to improve the intermediate code
This is necessary to have a faster executing code or less consumption of
memory
Thus, by optimizing the code the overall running time of the target program
can be improved.
Code generation
In this phase, target code is generated.
The intermediate code instructions are translated into sequence of machine
instruction
For example, total = count + rate * 10
Target code will be
MOV rate, R1
MUL #10.0, R1
MOV count, R2
ADD R2, R1
MOV R1, total