2. WHAT IS COMPILER
a program that reads a program written in one
language and translates it into an equivalent
program in another language.
2
3. COMPILER VS INTERPRETER
Compiler: convert human readable instructions to
computer readable instructions one time.
Interpreter: converts human instructions to machine
instructions each time the program is run.
3
Adv. Dis. Adv.
Fast Not cross platform
Source code private Requires extra compiling step
Ready to run Inflexible
Adv. Dis. Adv.
Cross platform Slower
Simple to test Public source code
Simple to debug Interpreter required
4. WHY COMPILER DESIGN
Many applications for compiler technology
Parsers for HTML in web browser
Machine code generation for high level
languages
Software testing
Program optimization
Malicious code detection
Design of new computer architectures
4
5. COMPLEXITY OF COMPILER TECHNOLOGY
Required to map a programmer’s requirements (in
a HLL program) to architectural details.
Uses algorithms and techniques from a very large
number of areas in computer science.
Translates intricate theory into practice - enables
tool building.
5
7. COUSINS OF THE COMPILER
There are other programs used with the compiler,
these are:
Preprocessor
Interpreter
Assembler
Linker
Loader
Cross-Compiler
Source-to-source Compiler
7
8. COUSINS OF THE COMPILER (CONT…)
Preprocessor:
A tool that produces input for compiler and considered
as part of a compiler.
deals with macro-processing, file inclusion,
language extension, etc.
Assembler
translates assembly language programs into machine
code
output of an assembler is called an object file
(combination of machine instructions as well as the data
required to place these instructions in memory).
8
9. COUSINS OF THE COMPILER (CONT…)
Linker
A program that links and merges various object
files together in order to make an executable file.
All these files might have been compiled by
separate assemblers.
Search and locate referenced
module/routines in a program
Determine the memory location where these
codes will be loaded
9
10. Loader
Part of operating system and is responsible for loading executable files
into memory and execute them.
It calculates the size of a program (instructions and
data) and creates memory space for it.
It initializes various registers to initiate execution.
Cross-Compiler
A compiler that runs on platform (A) and is capable of
generating executable code for another platform (B).
Source-to-source Compiler
A source-to-source compiler is a compiler that takes the
source code of one programming language and translates it
into the source code of another programming language.
10
11. PHASES OF A COMPILER
The two important parts in compilation are:
Analysis (Machine Independent/Language Dependent)
Synthesis (Machine Dependent/Language
independent)
11
13. ANALYSIS AND SYNTHESIS
Analysis:
breaks up the source program into constituent pieces
and creates an intermediate representation of the
source program.
Synthesis:
constructs the (target) program from the intermediate
representation.
13
14. ANALYSIS OF THE SOURCE PROGRAM
1. Lexical / Linear Analysis (scanning)
Scans the source code as a stream of characters
and converts it into meaningful lexemes.
Lexical analyzer represents these lexemes in the
form of tokens as:
<token-name, attribute-value>
Token is the basic lexical unit which is the smallest
meaningful element that a compiler understands.
Examples of tokens are: Identifiers, Keywords,
Literals, Operators and Special symbols.
Blanks, newlines, comments , tabulation marks will
be removed from the source program.
14
15. 15
Lexical analyzers can be generated automatically
from regular expression specifications.
LEX and Flex are two such tools.
Lexical analyzer is a deterministic finite state
automaton.
16. 2. Syntax / Hierarchical Analysis – Parsing
Tokens are grouped hierarchically into nested
collections with collective meaning.
The result is generally a parse tree.
In this phase expressions, statements, declarations
etc... are identified by using the results of lexical
analysis.
Most syntactic errors in the source program are
caught in this phase.
Syntactic rules of the source language are given via
a Grammar.
16
18. 3. Semantic Analysis
Certain checks are performed to make sure that the
components of the program fit together
meaningfully.
Unlike parsing, this phase checks for semantic
errors in the source program (e.g. type mismatch)
Type checking of various programming language
constructs is one of the most important tasks.
Stores type information in the symbol table or the
syntax tree.
Types of variables, function parameters, array
dimensions, etc.
18
20. INTERMEDIATE CODE GENERATION
While generating machine code directly from
source code is possible, it entails two problems:
With m languages and n target machines, we
need to write m × n compilers.
The code optimizer cannot be reused.
By converting source code to an intermediate
code, a machine-independent code optimizer
may be written.
Intermediate code must be easy to produce and
easy to translate to machine code:
20
22. SYNTHESIS OF THE TARGET PROGRAM
The synthesis of the target program is
composed of two phases, which are:
1)Code Optimization, and
2)Code Generation.
The synthesis part of the compilation
process is also called the back-end of
a compiler.
22
23. CODE OPTIMIZATION
Changes the IC by removing such inefficiencies
and improves the code so that the code generator
produces a faster and less memory consuming
program.
Improvement may be time, space, or
power consumption.
It changes the structure of programs,
sometimes of beyond recognition:
Inlines functions, unrolls loops, eliminates
some programmer-defined variables, etc. 23
25. CODE GENERATION
Converts intermediate code to machine code.
Each intermediate code instruction may result
in many machine instructions or vice-versa.
Must handle all aspects of machine
architecture
Registers, pipelining, cache, multiple function
units, etc.
Storage allocation decisions are made here
Register allocation and assignment are the most
important problems.
25
27. COMPILER CONSTRUCTION TOOLS
Various tools are used in the construction
of the various parts of a compiler.
1) Scanner generators
e.g. Lex, flex, JLex
These tools generate a scanner /lexical
analyzer given a regular expression.
2) Parser Generators
e.g. Yacc, Bison, CUP
These tools produce a parser /syntax
analyzer given a Context Free Grammar
(CFG) that describes the syntax of the
source language. 27