2. Single Pass compiler with example
• What is a Pass?
• What is a Single Pass compiler?
• Difference between Single and Multi Pass Compiler?
• Example
• Issue with Single Pass Compiler
3. 3
The Major “Phases” of a Compiler
Syntax Analysis
Semantic Analysis
Code Generation
Source Program
Abstract Syntax Tree
Decorated Abstract Syntax Tree
Object Code
Error Reports
Error Reports
4. 4
Compiler Passes
• A “pass” is a complete traversal of the source program, or a complete
traversal of some internal representation of the source program (such
as an AST).
• A pass can correspond to a “phase” but it does not have to!
• Sometimes a single pass corresponds to several phases that are
interleaved in time.
• What and how many passes a compiler does over the source program
is an important design decision.
5. 5
Single Pass Compiler
Compiler Driver
Syntactic Analyzer
calls
calls
Contextual Analyzer Code Generator
calls
Dependency diagram of a typical Single Pass Compiler:
A single pass compiler makes a single pass over the source text, parsing, analyzing, and
generating code all at once.
6. 6
Multi Pass Compiler
Compiler Driver
Syntactic Analyzer
calls
calls
Contextual Analyzer Code Generator
calls
Dependency diagram of a typical Multi Pass Compiler:
A multi pass compiler makes several passes over the program. The output of a preceding
phase is stored in a data structure and used by subsequent phases.
input
Source Text
output
AST
input
output
Decorated AST
input
output
Object Code
7. 7
Example: Single Pass Compilation of ...
Source Program
Let var n: integer;
var c: char
in begin
c := ‘&’;
n := n+1
end
Assembly Code
PUSH 2
LOADL 38
STORE 1[SB]
LOAD 0[SB]
LOADL 1
CALL add
STORE 0[SB]
POP 2
HALT
Ident
n
c
Type
int
char
Address
0[SB]
1[SB]
8. 8
Compiler Design Issues
Single Pass Multi Pass
Speed
Memory
Modularity
Flexibility
“Global” optimization
Source Language
better worse
better for
large programs
(potentially) better for
small programs
worse better
betterworse
impossible possible
single pass compilers are not possible for many
programming languages
9. 9
Language Issues
Example Pascal:
Pascal was explicitly designed to be easy to implement with a
single pass compiler:
• Every identifier must be declared before its first use.
var n:integer;
procedure inc;
begin
n:=n+1
end
Undeclared Variable!
procedure inc;
begin
n:=n+1
end;
var n:integer;
?
10. 10
Language Issues
Example Pascal:
• Every identifier must be declared before it is used.
• How to handle mutual recursion then?
procedure ping(x:integer)
begin
... pong(x-1); ...
end;
procedure pong(x:integer)
begin
... ping(x–1); ...
end;
11. 11
Language Issues
Example Pascal:
• Every identifier must be declared before it is used.
• How to handle mutual recursion then?
forward procedure pong(x:integer)
procedure ping(x:integer)
begin
... pong(x-1); ...
end;
procedure pong(x:integer)
begin
... ping(x–1); ...
end;
OK!
12. Virtual Machine for Compiler
• What is Compiler?
• Phases of Compiler
• What is a Virtual Machine?
• Java Virtual Machine
• Structure of JVM
• Loaders
• Linkers
• Assembler
13. What is a Compiler?
“A compiler is a piece of software that translates a source program
from source language to equivalent program in target language. An
important feature of the compiler is to identify error in source program
during compilation/translation process. “
15. Lexical Analyzer
• The lexical analyzer reads the stream of characters making up the
source program and groups the characters into meaningful sequences
called lexem
• For each lexeme, the lexical analyzer produces as output a token of
the form (token-name, attribute-value)
• It is also called Scanner.
16. Syntax Analyzer
• analyzes the source code (token stream) against the production rules
to detect any errors in the code.
• The output of this phase is a parse tree
• Goal: Recover the structure described by that series of tokens.
• Goal: Report errors if those tokens do not properly encode a structure
17. Semantic Analyzer
• The semantic analyzer uses the syntax tree and the information in the
symbol table to check the source program for semantic consistency.
• Functions
• Scope checking: verify that all applied occurrences of identifiers are
declared
• Type checking: verify that all operations in the program are used
according to their type rules
• Outputs semantically verified parse tree
18. Intermediate Code Generations
• This a phase where compiler produces and low level intermediate
representation of the which can have a variety of forms
• Postfix
• Syntax Tree
• Three Address Code
19. Three Address Code
Three address code is the sequence of statements of the general form
as given
a := b op c
Where a, b and c are identifiers names and op represents operator. And
the symbol “:=” stands for assignment operator
This representation is called three address code because it has three
addresses, two for operands and one for the result
20. Quadruples
• three address code with four fields
• Op (operator)
• Arg1(argument1)
• Arg2(argument2)
• Result
• Quadruple representation of
Op ARG1 ARG2 Result
(0) + i j t1
(1) + t1 k i
t1 := i + j
i := t1 + k
21. Triples
• three address code with three fields
• Op (operator)
• Arg1(argument1)
• Arg2(argument2)
Op ARG1 ARG2
(0) + i j
(1) + (0) k
22. Code Optimization
• It is a program transformation technique, which tries to improve the
code by making it consume less resources (i.e. CPU, Memory) and
deliver high speed
• Optimization depends on various factors like
• Memory
• Algorithm
• Execution Time
• Programming Language and
• Others
23. Code Generation
• The code generator takes as input an intermediate representation of
the source program and maps it into the target language.
• The target program may be in
• Assembly Language
• Relocatable machine code or
• Absolute machine code
24. What is a Virtual Machine?
• “A virtual machine (VM) is a software implementation of a machine
that executes programs like a physical machine. It shares physical
hardware resources with the other users but isolates the OS or
application to avoid changing the end-user experience.”
25. Why use Virtual Machines?
• compilers for languages like Java and C#
• To make language platform independent
• Security ( Sand Boxing)
• strong syntactic and structural constraints
• Garbage Collection
• Robustness
• other
28. Class Loader Subsystem
Load is the part responsible for loading bytecode into the memory. Class loader loads files from different sources
using different loader such as
Bootstrap Class Loader responsible for loading java internal classes from rt.jar which is distributed with JVM.
Extension class loader responsible for loading additional application jars that reside in jre/lib/ext
Application class loader loads classes from valued specified in your CLASSPATH environment variables and
from –cp parameterized folder.
29. • Link is the phase where much of the work is done. It consists of three
parts
• Verify This is the part where the bytecode is verified according to the JVM
class specifications.
• Prepare This is the part where the memory is allocated for the static variables
inside the class file. The memory locations are than initialized with the default
values.
• Resolve In this part all the symbolic references to the current classes are
resolved with actual reference. For example one class has reference to other
class.
• Initialization This is the phase where the actual values of the static
variable define in source code are set unlike prepare where the
default value are set.
30. Runtime Data Areas
• Method Areas The place where metadata corresponding to
class is stored.
• Heap Areas The place where object data is stored
• PC Registers They are called program counter registers i.e.
point to the next instruction to be executed per thread.
• Stack Areas Contain stack frame corresponding to the
current method execution per thread
• Native Method Stacks Contains stack for the native method
execution per thread. The stack contains memory portions
for different parts of functions like parameters or local
variables etc
32. Assembler
• It is a program which converts assembly language into machine code.
Assembler performs the translation in similar way as compiler. But
assembler is used to translate low-level programming language
whereas compiler is used to translate high-level programming
language.
• An assembler performs the following functions
• Convert mnemonic operation codes to their machine language codes
• Convert symbolic (e.g., jump labels, variable names) operands to their
machine addresses
• Use proper addressing modes and formats to build efficient machine
instructions
• Translate data constants into internal machine representations
• Output the object program and provide other information (e.g., for linker and
loader)
33. Two Pass Assembler
• Pass 1
• Assign addresses to all statements in the program
• Save the values (addresses) assigned to all labels (including label and
variable names) for use in Pass 2 (deal with forward references)
• Perform some processing of assembler directives (e.g., BYTE, RESW,
these can affect address assignment)
34. • Pass 2
• Assemble instructions (generate opcode and look up addresses)
• Generate data values defined by BYTE, WORD
• Perform processing of assembler directives not done in Pass 1
• Write the object program and the assembly listing
35. Linker
• A programming tool which combines one or more partial Object Files
and libraries into a (more) complete executable object file
36. Linker (cont.)
• Three tasks
• Searches the program to find library routines used by program, e.g. printf(),
math routines,…
• Determines the memory locations that code from each module will occupy
and relocates its instructions by adjusting absolute references
• Resolves references among files
37. Loader
“Part of the OS that brings an executable file residing on disk into
memory and starts it running “
• Steps
• Read executable file’s header to determine the size of text and data segments
• Create a new address space for the program
• Copies instructions and data into address space
• Copies arguments passed to the program on the stack
• Initializes the machine registers including the stack ptr
• Jumps to a startup routine that copies the program’s arguments from the
stack to registers and calls the program’s main routine