3. Objectives
Lahore Garrison University
▶ Inthis course we will study compilers that translate a program written in
high-level source language to a lower-level assembly code. We will study
the theories and algorithms that can be applied to any language and the
course.
▶ Course isdivided into 3 sections
▶ Front End (Deals with syntactic and semantic analysis of the source
language)
▶ Middle End (Deals with the intermediate representation its analysis and
optimizations into which the source code is translated )
▶ Back End or Code generator (Which generates the machine code)
3
4. Today’s Lecture
Lahore Garrison University
4
▶ In this lesson we will try to understand overall working of the compiler in
terms of theirvarious phases and interaction.
▶ In particularwe will understand
▶ What is compiler
▶ T
wo main partsof compiler
▶ How it works
▶ Programs that helps compiler
▶ Phasesof compiler
Through a simple language and its processing based on the grammar and its
lexical specification.
8. ▶ A compiler is a program that reads a program written in any high level
language and translate it into one equivalent language.
Lahore Garrison University
▶ Input language:Source Program
▶ Output language:T
arget Program
▶ Compiler also reports errors present in the source program as it is the part
of its translation
8
10. Lahore Garrison University
Single pass compiler
▶It compiles the whole process in only one
pass.
Multi pass compiler
▶ It compiles the process source code
of a program multiple time
10
11. 2 main parts of compiler
Lahore Garrison University
Analysis
▶ It takes the input source program and
breaks it into parts then create an
intermediate code representation of
source program
Synthesis
▶ It takes the intermediate code as
input and creates target program
10
12. Programs that help the compiler
Lahore Garrison University
▶ Preprocessors (accepts the source code as input and is responsible for
▶ Macro expansion (limit)(single line of code)
▶ File inclusion (#include<stdio.h>
▶ Assembler
▶ Ittakes the assembly code that has generated by the compiler and convert
into machine code
▶ Linker Loader
▶ Linker includes libraries, linked file and creates executable file
▶ Loader loads the executable program into the memory
11
14. Phases of Compiler
Lahore Garrison University
13
Steam of Char
Token
Parse Tree
STD Syntax directed translation
grammer rules apply action
3 address code
Source Program
Lexical Analysis(Scanner)
Syntax Analysis(parser)
Semantic Analysis( logicaltypeoferror)
Intermediate code generator
Code Optimization(no. of lines/ loop)
T
arget Code Generator
T
arget Program
Error
Handler
Symbol
Table
Manager
15. Lexical Analysis
Lahore Garrison University
▶ Lexical generates tokens
▶ Scan program character by character
▶ It determines operators, identifier, keywords
▶ Group all together and creates tokens
▶ Also removes the spaces (spaces are use for better readability)
▶ x=a +b *20
14
16. Syntax Analysis
Lahore Garrison University
▶ Generates parse tree
▶ Checks the structure of program that it follows rules or not
▶ Checks syntax by using grammar
▶ x=a +b *20
15
18. Intermediate Code Generator
Lahore Garrison University
▶ Generates three address code
▶ Address can be a memory location, can be a register
17
19. Symbol table
Lahore Garrison University
▶ Symbol table is a data structure that stores various identifiers/tokens and
their attributes.
▶ e.g. int a;
▶ a isvariable of integer data type.
18
21. Lexical Analysis
Lahore Garrison University
▶ Lexical Analysis converts source program into a stream of valid words of
the language, known astokens
▶ Also known as scanning
▶ There are two parts of lexical analysis
▶ Scanning read program character by character
▶ Lexical analysisgenerates T
okens
20
22. Basic functions of Lexical Analysis
Lahore Garrison University
1. Reads the input program character by character and produces a stream
of tokens which isused by parser
2. Removescomments from source program
3. Removes whitespaces characters (blank spaces, tabs, new line)
4. Handling of preprocessor directive
1. #include 2. #define
5. Display errors (if present) n the source program along with line numbers
21
24. ▶ T
oken is valid word
▶ It may be a keyword, operator, identifier or any punctuation character
Lahore Garrison University
23
25. Token, Pattern, Lexeme
Lahore Garrison University
▶ Tokens are terminal symbol of source language (identifier, operators,
punctuation symbol)
▶ Pattern is a rule particular token in a source language
▶ Id: starts with an alphabet or underscore followed by any alphanumeric
character
▶ Lexemes are match against the pattern a specific instance of a token
24
26. Example
Lahore Garrison University
▶ count =count +temp;
▶ In this count and temp are identifier and =and +are operators and ;is
punctuation
▶ 31 +28 –59
▶ In this number [0-9]+31, 28 and 59 are numbers and +and – are operators
25
27. Lexical Errors
Lahore Garrison University
▶ A lexical analyzer may not proceed if no rules/pattern matches the prefix
of remaining input.
26
28. Attribute for a token
Lahore Garrison University
▶ When a lexeme is encountered in a program, it is necessary to keep a
track of another occurrence of the same lexeme. (i.e. if the lexeme has
been seen before rnot)
▶ T
rack of operators are not necessary
▶ T
rack of identifiersare necessary
▶ We use symbol table to store lexeme in symbol table
▶ *the pointer of thissymbol table entry becomes an attribute of that
particulartoken
▶ E =m *c ^2 <e, pointer> pointerindicatesthe entry of lexeme
in the symbol table
27
29. The process followed by lexical
Lahore Garrison University
▶ T
ake source program asinput
▶ scan it character by character
▶ group these characters into lexemes
▶ Passthe tokens and attributes to the parser
28
30. Look ahead
Lahore Garrison University
▶ a >
=b
▶ Read character next of lexeme is compulsory to verify correct lexeme
▶ And push back extra character to the program that hasbeen read
29
31. Input Buffer
Lahore Garrison University
▶ The lexical analyzer scans the input from left to right one character at a
time. It uses two pointers begin ptr(bp) and forward to keep track of the
pointerof the input scanned.
30
34. ▶ The forward ptr moves ahead to search for end of lexeme. As soon as the
blank space is encountered, it indicates end of lexeme. In above example
as soon as ptr (fp) encounters a blank space the lexeme “int” is identified.
Lahore Garrison University
▶ The fp will be moved ahead at white space, when fp encounters white
space, it ignore and moves ahead. then both the begin ptr(bp) and
forward ptr(fp) are set at next token.
33
36. ▶ One Buffer Scheme:
Inthis scheme, only one buffer is used to store the input string but the
problem with this scheme is that if lexeme is very long then it crosses the
buffer boundary, to scan rest of the lexeme the buffer has to be refilled,
that makes overwriting the first of lexeme.
Lahore Garrison University
35
38. ▶ Initially both the bp and fp are pointing to the first character of first buffer. Then
the fp moves towards right in search of end of lexeme. as soon as blank
character isrecognized, the string between bp and fp isidentified as
corresponding token. to identify, the boundary of first buffer end of buffer
character should be placed at the end first buffer.
Lahore Garrison University
▶ Similarly end of second buffer isalso recognized by the end of buffer mark
present at the end of second buffer. when fp encounters first eof, then one can
recognize end of first buffer and hence filling up second buffer is started. in the
same way when second eof isobtained then it indicates of second buffer.
alternatively both the buffers can be filled up until end of the input program
and stream of tokens isidentified. Thiseof character introduced at the end is
calling Sentinel which is used to identify the end of buffer.
37
39. Symbol Table
Lahore Garrison University
▶ It is a memory that construct in our language. Symbol table is
implemented by # table.
38
40. Symbol table
Lahore Garrison University
▶ An essential function of a compiler is to record the variable names used in
the source program and collect information about the various attributes of
each name.
▶ A symbol table is a data structure containing all the identifies (name of
variables, procedures etc.) of a source program together with all the
attributes of each identifier
.
▶ The symbol table stores information about the entire source program, is
used by all phases of the compiler
.
39
41. ▶ The analysis part also collects information about the source program and
stores it in a data structure called a symbol table, which passed along with
the intermediate representation to the synthesis part.
Lahore Garrison University
▶ Symbol table is an important data structure created and maintained by
compilers in order to store information about the occurrence of various
entities such as variables names, function names, objects, classes,
interface etc.
40
42. Purpose of Symbol Table
Lahore Garrison University
▶ To store the names of all entities in a structured form at one place.
▶ Provide quick and uniform access to identifier attributes throughout the
compilerprocess
▶ T
o verify if a variable has been declared.
▶ To implement type checking, by verifying assignments and expressions in
the source code are semantically correct.
▶ T
o determine the scope of a name (scope resolution)
41
43. Interaction between the symbol table
and the phases of a compiler
Lahore Garrison University
▶ Virtually every phase of the compiler will use the symbol table.
▶ Initialization phase will place keywords, operators, and standard identifiers in it.
▶ Scanner will place user-defined identifiers and literals in it and will return the
corresponding token.
▶ The parser uses these token to create the parse tree, the product of the
syntactic analysisof the program.
▶ The semantic action routines place data type in its entries and uses this
information in performing basic type checking.
▶ The intermediate code generation phase use pointers to entries in symbol table
in creating the intermediate representation of the program.
▶ The object code generation phase uses pointers to entries in the symbol table
in allocating storage of its variables and constants, as well as to store the
addresses of its procedures and functions.
42
44. Symbol Table
Lahore Garrison University
▶ Symbol table is an important data structure created and maintained by
compilers in order to store information about the occurrence of various
entities such as variable names, function names, objects, classes,
interfaces, etc.
43
45. A symbol table may serve the
following purposes
Lahore Garrison University
▶ Symbol table is used in Analysis and Synthesis
▶ Information store in symbol table by analysisphases
▶ Scanner enter an identifier into a symbol table
▶ Parser/ semantic enter corresponding attributes
▶ It helps to determine scope of identifier
44
46. Continue …
Lahore Garrison University
▶ It helps whether a variable isdefined already or not
▶ T
o add a new name to the table
▶ T
o delete a name from the table
▶ T
o access information with a given name
▶ T
ype checking for semantic correctness
▶ If procedures/functions, it also store number of arguments and types of
arguments
45
47. Why we need Symbol table
Lahore Garrison University
▶ T
ype checking
▶ verify declaration if variable (a variable must be declared before its use)
▶ example
▶ int b;
▶ b=2+3;
▶ Sum=b+2;
46
48. Operation of Symbol Table
Lahore Garrison University
▶ Insert(name,type)
▶ int b;
▶ Insert(b , int)
▶ Lookup(name)
▶ It checks the name in symbol table
▶ Iffound it will return the attribute of identifier
▶ If not found it will return 0
▶ Delete operation
▶ Modify operation
47
49. Data Structures used for Symbol table
Lahore Garrison University
▶ Linear structures (sorted / unsorted)
▶ Binary trees
▶ Hash table
▶ *linear approach is simplest as it stores in order of arrival of variable
48
50. Symbol table representation
Lahore Garrison University
Fixed Length
▶ int calculate;
Variable Length
c a l c u l a t e int
s u m int
a Int
b int
49
▶ int sum;
▶ int a,b;
c a l c u l a t e $ s u m $ a $ b $
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Starting
Index
Length Type
0 10 int
10 4 int
14 2 int
16 2 int
51. Example
Lahore Garrison University
▶ int var1;
▶ int procA()
▶ {
▶ int var2,var3;
▶ {
▶ int var4,var5;
▶ }
▶ int var6;
▶ {
▶ int var7,var8;
▶ }
▶ }
▶ int procB()
▶ {
▶ int var9,var10;
▶ {
▶ int var11,var12;
▶ }
▶ int var13;
▶ }
50
52. Lahore Garrison University
51
Symbol table (Global)
var1 var int
procA Procedure int
procB procedure int
Symbol table procA
Var2 var int
Var3 var int
var6 var int
Symbol table procB
Var9 var int
Var10 var int
Var13 var int
53. 52
Symbol table inner
scope 1
Var4 var int
Var5 var int
Symbol table inner
scope 2
Var7 var int
Var8 var int
Symbol table inner
scope 3
Var11 var int
Var12 var int
Lahore Garrison University
55. References
Lahore Garrison University
▶ 1.Compilers: Principles, Techniques, and Tools, A. V. Aho,
R. Sethi and J. D. Ullman, Addison-Wesley, 2nded., 2006
▶ 2.Modern Compiler Design, D. Grune, H. E. Bal, C. J. H.
Jacobs, K. G. Langendoen, John Wiley, 2003.
▶ 3.Modern Compiler Implementation in C, A. W. Appel,
M. Ginsburg, Cambridge University Press, 2004
54