SlideShare uma empresa Scribd logo
1 de 235
Introduction of Compiler Design
• Compiler is a software which converts a
program written in high level language (Source
Language) to low level language
(Object/Target/Machine Language).
• Cross Compiler that runs on a machine ‘A’ and
produces a code for another machine ‘B’.
• It is capable of creating code for a platform
other than the one on which the compiler is
running.
• Source-to-source Compiler or transcompiler
or transpiler is a compiler that translates
source code written in one programming
language into source code of another
programming language.
Language processing systems (using
Compiler)
• We know a computer is a logical assembly of
Software and Hardware.
• The hardware knows a language, that is hard for
us to grasp, consequently we tend to write
programs in high-level language, that is much less
complicated for us to comprehend and maintain
in thoughts.
• Now these programs go through a series of
transformation so that they can readily be used
machines.
• This is where language procedure systems come
handy.
• High Level Language – If a program contains #define or
#include directives such as #include or #define it is called
HLL. They are closer to humans but far from machines.
These (#) tags are called pre-processor directives. They
direct the pre-processor about what to do.
• Pre-Processor – The pre-processor removes all the #include
directives by including the files called file inclusion and all
the #define directives using macro expansion. It performs
file inclusion, augmentation, macro-processing etc.
• Assembly Language – Its neither in binary form nor high
level. It is an intermediate state that is a combination of
machine instructions and some other useful data needed
for execution.
• Assembler – For every platform (Hardware + OS) we will
have a assembler. They are not universal since for each
platform we have one. The output of assembler is called
object file. Its translates assembly language to machine
code.
• Interpreter – An interpreter converts high level language
into low level machine language, just like a compiler. But
they are different in the way they read the input. The
Compiler in one go reads the inputs, does the processing
and executes the source code whereas the interpreter does
the same line by line. Compiler scans the entire program
and translates it as a whole into machine code whereas an
interpreter translates the program one statement at a time.
Interpreted programs are usually slower with respect to
compiled ones.
• Relocatable Machine Code – It can be loaded at any point
and can be run. The address within the program will be in
such a way that it will cooperate for the program
movement.
• Loader/Linker – It converts the relocatable code into
absolute code and tries to run the program resulting in a
running program or an error message (or sometimes both
can happen). Linker loads a variety of object files into a
single file to make it executable. Then loader loads it in
memory and executes it.
Phases of a Compiler
• There are two major phases of compilation,
which in turn have many parts.
• Each of them take input from the output of
the previous level and work in a coordinated
way.
Analysis Phase
• An intermediate representation is created
from the give source code :
1. Lexical Analyzer
2. Syntax Analyzer
3. Semantic Analyzer
4. Intermediate Code Generator
• Lexical analyzer divides the program into
“tokens”,
• Syntax analyzer recognizes “sentences” in the
program using syntax of language and
• Semantic analyzer checks static semantics of
each construct.
• Intermediate Code Generator generates
“abstract” code.
Synthesis Phase
• Equivalent target program is created from the
intermediate representation.
• It has two parts :
1. Code Optimizer
2. Code Generator
• Code Optimizer optimizes the abstract code,
and
• final Code Generator translates abstract
intermediate code into specific machine
instructions.
Compiler construction tools
• The compiler writer can use some specialized
tools that help in implementing various
phases of a compiler.
• These tools assist in the creation of an entire
compiler or its parts.
• Some commonly used compiler construction
tools include:
1. Parser Generator –
• It produces syntax analyzers (parsers) from the
input that is based on a grammatical
description of programming language or on a
context-free grammar.
• It is useful as the syntax analysis phase is
highly complex and consumes more manual
and compilation time.
Example: PIC, EQM
2. Scanner Generator
• It generates lexical analyzers from the input
that consists of regular expression description
based on tokens of a language.
• It generates a finite automation to recognize
the regular expression.
Example: Lex
• Syntax directed translation engines –
It generates intermediate code with three
address format from the input that consists of a
parse tree. These engines have routines to
traverse the parse tree and then produces the
intermediate code. In this, each node of the parse
tree is associated with one or more translations.
• Automatic code generators –
It generates the machine language for a target
machine. Each operation of the intermediate
language is translated using a collection of rules
and then is taken as an input by the code
generator. Template matching process is used. An
intermediate language statement is replaced by
its equivalent machine language statement using
templates.
• Data-flow analysis engines –
It is used in code optimization.
• Data flow analysis is a key part of the code
optimization that gathers the information,
that is the values that flow from one part of a
program to another.
• Compiler construction toolkits –
It provides an integrated set of routines that
aids in building compiler components or in the
construction of various phases of compiler.
Symbol Table in Compiler
• Symbol Table is an important data structure
created and maintained by the compiler in
order to keep track of semantics of variable
i.e. it stores information about scope and
binding information about names, information
about instances of various entities such as
variable and function names, classes, objects,
etc.
• It is built in lexical and syntax analysis phases.
• The information is collected by the analysis
phases of compiler and is used by synthesis
phases of compiler to generate code.
• It is used by compiler to achieve compile time
efficiency.
• It is used by various phases of compiler as follows :-
• Lexical Analysis: Creates new table entries in the table,
example like entries about token.
• Syntax Analysis: Adds information regarding attribute
type, scope, dimension, line of reference, use, etc in
the table.
• Semantic Analysis: Uses available information in the
table to check for semantics i.e. to verify that
expressions and assignments are semantically
correct(type checking) and update it accordingly.
• Intermediate Code generation: Refers symbol table for
knowing how much and what type of run-time is
allocated and table helps in adding temporary variable
information.
• Code Optimization: Uses information present in
symbol table for machine dependent optimization.
• Target Code generation: Generates code by using
address information of identifier present in the table.
• Symbol Table entries –
• Each entry in symbol table is associated with
attributes that support compiler in different
phases.
Items stored in Symbol table:
• Variable names and constants
• Procedure and function names
• Literal constants and strings
• Compiler generated temporaries
• Labels in source languages
• Operations of Symbol table –
• The basic operations defined on a symbol
table include:
Implementation of Symbol table
Consider the following expression and
construct a DAG for it
( a + b ) x ( a + b + c )
Intermediate Code Generation in
Compiler Design
• In the analysis-synthesis model of a compiler, the
front end of a compiler translates a source
program into an independent intermediate code,
then the back end of the compiler uses this
intermediate code to generate the target code
(which can be understood by the machine).
• The benefits of using machine independent
intermediate code are:
• Because of the machine independent
intermediate code, portability will be enhanced.
• For ex, suppose, if a compiler translates the
source language to its target machine
language without having the option for
generating intermediate code, then for each
new machine, a full native compiler is
required.
• Because, obviously, there were some
modifications in the compiler itself according
to the machine specifications.
• Retargeting is facilitated
• It is easier to apply source code modification
to improve the performance of source code by
optimising the intermediate code.
• If we generate machine code directly from
source code then for n target machine we will
have n optimisers and n code generators but if
we will have a machine independent
intermediate code,
we will have only one optimiser.
• Intermediate code can be either language
specific (e.g., Bytecode for Java) or language.
independent (three-address code).
The following are commonly used
intermediate code representation:
• Postfix Notation –
• The ordinary (infix) way of writing the sum of a and b is
with operator in the middle : a + b
• The postfix notation for the same expression places the
operator at the right end as ab +.
• In general, if e1 and e2 are any postfix expressions, and + is
any binary operator, the result of applying + to the values
denoted by e1 and e2 is postfix notation by e1e2 +.
• No parentheses are needed in postfix notation because the
position and arity (number of arguments) of the operators
permit only one way to decode a postfix expression.
• In postfix notation the operator follows the operand.
• Example – The postfix representation of the expression (a –
b) * (c + d) + (a – b) is : ab – cd + *ab -+.
Read more: Infix to Postfix
Introduction of Lexical Analysis
• Lexical Analysis is the first phase of compiler
also known as scanner.
• It converts the High level input program into a
sequence of Tokens.
• Lexical Analysis can be implemented with
the Deterministic finite Automata.
• The output is a sequence of tokens that is sent
to the parser for syntax analysis
• What is a token?
A lexical token is a sequence of characters that
can be treated as a unit in the grammar of the
programming languages.
• Example of tokens:
• Type token (id, number, real, . . . )
• Punctuation tokens (IF, void, return, . . . )
• Alphabetic tokens (keywords)
Example of Non-Tokens:
Comments, preprocessor directive, macros, blanks, tabs,
newline etc
• Lexeme:
• The sequence of characters matched by a
pattern to form the corresponding token or a
sequence of input characters that comprises a
single token is called a lexeme.
• eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”,
“;” .
• How Lexical Analyzer functions
• 1. Tokenization .i.e Dividing the program into
valid tokens.
2. Remove white space characters.
3. Remove comments.
4. It also provides help in generating error
message by providing row number and
column number.
• Exercise 2:
• Count number of tokens :
• int max(int i);
• Lexical analyzer first read int and finds it to be
valid and accepts as token
• max is read by it and found to be valid
function name after reading (
• int is also a token , then again i as another
token and finally ;
Error detection and Recovery in
Compiler
• In this phase of compilation, all possible errors
made by the user are detected and reported
to the user in form of error messages. This
process of locating errors and reporting it to
user is called Error Handling process.
Functions of Error handler
• Detection
• Reporting
• Recovery
Compile time errors are of three
types
• These errors are detected during the lexical
analysis phase. Typical lexical errors are
1. Exceeding length of identifier or numeric
constants.
2. Appearance of illegal characters
3. Unmatched string
Syntactic phase errors
• These errors are detected during syntax
analysis phase. Typical syntax errors are
• Errors in structure
• Missing operator
• Misspelled keywords
• Unbalanced parenthesis
Semantic errors
• These errors are detected during semantic
analysis phase. Typical semantic errors are
1. Incompatible type of operands
2. Undeclared variables
3. Not matching of actual arguments with
formal one
Code Optimization in Compiler Design
• The code optimization in the synthesis phase is a
program transformation technique, which tries to
improve the intermediate code by making it
consume fewer resources (i.e. CPU, Memory) so
that faster-running machine code will result.
Compiler optimizing process should meet the
following objectives :
• The optimization must be correct, it must not, in
any way, change the meaning of the program.
• Optimization should increase the speed and
performance of the program.
• The compilation time must be kept reasonable.
• The optimization process should not delay the
overall compiling process.
• When to Optimize?
Optimization of the code is often performed at the end
of the development stage since it reduces readability
and adds code that is used to increase the
performance.
• Why Optimize?
Optimizing an algorithm is beyond the scope of the
code optimization phase. So the program is optimized.
And it may involve reducing the size of the code. So
optimization helps to:
• Reduce the space consumed and increases the speed
of compilation.
• Manually analyzing datasets involves a lot of time.
Hence we make use of software like Tableau for data
analysis. Similarly manually performing the
optimization is also tedious and is better done using a
code optimizer.
• An optimized code often promotes re-usability.
Types of Code Optimization –The optimization process
can be broadly classified into two types :
• Machine Independent Optimization – This code
optimization phase attempts to improve
the intermediate code to get a better target code as
the output. The part of the intermediate code which is
transformed here does not involve any CPU registers or
absolute memory locations.
• Machine Dependent Optimization – Machine-
dependent optimization is done after the target
code has been generated and when the code is
transformed according to the target machine
architecture. It involves CPU registers and may have
absolute memory references rather than relative
references. Machine-dependent optimizers put efforts
to take maximum advantage of the memory hierarchy
Code Optimization is done in the
following different ways
Hence, after variable propagation, a*b and x*b will be identified as common sub-
expression.
Three address code in Compiler
• Three address code is a type of intermediate
code which is easy to generate and can be
easily converted to machine code.
• It makes use of at most three addresses and
one operator to represent an expression and
the value computed at each instruction is
stored in temporary variable generated by
compiler.
• The compiler decides the order of operation
given by three address code.
General representation
• Where a, b or c represents operands like
names, constants or compiler generated
temporaries and op represents the operator
• Example-2: Write three address code for
following code
Parse Tree
• Parse : It means to resolve (a sentence) into its
component parts and describe their syntactic
roles or simply it is an act of parsing a string or
a text.
• Tree : A tree may be a widely used abstract
data type that simulates a hierarchical tree
structure, with a root value and sub-trees of
youngsters with a parent node, represented as
a group of linked nodes.
• Uses of Parse Tree :
• It helps in making syntax analysis by reflecting
the syntax of the input language.
• It uses an in-memory representation of the
input with a structure that conforms to the
grammar.
• The advantages of using parse trees rather
than semantic actions: you’ll make multiple
passes over the info without having to re-
parse the input.
Types of Parsers in Compiler Design
• Parser is that phase of compiler which takes
token string as input and with the help of
existing grammar, converts it into the
corresponding parse tree.
• Parser is also known as Syntax Analyzer.
Bottom Up or Shift Reduce Parsers |
Set 2
• Bottom Up Parsers / Shift Reduce Parsers
Build the parse tree from leaves to root.
• Bottom-up parsing can be defined as an
attempt to reduce the input string w to the
start symbol of grammar by tracing out the
rightmost derivations of w in reverse.
Eg.
Recursive Descent Parser
• Parsing is the process to determine whether
the start symbol can derive the program or
not.
• If the Parsing is successful then the program is
a valid program otherwise the program is
invalid.
There are generally two types of
Parsers
Recursive Descent Parser
• It is a kind of Top-Down Parser.
• A top-down parser builds the parse tree from
the top to down, starting with the start non-
terminal.
• A Predictive Parser is a special case of
Recursive Descent Parser, where no Back
Tracking is required.
• By carefully writing a grammar means
eliminating left recursion and left factoring
from it, the resulting grammar will be a
grammar that can be parsed by a recursive
descent parser.
**Here e is Epsilon
• A general shift reduce parsing is LR parsing.
• The L stands for scanning the input from left
to right and R stands for constructing a
rightmost derivation in reverse.
Benefits of LR parsing:
• Many programming languages using some
variations of an LR parser. It should be noted
that C++ and Perl are exceptions to it.
• LR Parser can be implemented very efficiently
• Here we will look at the construction of GOTO
graph of grammar by using all the four LR
parsing techniques
S – attributed and L – attributed SDTs
in Syntax directed translation
• Types of attributes –
Attributes may be of two types – Synthesized or
Inherited.
1. Synthesized attributes –
A Synthesized attribute is an attribute of the
non-terminal on the left-hand side of a
production.
• Synthesized attributes represent information
that is being passed up the parse tree.
• The attribute can take value only from its
children (Variables in the RHS of the
production).
• For eg. let’s say A -> BC is a production of a
grammar, and A’s attribute is dependent on B’s
attributes or C’s attributes then it will be
synthesized attribute.
2. Inherited attributes –
An attribute of a nonterminal on the right-
hand side of a production is called an
inherited attribute.
• The attribute can take value either from its
parent or from its siblings (variables in the LHS
or RHS of the production).
• For example, let’s say A -> BC is a production
of a grammar and B’s attribute is dependent
on A’s attributes or C’s attributes then it will
be inherited attribute.
• Now, let’s discuss about S-attributed and L-
attributed SDT.
• S-attributed SDT :
– If an SDT uses only synthesized attributes, it is
called as S-attributed SDT.
– S-attributed SDTs are evaluated in bottom-up
parsing, as the values of the parent nodes depend
upon the values of the child nodes.
– Semantic actions are placed in rightmost place of
RHS.
• L-attributed SDT:
– If an SDT uses both synthesized attributes and
inherited attributes with a restriction that
inherited attribute can inherit values from left
siblings only, it is called as L-attributed SDT.
– Attributes in L-attributed SDTs are evaluated by
depth-first and left-to-right parsing manner.
– Semantic actions are placed anywhere in RHS.
• For example,
• A -> XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S} is not an
L-attributed grammar
• since Y.S = A.S and Y.S = X.S are allowed but Y.S
= Z.S violates the L-attributed SDT definition as
attributed is inheriting the value from its right
sibling.
Compiler Design | Detection of a Loop
in Three Address Code
• Loop optimization is the phase after the
Intermediate Code Generation.
• The main intention of this phase is to reduce
the number of lines in a program.
• In any program majority of the time is spent
by any program is actually inside the loop for
an iterative program.
• In case of the recursive program a block will
be there and the majority of the time will
present inside the block
• Loop Optimization –
• To apply loop optimization we must first
detect loops.
• For detecting loops we use Control Flow
Analysis(CFA) using Program Flow Graph(PFG).
• To find program flow graph we need to find
Basic Block
• Basic Block –
• A basic block is a sequence of three address
statements where control enters at the beginning
and leaves only at the end without any jumps or
halts.
• Finding the Basic Block –
In order to find the basic blocks, we need to find
the leaders in the program.
• Then a basic block will start from one leader to
the next leader but not including the next leader.
Which means if you find out that lines no 1 is a
leader and line no 15 is the next leader, then the
line from 1 to 14 is a basic block, but not
including line no 15
• Identifying leader in a Basic Block –
• First statement is always a leader
• Statement that is target of conditional or un-
conditional statement is a leader
• Statement that follows immediately a
conditional or un-conditional statement is a
leader
Language Processors: Assembler,
Compiler and Interpreter
• Language Processors –
Assembly language is machine dependent yet
mnemonics that are being used to represent
instructions in it are not directly understandable
by machine and high Level language is machine
independent.
• A computer understands instructions in machine
code, i.e. in the form of 0s and 1s.
• It is a tedious task to write a computer program
directly in machine code.
• The programs are written mostly in high level
languages like Java, C++, Python etc. and are
called source code.
• These source code cannot be executed
directly by the computer and must be
converted into machine language to be
executed.
• Hence, a special translator system software is
used to translate the program written in high
level language into machine code is
called Language Processor and the program
after translated into machine code (object
program / object code).
The language processors can be any
of the following three types:
• Compiler –
The language processor that reads the complete source
program written in high level language as a whole in
one go and translates it into an equivalent program in
machine language is called as a Compiler.
Example: C, C++, C#, Java In a compiler, the source
code is translated to object code successfully if it is free
of errors.
• The compiler specifies the errors at the end of
compilation with line numbers when there are any
errors in the source code.
• The errors must be removed before the compiler can
successfully recompile the source code again.
• Assembler –
The Assembler is used to translate the
program written in Assembly language into
machine code.
• The source program is a input of assembler
that contains assembly language instructions.
• The output generated by assembler is the
object code or machine code understandable
by the computer.
• Interpreter –
The translation of single statement of source
program into machine code is done by language
processor and executes it immediately before
moving on to the next line is called an interpreter.
• If there is an error in the statement, the
interpreter terminates its translating process at
that statement and displays an error message.
• The interpreter moves on to the next line for
execution only after removal of the error.
• An Interpreter directly executes instructions
written in a programming or scripting language
without previously converting them to an object
code or machine code.
Example: Perl, Python and Matlab.
FIRST Set in Syntax Analysis
Note
FOLLOW Set in Syntax Analysis
• Follow(X) to be the set of terminals that can
appear immediately to the right of Non-
Terminal X in some sentential form.
• 1) FOLLOW(S) = { $ } // where S is the starting
Non-Terminal
• 2) If A -> pBq is a production, where p, B and q
are any grammar symbols, then everything in
FIRST(q) except Є is in FOLLOW(B).
• 3) If A->pB is a production, then everything in
FOLLOW(A) is in FOLLOW(B).
• 4) If A->pBq is a production and FIRST(q)
contains Є, then FOLLOW(B) contains {
FIRST(q) – Є } U FOLLOW(A)
• FOLLOW Set FOLLOW(E) = { $ , ) } // Note ')' is
there because of 5th rule
• FOLLOW(E’) = FOLLOW(E) = { $, ) } // See 1st
production rule FOLLOW(T) = { FIRST(E’) – Є }
U FOLLOW(E’) U FOLLOW(E) = { + , $ , ) }
FOLLOW(T’) = FOLLOW(T) = { + , $ , ) }
FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U
FOLLOW(T) = { *, +, $, ) }
Classification of Context Free
Grammars
• Context Free Grammars (CFG) can be classified on the
basis of following two properties:
• 1) Based on number of strings it generates.
• If CFG is generating finite number of strings, then CFG
is Non-Recursive (or the grammar is said to be Non-
recursive grammar)
• If CFG can generate infinite number of strings then the
grammar is said to be Recursive grammar
• During Compilation, the parser uses the grammar of
the language to make a parse tree(or derivation tree)
out of the source code. The grammar used must be
unambiguous. An ambiguous grammar must not be
used for parsing.
• 2) Based on number of derivation trees.
• If there is only 1 derivation tree then the CFG
is unambiguous.
• If there are more than 1 derivation tree, then
the CFG is ambiguous.
Examples of Recursive Grammars
Examples of Non-Recursive Grammars
Ambiguous Grammar
• Context Free Grammars(CFGs) are classified
based on:
1. Number of Derivation trees
2. Number of strings
Depending on Number of Derivation trees, CFGs
are sub-divided into 2 types:
1. Ambiguous grammars
2. Unambiguous grammars
Ambiguous grammar
• For Example:
1. Let us consider this grammar : E -> E+E|id
• We can create 2 parse tree from this grammar
to obtain a string id+id+id :
Note
• Both the above parse trees are derived from
same grammar rules but both parse trees are
different.
• Hence the grammar is ambiguous.
• From the above grammar String 3*2+5 can be
derived in 2 ways:
• Derivation tree:
• It tells how string is derived using production
rules from S and has been shown in Figure 1.
Removing Left Recursion
Inherently ambiguous Language
• Note : Ambiguity of a grammar is undecidable,
i.e. there is no particular algorithm for
removing the ambiguity of a grammar, but we
can remove ambiguity by:
• Disambiguate the grammar i.e., rewriting the
grammar such that there is only one
derivation or parse tree possible for a string of
the language which the grammar represents.
Clone a Directed Acyclic Graph
• A directed acyclic graph (DAG) is a graph
which doesn’t contain a cycle and has directed
edges.
• We are given a DAG, we need to clone it, i.e.,
create another graph that has copy of its
vertices and edges connecting them.
Syntax Directed Translation in
Compiler Design
• Background : Parser uses a CFG(Context-free-
Grammer) to validate the input string and
produce output for next phase of the
compiler.
• Output could be either a parse tree or abstract
syntax tree.
• Now to interleave semantic analysis with
syntax analysis phase of the compiler, we use
Syntax Directed Translation.
• Definition
Syntax Directed Translation are augmented rules
to the grammar that facilitate semantic analysis.
SDT involves passing information bottom-up
and/or top-down the parse tree in form of
attributes attached to the nodes.
• Syntax directed translation rules use 1) lexical
values of nodes, 2) constants & 3) attributes
associated to the non-terminals in their
definitions.
• The general approach to Syntax-Directed
Translation is to construct a parse tree or syntax
tree and compute the values of attributes at the
nodes of the tree by visiting them in some order.
In many cases, translation can be done during
parsing without building an explicit tree.
• To evaluate translation rules, we can employ one
depth first search traversal on the parse tree.
• This is possible only because SDT rules don’t
impose any specific order on evaluation until
children attributes are computed before parents
for a grammar having all synthesized attributes.
• Otherwise, we would have to figure out the best
suited plan to traverse through the parse tree and
evaluate all the attributes in one or more
traversals.
• For better understanding, we will move bottom
up in left to right fashion for computing
translation rules of our example.
• Syntax Directed Definition (SDD) is a kind of
abstract specification.
• It is generalization of context free grammar in
which each grammar production X –> a is
associated with it a set of production rules of
the form s = f(b1, b2, ……bk) where s is the
attribute obtained from function f.
• The attribute can be a string, number, type or
a memory location.
• Semantic rules are fragments of code which
are embedded usually at the end of
production and enclosed in curly braces ({ }).
• Let us assume an input string 4 * 5 + 6 for
computing synthesized attributes.
• The annotated parse tree for the input string
is
• For computation of attributes we start from
leftmost bottom node.
• The rule F –> digit is used to reduce digit to F and
the value of digit is obtained from lexical analyzer
which becomes value of F i.e. from semantic
action F.val = digit.lexval.
• Hence, F.val = 4 and since T is parent node of F so,
we get T.val = 4 from semantic action T.val = F.val.
Then, for T –> T1 * F production, the
corresponding semantic action is T.val = T1.val *
F.val . Hence, T.val = 4 * 5 = 20
• Similarly, combination of E1.val + T.val becomes
E.val i.e. E.val = E1.val + T.val = 26.
• Then, the production S –> E is applied to reduce
E.val = 26 and semantic action associated with it
prints the result E.val .
• Hence, the output will be 26.
• Let us assume an input string int a, c for
computing inherited attributes.
• The annotated parse tree for the input string
is
• The value of L nodes is obtained from T.type
(sibling) which is basically lexical value
obtained as int, float or double.
• Then L node gives type of identifiers a and c.
The computation of type is done in top down
manner or preorder traversal.
• Using function Enter_type the type of
identifiers a and c is inserted in symbol table
at corresponding id.entry.
Removing Direct and Indirect Left
Recursion in a Grammar
Check if the given grammar contains left recursion, if present
then separate the production and start working on it.
In our example,
S-->S a/ S b /c / d
Bootstrapping in Compiler Design
• Step-4: Thus we get a compiler written in ASM
which compiles C and generates code in ASM.
Peephole Optimization in Compiler
Design
• Peephole optimization is a type of Code
Optimization performed on a small part of the
code.
• It is performed on the very small set of
instructions in a segment of code.
• It basically works on the theory of replacement in
which a part of code is replaced by shorter and
faster code without change in output.
• Peephole is the machine dependent optimization.
Objectives of Peephole Optimization
• The objective of peephole optimization is:
1. To improve performance
2. To reduce memory footprint
3. To reduce code size
Peephole Optimization Techniques
Construction of LL(1) Parsing Table
• A top-down parser builds the parse tree from
the top down, starting with the start non-
terminal.
• There are two types of Top-Down Parsers:
1. Top-Down Parser with Backtracking
2. Top-Down Parsers without Backtracking
3. Top-Down Parsers without backtracking can
further be divided into two parts:
• In the table, rows will contain the Non-
Terminals and the column will contain the
Terminal Symbols.
• All the Null Productions of the Grammars will
go under the Follow elements and the
remaining productions will lie under the
elements of the First set.
• As you can see that all the null productions
are put under the Follow set of that symbol
and all the remaining productions are lie
under the First of that symbol.
• Note: Every grammar is not feasible for LL(1)
Parsing table.
• It may be possible that one cell may contain
more than one production.
SLR, CLR and LALR Parsers
• SLR Parser
The SLR parser is similar to LR(0) parser except
that the reduced entry.
• The reduced productions are written only in
the FOLLOW of the variable whose production
is reduced.
CLR PARSER
• In the SLR method we were working with
LR(0)) items.
• In CLR parsing we will be using LR(1) items.
• LR(k) item is defined to be an item using
lookaheads of length k.
• So , the LR(1) item is comprised of two parts :
the LR(0) item and the lookahead associated
with the item.
LR(1) parsers are more powerful parser.
For LR(1) items we modify the Closure and
GOTO function.
Introduction to YACC
• A parser generator is a program that takes as
input a specification of a syntax, and produces
as output a procedure for recognizing that
language. Historically, they are also called
compiler-compilers.
YACC (yet another compiler-compiler) is
an LALR(1) (LookAhead, Left-to-right,
Rightmost derivation producer with 1
lookahead token) parser generator.
• YACC was originally designed for being
complemented by Lex.
Input File
Operator grammar and precedence
parser
• A grammar that is used to define
mathematical operators is called an operator
grammar or operator precedence grammar.
• Such grammars have the restriction that no
production has either an empty right-hand
side (null productions) or two adjacent non-
terminals in its right-hand side.
• Operator precedence parser –
An operator precedence parser is a bottom-up
parser that interprets an operator grammar.
• This parser is only used for operator
grammars.
• Ambiguous grammars are not allowed in any
parser except operator precedence parser.
• There are two methods for determining what
precedence relations should hold between a
pair of terminals:
1. Use the conventional associativity and
precedence of operator.
2. The second method of selecting operator-
precedence relations is first to construct an
unambiguous grammar for the language, a
grammar that reflects the correct
associativity and precedence in its parse
trees.
• This parser relies on the following three
precedence relations: ⋖, ≐, ⋗
• a ⋖ b This means a “yields precedence to” b.
a ⋗ b This means a “takes precedence over” b.
a ≐ b This means a “has same precedence as”
b.
• There is not given any relation between id and
id as id will not be compared and two
variables can not come side by side.
• There is also a disadvantage of this table – if
we have n operators then size of table will be
n*n and complexity will be 0(n2).
• In order to decrease the size of table, we
use operator function table.
• Operator precedence parsers usually do not store
the precedence table with the relations; rather
they are implemented in a special way.
• Operator precedence parsers use precedence
functions that map terminal symbols to integers,
and the precedence relations between the
symbols are implemented by numerical
comparison.
• The parsing table can be encoded by two
precedence functions f and g that map terminal
symbols to integers. We select f and g such that:
• f(a) < g(b) whenever a yields precedence to b
• f(a) = g(b) whenever a and b have the same
precedence
• f(a) > g(b) whenever a takes precedence over b
• Since there is no cycle in the graph, we can
make this function table:
Size of the table is 2n
• One disadvantage of function tables is that
even though we have blank entries in relation
table we have non-blank entries in function
table.
• Blank entries are also called error.
• Hence error detection capability of relation
table is greater than function table.

Mais conteúdo relacionado

Mais procurados

Life cycle of a computer program
Life cycle of a computer programLife cycle of a computer program
Life cycle of a computer programAbhay Kumar
 
Error Detection & Recovery
Error Detection & RecoveryError Detection & Recovery
Error Detection & RecoveryAkhil Kaushik
 
Sorting and Searching Techniques
Sorting and Searching TechniquesSorting and Searching Techniques
Sorting and Searching TechniquesProf Ansari
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Aman Sharma
 
Compiler Construction Course - Introduction
Compiler Construction Course - IntroductionCompiler Construction Course - Introduction
Compiler Construction Course - IntroductionMuhammad Sanaullah
 
Symbol table design (Compiler Construction)
Symbol table design (Compiler Construction)Symbol table design (Compiler Construction)
Symbol table design (Compiler Construction)Tech_MX
 
Introduction to systems programming
Introduction to systems programmingIntroduction to systems programming
Introduction to systems programmingMukesh Tekwani
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designSadia Akter
 
Chap 1-language processor
Chap 1-language processorChap 1-language processor
Chap 1-language processorshindept123
 
Compiler Design
Compiler DesignCompiler Design
Compiler DesignMir Majid
 
Lexical Analysis - Compiler Design
Lexical Analysis - Compiler DesignLexical Analysis - Compiler Design
Lexical Analysis - Compiler DesignAkhil Kaushik
 
Passes of compilers
Passes of compilersPasses of compilers
Passes of compilersVairavel C
 
Overview of Language Processor : Fundamentals of LP , Symbol Table , Data Str...
Overview of Language Processor : Fundamentals of LP , Symbol Table , Data Str...Overview of Language Processor : Fundamentals of LP , Symbol Table , Data Str...
Overview of Language Processor : Fundamentals of LP , Symbol Table , Data Str...Bhavin Darji
 

Mais procurados (20)

Life cycle of a computer program
Life cycle of a computer programLife cycle of a computer program
Life cycle of a computer program
 
Introduction to Compiler design
Introduction to Compiler design Introduction to Compiler design
Introduction to Compiler design
 
Linkers in compiler
Linkers in compilerLinkers in compiler
Linkers in compiler
 
Phases of compiler
Phases of compilerPhases of compiler
Phases of compiler
 
Error Detection & Recovery
Error Detection & RecoveryError Detection & Recovery
Error Detection & Recovery
 
Language processors
Language processorsLanguage processors
Language processors
 
System Programming Overview
System Programming OverviewSystem Programming Overview
System Programming Overview
 
Sorting and Searching Techniques
Sorting and Searching TechniquesSorting and Searching Techniques
Sorting and Searching Techniques
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
 
Compiler Construction Course - Introduction
Compiler Construction Course - IntroductionCompiler Construction Course - Introduction
Compiler Construction Course - Introduction
 
Symbol table design (Compiler Construction)
Symbol table design (Compiler Construction)Symbol table design (Compiler Construction)
Symbol table design (Compiler Construction)
 
Introduction to systems programming
Introduction to systems programmingIntroduction to systems programming
Introduction to systems programming
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler design
 
Chap 1-language processor
Chap 1-language processorChap 1-language processor
Chap 1-language processor
 
Phases of Compiler
Phases of CompilerPhases of Compiler
Phases of Compiler
 
Phases of compiler
Phases of compilerPhases of compiler
Phases of compiler
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Lexical Analysis - Compiler Design
Lexical Analysis - Compiler DesignLexical Analysis - Compiler Design
Lexical Analysis - Compiler Design
 
Passes of compilers
Passes of compilersPasses of compilers
Passes of compilers
 
Overview of Language Processor : Fundamentals of LP , Symbol Table , Data Str...
Overview of Language Processor : Fundamentals of LP , Symbol Table , Data Str...Overview of Language Processor : Fundamentals of LP , Symbol Table , Data Str...
Overview of Language Processor : Fundamentals of LP , Symbol Table , Data Str...
 

Semelhante a Compiler Design Introduction

Compiler Design Basics
Compiler Design BasicsCompiler Design Basics
Compiler Design BasicsAkhil Kaushik
 
Pros and cons of c as a compiler language
  Pros and cons of c as a compiler language  Pros and cons of c as a compiler language
Pros and cons of c as a compiler languageAshok Raj
 
Phases of Compiler.pptx
Phases of Compiler.pptxPhases of Compiler.pptx
Phases of Compiler.pptxssuser3b4934
 
Compiler an overview
Compiler  an overviewCompiler  an overview
Compiler an overviewamudha arul
 
4_5802928814682016556.pptx
4_5802928814682016556.pptx4_5802928814682016556.pptx
4_5802928814682016556.pptxAshenafiGirma5
 
Cd ch1 - introduction
Cd   ch1 - introductionCd   ch1 - introduction
Cd ch1 - introductionmengistu23
 
CD - CH1 - Introduction to compiler design.pptx
CD - CH1 - Introduction to compiler design.pptxCD - CH1 - Introduction to compiler design.pptx
CD - CH1 - Introduction to compiler design.pptxZiyadMohammed17
 
Introduction to Compilers
Introduction to CompilersIntroduction to Compilers
Introduction to CompilersAkhil Kaushik
 
Concept of compiler in details
Concept of compiler in detailsConcept of compiler in details
Concept of compiler in detailskazi_aihtesham
 
System software module 1 presentation file
System software module 1 presentation fileSystem software module 1 presentation file
System software module 1 presentation filejithujithin657
 
Chapter1pdf__2021_11_23_10_53_20.pdf
Chapter1pdf__2021_11_23_10_53_20.pdfChapter1pdf__2021_11_23_10_53_20.pdf
Chapter1pdf__2021_11_23_10_53_20.pdfDrIsikoIsaac
 
Introduction to compiler
Introduction to compilerIntroduction to compiler
Introduction to compilerAbha Damani
 
Week 08_Basics of Compiler Construction.pdf
Week 08_Basics of Compiler Construction.pdfWeek 08_Basics of Compiler Construction.pdf
Week 08_Basics of Compiler Construction.pdfAnonymousQ3EMYoWNS
 
Introduction to Compiler Construction
Introduction to Compiler Construction Introduction to Compiler Construction
Introduction to Compiler Construction Sarmad Ali
 
PCSG_Computer_Science_Unit_1_Lecture_2.pptx
PCSG_Computer_Science_Unit_1_Lecture_2.pptxPCSG_Computer_Science_Unit_1_Lecture_2.pptx
PCSG_Computer_Science_Unit_1_Lecture_2.pptxAliyahAli19
 
Compiler construction tools
Compiler construction toolsCompiler construction tools
Compiler construction toolsAkhil Kaushik
 

Semelhante a Compiler Design Introduction (20)

Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
 
Compiler Design Basics
Compiler Design BasicsCompiler Design Basics
Compiler Design Basics
 
Pros and cons of c as a compiler language
  Pros and cons of c as a compiler language  Pros and cons of c as a compiler language
Pros and cons of c as a compiler language
 
Phases of Compiler.pptx
Phases of Compiler.pptxPhases of Compiler.pptx
Phases of Compiler.pptx
 
Compiler an overview
Compiler  an overviewCompiler  an overview
Compiler an overview
 
COMPILER DESIGN PPTS.pptx
COMPILER DESIGN PPTS.pptxCOMPILER DESIGN PPTS.pptx
COMPILER DESIGN PPTS.pptx
 
4_5802928814682016556.pptx
4_5802928814682016556.pptx4_5802928814682016556.pptx
4_5802928814682016556.pptx
 
Cd ch1 - introduction
Cd   ch1 - introductionCd   ch1 - introduction
Cd ch1 - introduction
 
CD - CH1 - Introduction to compiler design.pptx
CD - CH1 - Introduction to compiler design.pptxCD - CH1 - Introduction to compiler design.pptx
CD - CH1 - Introduction to compiler design.pptx
 
Introduction to Compilers
Introduction to CompilersIntroduction to Compilers
Introduction to Compilers
 
Chap01-Intro.ppt
Chap01-Intro.pptChap01-Intro.ppt
Chap01-Intro.ppt
 
Concept of compiler in details
Concept of compiler in detailsConcept of compiler in details
Concept of compiler in details
 
Chapter1.pdf
Chapter1.pdfChapter1.pdf
Chapter1.pdf
 
System software module 1 presentation file
System software module 1 presentation fileSystem software module 1 presentation file
System software module 1 presentation file
 
Chapter1pdf__2021_11_23_10_53_20.pdf
Chapter1pdf__2021_11_23_10_53_20.pdfChapter1pdf__2021_11_23_10_53_20.pdf
Chapter1pdf__2021_11_23_10_53_20.pdf
 
Introduction to compiler
Introduction to compilerIntroduction to compiler
Introduction to compiler
 
Week 08_Basics of Compiler Construction.pdf
Week 08_Basics of Compiler Construction.pdfWeek 08_Basics of Compiler Construction.pdf
Week 08_Basics of Compiler Construction.pdf
 
Introduction to Compiler Construction
Introduction to Compiler Construction Introduction to Compiler Construction
Introduction to Compiler Construction
 
PCSG_Computer_Science_Unit_1_Lecture_2.pptx
PCSG_Computer_Science_Unit_1_Lecture_2.pptxPCSG_Computer_Science_Unit_1_Lecture_2.pptx
PCSG_Computer_Science_Unit_1_Lecture_2.pptx
 
Compiler construction tools
Compiler construction toolsCompiler construction tools
Compiler construction tools
 

Mais de Thapar Institute

Mais de Thapar Institute (19)

Digital Electronics Unit_4_new.pptx
Digital Electronics Unit_4_new.pptxDigital Electronics Unit_4_new.pptx
Digital Electronics Unit_4_new.pptx
 
Digital Electronics Unit_3.pptx
Digital Electronics Unit_3.pptxDigital Electronics Unit_3.pptx
Digital Electronics Unit_3.pptx
 
Digital Electronics Unit_2.pptx
Digital Electronics Unit_2.pptxDigital Electronics Unit_2.pptx
Digital Electronics Unit_2.pptx
 
Digital Electronics Unit_1.pptx
Digital Electronics Unit_1.pptxDigital Electronics Unit_1.pptx
Digital Electronics Unit_1.pptx
 
C Language Part 1
C Language Part 1C Language Part 1
C Language Part 1
 
Web Technology Part 4
Web Technology Part 4Web Technology Part 4
Web Technology Part 4
 
Web Technology Part 3
Web Technology Part 3Web Technology Part 3
Web Technology Part 3
 
Web Technology Part 2
Web Technology Part 2Web Technology Part 2
Web Technology Part 2
 
Web Technology Part 1
Web Technology Part 1Web Technology Part 1
Web Technology Part 1
 
TOC Introduction
TOC Introduction TOC Introduction
TOC Introduction
 
CSS Introduction
CSS Introduction CSS Introduction
CSS Introduction
 
COA (Unit_4.pptx)
COA (Unit_4.pptx)COA (Unit_4.pptx)
COA (Unit_4.pptx)
 
COA(Unit_3.pptx)
COA(Unit_3.pptx)COA(Unit_3.pptx)
COA(Unit_3.pptx)
 
COA (Unit_2.pptx)
COA (Unit_2.pptx)COA (Unit_2.pptx)
COA (Unit_2.pptx)
 
COA (Unit_1.pptx)
COA (Unit_1.pptx)COA (Unit_1.pptx)
COA (Unit_1.pptx)
 
Software Testing Introduction (Part 4))
 Software Testing Introduction (Part 4)) Software Testing Introduction (Part 4))
Software Testing Introduction (Part 4))
 
Software Testing Introduction (Part 3)
 Software Testing Introduction (Part 3) Software Testing Introduction (Part 3)
Software Testing Introduction (Part 3)
 
Software Testing Introduction (Part 2)
Software Testing Introduction (Part 2)Software Testing Introduction (Part 2)
Software Testing Introduction (Part 2)
 
Software Testing Introduction (Part 1)
Software Testing Introduction (Part 1)Software Testing Introduction (Part 1)
Software Testing Introduction (Part 1)
 

Último

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 

Último (20)

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 

Compiler Design Introduction

  • 2. • Compiler is a software which converts a program written in high level language (Source Language) to low level language (Object/Target/Machine Language).
  • 3. • Cross Compiler that runs on a machine ‘A’ and produces a code for another machine ‘B’. • It is capable of creating code for a platform other than the one on which the compiler is running. • Source-to-source Compiler or transcompiler or transpiler is a compiler that translates source code written in one programming language into source code of another programming language.
  • 4. Language processing systems (using Compiler) • We know a computer is a logical assembly of Software and Hardware. • The hardware knows a language, that is hard for us to grasp, consequently we tend to write programs in high-level language, that is much less complicated for us to comprehend and maintain in thoughts. • Now these programs go through a series of transformation so that they can readily be used machines. • This is where language procedure systems come handy.
  • 5.
  • 6. • High Level Language – If a program contains #define or #include directives such as #include or #define it is called HLL. They are closer to humans but far from machines. These (#) tags are called pre-processor directives. They direct the pre-processor about what to do. • Pre-Processor – The pre-processor removes all the #include directives by including the files called file inclusion and all the #define directives using macro expansion. It performs file inclusion, augmentation, macro-processing etc. • Assembly Language – Its neither in binary form nor high level. It is an intermediate state that is a combination of machine instructions and some other useful data needed for execution. • Assembler – For every platform (Hardware + OS) we will have a assembler. They are not universal since for each platform we have one. The output of assembler is called object file. Its translates assembly language to machine code.
  • 7. • Interpreter – An interpreter converts high level language into low level machine language, just like a compiler. But they are different in the way they read the input. The Compiler in one go reads the inputs, does the processing and executes the source code whereas the interpreter does the same line by line. Compiler scans the entire program and translates it as a whole into machine code whereas an interpreter translates the program one statement at a time. Interpreted programs are usually slower with respect to compiled ones. • Relocatable Machine Code – It can be loaded at any point and can be run. The address within the program will be in such a way that it will cooperate for the program movement. • Loader/Linker – It converts the relocatable code into absolute code and tries to run the program resulting in a running program or an error message (or sometimes both can happen). Linker loads a variety of object files into a single file to make it executable. Then loader loads it in memory and executes it.
  • 8. Phases of a Compiler • There are two major phases of compilation, which in turn have many parts. • Each of them take input from the output of the previous level and work in a coordinated way.
  • 9.
  • 10. Analysis Phase • An intermediate representation is created from the give source code : 1. Lexical Analyzer 2. Syntax Analyzer 3. Semantic Analyzer 4. Intermediate Code Generator
  • 11. • Lexical analyzer divides the program into “tokens”, • Syntax analyzer recognizes “sentences” in the program using syntax of language and • Semantic analyzer checks static semantics of each construct. • Intermediate Code Generator generates “abstract” code.
  • 12. Synthesis Phase • Equivalent target program is created from the intermediate representation. • It has two parts : 1. Code Optimizer 2. Code Generator • Code Optimizer optimizes the abstract code, and • final Code Generator translates abstract intermediate code into specific machine instructions.
  • 13. Compiler construction tools • The compiler writer can use some specialized tools that help in implementing various phases of a compiler. • These tools assist in the creation of an entire compiler or its parts. • Some commonly used compiler construction tools include:
  • 14. 1. Parser Generator – • It produces syntax analyzers (parsers) from the input that is based on a grammatical description of programming language or on a context-free grammar. • It is useful as the syntax analysis phase is highly complex and consumes more manual and compilation time. Example: PIC, EQM
  • 15.
  • 16. 2. Scanner Generator • It generates lexical analyzers from the input that consists of regular expression description based on tokens of a language. • It generates a finite automation to recognize the regular expression. Example: Lex
  • 17.
  • 18. • Syntax directed translation engines – It generates intermediate code with three address format from the input that consists of a parse tree. These engines have routines to traverse the parse tree and then produces the intermediate code. In this, each node of the parse tree is associated with one or more translations. • Automatic code generators – It generates the machine language for a target machine. Each operation of the intermediate language is translated using a collection of rules and then is taken as an input by the code generator. Template matching process is used. An intermediate language statement is replaced by its equivalent machine language statement using templates.
  • 19. • Data-flow analysis engines – It is used in code optimization. • Data flow analysis is a key part of the code optimization that gathers the information, that is the values that flow from one part of a program to another. • Compiler construction toolkits – It provides an integrated set of routines that aids in building compiler components or in the construction of various phases of compiler.
  • 20. Symbol Table in Compiler • Symbol Table is an important data structure created and maintained by the compiler in order to keep track of semantics of variable i.e. it stores information about scope and binding information about names, information about instances of various entities such as variable and function names, classes, objects, etc.
  • 21. • It is built in lexical and syntax analysis phases. • The information is collected by the analysis phases of compiler and is used by synthesis phases of compiler to generate code. • It is used by compiler to achieve compile time efficiency.
  • 22. • It is used by various phases of compiler as follows :- • Lexical Analysis: Creates new table entries in the table, example like entries about token. • Syntax Analysis: Adds information regarding attribute type, scope, dimension, line of reference, use, etc in the table. • Semantic Analysis: Uses available information in the table to check for semantics i.e. to verify that expressions and assignments are semantically correct(type checking) and update it accordingly. • Intermediate Code generation: Refers symbol table for knowing how much and what type of run-time is allocated and table helps in adding temporary variable information. • Code Optimization: Uses information present in symbol table for machine dependent optimization. • Target Code generation: Generates code by using address information of identifier present in the table.
  • 23. • Symbol Table entries – • Each entry in symbol table is associated with attributes that support compiler in different phases. Items stored in Symbol table: • Variable names and constants • Procedure and function names • Literal constants and strings • Compiler generated temporaries • Labels in source languages
  • 24.
  • 25. • Operations of Symbol table – • The basic operations defined on a symbol table include:
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34. Consider the following expression and construct a DAG for it ( a + b ) x ( a + b + c )
  • 35.
  • 36. Intermediate Code Generation in Compiler Design • In the analysis-synthesis model of a compiler, the front end of a compiler translates a source program into an independent intermediate code, then the back end of the compiler uses this intermediate code to generate the target code (which can be understood by the machine). • The benefits of using machine independent intermediate code are: • Because of the machine independent intermediate code, portability will be enhanced.
  • 37. • For ex, suppose, if a compiler translates the source language to its target machine language without having the option for generating intermediate code, then for each new machine, a full native compiler is required. • Because, obviously, there were some modifications in the compiler itself according to the machine specifications. • Retargeting is facilitated • It is easier to apply source code modification to improve the performance of source code by optimising the intermediate code.
  • 38.
  • 39. • If we generate machine code directly from source code then for n target machine we will have n optimisers and n code generators but if we will have a machine independent intermediate code, we will have only one optimiser. • Intermediate code can be either language specific (e.g., Bytecode for Java) or language. independent (three-address code).
  • 40. The following are commonly used intermediate code representation: • Postfix Notation – • The ordinary (infix) way of writing the sum of a and b is with operator in the middle : a + b • The postfix notation for the same expression places the operator at the right end as ab +. • In general, if e1 and e2 are any postfix expressions, and + is any binary operator, the result of applying + to the values denoted by e1 and e2 is postfix notation by e1e2 +. • No parentheses are needed in postfix notation because the position and arity (number of arguments) of the operators permit only one way to decode a postfix expression. • In postfix notation the operator follows the operand. • Example – The postfix representation of the expression (a – b) * (c + d) + (a – b) is : ab – cd + *ab -+. Read more: Infix to Postfix
  • 41.
  • 42.
  • 43.
  • 44. Introduction of Lexical Analysis • Lexical Analysis is the first phase of compiler also known as scanner. • It converts the High level input program into a sequence of Tokens. • Lexical Analysis can be implemented with the Deterministic finite Automata. • The output is a sequence of tokens that is sent to the parser for syntax analysis
  • 45.
  • 46. • What is a token? A lexical token is a sequence of characters that can be treated as a unit in the grammar of the programming languages. • Example of tokens: • Type token (id, number, real, . . . ) • Punctuation tokens (IF, void, return, . . . ) • Alphabetic tokens (keywords)
  • 47. Example of Non-Tokens: Comments, preprocessor directive, macros, blanks, tabs, newline etc
  • 48. • Lexeme: • The sequence of characters matched by a pattern to form the corresponding token or a sequence of input characters that comprises a single token is called a lexeme. • eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”, “;” .
  • 49. • How Lexical Analyzer functions • 1. Tokenization .i.e Dividing the program into valid tokens. 2. Remove white space characters. 3. Remove comments. 4. It also provides help in generating error message by providing row number and column number.
  • 50.
  • 51. • Exercise 2: • Count number of tokens : • int max(int i); • Lexical analyzer first read int and finds it to be valid and accepts as token • max is read by it and found to be valid function name after reading ( • int is also a token , then again i as another token and finally ;
  • 52. Error detection and Recovery in Compiler • In this phase of compilation, all possible errors made by the user are detected and reported to the user in form of error messages. This process of locating errors and reporting it to user is called Error Handling process. Functions of Error handler • Detection • Reporting • Recovery
  • 53.
  • 54. Compile time errors are of three types • These errors are detected during the lexical analysis phase. Typical lexical errors are 1. Exceeding length of identifier or numeric constants. 2. Appearance of illegal characters 3. Unmatched string
  • 55.
  • 56. Syntactic phase errors • These errors are detected during syntax analysis phase. Typical syntax errors are • Errors in structure • Missing operator • Misspelled keywords • Unbalanced parenthesis
  • 57.
  • 58. Semantic errors • These errors are detected during semantic analysis phase. Typical semantic errors are 1. Incompatible type of operands 2. Undeclared variables 3. Not matching of actual arguments with formal one
  • 59.
  • 60. Code Optimization in Compiler Design • The code optimization in the synthesis phase is a program transformation technique, which tries to improve the intermediate code by making it consume fewer resources (i.e. CPU, Memory) so that faster-running machine code will result. Compiler optimizing process should meet the following objectives : • The optimization must be correct, it must not, in any way, change the meaning of the program. • Optimization should increase the speed and performance of the program. • The compilation time must be kept reasonable. • The optimization process should not delay the overall compiling process.
  • 61. • When to Optimize? Optimization of the code is often performed at the end of the development stage since it reduces readability and adds code that is used to increase the performance. • Why Optimize? Optimizing an algorithm is beyond the scope of the code optimization phase. So the program is optimized. And it may involve reducing the size of the code. So optimization helps to: • Reduce the space consumed and increases the speed of compilation. • Manually analyzing datasets involves a lot of time. Hence we make use of software like Tableau for data analysis. Similarly manually performing the optimization is also tedious and is better done using a code optimizer. • An optimized code often promotes re-usability.
  • 62. Types of Code Optimization –The optimization process can be broadly classified into two types : • Machine Independent Optimization – This code optimization phase attempts to improve the intermediate code to get a better target code as the output. The part of the intermediate code which is transformed here does not involve any CPU registers or absolute memory locations. • Machine Dependent Optimization – Machine- dependent optimization is done after the target code has been generated and when the code is transformed according to the target machine architecture. It involves CPU registers and may have absolute memory references rather than relative references. Machine-dependent optimizers put efforts to take maximum advantage of the memory hierarchy
  • 63. Code Optimization is done in the following different ways
  • 64. Hence, after variable propagation, a*b and x*b will be identified as common sub- expression.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69. Three address code in Compiler • Three address code is a type of intermediate code which is easy to generate and can be easily converted to machine code. • It makes use of at most three addresses and one operator to represent an expression and the value computed at each instruction is stored in temporary variable generated by compiler. • The compiler decides the order of operation given by three address code.
  • 70. General representation • Where a, b or c represents operands like names, constants or compiler generated temporaries and op represents the operator
  • 71.
  • 72. • Example-2: Write three address code for following code
  • 73.
  • 74. Parse Tree • Parse : It means to resolve (a sentence) into its component parts and describe their syntactic roles or simply it is an act of parsing a string or a text. • Tree : A tree may be a widely used abstract data type that simulates a hierarchical tree structure, with a root value and sub-trees of youngsters with a parent node, represented as a group of linked nodes.
  • 75.
  • 76.
  • 77. • Uses of Parse Tree : • It helps in making syntax analysis by reflecting the syntax of the input language. • It uses an in-memory representation of the input with a structure that conforms to the grammar. • The advantages of using parse trees rather than semantic actions: you’ll make multiple passes over the info without having to re- parse the input.
  • 78. Types of Parsers in Compiler Design • Parser is that phase of compiler which takes token string as input and with the help of existing grammar, converts it into the corresponding parse tree. • Parser is also known as Syntax Analyzer.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84. Bottom Up or Shift Reduce Parsers | Set 2 • Bottom Up Parsers / Shift Reduce Parsers Build the parse tree from leaves to root. • Bottom-up parsing can be defined as an attempt to reduce the input string w to the start symbol of grammar by tracing out the rightmost derivations of w in reverse. Eg.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89. Recursive Descent Parser • Parsing is the process to determine whether the start symbol can derive the program or not. • If the Parsing is successful then the program is a valid program otherwise the program is invalid.
  • 90. There are generally two types of Parsers
  • 91. Recursive Descent Parser • It is a kind of Top-Down Parser. • A top-down parser builds the parse tree from the top to down, starting with the start non- terminal. • A Predictive Parser is a special case of Recursive Descent Parser, where no Back Tracking is required. • By carefully writing a grammar means eliminating left recursion and left factoring from it, the resulting grammar will be a grammar that can be parsed by a recursive descent parser.
  • 92. **Here e is Epsilon
  • 93.
  • 94. • A general shift reduce parsing is LR parsing. • The L stands for scanning the input from left to right and R stands for constructing a rightmost derivation in reverse. Benefits of LR parsing: • Many programming languages using some variations of an LR parser. It should be noted that C++ and Perl are exceptions to it. • LR Parser can be implemented very efficiently • Here we will look at the construction of GOTO graph of grammar by using all the four LR parsing techniques
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 105.
  • 106. S – attributed and L – attributed SDTs in Syntax directed translation • Types of attributes – Attributes may be of two types – Synthesized or Inherited. 1. Synthesized attributes – A Synthesized attribute is an attribute of the non-terminal on the left-hand side of a production. • Synthesized attributes represent information that is being passed up the parse tree. • The attribute can take value only from its children (Variables in the RHS of the production).
  • 107. • For eg. let’s say A -> BC is a production of a grammar, and A’s attribute is dependent on B’s attributes or C’s attributes then it will be synthesized attribute.
  • 108. 2. Inherited attributes – An attribute of a nonterminal on the right- hand side of a production is called an inherited attribute. • The attribute can take value either from its parent or from its siblings (variables in the LHS or RHS of the production). • For example, let’s say A -> BC is a production of a grammar and B’s attribute is dependent on A’s attributes or C’s attributes then it will be inherited attribute.
  • 109. • Now, let’s discuss about S-attributed and L- attributed SDT. • S-attributed SDT : – If an SDT uses only synthesized attributes, it is called as S-attributed SDT. – S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent nodes depend upon the values of the child nodes. – Semantic actions are placed in rightmost place of RHS.
  • 110. • L-attributed SDT: – If an SDT uses both synthesized attributes and inherited attributes with a restriction that inherited attribute can inherit values from left siblings only, it is called as L-attributed SDT. – Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing manner. – Semantic actions are placed anywhere in RHS.
  • 111. • For example, • A -> XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S} is not an L-attributed grammar • since Y.S = A.S and Y.S = X.S are allowed but Y.S = Z.S violates the L-attributed SDT definition as attributed is inheriting the value from its right sibling.
  • 112.
  • 113.
  • 114. Compiler Design | Detection of a Loop in Three Address Code • Loop optimization is the phase after the Intermediate Code Generation. • The main intention of this phase is to reduce the number of lines in a program. • In any program majority of the time is spent by any program is actually inside the loop for an iterative program. • In case of the recursive program a block will be there and the majority of the time will present inside the block
  • 115. • Loop Optimization – • To apply loop optimization we must first detect loops. • For detecting loops we use Control Flow Analysis(CFA) using Program Flow Graph(PFG). • To find program flow graph we need to find Basic Block
  • 116. • Basic Block – • A basic block is a sequence of three address statements where control enters at the beginning and leaves only at the end without any jumps or halts. • Finding the Basic Block – In order to find the basic blocks, we need to find the leaders in the program. • Then a basic block will start from one leader to the next leader but not including the next leader. Which means if you find out that lines no 1 is a leader and line no 15 is the next leader, then the line from 1 to 14 is a basic block, but not including line no 15
  • 117. • Identifying leader in a Basic Block – • First statement is always a leader • Statement that is target of conditional or un- conditional statement is a leader • Statement that follows immediately a conditional or un-conditional statement is a leader
  • 118.
  • 119. Language Processors: Assembler, Compiler and Interpreter • Language Processors – Assembly language is machine dependent yet mnemonics that are being used to represent instructions in it are not directly understandable by machine and high Level language is machine independent. • A computer understands instructions in machine code, i.e. in the form of 0s and 1s. • It is a tedious task to write a computer program directly in machine code. • The programs are written mostly in high level languages like Java, C++, Python etc. and are called source code.
  • 120. • These source code cannot be executed directly by the computer and must be converted into machine language to be executed. • Hence, a special translator system software is used to translate the program written in high level language into machine code is called Language Processor and the program after translated into machine code (object program / object code).
  • 121. The language processors can be any of the following three types: • Compiler – The language processor that reads the complete source program written in high level language as a whole in one go and translates it into an equivalent program in machine language is called as a Compiler. Example: C, C++, C#, Java In a compiler, the source code is translated to object code successfully if it is free of errors. • The compiler specifies the errors at the end of compilation with line numbers when there are any errors in the source code. • The errors must be removed before the compiler can successfully recompile the source code again.
  • 122.
  • 123. • Assembler – The Assembler is used to translate the program written in Assembly language into machine code. • The source program is a input of assembler that contains assembly language instructions. • The output generated by assembler is the object code or machine code understandable by the computer.
  • 124.
  • 125. • Interpreter – The translation of single statement of source program into machine code is done by language processor and executes it immediately before moving on to the next line is called an interpreter. • If there is an error in the statement, the interpreter terminates its translating process at that statement and displays an error message. • The interpreter moves on to the next line for execution only after removal of the error. • An Interpreter directly executes instructions written in a programming or scripting language without previously converting them to an object code or machine code. Example: Perl, Python and Matlab.
  • 126. FIRST Set in Syntax Analysis
  • 127.
  • 128.
  • 129. Note
  • 130. FOLLOW Set in Syntax Analysis • Follow(X) to be the set of terminals that can appear immediately to the right of Non- Terminal X in some sentential form.
  • 131.
  • 132. • 1) FOLLOW(S) = { $ } // where S is the starting Non-Terminal • 2) If A -> pBq is a production, where p, B and q are any grammar symbols, then everything in FIRST(q) except Є is in FOLLOW(B). • 3) If A->pB is a production, then everything in FOLLOW(A) is in FOLLOW(B). • 4) If A->pBq is a production and FIRST(q) contains Є, then FOLLOW(B) contains { FIRST(q) – Є } U FOLLOW(A)
  • 133.
  • 134. • FOLLOW Set FOLLOW(E) = { $ , ) } // Note ')' is there because of 5th rule • FOLLOW(E’) = FOLLOW(E) = { $, ) } // See 1st production rule FOLLOW(T) = { FIRST(E’) – Є } U FOLLOW(E’) U FOLLOW(E) = { + , $ , ) } FOLLOW(T’) = FOLLOW(T) = { + , $ , ) } FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U FOLLOW(T) = { *, +, $, ) }
  • 135.
  • 136.
  • 137.
  • 138. Classification of Context Free Grammars • Context Free Grammars (CFG) can be classified on the basis of following two properties: • 1) Based on number of strings it generates. • If CFG is generating finite number of strings, then CFG is Non-Recursive (or the grammar is said to be Non- recursive grammar) • If CFG can generate infinite number of strings then the grammar is said to be Recursive grammar • During Compilation, the parser uses the grammar of the language to make a parse tree(or derivation tree) out of the source code. The grammar used must be unambiguous. An ambiguous grammar must not be used for parsing.
  • 139. • 2) Based on number of derivation trees. • If there is only 1 derivation tree then the CFG is unambiguous. • If there are more than 1 derivation tree, then the CFG is ambiguous.
  • 142.
  • 143. Ambiguous Grammar • Context Free Grammars(CFGs) are classified based on: 1. Number of Derivation trees 2. Number of strings Depending on Number of Derivation trees, CFGs are sub-divided into 2 types: 1. Ambiguous grammars 2. Unambiguous grammars
  • 145. • For Example: 1. Let us consider this grammar : E -> E+E|id • We can create 2 parse tree from this grammar to obtain a string id+id+id :
  • 146. Note • Both the above parse trees are derived from same grammar rules but both parse trees are different. • Hence the grammar is ambiguous.
  • 147.
  • 148. • From the above grammar String 3*2+5 can be derived in 2 ways:
  • 149.
  • 150. • Derivation tree: • It tells how string is derived using production rules from S and has been shown in Figure 1.
  • 151.
  • 152.
  • 153.
  • 155.
  • 156.
  • 157.
  • 159. • Note : Ambiguity of a grammar is undecidable, i.e. there is no particular algorithm for removing the ambiguity of a grammar, but we can remove ambiguity by: • Disambiguate the grammar i.e., rewriting the grammar such that there is only one derivation or parse tree possible for a string of the language which the grammar represents.
  • 160. Clone a Directed Acyclic Graph • A directed acyclic graph (DAG) is a graph which doesn’t contain a cycle and has directed edges. • We are given a DAG, we need to clone it, i.e., create another graph that has copy of its vertices and edges connecting them.
  • 161. Syntax Directed Translation in Compiler Design • Background : Parser uses a CFG(Context-free- Grammer) to validate the input string and produce output for next phase of the compiler. • Output could be either a parse tree or abstract syntax tree. • Now to interleave semantic analysis with syntax analysis phase of the compiler, we use Syntax Directed Translation.
  • 162. • Definition Syntax Directed Translation are augmented rules to the grammar that facilitate semantic analysis. SDT involves passing information bottom-up and/or top-down the parse tree in form of attributes attached to the nodes. • Syntax directed translation rules use 1) lexical values of nodes, 2) constants & 3) attributes associated to the non-terminals in their definitions. • The general approach to Syntax-Directed Translation is to construct a parse tree or syntax tree and compute the values of attributes at the nodes of the tree by visiting them in some order. In many cases, translation can be done during parsing without building an explicit tree.
  • 163.
  • 164.
  • 165.
  • 166. • To evaluate translation rules, we can employ one depth first search traversal on the parse tree. • This is possible only because SDT rules don’t impose any specific order on evaluation until children attributes are computed before parents for a grammar having all synthesized attributes. • Otherwise, we would have to figure out the best suited plan to traverse through the parse tree and evaluate all the attributes in one or more traversals. • For better understanding, we will move bottom up in left to right fashion for computing translation rules of our example.
  • 167.
  • 168.
  • 169.
  • 170. • Syntax Directed Definition (SDD) is a kind of abstract specification. • It is generalization of context free grammar in which each grammar production X –> a is associated with it a set of production rules of the form s = f(b1, b2, ……bk) where s is the attribute obtained from function f. • The attribute can be a string, number, type or a memory location. • Semantic rules are fragments of code which are embedded usually at the end of production and enclosed in curly braces ({ }).
  • 171.
  • 172.
  • 173.
  • 174.
  • 175. • Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. • The annotated parse tree for the input string is
  • 176. • For computation of attributes we start from leftmost bottom node. • The rule F –> digit is used to reduce digit to F and the value of digit is obtained from lexical analyzer which becomes value of F i.e. from semantic action F.val = digit.lexval. • Hence, F.val = 4 and since T is parent node of F so, we get T.val = 4 from semantic action T.val = F.val. Then, for T –> T1 * F production, the corresponding semantic action is T.val = T1.val * F.val . Hence, T.val = 4 * 5 = 20 • Similarly, combination of E1.val + T.val becomes E.val i.e. E.val = E1.val + T.val = 26. • Then, the production S –> E is applied to reduce E.val = 26 and semantic action associated with it prints the result E.val . • Hence, the output will be 26.
  • 177.
  • 178.
  • 179.
  • 180. • Let us assume an input string int a, c for computing inherited attributes. • The annotated parse tree for the input string is
  • 181. • The value of L nodes is obtained from T.type (sibling) which is basically lexical value obtained as int, float or double. • Then L node gives type of identifiers a and c. The computation of type is done in top down manner or preorder traversal. • Using function Enter_type the type of identifiers a and c is inserted in symbol table at corresponding id.entry.
  • 182.
  • 183.
  • 184. Removing Direct and Indirect Left Recursion in a Grammar
  • 185. Check if the given grammar contains left recursion, if present then separate the production and start working on it. In our example, S-->S a/ S b /c / d
  • 186.
  • 187.
  • 188.
  • 189.
  • 191.
  • 192.
  • 193.
  • 194.
  • 195. • Step-4: Thus we get a compiler written in ASM which compiles C and generates code in ASM.
  • 196. Peephole Optimization in Compiler Design • Peephole optimization is a type of Code Optimization performed on a small part of the code. • It is performed on the very small set of instructions in a segment of code. • It basically works on the theory of replacement in which a part of code is replaced by shorter and faster code without change in output. • Peephole is the machine dependent optimization.
  • 197. Objectives of Peephole Optimization • The objective of peephole optimization is: 1. To improve performance 2. To reduce memory footprint 3. To reduce code size
  • 199.
  • 200.
  • 201.
  • 202. Construction of LL(1) Parsing Table • A top-down parser builds the parse tree from the top down, starting with the start non- terminal. • There are two types of Top-Down Parsers: 1. Top-Down Parser with Backtracking 2. Top-Down Parsers without Backtracking 3. Top-Down Parsers without backtracking can further be divided into two parts:
  • 203.
  • 204.
  • 205.
  • 206. • In the table, rows will contain the Non- Terminals and the column will contain the Terminal Symbols. • All the Null Productions of the Grammars will go under the Follow elements and the remaining productions will lie under the elements of the First set.
  • 207.
  • 208.
  • 209.
  • 210. • As you can see that all the null productions are put under the Follow set of that symbol and all the remaining productions are lie under the First of that symbol. • Note: Every grammar is not feasible for LL(1) Parsing table. • It may be possible that one cell may contain more than one production.
  • 211.
  • 212.
  • 213. SLR, CLR and LALR Parsers • SLR Parser The SLR parser is similar to LR(0) parser except that the reduced entry. • The reduced productions are written only in the FOLLOW of the variable whose production is reduced.
  • 214.
  • 215.
  • 216. CLR PARSER • In the SLR method we were working with LR(0)) items. • In CLR parsing we will be using LR(1) items. • LR(k) item is defined to be an item using lookaheads of length k. • So , the LR(1) item is comprised of two parts : the LR(0) item and the lookahead associated with the item. LR(1) parsers are more powerful parser. For LR(1) items we modify the Closure and GOTO function.
  • 217.
  • 218.
  • 219. Introduction to YACC • A parser generator is a program that takes as input a specification of a syntax, and produces as output a procedure for recognizing that language. Historically, they are also called compiler-compilers. YACC (yet another compiler-compiler) is an LALR(1) (LookAhead, Left-to-right, Rightmost derivation producer with 1 lookahead token) parser generator. • YACC was originally designed for being complemented by Lex.
  • 220.
  • 221.
  • 222.
  • 223.
  • 225.
  • 226. Operator grammar and precedence parser • A grammar that is used to define mathematical operators is called an operator grammar or operator precedence grammar. • Such grammars have the restriction that no production has either an empty right-hand side (null productions) or two adjacent non- terminals in its right-hand side.
  • 227.
  • 228. • Operator precedence parser – An operator precedence parser is a bottom-up parser that interprets an operator grammar. • This parser is only used for operator grammars. • Ambiguous grammars are not allowed in any parser except operator precedence parser. • There are two methods for determining what precedence relations should hold between a pair of terminals:
  • 229. 1. Use the conventional associativity and precedence of operator. 2. The second method of selecting operator- precedence relations is first to construct an unambiguous grammar for the language, a grammar that reflects the correct associativity and precedence in its parse trees.
  • 230. • This parser relies on the following three precedence relations: ⋖, ≐, ⋗ • a ⋖ b This means a “yields precedence to” b. a ⋗ b This means a “takes precedence over” b. a ≐ b This means a “has same precedence as” b.
  • 231. • There is not given any relation between id and id as id will not be compared and two variables can not come side by side. • There is also a disadvantage of this table – if we have n operators then size of table will be n*n and complexity will be 0(n2). • In order to decrease the size of table, we use operator function table.
  • 232. • Operator precedence parsers usually do not store the precedence table with the relations; rather they are implemented in a special way. • Operator precedence parsers use precedence functions that map terminal symbols to integers, and the precedence relations between the symbols are implemented by numerical comparison. • The parsing table can be encoded by two precedence functions f and g that map terminal symbols to integers. We select f and g such that: • f(a) < g(b) whenever a yields precedence to b • f(a) = g(b) whenever a and b have the same precedence • f(a) > g(b) whenever a takes precedence over b
  • 233.
  • 234. • Since there is no cycle in the graph, we can make this function table: Size of the table is 2n
  • 235. • One disadvantage of function tables is that even though we have blank entries in relation table we have non-blank entries in function table. • Blank entries are also called error. • Hence error detection capability of relation table is greater than function table.