Introduction: Language Processors, the structure of a compiler, the science of building a compiler, programming language basics.
Lexical Analysis: The Role of the Lexical Analyzer, Input Buffering, Recognition of Tokens, The Lexical-Analyzer Generator Lex, Finite Automata, From Regular Expressions to Automata, Design of a Lexical-Analyzer Generator, Optimization of DFA-Based Pattern Matchers
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
COMPILER DESIGN- Introduction & Lexical Analysis:
1. COMPILER DESIGN
Dr R Jegadeesan Prof-CSE
Jyothishmathi Institute of Technology and Science, Karimnagar
2. SYLLABUS
Introduction: Language Processors, the structure of a compiler, the science of building a compiler, programming
language basics.
Lexical Analysis: The Role of the Lexical Analyzer, Input Buffering, Recognition of Tokens, The Lexical-Analyzer
Generator Lex, Finite Automata, From Regular Expressions to Automata, Design of a Lexical-Analyzer
Generator, Optimization of DFA-Based Pattern Matchers
3. UNIT-I : INTRODUCTION
Topic Name : Language processors
Aim & Objective : convert language into target code.
Principle & Operation/ Detailed Explanation :
A translator is a programming language processor that takes a program
written in source code and converts it into machine code. It discovers
and identifies the error during translation.
There are 3 different types of translators as follows:
Compiler
A compiler is a translator used to convert high-level
programming language to low-level programming language. It
converts the whole program in one session and reports errors
detected after the conversion.
4. Interpreter
Just like a compiler, is a translator used to convert high-level programming
language to low-level programming language. It converts the program one at a
time and reports errors detected at once, while doing the conversion. With
this, it is easier to detect errors than in a compiler. An interpreter is faster than
a compiler as it immediately executes the code upon reading the code.
Assembler
An assembler is is a translator used to translate assembly language to machine
language. It is like a compiler for the assembly language but interactive like
an interpreter. Assembly language is difficult to understand as it is a low-level
programming language. An assembler translates a low-level language, an
assembly language to an even lower-level language, which is the machine
code. The machine code can be directly understood by the CPU.
Universities & Important Questions:
1. What are the differences between Compiler and Interpreter?
5. The Structure of the Compiler
Topic Name : Structure of the compiler
Aim & Objective : show different of forms of code before machine code..
Principle & Operation/ Detailed Explanation :
6. Lexical Analyzer ( scanner) –
It takes the output of preprocessor as the input which is in pure high level language. It
reads the characters from source program and groups them into lexemes (sequence of
characters that “go together”). Each lexeme corresponds to a token. Tokens are defined
by regular expressions which are understood by the lexical analyzer. It also removes
lexical errors (for e.g. erroneous characters), comments and white space.
Syntax Analyzer – It is sometimes called as parser. It constructs the parse tree. It takes all
the tokens one by one and uses Context Free Grammar to construct the parse tree.
Semantic Analyzer – It verifies the parse tree, whether it’s meaningful or not. It
furthermore produces a verified parse tree.It also does type checking, Label checking and
Flow control checking.
Intermediate Code Generator – It generates intermediate code, that is a form which
can be readily executed by machine . Example – Three address code etc. Intermediate
code is converted to machine language using the last two phases which are platform
dependent.
7. Code Optimizer – It transforms the code so that it consumes fewer resources and produces more speed.
The meaning of the code being transformed is not altered. Optimisation can be categorized into two
types: machine dependent and machine independent.
Target Code Generator – The main purpose of Target Code generator is to write a code that the
machine can understand and also register allocation, instruction selection etc. The output is
dependent on the type of assembler. This is the final stage of compilation.
Universities & Important Questions:
1. Explain phases of compiler with neat diagram ?
8. INPUT BUFFERING
Topic Name : Input Buffering.
Aim & Objective : Reduce time to read to input string.
Principle & Operation/ Detailed Explanation :
The lexical analyzer scans the input from left to right one character at a time. It uses two pointers begin
ptr(bp) and forward to keep track of the pointer of the input scanned.
9. The forward ptr moves ahead to search for end of lexeme. As soon as the blank
space is encountered, it indicates end of lexeme. In above example as soon as ptr
(fp) encounters a blank space the lexeme “int” is identified.
10. Lexical Analyzer Generator-Lex
An input file, which we call l e x . l , is written in the Lex language and
describes the lexical analyzer to be generated. The Lex compiler transforms l e x
. 1 to a C program, in a file that is always named l e x . y y . c. The latter file is
compiled by the C compiler into a file called a . o u t , as always. The C-compiler
output is a working lexical analyzer that can take a stream of input characters
and produce a stream of tokens.
11. Structure of Lex Programs
A Lex program has the following form:
declarations
°/.0/.
translation rules
°/.0/.
auxiliary functions
The declarations section includes declarations of variables, manifest constants (identifiers
declared to stand for a constant, e.g., the name of a token), and regular definitions.
The translation rules each have the form
Pattern { Action }
The third section holds whatever additional functions are used in the actions. Alternatively,
these functions can be compiled separately and loaded with the lexical analyzer.
12. Finite Automata
Finite Automata(FA) is the simplest machine to recognize patterns.
A Finite Automata consists of the following :
Q : Finite set of states.
∑ : set of Input Symbols.
q : Initial state.
F : set of Final States.
δ : Transition Function.
Formal specification of machine is
{ Q, ∑, q, F, δ }.
FA is characterized into two types:
1) Deterministic Finite Automata (DFA)
2) Nondeterministic Finite Automata(NFA)
13. Deterministic Finite Automata
In a DFA, for a particular input character, the machine goes to one state only. A transition function is
defined on every state for every input symbol. Also in DFA null (or ε) move is not allowed, i.e., DFA
cannot change state without any input character.
For example, below DFA with ∑ = {0, 1} accepts all strings ending with 0.
14. Non-Deterministic Finite Automata
NFA is similar to DFA except following additional features:
1. Null (or ε) move is allowed i.e., it can move forward without reading symbols.
2. Ability to transmit to any number of states for a particular input.
However, these above features don’t add any power to NFA. If we compare both in terms
of power, both are equivalent.
Due to above additional features, NFA has a different transition function, rest is same as
DFA.
δ: Transition Function
δ: Q X (∑ U ϵ ) --> 2 ^ Q.
As you can see in transition function is for any input including null (or ε), NFA can go to any
state number of states.
For example, below is a NFA for above problem