SlideShare uma empresa Scribd logo
1 de 11
ICS 313 - Fundamentals of Programming Languages 1
4. Lexical and Syntax Analysis
4.1 Introduction
Language implementation systems must analyze source code,
regardless of the specific implementation approach
Nearly all syntax analysis is based on a formal description of the
syntax of the source language (BNF)
The syntax analysis portion of a language processor nearly always
consists of two parts:
A low-level part called a lexical analyzer (mathematically, a finite
automaton based on a regular grammar)
A high-level part called a syntax analyzer, or parser (mathematically, a
push-down automaton based on a context-free grammar, or BNF)
Reasons to use BNF to describe syntax:
Provides a clear and concise syntax description
The parser can be based directly on the BNF
Parsers based on BNF are easy to maintain
ICS 313 - Fundamentals of Programming Languages 2
4.1 Introduction (continued)
Reasons to separate lexical and syntax analysis:
Simplicity - less complex approaches can be used for
lexical analysis; separating them simplifies the parser
Efficiency - separation allows optimization of the lexical
analyzer
Portability - parts of the lexical analyzer may not be
portable, but the parser always is portable
4.2 Lexical Analysis
A lexical analyzer is a pattern matcher for character
strings
A lexical analyzer is a “front-end” for the parser
Identifies substrings of the source program that
belong together - lexemes
Lexemes match a character pattern, which is
associated with a lexical category called a token
sum is a lexeme; its token may be IDENT
ICS 313 - Fundamentals of Programming Languages 3
4.2 Lexical Analysis (continued)
The lexical analyzer is usually a function that is called by the
parser when it needs the next token
Three approaches to building a lexical analyzer:
1. Write a formal description of the tokens and use a software tool
that constructs table-driven lexical analyzers given such a
description
2. Design a state diagram that describes the tokens and write a
program that implements the state diagram
3. Design a state diagram that describes the tokens and hand-
construct a table-driven implementation of the state diagram
We only discuss approach 2
4.2 Lexical Analysis (continued)
State diagram design:
A naive state diagram would have a transition from every state
on every character in the source language - such a diagram
would be very large!
In many cases, transitions can be combined to simplify the
state diagram
When recognizing an identifier, all uppercase and lowercase
letters are equivalent - Use a character class that includes all
letters
When recognizing an integer literal, all digits are equivalent -
use a digit class
Reserved words and identifiers can be recognized together
(rather than having a part of the diagram for each reserved
word)
Use a table lookup to determine whether a possible identifier is in fact a reserved
word
ICS 313 - Fundamentals of Programming Languages 4
4.2 Lexical Analysis (continued)
Convenient utility subprograms:
getChar - gets the next character of input, puts it in
nextChar, determines its class and puts the class in
charClass
addChar - puts the character from nextChar into the place
the lexeme is being accumulated, lexeme
lookup - determines whether the string in lexeme is a
reserved word (returns a code)
4.2 Lexical Analysis (continued)
Implementation (assume initialization):
int lex() {
switch (charClass) {
case LETTER:
addChar();
getChar();
while (charClass == LETTER ||
charClass == DIGIT) {
addChar();
getChar();
}
return lookup(lexeme);
break;
case DIGIT:
addChar();
getChar();
while (charClass == DIGIT) {
addChar();
getChar();
}
return INT_LIT;
break;
} /* End of switch */
} /* End of function lex */
ICS 313 - Fundamentals of Programming Languages 5
4.3 The Parsing Problem
Goals of the parser, given an input program:
Find all syntax errors; For each, produce an appropriate diagnostic
message, and recover quickly
Produce the parse tree, or at least a trace of the parse tree, for the
program
Two categories of parsers
Top down - produce the parse tree, beginning at the root
Order is that of a leftmost derivation
Bottom up - produce the parse tree, beginning at the leaves
Order is the that of the reverse of a rightmost derivation
Parsers look only one token ahead in the input
Top-down Parsers
Given a sentential form, xAα , the parser must choose the correct A-rule to
get the next sentential form in the leftmost derivation, using only the first
token produced by A
The most common top-down parsing algorithms:
Recursive descent - a coded implementation
LL parsers - table driven implementation
4.3 The Parsing Problem (continued)
Bottom-up parsers
Given a right sentential form, α, what substring of α is the right-
hand side of the rule in the grammar that must be reduced to
produce the previous sentential form in the right derivation
The most common bottom-up parsing algorithms are in the LR
family (LALR, SLR, canonical LR)
The Complexity of Parsing
Parsers that works for any unambiguous grammar are complex
and inefficient (O(n3), where n is the length of the input)
Compilers use parsers that only work for a subset of all
unambiguous grammars, but do it in linear time (O(n), where n
is the length of the input)
ICS 313 - Fundamentals of Programming Languages 6
4.4 Recursive-Descent Parsing
Recursive Descent Process
There is a subprogram for each nonterminal in the grammar, which can
parse sentences that can be generated by that nonterminal
EBNF is ideally suited for being the basis for a recursive-descent
parser, because EBNF minimizes the number of nonterminals
A grammar for simple expressions:
<expr> → <term> {(+ | -) <term>}
<term> → <factor> {(* | /) <factor>}
<factor> → id | ( <expr> )
Assume we have a lexical analyzer named lex, which puts the next
token code in nextToken
The coding process when there is only one RHS:
For each terminal symbol in the RHS, compare it with the next input token;
if they match, continue, else there is an error
For each nonterminal symbol in the RHS, call its associated parsing
subprogram
4.4 Recursive-Descent Parsing (continued)
/* Function expr
Parses strings in the language generated by the rule:
<expr> → <term> {(+ | -) <term>} */
void expr() {
/* Parse the first term */
term();
/* As long as the next token is + or -, call
lex to get the next token, and parse the next term */
while (nextToken == PLUS_CODE ||
nextToken == MINUS_CODE){
lex();
term();
}
}
This particular routine does not detect errors
Convention: Every parsing routine leaves the next token in nextToken
ICS 313 - Fundamentals of Programming Languages 7
4.4 Recursive-Descent Parsing (continued)
A nonterminal that has more than one RHS requires
an initial process to determine which RHS it is to
parse
The correct RHS is chosen on the basis of the next token
of input (the lookahead)
The next token is compared with the first token that can
be generated by each RHS until a match is found
If no match is found, it is a syntax error
4.4 Recursive-Descent Parsing (continued)
/* Function factor
Parses strings in the language generated by
the rule: <factor> -> id | (<expr>) */
void factor() {
/* Determine which RHS */
if (nextToke == ID_CODE)
/* For the RHS id, just call lex */
lex();
/* If the RHS is (<expr>) – call lex to pass
over the left parenthesis, call expr, and
check for the right parenthesis */
else if (nextToken == LEFT_PAREN_CODE) {
lex();
expr();
if (nextToken == RIGHT_PAREN_CODE)
lex();
else
error();
} /* End of else if (nextToken == ... */
else error(); /* Neither RHS matches */
}
ICS 313 - Fundamentals of Programming Languages 8
4.4 Recursive-Descent Parsing (continued)
The LL Grammar Class
The Left Recursion Problem
If a grammar has left recursion, either direct or indirect, it cannot be the
basis for a top-down parser
A grammar can be modified to remove left recursion
The other characteristic of grammars that disallows top-down parsing is
the lack of pairwise disjointness
The inability to determine the correct RHS on the basis of one token of
lookahead
Def: FIRST(α) = {a | α =>* aβ } (If α =>* ε, ε is in FIRST(α))
Pairwise Disjointness Test:
For each nonterminal, A, in the grammar that has more than one RHS, for
each pair of rules, A → αi and A → αj, it must be true that FIRST(αi) ∩
FIRST(αj) = φ
Examples:
A → a | bB | cAb
A → a | aB
4.4 Recursive-Descent Parsing (continued)
Left factoring can resolve the problem
Replace
<variable> → identifier | identifier [<expression>]
with
<variable> → identifier <new>
<new> → ε | [<expression>]
or
<variable> → identifier [[<expression>]]
(the outer brackets are metasymbols of EBNF)
ICS 313 - Fundamentals of Programming Languages 9
4.5 Bottom-up Parsing
The parsing problem is finding the correct RHS in a right-
sentential form to reduce to get the previous right-sentential
form in the derivation
Intuition about handles:
Def: β is the handle of the right sentential form
γ = αβw if and only if S =>*rm αAw => αβw
Def: β is a phrase of the right sentential form
γ if and only if S =>* γ = α1Aα2 =>+ α1βα2
Def: β is a simple phrase of the right sentential form γ if and
only if S =>* γ = α1Aα2 => α1βα2
The handle of a right sentential form is its leftmost simple
phrase
Given a parse tree, it is now easy to find the handle
Parsing can be thought of as handle pruning
4.5 Bottom-up Parsing (continued)
Shift-Reduce Algorithms
Reduce is the action of replacing the handle on the top of the
parse stack with its corresponding LHS
Shift is the action of moving the next token to the top of the
parse stack
Advantages of LR parsers:
They will work for nearly all grammars that describe
programming languages
They work on a larger class of grammars than other bottom-up
algorithms, but are as efficient as any other bottom-up parser
They can detect syntax errors as soon as it is possible
The LR class of grammars is a superset of the class parsable
by LL parsers
ICS 313 - Fundamentals of Programming Languages 10
4.5 Bottom-up Parsing (continued)
LR parsers must be constructed with a tool
Knuth’s insight: A bottom-up parser could use the
entire history of the parse, up to the current point, to
make parsing decisions
There were only a finite and relatively small number of
different parse situations that could have occurred, so the
history could be stored in a parser state, on the parse
stack
An LR configuration stores the state of an LR parser
(S0X1S1X2S2…XmSm, aiai+1…an$)
4.5 Bottom-up Parsing (continued)
LR parsers are table driven, where the table has two
components, an ACTION table and a GOTO table
The ACTION table specifies the action of the parser,
given the parser state and the next token
Rows are state names; columns are terminals
The GOTO table specifies which state to put on top of the
parse stack after a reduction action is done
Rows are state names; columns are nonterminals
ICS 313 - Fundamentals of Programming Languages 11
4.5 Bottom-up Parsing (continued)
Initial configuration: (S0, a1…an$)
Parser actions:
If ACTION[Sm, ai] = Shift S, the next configuration is:
(S0X1S1X2S2…XmSmaiS, ai+1…an$)
If ACTION[Sm, ai] = Reduce A → β and S = GOTO[Sm-r, A],
where r = the length of β, the next configuration is
(S0X1S1X2S2…Xm-rSm-rAS, aiai+1…an$)
If ACTION[Sm, ai] = Accept, the parse is complete and no errors
were found
If ACTION[Sm, ai] = Error, the parser calls an error-handling
routine
A parser table can be generated from a given grammar
with a tool, e.g., yacc
4.5 Bottom-up Parsing (continued)
Reduce 4 (use GOTO[6, T])
…
* id $
…
0E1+6F3
…
Reduce 6 (use GOTO[6, F])* id $0E1+6id5
Shift 5id * id $0E1+6
Shift 6+ id * id $0E1
Reduce 2 (use GOTO[0, E])+ id * id $0T2
Reduce 4 (use GOTO[0, T])+ id * id $0F3
Reduce 6 (use GOTO[0, F])+ id * id $0id5
Shift 5id + id * id $0
ActionInputStack
1. E → E + T
2. E → T
3. T → T * F
4. T → F
5. F → (E)
6. F → id

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Lexical Analyzers and Parsers
Lexical Analyzers and ParsersLexical Analyzers and Parsers
Lexical Analyzers and Parsers
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
 
About Tokens and Lexemes
About Tokens and LexemesAbout Tokens and Lexemes
About Tokens and Lexemes
 
Token, Pattern and Lexeme
Token, Pattern and LexemeToken, Pattern and Lexeme
Token, Pattern and Lexeme
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
4 lexical and syntax
4 lexical and syntax4 lexical and syntax
4 lexical and syntax
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzers
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
 
Lex
LexLex
Lex
 
Lexical analysis-using-lex
Lexical analysis-using-lexLexical analysis-using-lex
Lexical analysis-using-lex
 
Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compiler
 
Lexical
LexicalLexical
Lexical
 
Lecture 02 lexical analysis
Lecture 02 lexical analysisLecture 02 lexical analysis
Lecture 02 lexical analysis
 
3. Lexical analysis
3. Lexical analysis3. Lexical analysis
3. Lexical analysis
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysis
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysis
 

Destaque

Slide combine
Slide combineSlide combine
Slide combine
Bluecn
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
Mir Majid
 

Destaque (17)

Introduction
IntroductionIntroduction
Introduction
 
Introduction to course
Introduction to courseIntroduction to course
Introduction to course
 
Complier designer
Complier designerComplier designer
Complier designer
 
Slide combine
Slide combineSlide combine
Slide combine
 
Compiler Design Full Curse
Compiler Design Full CurseCompiler Design Full Curse
Compiler Design Full Curse
 
LR Parsing
LR ParsingLR Parsing
LR Parsing
 
Lexing and parsing
Lexing and parsingLexing and parsing
Lexing and parsing
 
Buffers
BuffersBuffers
Buffers
 
Compiler Design Basics
Compiler Design BasicsCompiler Design Basics
Compiler Design Basics
 
Module 11
Module 11Module 11
Module 11
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Analysis of the source program
Analysis of the source programAnalysis of the source program
Analysis of the source program
 
Input-Buffering
Input-BufferingInput-Buffering
Input-Buffering
 
Compiler Chapter 1
Compiler Chapter 1Compiler Chapter 1
Compiler Chapter 1
 
Lexical Approach
Lexical ApproachLexical Approach
Lexical Approach
 

Semelhante a 4 lexical and syntax analysis

match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdf
arpitaeron555
 
Chapter One
Chapter OneChapter One
Chapter One
bolovv
 

Semelhante a 4 lexical and syntax analysis (20)

Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
sabesta.ppt
sabesta.pptsabesta.ppt
sabesta.ppt
 
match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdf
 
Parsing
ParsingParsing
Parsing
 
Chapter-3 compiler.pptx course materials
Chapter-3 compiler.pptx course materialsChapter-3 compiler.pptx course materials
Chapter-3 compiler.pptx course materials
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
CH 2.pptx
CH 2.pptxCH 2.pptx
CH 2.pptx
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Key
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.ppt
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.ppt
 
Control structure
Control structureControl structure
Control structure
 
Compilers Design
Compilers DesignCompilers Design
Compilers Design
 
lec00-Introduction.pdf
lec00-Introduction.pdflec00-Introduction.pdf
lec00-Introduction.pdf
 
Plc part 2
Plc  part 2Plc  part 2
Plc part 2
 
Parser
ParserParser
Parser
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design
 
Compiler design important questions
Compiler design   important questionsCompiler design   important questions
Compiler design important questions
 
Lecture 24Recursive decent parsing and back tracking.pptx
Lecture 24Recursive decent parsing and back tracking.pptxLecture 24Recursive decent parsing and back tracking.pptx
Lecture 24Recursive decent parsing and back tracking.pptx
 
LANGUAGE PROCESSOR
LANGUAGE PROCESSORLANGUAGE PROCESSOR
LANGUAGE PROCESSOR
 
Chapter One
Chapter OneChapter One
Chapter One
 

Mais de jigeno

Basic introduction to ms access
Basic introduction to ms accessBasic introduction to ms access
Basic introduction to ms access
jigeno
 
16 logical programming
16 logical programming16 logical programming
16 logical programming
jigeno
 
15 functional programming
15 functional programming15 functional programming
15 functional programming
jigeno
 
15 functional programming
15 functional programming15 functional programming
15 functional programming
jigeno
 
14 exception handling
14 exception handling14 exception handling
14 exception handling
jigeno
 
13 concurrency
13 concurrency13 concurrency
13 concurrency
jigeno
 
12 object oriented programming
12 object oriented programming12 object oriented programming
12 object oriented programming
jigeno
 
11 abstract data types
11 abstract data types11 abstract data types
11 abstract data types
jigeno
 
9 subprograms
9 subprograms9 subprograms
9 subprograms
jigeno
 
8 statement-level control structure
8 statement-level control structure8 statement-level control structure
8 statement-level control structure
jigeno
 
7 expressions and assignment statements
7 expressions and assignment statements7 expressions and assignment statements
7 expressions and assignment statements
jigeno
 
6 data types
6 data types6 data types
6 data types
jigeno
 
5 names
5 names5 names
5 names
jigeno
 
3 describing syntax and semantics
3 describing syntax and semantics3 describing syntax and semantics
3 describing syntax and semantics
jigeno
 
2 evolution of the major programming languages
2 evolution of the major programming languages2 evolution of the major programming languages
2 evolution of the major programming languages
jigeno
 
1 preliminaries
1 preliminaries1 preliminaries
1 preliminaries
jigeno
 
Access2007 m2
Access2007 m2Access2007 m2
Access2007 m2
jigeno
 
Access2007 m1
Access2007 m1Access2007 m1
Access2007 m1
jigeno
 

Mais de jigeno (20)

Access2007 part1
Access2007 part1Access2007 part1
Access2007 part1
 
Basic introduction to ms access
Basic introduction to ms accessBasic introduction to ms access
Basic introduction to ms access
 
Bsit1
Bsit1Bsit1
Bsit1
 
16 logical programming
16 logical programming16 logical programming
16 logical programming
 
15 functional programming
15 functional programming15 functional programming
15 functional programming
 
15 functional programming
15 functional programming15 functional programming
15 functional programming
 
14 exception handling
14 exception handling14 exception handling
14 exception handling
 
13 concurrency
13 concurrency13 concurrency
13 concurrency
 
12 object oriented programming
12 object oriented programming12 object oriented programming
12 object oriented programming
 
11 abstract data types
11 abstract data types11 abstract data types
11 abstract data types
 
9 subprograms
9 subprograms9 subprograms
9 subprograms
 
8 statement-level control structure
8 statement-level control structure8 statement-level control structure
8 statement-level control structure
 
7 expressions and assignment statements
7 expressions and assignment statements7 expressions and assignment statements
7 expressions and assignment statements
 
6 data types
6 data types6 data types
6 data types
 
5 names
5 names5 names
5 names
 
3 describing syntax and semantics
3 describing syntax and semantics3 describing syntax and semantics
3 describing syntax and semantics
 
2 evolution of the major programming languages
2 evolution of the major programming languages2 evolution of the major programming languages
2 evolution of the major programming languages
 
1 preliminaries
1 preliminaries1 preliminaries
1 preliminaries
 
Access2007 m2
Access2007 m2Access2007 m2
Access2007 m2
 
Access2007 m1
Access2007 m1Access2007 m1
Access2007 m1
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

4 lexical and syntax analysis

  • 1. ICS 313 - Fundamentals of Programming Languages 1 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal description of the syntax of the source language (BNF) The syntax analysis portion of a language processor nearly always consists of two parts: A low-level part called a lexical analyzer (mathematically, a finite automaton based on a regular grammar) A high-level part called a syntax analyzer, or parser (mathematically, a push-down automaton based on a context-free grammar, or BNF) Reasons to use BNF to describe syntax: Provides a clear and concise syntax description The parser can be based directly on the BNF Parsers based on BNF are easy to maintain
  • 2. ICS 313 - Fundamentals of Programming Languages 2 4.1 Introduction (continued) Reasons to separate lexical and syntax analysis: Simplicity - less complex approaches can be used for lexical analysis; separating them simplifies the parser Efficiency - separation allows optimization of the lexical analyzer Portability - parts of the lexical analyzer may not be portable, but the parser always is portable 4.2 Lexical Analysis A lexical analyzer is a pattern matcher for character strings A lexical analyzer is a “front-end” for the parser Identifies substrings of the source program that belong together - lexemes Lexemes match a character pattern, which is associated with a lexical category called a token sum is a lexeme; its token may be IDENT
  • 3. ICS 313 - Fundamentals of Programming Languages 3 4.2 Lexical Analysis (continued) The lexical analyzer is usually a function that is called by the parser when it needs the next token Three approaches to building a lexical analyzer: 1. Write a formal description of the tokens and use a software tool that constructs table-driven lexical analyzers given such a description 2. Design a state diagram that describes the tokens and write a program that implements the state diagram 3. Design a state diagram that describes the tokens and hand- construct a table-driven implementation of the state diagram We only discuss approach 2 4.2 Lexical Analysis (continued) State diagram design: A naive state diagram would have a transition from every state on every character in the source language - such a diagram would be very large! In many cases, transitions can be combined to simplify the state diagram When recognizing an identifier, all uppercase and lowercase letters are equivalent - Use a character class that includes all letters When recognizing an integer literal, all digits are equivalent - use a digit class Reserved words and identifiers can be recognized together (rather than having a part of the diagram for each reserved word) Use a table lookup to determine whether a possible identifier is in fact a reserved word
  • 4. ICS 313 - Fundamentals of Programming Languages 4 4.2 Lexical Analysis (continued) Convenient utility subprograms: getChar - gets the next character of input, puts it in nextChar, determines its class and puts the class in charClass addChar - puts the character from nextChar into the place the lexeme is being accumulated, lexeme lookup - determines whether the string in lexeme is a reserved word (returns a code) 4.2 Lexical Analysis (continued) Implementation (assume initialization): int lex() { switch (charClass) { case LETTER: addChar(); getChar(); while (charClass == LETTER || charClass == DIGIT) { addChar(); getChar(); } return lookup(lexeme); break; case DIGIT: addChar(); getChar(); while (charClass == DIGIT) { addChar(); getChar(); } return INT_LIT; break; } /* End of switch */ } /* End of function lex */
  • 5. ICS 313 - Fundamentals of Programming Languages 5 4.3 The Parsing Problem Goals of the parser, given an input program: Find all syntax errors; For each, produce an appropriate diagnostic message, and recover quickly Produce the parse tree, or at least a trace of the parse tree, for the program Two categories of parsers Top down - produce the parse tree, beginning at the root Order is that of a leftmost derivation Bottom up - produce the parse tree, beginning at the leaves Order is the that of the reverse of a rightmost derivation Parsers look only one token ahead in the input Top-down Parsers Given a sentential form, xAα , the parser must choose the correct A-rule to get the next sentential form in the leftmost derivation, using only the first token produced by A The most common top-down parsing algorithms: Recursive descent - a coded implementation LL parsers - table driven implementation 4.3 The Parsing Problem (continued) Bottom-up parsers Given a right sentential form, α, what substring of α is the right- hand side of the rule in the grammar that must be reduced to produce the previous sentential form in the right derivation The most common bottom-up parsing algorithms are in the LR family (LALR, SLR, canonical LR) The Complexity of Parsing Parsers that works for any unambiguous grammar are complex and inefficient (O(n3), where n is the length of the input) Compilers use parsers that only work for a subset of all unambiguous grammars, but do it in linear time (O(n), where n is the length of the input)
  • 6. ICS 313 - Fundamentals of Programming Languages 6 4.4 Recursive-Descent Parsing Recursive Descent Process There is a subprogram for each nonterminal in the grammar, which can parse sentences that can be generated by that nonterminal EBNF is ideally suited for being the basis for a recursive-descent parser, because EBNF minimizes the number of nonterminals A grammar for simple expressions: <expr> → <term> {(+ | -) <term>} <term> → <factor> {(* | /) <factor>} <factor> → id | ( <expr> ) Assume we have a lexical analyzer named lex, which puts the next token code in nextToken The coding process when there is only one RHS: For each terminal symbol in the RHS, compare it with the next input token; if they match, continue, else there is an error For each nonterminal symbol in the RHS, call its associated parsing subprogram 4.4 Recursive-Descent Parsing (continued) /* Function expr Parses strings in the language generated by the rule: <expr> → <term> {(+ | -) <term>} */ void expr() { /* Parse the first term */ term(); /* As long as the next token is + or -, call lex to get the next token, and parse the next term */ while (nextToken == PLUS_CODE || nextToken == MINUS_CODE){ lex(); term(); } } This particular routine does not detect errors Convention: Every parsing routine leaves the next token in nextToken
  • 7. ICS 313 - Fundamentals of Programming Languages 7 4.4 Recursive-Descent Parsing (continued) A nonterminal that has more than one RHS requires an initial process to determine which RHS it is to parse The correct RHS is chosen on the basis of the next token of input (the lookahead) The next token is compared with the first token that can be generated by each RHS until a match is found If no match is found, it is a syntax error 4.4 Recursive-Descent Parsing (continued) /* Function factor Parses strings in the language generated by the rule: <factor> -> id | (<expr>) */ void factor() { /* Determine which RHS */ if (nextToke == ID_CODE) /* For the RHS id, just call lex */ lex(); /* If the RHS is (<expr>) – call lex to pass over the left parenthesis, call expr, and check for the right parenthesis */ else if (nextToken == LEFT_PAREN_CODE) { lex(); expr(); if (nextToken == RIGHT_PAREN_CODE) lex(); else error(); } /* End of else if (nextToken == ... */ else error(); /* Neither RHS matches */ }
  • 8. ICS 313 - Fundamentals of Programming Languages 8 4.4 Recursive-Descent Parsing (continued) The LL Grammar Class The Left Recursion Problem If a grammar has left recursion, either direct or indirect, it cannot be the basis for a top-down parser A grammar can be modified to remove left recursion The other characteristic of grammars that disallows top-down parsing is the lack of pairwise disjointness The inability to determine the correct RHS on the basis of one token of lookahead Def: FIRST(α) = {a | α =>* aβ } (If α =>* ε, ε is in FIRST(α)) Pairwise Disjointness Test: For each nonterminal, A, in the grammar that has more than one RHS, for each pair of rules, A → αi and A → αj, it must be true that FIRST(αi) ∩ FIRST(αj) = φ Examples: A → a | bB | cAb A → a | aB 4.4 Recursive-Descent Parsing (continued) Left factoring can resolve the problem Replace <variable> → identifier | identifier [<expression>] with <variable> → identifier <new> <new> → ε | [<expression>] or <variable> → identifier [[<expression>]] (the outer brackets are metasymbols of EBNF)
  • 9. ICS 313 - Fundamentals of Programming Languages 9 4.5 Bottom-up Parsing The parsing problem is finding the correct RHS in a right- sentential form to reduce to get the previous right-sentential form in the derivation Intuition about handles: Def: β is the handle of the right sentential form γ = αβw if and only if S =>*rm αAw => αβw Def: β is a phrase of the right sentential form γ if and only if S =>* γ = α1Aα2 =>+ α1βα2 Def: β is a simple phrase of the right sentential form γ if and only if S =>* γ = α1Aα2 => α1βα2 The handle of a right sentential form is its leftmost simple phrase Given a parse tree, it is now easy to find the handle Parsing can be thought of as handle pruning 4.5 Bottom-up Parsing (continued) Shift-Reduce Algorithms Reduce is the action of replacing the handle on the top of the parse stack with its corresponding LHS Shift is the action of moving the next token to the top of the parse stack Advantages of LR parsers: They will work for nearly all grammars that describe programming languages They work on a larger class of grammars than other bottom-up algorithms, but are as efficient as any other bottom-up parser They can detect syntax errors as soon as it is possible The LR class of grammars is a superset of the class parsable by LL parsers
  • 10. ICS 313 - Fundamentals of Programming Languages 10 4.5 Bottom-up Parsing (continued) LR parsers must be constructed with a tool Knuth’s insight: A bottom-up parser could use the entire history of the parse, up to the current point, to make parsing decisions There were only a finite and relatively small number of different parse situations that could have occurred, so the history could be stored in a parser state, on the parse stack An LR configuration stores the state of an LR parser (S0X1S1X2S2…XmSm, aiai+1…an$) 4.5 Bottom-up Parsing (continued) LR parsers are table driven, where the table has two components, an ACTION table and a GOTO table The ACTION table specifies the action of the parser, given the parser state and the next token Rows are state names; columns are terminals The GOTO table specifies which state to put on top of the parse stack after a reduction action is done Rows are state names; columns are nonterminals
  • 11. ICS 313 - Fundamentals of Programming Languages 11 4.5 Bottom-up Parsing (continued) Initial configuration: (S0, a1…an$) Parser actions: If ACTION[Sm, ai] = Shift S, the next configuration is: (S0X1S1X2S2…XmSmaiS, ai+1…an$) If ACTION[Sm, ai] = Reduce A → β and S = GOTO[Sm-r, A], where r = the length of β, the next configuration is (S0X1S1X2S2…Xm-rSm-rAS, aiai+1…an$) If ACTION[Sm, ai] = Accept, the parse is complete and no errors were found If ACTION[Sm, ai] = Error, the parser calls an error-handling routine A parser table can be generated from a given grammar with a tool, e.g., yacc 4.5 Bottom-up Parsing (continued) Reduce 4 (use GOTO[6, T]) … * id $ … 0E1+6F3 … Reduce 6 (use GOTO[6, F])* id $0E1+6id5 Shift 5id * id $0E1+6 Shift 6+ id * id $0E1 Reduce 2 (use GOTO[0, E])+ id * id $0T2 Reduce 4 (use GOTO[0, T])+ id * id $0F3 Reduce 6 (use GOTO[0, F])+ id * id $0id5 Shift 5id + id * id $0 ActionInputStack 1. E → E + T 2. E → T 3. T → T * F 4. T → F 5. F → (E) 6. F → id