SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Context Free Grammar
1
Course Name: Compiler Design
Course Code: CSE331
Level:3, Term:3
Department of Computer Science and Engineering
Daffodil International University
Syntax Analysis
Parsing or syntactic analysis is the process of analyzing a string of symbols,
either in natural language or in computer languages, conforming to the
rules of a formal grammar.
The second phase of compiler and the sometimes also being called as
Hierarchical Analysis.
2
Why Syntax Analysis?
• We have seen that a lexical analyzer can identify tokens with the help of
regular expressions and pattern rules.
• But a lexical analyzer cannot check the syntax of a given sentence due to the
limitations of the regular expressions.
• Regular expressions cannot check balancing tokens, such as parenthesis.
• Therefore, this phase uses context-free grammar (CFG), which is recognized by
push-down automata.
3
Syntax Analysis
4
Detail Diagram
5
• A syntax analyzer or parser takes the input from a lexical analyzer in the
form of token streams.
• The parser analyzes the source code (token stream) against the
production rules to detect any errors in the code. The output of this
phase is a parse tree.
• This way, the parser accomplishes two tasks, i.e., parsing the code,
looking for errors and generating a parse tree as the output of the
phase.
6
CFG
• CFG, on the other hand, is a superset of Regular Grammar, as depicted below:
A context-free grammar (CFG) consisting of a finite set of grammar
rules is a quadruple (N, T, P, S)
where,
• N is a set of non-terminal symbols.
• T is a set of terminals where N ∩ T = NULL.
• P is a set of rules, P: N → (N ∪ T)*, i.e., the left-hand side of the
production rule P does have any right context or left context.
• S is the start symbol.
7
• A context-free grammar has four components: G = ( V, Σ, P, S )
A set of non-terminals (V). Non-terminals are syntactic variables that denote sets of strings.
The non-terminals define sets of strings that help define the language generated by the
grammar.
A set of tokens, known as terminal symbols (Σ). Terminals are the basic symbols from which
strings are formed.
A set of productions (P). The productions of a grammar specify the manner in which the
terminals and non-terminals can be combined to form strings. Each production consists of
a non-terminal called the left side of the production, an arrow, and a sequence of tokens
and/or on- terminals, called the right side of the production.
One of the non-terminals is designated as the start symbol (S); from where the production
begins.
CFG: Context Free Grammar
8
Example of CFG:
• G = ( V, Σ, P, S ) Where:
• V = { Q, Z, N }
• Σ = { 0, 1 }
• P = { Q → Z | Q → N | Q → ℇ | Z → 0Q0 |N → 1Q1 }
• S = { Q }
This grammar describes palindrome language,
such as: 1001, 11100111, 00100, 1010101, 11111, etc.
9
The Role of the Lexical Analyzer
• Roles:
• Primary role: Scan a source program (a string) and break it up into small, meaningful units,
called tokens.
• Example: position := initial + rate * 60;
• Transform into meaningful units: identifiers, constants, operators, and punctuation.
• Other roles:
• Removal of comments
• Case conversion
• Removal of white spaces
• Interpretation of compiler directives or pragmas.
• Communication with symbol table: Store information regarding an identifier in the symbol
table. Not advisable in cases where scopes can be nested.
• Preparation of output listing: Keep track of source program, line numbers, and
correspondences between error messages and line numbers.
• Why Lexical Analyzer is separate from parser?
• Simpler design of both LA and parser.
• More efficient & portable compiler.
10
Tokens
A lexical token is a sequence of characters that can be treated as a unit in the grammar of the
programming languages.
Example of tokens:
Type token (id, number, real, . . . )
Punctuation tokens (IF, void, return, . . . )
Alphabetic tokens (keywords)
Keywords; Examples-for, while, if etc.
Identifier; Examples-Variable name, function name etc.
Operators; Examples '+', '++', '-' etc.
Separators; Examples ',' ';' etc.
Example of Non-Tokens:
Comments, preprocessor directive, macros, blanks, tabs, newline etc.
Lexeme: The sequence of characters matched by a pattern to form the corresponding token or a
sequence of input characters that comprises a single token is called a lexeme.
eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”, “;” .
11
Tokens
Operators = + − > ( { := == <>
Keywords if while for int double
Numeric literals 43 6.035 -3.6e10 0x13F3A
Character literals ‘a’ ‘~’ ‘’’
String literals “3.142” “aBcDe” “”
White space space(‘ ’) tab(‘t’) newline(‘n’)
Comments /*this is not a token*/
• Examples of non-tokens
• Examples of Tokens
12
• Type of tokens in C++:
• Constants:
• char constants: ‘a’
• string constants: “i=%d”
• int constants: 50
• float point constants
• Identifiers: i, j, counter, ……
• Reserved words: main, int, for, …
• Operators: +, =, ++, /, …
• Misc. symbols: (, ), {, }, …
• Tokens are specified by regular expressions.
main() {
int i, j;
for (i=0; i<50; i++) {
printf(“i = %d”, i);
}
}
13
Lexical Analysis vs Parsing
• There are a number of reasons why the analysis portion of a compiler is normally separated into
lexical analysis and parsing (syntax analysis) phases.
• Simplicity of design is the most important consideration.
The separation of lexical and syntactic analysis often allows us to simplify at least one of these
tasks. For example, a parser that had to deal with comments and whitespace as syntactic units
would be considerably more complex than one that can assume comments and whitespace have
already been removed by the lexical analyzer.
• Compiler efficiency is improved.
a separate lexical analyzer allows us to apply specialized techniques that serve only the lexical
task, not the job of parsing. In addition, specialized buffering techniques for reading input
characters can speed up the compiler significantly.
• Compiler portability is enhanced.
Input-device-specific peculiarities can be restricted to the lexical analyzer
14
More about Tokens, Patterns and Lexemes
• Token: a certain classification of entities of a program.
• four kinds of tokens in previous example: identifiers, operators, constraints, and punctuation.
• Lexeme: A specific instance of a token. Used to differentiate tokens. For instance, both position
and initial belong to the identifier class, however each a different lexeme.
• Lexical analyzer may return a token type to the Parser, but must also keep track of
“attributes” that distinguish one lexeme from another.
• Examples of attributes: Identifiers: string, Numbers: value
• Attributes are used during semantic checking and code generation. They are not needed
during parsing.
• Patterns: Rule describing how tokens are specified in a program. Needed because a language can
contain infinite possible strings. They all cannot be enumerated (calculated/specified)
• Formal mechanisms used to represent these patterns. Formalism helps in describing precisely
(i) which strings belong to the language, and (ii) which do not.
• Also, form basis for developing tools that can automatically determine if a string belongs to a
language.
15
Lexical errors
fi (a==f(x)) - fi is misspelled or keyword? Or undeclared function identifier?
• If fi is a valid lexeme for the token id, the lexical analyzer must return the token
id to the parser and let some other phase of the compiler - handle the error
How?
i. Delete one character from the remaining input.
ii. Insert a missing character into the remaining input.
iii. Replace a character by another character.
iv. Transpose two adjacent characters.
16
Lexical Errors and Recovery
• Panic mode error recovery
• Deleting an extraneous character
• Inserting a missing character
• Replacing an incorrect character by another
• Transposing two adjacent character
17
THANK YOU
18

Mais conteúdo relacionado

Semelhante a 3a. Context Free Grammar.pdf

New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfeliasabdi2024
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysisSudhaa Ravi
 
Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compilerSudhaa Ravi
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler ConstructionSarmad Ali
 
match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfarpitaeron555
 
Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01riddhi viradiya
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysisIffat Anjum
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design MAHASREEM
 
4 compiler lab - Syntax Ana
4 compiler lab - Syntax Ana4 compiler lab - Syntax Ana
4 compiler lab - Syntax AnaMashaelQ
 
COMPILER CONSTRUCTION KU 1.pptx
COMPILER CONSTRUCTION KU 1.pptxCOMPILER CONSTRUCTION KU 1.pptx
COMPILER CONSTRUCTION KU 1.pptxRossy719186
 
automata theroy and compiler designc.pptx
automata theroy and compiler designc.pptxautomata theroy and compiler designc.pptx
automata theroy and compiler designc.pptxYashaswiniYashu9555
 
Principles of Compiler Design
Principles of Compiler DesignPrinciples of Compiler Design
Principles of Compiler DesignMarimuthu M
 

Semelhante a 3a. Context Free Grammar.pdf (20)

New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdf
 
Lexical Analysis.pdf
Lexical Analysis.pdfLexical Analysis.pdf
Lexical Analysis.pdf
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysis
 
Chap 1 and 2
Chap 1 and 2Chap 1 and 2
Chap 1 and 2
 
Plc part 2
Plc  part 2Plc  part 2
Plc part 2
 
Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compiler
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler Construction
 
11700220036.pdf
11700220036.pdf11700220036.pdf
11700220036.pdf
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdf
 
Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01
 
1._Introduction_.pptx
1._Introduction_.pptx1._Introduction_.pptx
1._Introduction_.pptx
 
Syntax
SyntaxSyntax
Syntax
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysis
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design
 
4 compiler lab - Syntax Ana
4 compiler lab - Syntax Ana4 compiler lab - Syntax Ana
4 compiler lab - Syntax Ana
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
COMPILER CONSTRUCTION KU 1.pptx
COMPILER CONSTRUCTION KU 1.pptxCOMPILER CONSTRUCTION KU 1.pptx
COMPILER CONSTRUCTION KU 1.pptx
 
automata theroy and compiler designc.pptx
automata theroy and compiler designc.pptxautomata theroy and compiler designc.pptx
automata theroy and compiler designc.pptx
 
Principles of Compiler Design
Principles of Compiler DesignPrinciples of Compiler Design
Principles of Compiler Design
 

Último

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 

Último (20)

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 

3a. Context Free Grammar.pdf

  • 1. Context Free Grammar 1 Course Name: Compiler Design Course Code: CSE331 Level:3, Term:3 Department of Computer Science and Engineering Daffodil International University
  • 2. Syntax Analysis Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar. The second phase of compiler and the sometimes also being called as Hierarchical Analysis. 2
  • 3. Why Syntax Analysis? • We have seen that a lexical analyzer can identify tokens with the help of regular expressions and pattern rules. • But a lexical analyzer cannot check the syntax of a given sentence due to the limitations of the regular expressions. • Regular expressions cannot check balancing tokens, such as parenthesis. • Therefore, this phase uses context-free grammar (CFG), which is recognized by push-down automata. 3
  • 6. • A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. • The parser analyzes the source code (token stream) against the production rules to detect any errors in the code. The output of this phase is a parse tree. • This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and generating a parse tree as the output of the phase. 6
  • 7. CFG • CFG, on the other hand, is a superset of Regular Grammar, as depicted below: A context-free grammar (CFG) consisting of a finite set of grammar rules is a quadruple (N, T, P, S) where, • N is a set of non-terminal symbols. • T is a set of terminals where N ∩ T = NULL. • P is a set of rules, P: N → (N ∪ T)*, i.e., the left-hand side of the production rule P does have any right context or left context. • S is the start symbol. 7
  • 8. • A context-free grammar has four components: G = ( V, Σ, P, S ) A set of non-terminals (V). Non-terminals are syntactic variables that denote sets of strings. The non-terminals define sets of strings that help define the language generated by the grammar. A set of tokens, known as terminal symbols (Σ). Terminals are the basic symbols from which strings are formed. A set of productions (P). The productions of a grammar specify the manner in which the terminals and non-terminals can be combined to form strings. Each production consists of a non-terminal called the left side of the production, an arrow, and a sequence of tokens and/or on- terminals, called the right side of the production. One of the non-terminals is designated as the start symbol (S); from where the production begins. CFG: Context Free Grammar 8
  • 9. Example of CFG: • G = ( V, Σ, P, S ) Where: • V = { Q, Z, N } • Σ = { 0, 1 } • P = { Q → Z | Q → N | Q → ℇ | Z → 0Q0 |N → 1Q1 } • S = { Q } This grammar describes palindrome language, such as: 1001, 11100111, 00100, 1010101, 11111, etc. 9
  • 10. The Role of the Lexical Analyzer • Roles: • Primary role: Scan a source program (a string) and break it up into small, meaningful units, called tokens. • Example: position := initial + rate * 60; • Transform into meaningful units: identifiers, constants, operators, and punctuation. • Other roles: • Removal of comments • Case conversion • Removal of white spaces • Interpretation of compiler directives or pragmas. • Communication with symbol table: Store information regarding an identifier in the symbol table. Not advisable in cases where scopes can be nested. • Preparation of output listing: Keep track of source program, line numbers, and correspondences between error messages and line numbers. • Why Lexical Analyzer is separate from parser? • Simpler design of both LA and parser. • More efficient & portable compiler. 10
  • 11. Tokens A lexical token is a sequence of characters that can be treated as a unit in the grammar of the programming languages. Example of tokens: Type token (id, number, real, . . . ) Punctuation tokens (IF, void, return, . . . ) Alphabetic tokens (keywords) Keywords; Examples-for, while, if etc. Identifier; Examples-Variable name, function name etc. Operators; Examples '+', '++', '-' etc. Separators; Examples ',' ';' etc. Example of Non-Tokens: Comments, preprocessor directive, macros, blanks, tabs, newline etc. Lexeme: The sequence of characters matched by a pattern to form the corresponding token or a sequence of input characters that comprises a single token is called a lexeme. eg- “float”, “abs_zero_Kelvin”, “=”, “-”, “273”, “;” . 11
  • 12. Tokens Operators = + − > ( { := == <> Keywords if while for int double Numeric literals 43 6.035 -3.6e10 0x13F3A Character literals ‘a’ ‘~’ ‘’’ String literals “3.142” “aBcDe” “” White space space(‘ ’) tab(‘t’) newline(‘n’) Comments /*this is not a token*/ • Examples of non-tokens • Examples of Tokens 12
  • 13. • Type of tokens in C++: • Constants: • char constants: ‘a’ • string constants: “i=%d” • int constants: 50 • float point constants • Identifiers: i, j, counter, …… • Reserved words: main, int, for, … • Operators: +, =, ++, /, … • Misc. symbols: (, ), {, }, … • Tokens are specified by regular expressions. main() { int i, j; for (i=0; i<50; i++) { printf(“i = %d”, i); } } 13
  • 14. Lexical Analysis vs Parsing • There are a number of reasons why the analysis portion of a compiler is normally separated into lexical analysis and parsing (syntax analysis) phases. • Simplicity of design is the most important consideration. The separation of lexical and syntactic analysis often allows us to simplify at least one of these tasks. For example, a parser that had to deal with comments and whitespace as syntactic units would be considerably more complex than one that can assume comments and whitespace have already been removed by the lexical analyzer. • Compiler efficiency is improved. a separate lexical analyzer allows us to apply specialized techniques that serve only the lexical task, not the job of parsing. In addition, specialized buffering techniques for reading input characters can speed up the compiler significantly. • Compiler portability is enhanced. Input-device-specific peculiarities can be restricted to the lexical analyzer 14
  • 15. More about Tokens, Patterns and Lexemes • Token: a certain classification of entities of a program. • four kinds of tokens in previous example: identifiers, operators, constraints, and punctuation. • Lexeme: A specific instance of a token. Used to differentiate tokens. For instance, both position and initial belong to the identifier class, however each a different lexeme. • Lexical analyzer may return a token type to the Parser, but must also keep track of “attributes” that distinguish one lexeme from another. • Examples of attributes: Identifiers: string, Numbers: value • Attributes are used during semantic checking and code generation. They are not needed during parsing. • Patterns: Rule describing how tokens are specified in a program. Needed because a language can contain infinite possible strings. They all cannot be enumerated (calculated/specified) • Formal mechanisms used to represent these patterns. Formalism helps in describing precisely (i) which strings belong to the language, and (ii) which do not. • Also, form basis for developing tools that can automatically determine if a string belongs to a language. 15
  • 16. Lexical errors fi (a==f(x)) - fi is misspelled or keyword? Or undeclared function identifier? • If fi is a valid lexeme for the token id, the lexical analyzer must return the token id to the parser and let some other phase of the compiler - handle the error How? i. Delete one character from the remaining input. ii. Insert a missing character into the remaining input. iii. Replace a character by another character. iv. Transpose two adjacent characters. 16
  • 17. Lexical Errors and Recovery • Panic mode error recovery • Deleting an extraneous character • Inserting a missing character • Replacing an incorrect character by another • Transposing two adjacent character 17