SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
Chapter 3: Lexical Analysis 
Principles of Programming Languages
Contents 
•Terminology 
•Chomsky Hierarchy 
•Lexical analysis in syntax analysis 
•Using Finite Automata to describe tokens 
•Using Regular Expression to describe tokens 
•RegexLibrary in Scala
Introduction 
•Syntax:the form or structure of the expressions, statements, and program units 
•Semantics:the meaning of the expressions, statements, and program units 
•Syntax and semantics provide a language’s definition 
–Users of a language definition 
•Other language designers 
•Implementers 
•Programmers (the users of the language)
•A sentence is a string of characters over some alphabet 
•A languageis a set of sentences 
Terminology
Terminology 
•Sentences: a = b + c;or c = (a + b) * c; 
•Syntax: 
<assign> →<id> = <expr> ; 
<id> →a | b | c 
<expr> →<id> + <expr> 
| <id> * <expr> 
| ( <expr> ) 
| <id> 
•Sematicsof a = b + c;
Formal Definition of Languages 
•Recognizers 
–A recognition device reads input strings of the language and decides whether the input strings belong to the language 
–Example: syntax analysis part of a compiler 
•Generators 
–A device that generates sentences of a language 
–One can determine if the syntax of a particular sentence is correct by comparing it to the structure of the generator
Recognizers vs. Generators 
LANGUAGELanguage Generator 
Language Recognizer 
Grammar 
(regular expression) 
Automaton
Chomsky Hierarchy 
Grammars 
Languages 
Automaton 
Restrictions 
(w1w2) 
Type-0 
Phrase-structure 
Turing machine 
w1= any string with at least 1 non-terminal 
w2 = any string 
Type-1 
Context-sensitive 
Bounded Turing machine 
w1= any string with at least 1 non-terminal 
w2 = any string at least as long as w1 
Type-2 
Context-free 
Non-deterministic pushdown automaton 
w1= one non-terminal 
w2 = any string 
Type-3 
Regular 
Finitestate automaton 
w1= one non-terminal 
w2 = tAor t 
(t = terminal 
A = non-terminal)
Syntax Analysis 
•The syntax analysis portion of a language processor nearly always consists of two parts: 
–A low-level part called a lexical analyzer(mathematically, a finite automaton based on a regular grammar) 
–A high-level part called a syntax analyzer, or parser (mathematically, a push-down automaton based on a context-free grammar, or BNF)
•Simplicity-less complex approaches can be used for lexical analysis; separating them simplifies the parser 
•Efficiency-separation allows optimization of the lexical analyzer 
•Portability-parts of the lexical analyzer may not be portable, but the parser always is portable 
Reasons to Separate Lexical and Syntax Analysis
Lexical Analysis 
•A lexical analyzer is a pattern matcher for character strings 
•A lexical analyzer is a “front-end” for the parser 
•Identifies substrings of the source program that belong together –lexemes 
–Lexemes match a character pattern, which is associated with a lexical category called a token
Lexeme vs. Token 
result = oldsum–value / 100; 
Lexemes 
Tokens 
result 
IDENT 
= 
ASSIGN_OP 
oldsum 
IDENT 
– 
SUBSTRACT_OP 
value 
IDENT 
/ 
DIVISION_OP 
100 
INT_LIT 
; 
SEMICOLON
Lexical Analysis 
•The lexical analyzer is usually a function that is called by the parser when it needs the next token 
•Three approaches to building a lexical analyzer: 
–Design a state diagram that describes the tokens and write a program that implements the state diagram 
–Design a state diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram 
–Write a formal description of the tokens and use a software tool that constructs table-driven lexical analyzers given such a description
Deterministic Finite Automata
DFA 
•DFA is a 5-tuple 
•= a finite set of states 
•= alphabet 
•is the initial state 
•is the set of final states 
•transition function, a function from to 
(,,,,)MKsF  KsK FK KK
DFA 
• E.g., 
• Test with the input aabba 
M  (K,, , s,F) 
0 1 K {q ,q }  {a,b} 0 s  q 0 F {q } 
 
q σ δ(q, σ) 
q_0 a q_0 
q_0 b q_1 
q_1 a q_1 
q_1 b q_0 
What is the language accepted by M, a.k.a. L(M)?
DFA 
•Test with the input aabba 
•Or we can say 
•So, aabbais accepted by M 
000100(,)(,)(,)(,)(,)(,)qaabbaqabbaqbbaqbaqaqe * 00(,)(,)qaabbaqe
State Diagram 
0 q 1 q 
a 
a 
b 
b 
q σ δ(q, σ) 
q_0 a q_0 
q_0 b q_1 
q_1 a q_1 
q_1 b q_0
Example 
•Design a DFA Mthat accepts the language L(M) = {w: w ϵ{a,b}*and wdoes not contain three consecutive b’s}.
Nondeterministic Finite Automata 
•Permit several possible “next states” for a given combination of current state and input symbol 
•Accept the empty string ein state diagram 
•Help simplifying the description of automata 
•Every NFA is equivalent to a DFA
Example 
• Language L = (ab U aba)* 
0 q 1 q 
a 
b 
2 q 
a b
Example 
• Language L = (ab U aba)* 
0 q 
1 q 
a 
e 
2 q 
b 
a
Example 
• Language L = (ab U aba)* 
0 q 
ab aba
Example 
•Design a NFA that accepts the following definition for IDENT 
–Starts with a letter 
–Has any number of letter or digit or “_” afterwards
Regular Expression (regex) 
•Describe “regular” sets of strings 
•Symbols other than ( ) | * stand for themselves 
•Concatenationα β=First part matches α, second part β 
•Unionα | β = Match α or β 
•Kleenestarα* = 0 or more matches of α 
•Use ( ) for grouping
Regular Expression (regex) 
E(0|1|2|3|4|5|6|7|8|9)* 
•An E followed by a (possibly empty) sequence of digits 
E123 
E9 
E
Regular Expression (regex) b 
a 
ab 
ab 
a | ba 
a*
Convenience Notation 
•α+= one or more (i.e. αα*) 
•α?= 0 or 1 (i.e. (α|e)) 
•[xyz]= x|y|z 
•[x-y] = all characters from xto y, e.g. [0- 9] = all ASCII digits 
•[^x-y] = all characters other than [x-y]
Convenience Notation 
•p{Name}, where Nameis a Unicode category (ex. L, N, Z for letter, number, space) 
•P{Name}: complement of p{Name} 
•.matches any character 
•is an escape. For example, .is a period, a backslash
RegexExamples 
•Reserved words: easy 
WHILE = while BEGIN = begin 
DO = do END = end 
•Integers:[+-]?[0-9]+, or maybe [+- ]?p{N}+ 
•Note: +loses its normal meaning inside [], and a -just before ]denotes itself
RegexExamples 
•Hexadecimal numbers 0[Xx][0-9A-Fa-f]+ 
•Quoted C++ strings: ".*" 
•Well, actually not; the . will match a quote 
•Better: "[^"]*" 
•Well, actually not; you can have a "in a quoted string 
•"([^"]|.)*"
Exercises 
•IDENT 
–Starts with a letter 
–Has any number of letter or digit or “_” afterwards 
•C++ floating-point literals 
–See http://msdn.microsoft.com/en- us/library/tfh6f0w2.aspx
ScalaRegexLibrary 
•Find all matches 
import scala.util.matching._ 
valregex= new Regex("[0-9]+") 
regex.findAllIn("99 bottles, 98 bottles").toList 
List[String] = List(99, 98) 
Check whether beginning matches 
regex.findPrefixOf("99 bottles, 98 bottles").getOrElse(null) 
String = 99
ScalaRegexLibrary 
•Groups 
valregex= new Regex("([0-9]+) bottles") 
valmatches = regex.findAllIn("99 bottles, 98 bottles, 97 cans").matchData.toList 
matches : List[scala.util.matching.Regex.Match] = List(99 bottles, 98 bottles) 
matches.map(_.group(1)) 
List[String] = List(99, 98)
Exercises 
•Find NFA and regexfor anbm: n+mis even 
•Find NFA and regexfor anbm: n 1, m 1
Remind 
•Design a state diagram that describes the tokens and write a program that implements the state diagram 
•Design a state diagram that describes the tokens and hand- construct a table-driven implementation of the state diagram 
•Write a formal description of the tokens and use a software tool that constructs table-driven lexical analyzers given such a description
What do lexical analyzers do? 
•Lexical analyzers extract lexemes from a given input string and produce the corresponding tokens 
•Old compilers: processed an entire source program 
•New compilers: locate the next lexeme with token code, then return to syntax analyzer
What else? 
•Skip comments and blanks outside lexemes 
•Insert user-defined lexemes into the symbol table 
•Detect syntactic errors in tokens 
–e.g. ill-formed floating-point literals
In the next lecture 
•How can we describe grammar? 
•What do syntax analyzers do after receiving lexemes from lexical analyzers? 
•Build grammar for some parts of your popular programming languages
Summary 
•Syntax analysis is a common part of language implementation 
•A lexical analyzer is a pattern matcher that isolates small-scale parts of a program 
•Regular expressions are built based on Finite Automata

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Compiler design and lexical analyser
Compiler design and lexical analyserCompiler design and lexical analyser
Compiler design and lexical analyser
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Compier Design_Unit I_SRM.ppt
Compier Design_Unit I_SRM.pptCompier Design_Unit I_SRM.ppt
Compier Design_Unit I_SRM.ppt
 
Lecture3 lexical analysis
Lecture3 lexical analysisLecture3 lexical analysis
Lecture3 lexical analysis
 
1.Role lexical Analyzer
1.Role lexical Analyzer1.Role lexical Analyzer
1.Role lexical Analyzer
 
Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translation
 
Lecture 13 intermediate code generation 2.pptx
Lecture 13 intermediate code generation 2.pptxLecture 13 intermediate code generation 2.pptx
Lecture 13 intermediate code generation 2.pptx
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 
Lexical analysis-using-lex
Lexical analysis-using-lexLexical analysis-using-lex
Lexical analysis-using-lex
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lex
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysis
 
Implementation of lexical analyser
Implementation of lexical analyserImplementation of lexical analyser
Implementation of lexical analyser
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
Role-of-lexical-analysis
Role-of-lexical-analysisRole-of-lexical-analysis
Role-of-lexical-analysis
 
Chapter Three(1)
Chapter Three(1)Chapter Three(1)
Chapter Three(1)
 
Compiler lec 8
Compiler lec 8Compiler lec 8
Compiler lec 8
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
4 lexical and syntax analysis
4 lexical and syntax analysis4 lexical and syntax analysis
4 lexical and syntax analysis
 

Semelhante a Lexical

Semelhante a Lexical (20)

Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
Module4 lex and yacc.ppt
Module4 lex and yacc.pptModule4 lex and yacc.ppt
Module4 lex and yacc.ppt
 
Lex and Yacc ppt
Lex and Yacc pptLex and Yacc ppt
Lex and Yacc ppt
 
Control structure
Control structureControl structure
Control structure
 
Syntax Analyzer.pdf
Syntax Analyzer.pdfSyntax Analyzer.pdf
Syntax Analyzer.pdf
 
Lexical analysis, syntax analysis, semantic analysis. Ppt
Lexical analysis, syntax analysis, semantic analysis. PptLexical analysis, syntax analysis, semantic analysis. Ppt
Lexical analysis, syntax analysis, semantic analysis. Ppt
 
Ch 2.pptx
Ch 2.pptxCh 2.pptx
Ch 2.pptx
 
9781284077247_PPTx_CH01.pptx
9781284077247_PPTx_CH01.pptx9781284077247_PPTx_CH01.pptx
9781284077247_PPTx_CH01.pptx
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.ppt
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdf
 
lex and yacc.pdf
lex and yacc.pdflex and yacc.pdf
lex and yacc.pdf
 
Unit1.ppt
Unit1.pptUnit1.ppt
Unit1.ppt
 
Regular Expression to Finite Automata
Regular Expression to Finite AutomataRegular Expression to Finite Automata
Regular Expression to Finite Automata
 
Compilers Design
Compilers DesignCompilers Design
Compilers Design
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 

Mais de baran19901990

Config websocket on apache
Config websocket on apacheConfig websocket on apache
Config websocket on apachebaran19901990
 
Nhập môn công tác kỹ sư
Nhập môn công tác kỹ sưNhập môn công tác kỹ sư
Nhập môn công tác kỹ sưbaran19901990
 
Tìm đường đi xe buýt trong TPHCM bằng Google Map
Tìm đường đi xe buýt trong TPHCM bằng Google MapTìm đường đi xe buýt trong TPHCM bằng Google Map
Tìm đường đi xe buýt trong TPHCM bằng Google Mapbaran19901990
 
How to build a news website use CMS wordpress
How to build a news website use CMS wordpressHow to build a news website use CMS wordpress
How to build a news website use CMS wordpressbaran19901990
 
How to install nginx vs unicorn
How to install nginx vs unicornHow to install nginx vs unicorn
How to install nginx vs unicornbaran19901990
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentationbaran19901990
 
10 logic+programming+with+prolog
10 logic+programming+with+prolog10 logic+programming+with+prolog
10 logic+programming+with+prologbaran19901990
 
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprogramsbaran19901990
 
07 control+structures
07 control+structures07 control+structures
07 control+structuresbaran19901990
 
How to install git on ubuntu
How to install git on ubuntuHow to install git on ubuntu
How to install git on ubuntubaran19901990
 

Mais de baran19901990 (20)

Config websocket on apache
Config websocket on apacheConfig websocket on apache
Config websocket on apache
 
Nhập môn công tác kỹ sư
Nhập môn công tác kỹ sưNhập môn công tác kỹ sư
Nhập môn công tác kỹ sư
 
Tìm đường đi xe buýt trong TPHCM bằng Google Map
Tìm đường đi xe buýt trong TPHCM bằng Google MapTìm đường đi xe buýt trong TPHCM bằng Google Map
Tìm đường đi xe buýt trong TPHCM bằng Google Map
 
How to build a news website use CMS wordpress
How to build a news website use CMS wordpressHow to build a news website use CMS wordpress
How to build a news website use CMS wordpress
 
How to install nginx vs unicorn
How to install nginx vs unicornHow to install nginx vs unicorn
How to install nginx vs unicorn
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 
Subprogram
SubprogramSubprogram
Subprogram
 
Introduction
IntroductionIntroduction
Introduction
 
Datatype
DatatypeDatatype
Datatype
 
10 logic+programming+with+prolog
10 logic+programming+with+prolog10 logic+programming+with+prolog
10 logic+programming+with+prolog
 
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprograms
 
08 subprograms
08 subprograms08 subprograms
08 subprograms
 
07 control+structures
07 control+structures07 control+structures
07 control+structures
 
How to install git on ubuntu
How to install git on ubuntuHow to install git on ubuntu
How to install git on ubuntu
 
Ruby notification
Ruby notificationRuby notification
Ruby notification
 
Rails notification
Rails notificationRails notification
Rails notification
 
Linux notification
Linux notificationLinux notification
Linux notification
 
Lab4
Lab4Lab4
Lab4
 
Lab5
Lab5Lab5
Lab5
 
09
0909
09
 

Último

(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...Escorts Call Girls
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.soniya singh
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Delhi Call girls
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.CarlotaBedoya1
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 

Último (20)

(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 

Lexical

  • 1. Chapter 3: Lexical Analysis Principles of Programming Languages
  • 2. Contents •Terminology •Chomsky Hierarchy •Lexical analysis in syntax analysis •Using Finite Automata to describe tokens •Using Regular Expression to describe tokens •RegexLibrary in Scala
  • 3. Introduction •Syntax:the form or structure of the expressions, statements, and program units •Semantics:the meaning of the expressions, statements, and program units •Syntax and semantics provide a language’s definition –Users of a language definition •Other language designers •Implementers •Programmers (the users of the language)
  • 4. •A sentence is a string of characters over some alphabet •A languageis a set of sentences Terminology
  • 5. Terminology •Sentences: a = b + c;or c = (a + b) * c; •Syntax: <assign> →<id> = <expr> ; <id> →a | b | c <expr> →<id> + <expr> | <id> * <expr> | ( <expr> ) | <id> •Sematicsof a = b + c;
  • 6. Formal Definition of Languages •Recognizers –A recognition device reads input strings of the language and decides whether the input strings belong to the language –Example: syntax analysis part of a compiler •Generators –A device that generates sentences of a language –One can determine if the syntax of a particular sentence is correct by comparing it to the structure of the generator
  • 7. Recognizers vs. Generators LANGUAGELanguage Generator Language Recognizer Grammar (regular expression) Automaton
  • 8. Chomsky Hierarchy Grammars Languages Automaton Restrictions (w1w2) Type-0 Phrase-structure Turing machine w1= any string with at least 1 non-terminal w2 = any string Type-1 Context-sensitive Bounded Turing machine w1= any string with at least 1 non-terminal w2 = any string at least as long as w1 Type-2 Context-free Non-deterministic pushdown automaton w1= one non-terminal w2 = any string Type-3 Regular Finitestate automaton w1= one non-terminal w2 = tAor t (t = terminal A = non-terminal)
  • 9. Syntax Analysis •The syntax analysis portion of a language processor nearly always consists of two parts: –A low-level part called a lexical analyzer(mathematically, a finite automaton based on a regular grammar) –A high-level part called a syntax analyzer, or parser (mathematically, a push-down automaton based on a context-free grammar, or BNF)
  • 10. •Simplicity-less complex approaches can be used for lexical analysis; separating them simplifies the parser •Efficiency-separation allows optimization of the lexical analyzer •Portability-parts of the lexical analyzer may not be portable, but the parser always is portable Reasons to Separate Lexical and Syntax Analysis
  • 11. Lexical Analysis •A lexical analyzer is a pattern matcher for character strings •A lexical analyzer is a “front-end” for the parser •Identifies substrings of the source program that belong together –lexemes –Lexemes match a character pattern, which is associated with a lexical category called a token
  • 12. Lexeme vs. Token result = oldsum–value / 100; Lexemes Tokens result IDENT = ASSIGN_OP oldsum IDENT – SUBSTRACT_OP value IDENT / DIVISION_OP 100 INT_LIT ; SEMICOLON
  • 13. Lexical Analysis •The lexical analyzer is usually a function that is called by the parser when it needs the next token •Three approaches to building a lexical analyzer: –Design a state diagram that describes the tokens and write a program that implements the state diagram –Design a state diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram –Write a formal description of the tokens and use a software tool that constructs table-driven lexical analyzers given such a description
  • 15. DFA •DFA is a 5-tuple •= a finite set of states •= alphabet •is the initial state •is the set of final states •transition function, a function from to (,,,,)MKsF  KsK FK KK
  • 16. DFA • E.g., • Test with the input aabba M  (K,, , s,F) 0 1 K {q ,q }  {a,b} 0 s  q 0 F {q }  q σ δ(q, σ) q_0 a q_0 q_0 b q_1 q_1 a q_1 q_1 b q_0 What is the language accepted by M, a.k.a. L(M)?
  • 17. DFA •Test with the input aabba •Or we can say •So, aabbais accepted by M 000100(,)(,)(,)(,)(,)(,)qaabbaqabbaqbbaqbaqaqe * 00(,)(,)qaabbaqe
  • 18. State Diagram 0 q 1 q a a b b q σ δ(q, σ) q_0 a q_0 q_0 b q_1 q_1 a q_1 q_1 b q_0
  • 19. Example •Design a DFA Mthat accepts the language L(M) = {w: w ϵ{a,b}*and wdoes not contain three consecutive b’s}.
  • 20. Nondeterministic Finite Automata •Permit several possible “next states” for a given combination of current state and input symbol •Accept the empty string ein state diagram •Help simplifying the description of automata •Every NFA is equivalent to a DFA
  • 21. Example • Language L = (ab U aba)* 0 q 1 q a b 2 q a b
  • 22. Example • Language L = (ab U aba)* 0 q 1 q a e 2 q b a
  • 23. Example • Language L = (ab U aba)* 0 q ab aba
  • 24. Example •Design a NFA that accepts the following definition for IDENT –Starts with a letter –Has any number of letter or digit or “_” afterwards
  • 25. Regular Expression (regex) •Describe “regular” sets of strings •Symbols other than ( ) | * stand for themselves •Concatenationα β=First part matches α, second part β •Unionα | β = Match α or β •Kleenestarα* = 0 or more matches of α •Use ( ) for grouping
  • 26. Regular Expression (regex) E(0|1|2|3|4|5|6|7|8|9)* •An E followed by a (possibly empty) sequence of digits E123 E9 E
  • 27. Regular Expression (regex) b a ab ab a | ba a*
  • 28. Convenience Notation •α+= one or more (i.e. αα*) •α?= 0 or 1 (i.e. (α|e)) •[xyz]= x|y|z •[x-y] = all characters from xto y, e.g. [0- 9] = all ASCII digits •[^x-y] = all characters other than [x-y]
  • 29. Convenience Notation •p{Name}, where Nameis a Unicode category (ex. L, N, Z for letter, number, space) •P{Name}: complement of p{Name} •.matches any character •is an escape. For example, .is a period, a backslash
  • 30. RegexExamples •Reserved words: easy WHILE = while BEGIN = begin DO = do END = end •Integers:[+-]?[0-9]+, or maybe [+- ]?p{N}+ •Note: +loses its normal meaning inside [], and a -just before ]denotes itself
  • 31. RegexExamples •Hexadecimal numbers 0[Xx][0-9A-Fa-f]+ •Quoted C++ strings: ".*" •Well, actually not; the . will match a quote •Better: "[^"]*" •Well, actually not; you can have a "in a quoted string •"([^"]|.)*"
  • 32. Exercises •IDENT –Starts with a letter –Has any number of letter or digit or “_” afterwards •C++ floating-point literals –See http://msdn.microsoft.com/en- us/library/tfh6f0w2.aspx
  • 33. ScalaRegexLibrary •Find all matches import scala.util.matching._ valregex= new Regex("[0-9]+") regex.findAllIn("99 bottles, 98 bottles").toList List[String] = List(99, 98) Check whether beginning matches regex.findPrefixOf("99 bottles, 98 bottles").getOrElse(null) String = 99
  • 34. ScalaRegexLibrary •Groups valregex= new Regex("([0-9]+) bottles") valmatches = regex.findAllIn("99 bottles, 98 bottles, 97 cans").matchData.toList matches : List[scala.util.matching.Regex.Match] = List(99 bottles, 98 bottles) matches.map(_.group(1)) List[String] = List(99, 98)
  • 35. Exercises •Find NFA and regexfor anbm: n+mis even •Find NFA and regexfor anbm: n 1, m 1
  • 36. Remind •Design a state diagram that describes the tokens and write a program that implements the state diagram •Design a state diagram that describes the tokens and hand- construct a table-driven implementation of the state diagram •Write a formal description of the tokens and use a software tool that constructs table-driven lexical analyzers given such a description
  • 37. What do lexical analyzers do? •Lexical analyzers extract lexemes from a given input string and produce the corresponding tokens •Old compilers: processed an entire source program •New compilers: locate the next lexeme with token code, then return to syntax analyzer
  • 38. What else? •Skip comments and blanks outside lexemes •Insert user-defined lexemes into the symbol table •Detect syntactic errors in tokens –e.g. ill-formed floating-point literals
  • 39. In the next lecture •How can we describe grammar? •What do syntax analyzers do after receiving lexemes from lexical analyzers? •Build grammar for some parts of your popular programming languages
  • 40. Summary •Syntax analysis is a common part of language implementation •A lexical analyzer is a pattern matcher that isolates small-scale parts of a program •Regular expressions are built based on Finite Automata