3. Syntax Analysis
• Scanner and parser
– Scanner looks at the lower level part of the programming language
• Only the language for the tokens
– Parser looks at the higher lever part of the programming language
• E.g., if statement, functions
• The tokens are abstracted
• Parser = Syntax analyzer
– Input: sequence of tokens from lexical analysis
– Output: a parse tree of the program
• E.g., AST
– Process:
• Try to derive from a starting symbol to the input string (How?)
• Build the parse tree following the derivation
4. Top-Down Parsing
• Top down parsing
– Recursive descent parsing
– Making Recursive descent a Predictive parsing
algorithm
• Left recursion elimination
• Left factoring
– Predictive parsing
• First set and Follow set
• Parse table construction
• Parsing procedure
– LL grammars and languages
• LL(1) grammar
5. Predictive parsing
A predictive parser is an efficient way of implementing recursive-descent
parsing by handling the stack of activation records explicitly.
6. First set and Follow set
The construction of a predictive parser is aided by two functions associated with
a grammar G. These functions, FIRST and FOLLOW, allow us to fill in the proper entries of a
predictive parsing table for G, if such a parsing table for G exist.
First :
If a is any string of grammar symbols, let FIRST(a) be the set of terminals that
begin the strings derived from a. If a ⇒ ε then e is also in FIRST(a).
Follow :
Define FOLLOW(A), for nonterminal A, to be the set of terminals a that can
appear immediately to the right of A in some sentential form, that is, the set of terminals a
such that there exists a derivation of the form S ⇒ αAa β for some a and b. Note that there
may, at some time during the derivation, have been symbols between A and a, but if so,
they derived ε and disappeared. If A can be the rightmost symbol in some sentential form,
then $, representing the input right endmarker, is in FOLLOW(A).
7. First
Rules for Compute First set:
1. If X is terminal, then FIRST (X) is {X}.
2. If X is nonterminal and X aα is a production, then add a to
FIRST (X). If X ε is a production, then add ε to FIRST (X).
3. If XY1Y2…..Yk is a production, then for all i such that all of
Y1,Y2…..,Yi-1 are nonterminals and FIRST (Yj) contains ε for
j=1,2,…,i-1 (i.e. Y1Y2…. Yi-1 ⇒ ε), add every non-ε symbol in
FIRST (Yi) to FIRST(X). If ε is in FIRST (Yj) for all j=1, 2… k, then
add ε to FIRST(X).
8. Follow
Rules for Compute Follow set:
1. $ is in FOLLOW(S), where S is the starting symbol.
2. If there is a production A αBβ, β ≠ ε, then everything is in
FIRST (Β) except for ε is placed in FOLLOW (B).
3. If there is a production A αB, or a production A αBβ,
where FIRST (Β) contains ε (i.e. β ⇒ ε), then everything in
FOLLOW (A) is in FOLLOW (B).
9. Example
Grammar
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id | num
Note : This grammar is non left-recursive type so,
we get Both FIRST and FOLLOW sets
10. First
Grammar
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id | num
First set
FIRST (E) = { } FIRST (+) = { + } FIRST (id) = { id }
FIRST(E') = { } FIRST (*) = { * } FIRST (num) = { num }
FIRST(T) = { } FIRST (‘(‘) = { ( }
FIRST(T') = { } FIRST (‘)’) = { ) }
FIRST( F ) = { } FIRST ( ) = { }
The First thing we do is Add terminal
itself in its FIRST set by applying 1st
rule of FIRST set
11. First
Grammar
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id | num
First set
FIRST (E) = { } FIRST (+) = { + } FIRST (id) = { id }
FIRST(E') = { + , ε } FIRST (*) = { * } FIRST (num) = { num }
FIRST(T) = { } FIRST (‘(‘) = { ( }
FIRST(T') = { * , ε } FIRST (‘)’) = { ) }
FIRST( F ) = { ( , id , num } FIRST ( ) = { }
Next, we apply rule 2 i.e. for every
production X aα add a to First (X)
Rule 2 also says that for every
production X ε add ε to First (X)
12. First
Grammar
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id | num
First set
FIRST (E) = { ( , id , num } FIRST (+) = { + } FIRST (id) = { id }
FIRST(E') = { + , ε } FIRST (*) = { * } FIRST (num) = { num }
FIRST(T) = { ( , id , num } FIRST (‘(‘) = { ( }
FIRST(T') = { * , ε } FIRST (‘)’) = { ) }
FIRST( F ) = { ( , id , num } FIRST ( ) = { }
For Production T FT’ 3rd rule says that
everything in FIRST (F) is into FIRST (T).
(since production is like XY1Y2…Yk and
FIRST (Y1) not contain ε )
Similarly For Production E TE’ 3rd rule says
that everything in FIRST (F) is into FIRST (T).
13. Follow
Grammar
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id | num
The First thing we do is Add $ to the
start Symbol nonterminal “E”
Follow set
FOLLOW(E) = { $ }
FOLLOW(E') = { }
FOLLOW(T) = { }
FOLLOW(T') = { }
FOLLOW( F ) = { }
14. Follow
Grammar
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id | num
Follow set
FOLLOW(E) = { $ , ) }
FOLLOW(E') = { }
FOLLOW(T) = { + }
FOLLOW(T') = { }
FOLLOW( F ) = { * }
Next we get this productions on which
we apply rule 2we apply rule 2 to E' →+TE' This says that
everything in First(E') except for ε should be
in Follow(T)Similarly we can apply this rule 2 to other
two productions T' → *FT‘ and F → (E)
which give Follow(F) & Follow(E)
respectively
15. Follow
Grammar
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id | num
Follow set
FOLLOW(E) = { $ , ) }
FOLLOW(E') = { $ , ) }
FOLLOW(T) = { + , $ , ) }
FOLLOW(T') = { + , $ , ) }
FOLLOW(F) = { * , + , $ , ) }
Now applying 3rd rule on this
production we can say everything
in FOLLOW (E) is into FOLLOW (E’).
(since here β = ε )
Now applying 3rd rule on this
production we can say everything
in FOLLOW (E’) is into FOLLOW (T).
(since here β ⇒ ε )
Same here everything in
FOLLOW (T) is into FOLLOW (T’).
(since here β = ε )
Same here everything in
FOLLOW (T’) is into FOLLOW (F).
(since here β ⇒ ε )
16. Parse table
Construct a parse table M[N, T {$}]
Non-terminals in the rows and terminals in the columns
For each production A
For each terminal a First( )
add A to M[A, a]
Meaning: When at A and seeing input a, A should be used
If First( ) then for each terminal a Follow(A)
add A to M[A, a]
Meaning: When at A and seeing input a, A should be used
• In order to continue expansion to
• X AC A B B b | C cc
If First( ) and $ Follow(A)
add A to M[A, $]
Same as the above
17. Construct a Parse Table
Grammar
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id | num
First(*) = {*}
First(F) = {(, id, num}
First(T’) = {*, }
First(T) {(, id, num}
First(E’) = {+, }
First(E) {(, id, num}
Follow(E) = {$, )}
Follow(E’) = {$, )}
Follow(T) = {$, ), +}
Follow(T) = {$, ), +}
Follow(T’) = {$, ), +}
Follow(F) = {*, $, ), +}
E TE’: First(TE’) = {(, id, num}E’ +TE’: First(+TE’) = {+}E’ : Follow(E’) = {$,)}T FT’: First(FT’) = {(, id, num}T’ *FT’: First(*FT’) = {*}T’ : Follow(T’) = {$, ), +}
id num * + ( ) $
E E TE’ E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’ T FT’
T’ T’ *FT’ T’ T’ T’
F F id F num F (E)
18. Now we can have a predictive parsing mechanism
Use a stack to keep track of the expanded form
Initialization
Put starting symbol S and $ into the stack
Add $ to the end of the input string
$ is for the recognition of the termination configuration
If ‘a’ is at the top of the stack and ‘a’ is the next input symbol then
Simply pop ‘a’ from stack and advance on the input string
If ‘A’ is on top of the stack and ‘a’ is the next input symbol then
Assume that M [A, a] = A
Replace A by in the stack
Termination
When only $ in the stack and in the input string
If ‘A’ is on top of the stack and ‘a’ is the next input but
M[A, a] = empty
Error
Parsing procedure
19. id num * + ( ) $
E E TE’ E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’ T FT’
T’ T’ *FT’ T’ T’ T’
F F id F num F (E)
Stack Input Action
E $ id + num * id $ E TE’
T E’ $ id + num * id $ T FT’
F T’ E’ $ id + num * id $ F id
T’ E’ $ + num * id $ T’
E’ $ + num * id $ E’ +TE’
T E’ $ num * id $ T FT’
F T’ E’ $ num * id $ F num
T’ E’ $ * id $ T’ *FT’
F T’ E’ $ id $ F id
T’ E’ $ $ T’
E’ $ $ E’
$ $ Accept
ERROR
Example of Predictive Parsing:
id + num * id