CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
YACC Parser Generator for Language Processors
1. System Programming
Walchand Institute of Technology
Aim:
Parser generator using YACC
Theory:
Language processor development tools (LPDTs) focusing on generation of the
analysis phase of language processors.
Figure 1 shows a schematic
language processor whose source
two inputs:
1. Specification
2. Specification
phase.
Figure 1: A Language Processor Development Tool (LPDT)
It generates programs that perform lexical, syntax and semantic analysis
source program and construct the IR. These programs collectively form the
analysis phase of the language processor.
Sunita M. Dol, CSE Dept
Technology, Solapur
HANDOUT#03
Parser generator using YACC (Yet Another Compiler Compiler)
guage processor development tools (LPDTs) focusing on generation of the
analysis phase of language processors.
shows a schematic of an LPDT which generates the analysis phase
whose source language is L. The LPDT requires the following
Specification of a grammar of language L
Specification of semantic actions to be performed in the analysis
: A Language Processor Development Tool (LPDT)
It generates programs that perform lexical, syntax and semantic analysis
source program and construct the IR. These programs collectively form the
language processor.
Sunita M. Dol, CSE Dept
Page 1
(Yet Another Compiler Compiler)
guage processor development tools (LPDTs) focusing on generation of the
an LPDT which generates the analysis phase of a
is L. The LPDT requires the following
semantic actions to be performed in the analysis
: A Language Processor Development Tool (LPDT)
It generates programs that perform lexical, syntax and semantic analysis of the
source program and construct the IR. These programs collectively form the
2. System Programming
Walchand Institute of Technology
Two LPDTs are widely used in practice. These are, the lexical analyzer generator
LEX, and the parser generator YACC. The input to these tools is a specification
the lexical and syntactic constructs
on recognizing the constructs. The specification consists
rules of the form
<
where < semantic action >
matching < string specification
erate C programs which contain the code for scanning and parsing, respectively,
and the semantic actions contained in the specification.
A YACC generated parser can use a LEX generated scanner as a routine i
scanner and parser use same con
Figure 2 shows a schematic for developing the analysis phase
language L using LEX and YACC. The analysis phase processes the source
program to build an intermediate represen
Figure
YACC
Each string specification in the input to YACC resembles a grammar production.
The parser generated by
The actions associated with a string specification are executed when a reduction is
made according to the specification. An attribute is associated with every
nonterminal symbol. The value
The attribute can be given any user
Sunita M. Dol, CSE Dept
Technology, Solapur
Two LPDTs are widely used in practice. These are, the lexical analyzer generator
LEX, and the parser generator YACC. The input to these tools is a specification
the lexical and syntactic constructs of L, and the semantic actions to be performed
nizing the constructs. The specification consists of a set
<string specification> {< semantic action>}
> consists of C code. This code is executed when a string
string specification > is encountered in the input. LEX and YACC gen
erate C programs which contain the code for scanning and parsing, respectively,
and the semantic actions contained in the specification.
A YACC generated parser can use a LEX generated scanner as a routine i
scanner and parser use same conventions concerning the representation
Figure 2 shows a schematic for developing the analysis phase
L using LEX and YACC. The analysis phase processes the source
n intermediate representation.
Figure 2: using LEX and YACC
Each string specification in the input to YACC resembles a grammar production.
by YACC performs reductions according to this grammar.
The actions associated with a string specification are executed when a reduction is
made according to the specification. An attribute is associated with every
bol. The value of this attribute can be manipulated during parsing.
The attribute can be given any user-designed structure. A symbol ‘$n’ in the action
Sunita M. Dol, CSE Dept
Page 2
Two LPDTs are widely used in practice. These are, the lexical analyzer generator
LEX, and the parser generator YACC. The input to these tools is a specification of
L, and the semantic actions to be performed
a set of translation
string specification> {< semantic action>}
C code. This code is executed when a string
> is encountered in the input. LEX and YACC gen-
erate C programs which contain the code for scanning and parsing, respectively,
A YACC generated parser can use a LEX generated scanner as a routine if the
ventions concerning the representation of tokens.
Figure 2 shows a schematic for developing the analysis phase of a compiler for
L using LEX and YACC. The analysis phase processes the source
Each string specification in the input to YACC resembles a grammar production.
YACC performs reductions according to this grammar.
The actions associated with a string specification are executed when a reduction is
made according to the specification. An attribute is associated with every
ute can be manipulated during parsing.
designed structure. A symbol ‘$n’ in the action
3. System Programming
Walchand Institute of Technology
part of a translation rule refers to the attribute
string specification. ‘$$’ represents the attr
specification.
Example
Figure 3 shows sample input to YACC. The input consists
which only two are shown. The routine gendesc builds a descriptor containing the
name and type of an id or constant. The routine gencode takes an op
attributes of two operands, generates code and returns with the attribute
Figure
Parsing of the* string b+c*d where b, c and d are
generated by YACC from the input
C routines:
Gendesc (id#1);
Gendesc (id#2);
Gendesc (id#3);
Gencode (*, [c,real], [d, real]);
Gencode (+, [b, real], [t, real]);
where an attribute has the f
used to store the result of
Compiling the Example Program
To create the desk calculator example program, do the following:
Sunita M. Dol, CSE Dept
Technology, Solapur
a translation rule refers to the attribute of the nth
symbol in the RHS
string specification. ‘$$’ represents the attribute of the LHS symbol
Figure 3 shows sample input to YACC. The input consists of four com
which only two are shown. The routine gendesc builds a descriptor containing the
an id or constant. The routine gencode takes an op
two operands, generates code and returns with the attribute
Figure 3: A sample YACC specification
the* string b+c*d where b, c and d are of type real, using the p
YACC from the input of above Figure leads to following
Gencode (*, [c,real], [d, real]);
Gencode (+, [b, real], [t, real]);
where an attribute has the form <name>, <type> and t is the name
of c*d in the code generated by the first call on gencode.
Compiling the Example Program
To create the desk calculator example program, do the following:
Sunita M. Dol, CSE Dept
Page 3
symbol in the RHS of the
the LHS symbol of the string
four components, of
which only two are shown. The routine gendesc builds a descriptor containing the
an id or constant. The routine gencode takes an operator and the
two operands, generates code and returns with the attribute
: A sample YACC specification
type real, using the parser
Figure leads to following-calls on the
and t is the name of a location
the first call on gencode.
To create the desk calculator example program, do the following:
4. System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 4
1. Process the yacc grammar file using the -d optional flag (which informs the
yacc command to create a file that defines the tokens used in addition to the
C language source code):
yacc -d calc.yacc
2. Use the ls command to verify that the following files were created:
y.tab.c
The C language source file that the yacc command created for the parser
y.tab.h
A header file containing define statements for the tokens used by the parser
3. Process the lex specification file:
lex calc.lex
4. Use the ls command to verify that the following file was created:
lex.yy.c
The C language source file that the lex command created for the lexical
analyzer
5. Compile and link the two C language source files:
cc y.tab.c lex.yy.c
6. Use the ls command to verify that the following files were created:
y.tab.o
The object file for the y.tab.c source file
lex.yy.o
The object file for the lex.yy.c source file
a.out
The executable program file
To run the program directly from the a.out file, type:
$ a.out
Program:
YACC program
The following example shows the contents of the calc.yacc file. This file has
entries in all three sections of a yacc grammar file: declarations, rules, and
programs.
5. System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 5
%{
#include <stdio.h>
int regs[26];
int base;
%}
%start list
%union { int a; }
%type <a> expr number
%token DIGIT LETTER
%left '|'
%left '&'
%left '+' '-'
%left '*' '/' '%'
%left UMINUS /*supplies precedence for unary minus */
%% /* beginning of rules section */
list: /*empty */
|
list stat 'n'
|
list error 'n'
{
yyerrok;
} ;
stat: expr
{
printf("%dn",$1);
}
|
LETTER '=' expr
{
regs[$1] = $3;
} ;
7. System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 7
|
expr '|' expr
{
$$ = $1 | $3;
}
|
'-' expr %prec UMINUS
{
$$ = -$2;
}
|
LETTER
{
$$ = regs[$1];
}
|
number
;
number: DIGIT
{
$$ = $1;
base = ($1==0) ? 8 : 10;
} |
number DIGIT
{
$$ = base * $1 + $2;
} ;
%%
main()
{
return(yyparse());
}
yyerror(s)
8. System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 8
char *s;
{
fprintf(stderr, "%sn",s);
}
yywrap()
{
return(1);
}
The file contains the following sections:
• Declarations Section. This section contains entries that:
o Include standard I/O header file
o Define global variables
o Define the list rule as the place to start processing
o Define the tokens used by the parser
o Define the operators and their precedence
• Rules Section. The rules section defines the rules that parse the input
stream.
o %start - Specifies that the whole input should match stat.
o %union - By default, the values returned by actions and the lexical
analyzer are integers. yacc can also support values of other types,
including structures. In addition, yacc keeps track of the types, and
inserts appropriate union member names so that the resulting parser
will be strictly type checked. The yacc value stack is declared to be a
union of the various types of values desired. The user declares the
union, and associates union member names to each token and
nonterminal symbol having a value. When the value is referenced
through a $$ or $n construction, yacc will automatically insert the
appropriate union name, so that no unwanted conversions will take
place.
o %type - Makes use of the members of the %union declaration and
gives an individual type for the values associated with each part of the
grammar.
o %token - Lists the tokens which come from lex tool with their type.
9. System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 9
• Programs Section. The programs section contains the following
subroutines. Because these subroutines are included in this file, you do not
need to use the yacc library when processing this file.
main
The required main program that calls the yyparse subroutine to
start the program.
yyerror(s)
This error-handling subroutine only prints a syntax error
message.
yywrap
The wrap-up subroutine that returns a value of 1 when the end
of input occurs.
LEX program
This file contains include statements for standard input and output, as well as for
the y.tab.h file. If you use the -d flag with the yacc command, the yacc program
generates that file from the yacc grammar file information. The y.tab.h file
contains definitions for the tokens that the parser program uses. In addition, the
calc.lex file contains the rules to generate these tokens from the input stream. The
following are the contents of the calc.lex file.
%{
#include <stdio.h>
#include "y.tab.h"
int c;
extern int yylval;
%}
%%
" " ;
[a-z] {
c = yytext[0];
yylval = c - 'a';
return(LETTER);
}
[0-9] {
10. System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 10
c = yytext[0];
yylval = c - '0';
return(DIGIT);
}
[^a-z0-9b] {
c = yytext[0];
return(c);
}
Input:
2+3
Output:
5
Conclusion:
YACC file has entries in three sections of a yacc grammar file: declarations,
rules, and programs.
The lexical analyzer file contains include statements for standard input and
output, as well as for the y.tab.h file. If you use the -d flag with the yacc
command, the yacc program generates that file from the yacc grammar file
information.
The y.tab.h file contains definitions for the tokens that the parser program
uses. In addition, the lexical analyzer file contains the rules to generate
these tokens from the input stream.
Thus the parser is implemented using YACC