2. Symbol Table
• When names are found, they will be entered into a symbol
table, which will hold all relevant information about
identifiers, function names, objects, classes, interfaces, etc.
• This information will be used later by the semantic analyzer
and the code generator.
Lexical
Analyzer
Semantic
Analyzer
Code
Generator
Symbol
Table
Syntax
Analyzer
3. Usage of Symbol table by Various Phases of Compiler
• Lexical Analysis: Creates new entries in the table about token.
• Syntax Analysis: Adds information regarding attribute type, scope,
dimension, line of reference, etc in the table.
• Semantic Analysis: Uses available information in the table to check for
semantics i.e. to verify that expressions and assignments are semantically
correct(type checking) and update it accordingly.
• Intermediate Code generation: Refers symbol table for knowing how much
memory and what type is allocated and table helps in adding temporary
variable information.
• Code Optimization: Uses information present in symbol table for machine
dependent optimization.
• Target Code generation: Generates code by using address information of
identifier present in the table.
4. Symbol table
• Symbol table: A data structure used by a compiler to keep track of
semantics of names.
– Determine whether the variable is defined already or not.
– Determine the scope.
• The effective context where a name is valid.
– Where it is stored: storage address.
– Type checking for semantic correctness determination.
• Operations:
– Find / Lookup /Search: Access the information associated with given
name.
– Insert: add a name into the table.
– Delete: remove a name when its scope is closed.
5. 5
Symbol Table
• Compiler uses symbol table to keep track of scope (block) and binding information
about names
• symbol table is changed (updated) every time
– if a new name is discovered
– if new information about an existing name is discovered
• Symbol table must have mechanism to:
– add new entries
– find existing information efficiently
• Two common mechanism:
– linear lists, simple to implement, poor performance
– hash tables, greater programming, good performance
• Compiler should be able to grow symbol table dynamically
• If size is fixed, it must be large enough for the largest program
6. Symbol table Information
Symbol table stores:
• For each type name, its type definition (eg. for the C type
declaration typedef int* mytype, it maps the name mytype to a data
structure that represents the type int*).
• For each variable name, its type. If the variable is an array, it also stores
dimension information. It may also store storage class, offset in activation
record etc.
• For each constant name, its type and value.
• For each function and procedure, its formal parameter list and its output
type. Each formal parameter must have name, type, type of passing (by-
reference or by-value), etc.
8. • Variable Name:
– Must be present to let other phases know which is a particular variable
– Major issue is variability of length of name
– Two popular approaches
• To set a fixed maximum length for variable name
• To keep only a descriptor in the variable name field and keep the
name in general string area referenced by this descriptor
• First approach gives quick table access while the other supports efficient
storage of variable names
• First approach is inefficient in short named variables while second has slow
table access due to referencing
12. Four Structures of Non Block Structured Languages
• Unordered list: (linked list/array)
– for a very small set of variables;
– coding is easy, but performance is bad for large number of variables.
• Ordered linear list:
– use binary search on arrays;
– insertion and deletion are expensive;
– coding is relatively easy.
• Binary search tree:
– O(log n) time per operation (search, insert or delete) for n variables;
– coding is relatively difficult.
• Hash table:
– most commonly used;
– very efficient provided the memory space is adequately larger than the
number of variables;
– performance maybe bad if unlucky or the table is saturated;
– coding is not too difficult.
13.
14.
15. Hash Table Efficiency
• For a given hash table capacity,
– If there are too many buckets, then many buckets will
not be used, leading to space inefficiency.
– If there are too few buckets, then there will be many
clashes, causing the searches to degenerate into
predominately sequential searches, leading to time
inefficiency.
16.
17.
18.
19. Symbol tables in block-structured languages
Symbol tables in block-structured languages:
– 1. many small symbol tables
– 2. one global symbol table
1. many small tables (Stack based Implementation)
– one symbol table per scope.
– use a stack of tables.
– The symbol table for the current scope is on top of the stack.
– The symbol tables for other enclosing scopes are placed under the current
one.
– Push a new table when a new scope is entered.
– Pop a symbol table when a scope is closed.
20. Example
Multiple symbol tables in one stack
H:int
A:int
L:int
x:real
y:real
:
symbol table
stack
{
int H,A,L;
{
real x,y;
:
:
}
{
char A,C,M;
print(M);
H + A ..... ;
X + L ...... ;
}
}
symbol table
21. Example
Multiple symbol tables in one stack
H:int
A:int
L:int
symbol table
stack
{
int H,A,L;
{
real x,y;
:
:
}
{
char A,C,M;
print(M);
H + A ..... ;
X + L ...... ;
}
} Second scope is
completed. So Pop
(remove) the symbol
table of x,y
23. many small tables
• To search for a name, we check the symbol tables on the stack from top to
bottom.
• We may need to search multiple tables.
E.g. A global name is defined in the bottom-most symbol table.
• Space may be wasted if a fixed-sized hash table is used to
implement symbol tables.
- hash table too big – waste the memory space
- hash table too small -- collisions
24. 2. one global table
One Symbol Table with chaining
• All names are in the single global table.
• What about the same name is declared several times?
• Each name is given a scope number.
• <name, scope number> should be unique in the table.
• Easy to search a name.
• New names are placed at the front of lists.
• To close a scope, we need to remove all entries defined in that
scope.
26. One Symbol Table with chaining
• Binary search tree with chaining.
• Use a doubly linked list to chain all entries with the same name.
27.
28. Reference
• A.V. Aho, M.S. Lam, R. Sethi, J. D. Ullman,
Compilers Principles, Techniques and Tools,
Pearson Edition, 2013.
P. Kuppusamy - Lexical Analyzer