Anúncio
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Anúncio
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Anúncio
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Anúncio
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Anúncio
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Anúncio
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Anúncio
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Anúncio
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Anúncio
Unit iv-syntax-directed-translation
Unit iv-syntax-directed-translation
Próximos SlideShares
Modeling with Recurrence RelationsModeling with Recurrence Relations
Carregando em ... 3
1 de 41
Anúncio

Mais conteúdo relacionado

Anúncio

Unit iv-syntax-directed-translation

  1. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.1 M.MANORANJITHAM / AP/IT UNIT IV SYNTAX DIRECTED TRANSLATION & RUN TIME ENVIRONMENT Syllabus: Syntax directed Definitions-Construction of Syntax Tree-Bottom-up Evaluation of S- Attribute Definitions- Design of predictive translator - Type Systems-Specification of a simple type checker-Equivalence of Type Expressions-Type Conversions. RUN-TIME ENVIRONMENT: Source Language Issues-Storage Organization-Storage Allocation- Parameter Passing-Symbol Tables-Dynamic Storage Allocation-Storage Allocation in FORTAN. 4.1 - Introduction Intermediate codes are machine independent codes, but they are close to machine instructions. The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator. Advantages of using a machine-independent intermediate form are as follows: 1. Retargeting is facilitated; a computer for a different machine can be created by attaching a back end for the new machine to an existing front end. 2. A machine-independent code optimizer can be applied to the intermediate representation. For intermediate code generation, there is a notational framework that is an extension of CFG. This frame work is called syntax-directed translation scheme. It allows sub- routing or a semantic action to be attached to the production generates intermediate code when called at appropriate times by a parser for that grammer. There are two notations for associating semantic rules with productions, they are (i) Syntax-directed definition: Syntax-directed definitions are high-level specifications for translations. They hide many implementation details and the user need not specify the order in which translation takes place. (ii) Translation schemes: Translation schemes indicate the order in which semantic rules are to be evaluated, so they allow some implementation details to be shown.
  2. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.2 M.MANORANJITHAM / AP/IT In both Syntax-directed definitions and translation schemes, we parse the input token stream, build the parse tree, and then traverse the tree as needed to evaluate the semantic rules at the parse-tree nodes. Evaluation of the semantic rules may generate code, save information in a symbol table, issue error messages, or perform any other activities. The translation of the token stream is the result obtained by evaluating the semantic rules. 4.2 – Syntax directed Definitions A syntax-directed definition (SDD) is a context-free grammar together with attributes and rules. Attributes are associated with grammar symbols and rules are associated with productions. Node : A node for the grammer symbol in a parse tree as a record with fields for holding information, then an attribute corresponds to the name of a field. An attribute can be represents anything we choose: a string, a type, a memory location or whatever. The value of an attribute at a parse-tree node is defined by a semantic rule associated with the production used at that node. There are two kinds of attributes for nonterminals: synthesized attribute and inherited attribute. Semantic rules set up dependencies between attributes that will be represented by a graph. From the dependency graph, we derive an evaluation order for the semantic rules. Evaluation of the semantic rules defines the values of the attributes at the nodes in the parse tree for the input string. A semantic rule may also have side effects, eg., printing a value or updating a global variable.
  3. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.3 M.MANORANJITHAM / AP/IT A parse tree showing the values of attributes at each node is called an annotated parse tree. The process of computing the attribute values at the nodes is called annotating or decorating the parse tree. Example: The SDD in Fig. 5.1 is based on our familiar grammar for arithmetic expressions with operators + and *. It evaluates expressions terminated by an endmarker n. In the SDD, each of the nonterminals has a single synthesized attribute, called val We also suppose that the terminal digit has a synthesized attribute lexval, which is an integer value returned by the lexical analyzer The rule for production 1, L →E n, the starting nonterminal L is just a procedure tha prints as output the value of the arithmetic generated by E. Production 2, E → E1 + T, also has one rule, which computes the val attribute for the head E as the sum of the values at E1 and T. At any parse tree node N labeled E, the value of val for E is the sum of the values of val at the children of node N labeled E and T. Production 3, E → T, has a single rule that defines the value of val for E to be the same as the value of val at the child for T. Production 4 is similar to the second production; its rule multiplies the values at the children instead of adding them. The rules for productions 5 and 6 copy values at a child, like that for the third production. Production 7 gives F.val the value of a digit, that is, the numerical value of the token digit that the lexical analyzer returned 1. A synthesized attribute for a nonterminal A at a parse-tree node N is defined by a semantic rule associated with the production at N. Note that the production must have A as its head. A synthesized attribute at node N is defined only in terms of attribute values at the children of N and at N itself. A syntax-directed definition that uses synthesized attributes exclusively is said to be an S-attributed definition. A parse tree for an S-attributed definition can always be
  4. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.4 M.MANORANJITHAM / AP/IT annoted by evaluating the semantic rules for the attributes at each node bottom-up from the leaves to the root. For example, the given expression, 3*5+4 followed by a newline, the program prints the value 19. 2. An inherited attribute for a nonterminal B at a parse-tree node N is defined by a semantic rule associated with the production at the parent of N. Note that the production must have B as a symbol in its body. An inherited attribute at node N is defined only in terms of attribute values at N's parent, N itself, and N's siblings. For Example, A declaration generated by the non-terminal D in the syntax-directed definition given below: The given grammer consists of the keyword int or real, following by a list of identifiers. The nonterminal T has a synthesized attribute type, whose value is determined by the keyword in the declaration. The semantic rule L.in:=T.type, associated with production
  5. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.5 M.MANORANJITHAM / AP/IT D→TL, sets inherited attribute L.in to the type in the declaration. The rules then pass this type down the parse tree using the inherited attribute L.in. Rules associated with the productions for L call procedure addtype to add the type of each identifier to its entry in the symbol table. Dependency Graphs: If an attribute b at a node in a parse tree dependence on an attribute c, then the semantic rule for b at that node must be evaluated after the semantic rule that defines c. The interdependencies among the inherited and synthesized attributes at the nodes in a parse tree can be depicted by a directed graph called a dependency graph. Before constructing a dependency graph for a parse tree, we put each semantic rule into the form b:=f(c1, c2,...,ck), by introducing a dummy synthesized attribute b for each semantic rule that consists of a procedure call. The graph has a node for each attribute and an edge to the node for b from the node for c if attribute b depends on attribute c. In more detail, the dependency graph for a given parse tree is constructed as follows. for each node n in the parse tree do for each attribute a of the grammar symbol at n do construct a node in the dependency graph for a; for each node n in the parse tree do for each semantic rule b:=f(c1, c2,...,ck) associated with the production used at n do for i:=1 to k do construct an edge from the node for ci to the node for b;
  6. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.6 M.MANORANJITHAM / AP/IT Example: Consider the following production and rule: PRODUCTION SEMANTIC RULE E → E1 + E2 E.val := E1.val + E2.val We add the edges as shown below The three nodes of the dependency graph marked by ● represent the synthesized attributes E.val, E1.val and E2.val and the edge to E.val from E2.val shows that E.val also depends on E2.val. The dotted lines represent the parse tree and are not part of the dependency graph. Evaluation Order: A topological sort of a directed acyclic graph is any ordering m1,m2,...,mk of the nodes of the graph such that edges go from nodes earlier in the ordering to later nodes; that is, if mi → mj is an edge from mi to mj, then mi appears before mj in the ordering. Any topological sort of a dependency graph gives a valid order in which the semantic rules associated with the nodes in a parse tree can be evaluated. that is, in the topological sort, the dependent attributes c1,c2,....,ck in a semantic rule b:=f(c1, c2,...,ck) are available at a node before f is evaluated.
  7. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.7 M.MANORANJITHAM / AP/IT Example: Each of the edges in the dependency graph in fig 5.7 goes from a lower- numbered node to a higher-numbered node. Hence, a topological sort of the dependency graph is obtained by writing down the node in the order of their number. From this topological sort. We obtain the following program. We write an for the attribute associated with the node numbered n in the dependency graph. a4 := real; a5 := a4; addtype(id3.entry, a5); a7 := a5; addtype(id2.entry, a7); a9 := a7; addtype(id1.entry, a9); Evaluating these semantic rules stores the type real in the symbol-table entry for each identifier. Several methods have been proposed for evaluating semantic rules: 1. Parse-Tree Methods: At compiler time, these methods obtain an evaluation order from a topological sort of the dependency graph constructed from the parse tree for each input. these methods will fail to find an evaluation order only if the dependency graph for the particular parse tree under consideration has a cycle. 2. Rule-Based Method: At compiler-construction time, the semantic rules associated with productions are analyzed, either by hand, or by a specialized tool. For each production, the order in which the attributes associated with that production are evaluated is predetermined at compiler-construction time. 3. Oblivious Methods: An evaluation order is chosen without considering the semantic rules. For example, if translation takes place during parsing, then the order of evaluation is forced by the parsing method, independent of the semantic rules. an oblivious evaluation order restricts the class of syntax-directed definitions that can be implemented. Rule-based and oblivious methods need not explicitly construct the dependency graph at compile time, so they can be more efficient in their use of compile time and space.
  8. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.8 M.MANORANJITHAM / AP/IT 4.3 – Construction of Syntax Trees The use of syntax trees as an intermediate representation allows translation to be decoupled from parsing. Translation routines that are invoked during parsing must live with two kinds of restriction. 1. A grammar that is suitable for parsing may not reflect the natural hierarchical structure of the constructs in the language. 2. The parsing method constrains the order in which nodes in a parse tree are considered. This order may not match the order in which information about a construct become available. Syntax Tree: An (abstract) syntax tree is a condensed form of parse tree useful for representing language constructs. In a syntax tree, operators and keywords do not appear as leaves, but rather are associated with the interior node that would be the parent of those leaves in the parse tree. Constructing Syntax Trees for Expressions: The construction of a syntax tree for an expression is similar to the translation of the expression into postfix form. We construct subtree for the subexpressions by creating a node for each operator and operand. The children of an operator node are the roots of the nodes representing the subexpressions constituting the operands of that operator. Each node in a syntax tree can be implemented as a record with several fields. In the node for an operator, one field identifies the operator and the remaining fields contain pointers to the nodes for the operands. The operator is often called the label of the node. When used for translation, the nodes in a syntax tree may have additional fields to hold the values of attributes attached to the node. Each function returns a pointer to a newly created node. 1. mknode (op, left, right): Creates an operator node with label op and two fields containing pointers to left and right. 2. mkleaf (id, entry): Create an identifier node with lable id and a field containing entry, a pointer to the symbol-table entry for the identifiers.
  9. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.9 M.MANORANJITHAM / AP/IT 3. mkleaf (num, val): Creates a number node with label num and a field containing val, the value of the number. Example: Consider the expression a – 4 + c. In this sequence, p1, p2, ..., p5 are pointers to nodes and entry-a and entry-c are pointers to the symbol-table entries for identifiers a and c, respectively, (1) P1 = mkleaf(id, entry-a); (2) P2 = mkleaf(num, 4); (3) P3 = mknode('-',p1,p2); (4) P4 = mkleaf'(id, entry-c); (5) p5 = mknode('+',p3,p4); A Syntax-Directed Definition for Constructing Syntax Trees: Fig 5.9 contains an S-attributed definition for constructing a syntax tree for an expression containing the operators + and -. It uses the underlying productions of the grammar to schedule the call of the functions mknode and mkleaf to construct the tree. The synthesized attribute nptr for E and T keeps track of the pointers returned by the function calls. An annotated parse tree depicting the construction of a syntax tree for expression a – 4 + c is shown below
  10. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.10 M.MANORANJITHAM / AP/IT Directed Acyclic Graphs for Expressions: A directed acyclic graph for an expression identifies the common subexpressions in the expression. Like a syntax tree, a dag has a node for every subexpression of the expression; a interior node represents an operator and its children represent its operands. The difference is that a node in a dag representing a common subexpression has more than one “parent” in a syntax tree, the common subexpression would be represented as a duplicated subtree. The leaf for a has two parents because a is common to the two subexpressions a and a*(b-c).
  11. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.11 M.MANORANJITHAM / AP/IT 4.4 – Bottom-Up Evaluation of S-Attributed Definitions A translator for an arbitrary syntax-directed definition can be difficult to build. However, there are large classes of useful syntax-directed definitions for which it is easy to construct translators. The S-attribute definitions, that is, the syntax-directed definitions with only synthesized attributes. Synthesized attributes can be evaluated by a bottom-up parser as the input is being parsed. The parser can keep the value of the new synthesized attributes are computed from the attributes appearing on the stack for the grammar symbols on the right side of the reducing production. Synthesized Attributes On The Parser Stack: A translator for an S-attributed definition can often be implemented with the help of an LR-parser generator. A bottom-up parser uses a stack to hold subtree that have been parsed. We can use extra fields in the parser stack to hold the values of synthesized attributes. Let us suppose, as in the figure, that the stack is implemented by a pair of arrays state and val. Each state entry is a pointer to an LR(1) parsing table. It is convenient, however, to refer to the state by the unique grammar symbol that it covers when placed on the parsing stack. If the ith state symbol is A, then val[i] will hold the value of the attribute associated with the parse tree node corresponding to this A. The current top of the stack is indicated by the pointer top. We assume that synthasized attributes are evaluated just before each reduction. Suppose the semantic rule A.a:=f(X.x, Y.y, Z.z) is associated with the production A→XYZ. Before XYZ is reduced to A, the value of the attribute Z.z is in val[top], that of Y.y in val[top-1] and that of X.x in
  12. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.12 M.MANORANJITHAM / AP/IT val[top-2]. If the symbol has no attribute, then the corresponding entry in the val array is undefined. After the reduction, top is decremented by 2,the state covering A is put in state[top] and the value of the synthesized attribute A.a is put in val[top]. Example:
  13. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.13 M.MANORANJITHAM / AP/IT 4.5 – L-Attributed Definitions When thranslation takes place during parsing, the order of evaluation of attributes is linked to the order in which nodes of a parse tree are “created” by the parsing method. A natural order that characterizes many top-down and bottom-up translation methods is the one obtained by applying the procedure dfvisit to the root pf a parser tree. We call this evaluation order the depth-first order. Even if the parse tree is not actually constructed, it is useful to study translation during parsing by considering depth-first evaluation of attributes at the nodes of a parse tree. We now introduce a class of syntax-directed definitions, called L-attributed definitions, whose attributes can always be evaluated in depth-first order. (The L is for “Left”), because attribute information appears to flow from left to right. L-Attributed Definition: A Syntax-directed definition is L-attributed if each inherited attribute of Xj, 1 ≤ j ≤ n, on the right side of A→X1X2....Xn, depends only on 1. The attributes of the symbols X1, X2,.....Xj-1 to the left of Xj in the production and 2. The inherited attributes of A. Note that every S-attributed definition is L-attributed, because the restrictions (1) and (2) apply only to inherited attributes.
  14. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.14 M.MANORANJITHAM / AP/IT Translation Schemes: A translation scheme is a context-free grammar in which attributes are associated with the grammar symbols and semantic actions enclosed between braces { } are inserted within the right side of productions.
  15. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.15 M.MANORANJITHAM / AP/IT 4.6 – Top-Down Translation In this session, L-attributed definitions will be implemented during predictive parsing. We work with translation schemes rather than syntax-directed definitions so we can be explicit about the order in which actions and attributes take place. We also extend the algorithm for left-recursion to translation schemes with synthesized attributes. Eliminating Left Recursion from a Translation Scheme: Since most arithmetic operators associate to the left, it is natural to use left-recursive grammars for expressions.
  16. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.16 M.MANORANJITHAM / AP/IT
  17. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.17 M.MANORANJITHAM / AP/IT
  18. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.18 M.MANORANJITHAM / AP/IT Design of a Predictive Translator: the next algorithm generalizes the construction of predictive parsers to implement a translation scheme based on a grammar suitable for top-down parsing. Algorithm: Construction of a predictive syntax-directed translator Input: A syntax-directed translation scheme with an underlying grammar suitable for predictive parsing. Output: Code for a syntax-directed translator. Method: The technique is a modification of the predictive-parser construction. 1. For each nonterminal A, construct a function that has a formal parameter for each inherited attribute of A and that returns the values of the synthesized attributes of A. For simplicity, we assume that each nonterminal has just one synthesized attribute. The function for A has a local variable for each attribute of each grammar symbol that appears in a production for A. 2. The code for nonterminal A decides what production to use based on the current input symbol. 3. The code associated with each production does the following. We consider the tokens, nonterminals, and actions on the right side of the production from left to right. (i) For token X with synthesized attribute x, save the value of x in the variable declared for X.x. then generate a call to match token X and advance the input. (ii) For nonterminal B, generate an assignment c:=B(b1, b2,..., bk) with a function call on the right side, where b1, b2, .. bk are the variables for the inherited attributes of B and c is the variable for the synthesized attribute of B. (iii)For an action, copy the code into the parser, replacing each reference to an attribute by the variable for that attribute.
  19. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.19 M.MANORANJITHAM / AP/IT
  20. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.20 M.MANORANJITHAM / AP/IT
  21. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.21 M.MANORANJITHAM / AP/IT 4.7 – Type Checking A compiler must check that the source program follows both the syntactic and semantic conventions of the source language. This checking, called static checking, ensures that certain kinds of programming errors will be detected and reported. Examples of static checks include: (i) Type checks. A compiler should report an error if an operator is applied to an incompatible operand; for examp1e;if an array variable and a function variable are added together. (ii) Flow-of-control checks. Statements that cause flow of control to leave a construct must have some place to which to transfer the flow of control. For example, a break statement in C causes control to leave the smallest enclosing while, for, or switch statement; an error occurs if such an enclosing statement does not exist. (iii) Uniqueness checks. There are situations in which an object must be defined exactly once. For example, label in case statement (iv) Name-related checks. Sometimes, the same name must appear two or more times. For example, in Ada, a loop or block may have a name that appears at the beginning and end of the construct. The compiler must check that the same name is used at both places. 4.7.1 TYPE SYSTEMS The design of a type checker for a language is based on information about the syntactic constructs in the language, the notion of types, and the rules for assigning types to language constructs, The following excerpts from the Pascal report and the C reference manual, respectively, are examples of information that a compiler writer might have to start with,  “If both operands of the arithmetic operators of addition, subtraction and multiplication are of type integer, then the result is of type integer.”  “The result of the unary & operator is a pointer to the object referred to by the operand. If the type of the operand is '..,', the type of the result is 'pointer to . . .'." In both Pascal and C, types are either basic or constructed. Basic types are the atomic types with no internal structure as far as the programmer is concerned. In Pascal, the basic types are boolean, character, integer and real. Subrange types, like 1 . . 10, and enumerated types, like (violet, indigo, blue, green, yellow, orange, red) can be treated as basic types.
  22. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.22 M.MANORANJITHAM / AP/IT Pascal allows a programmer to construct types from basic types and other constructed types, with arrays, records, and sets being examples. In addition, pointers and functions can also be treated as constructed types. Type Expression The type of a language construct will be denoted by a "type expression." Informally, a type expression is either a basic type or is formed by applying an operator called a type constructor to other type expressions. The sets of basic types and constructors depend on the language to be checked. We uses the following definition of type expressions: 1. A basic type is a type expression. Among the basic types are boolean, char, integer, and real. A special basic type, type_error, will signal an error during type checking. Finally, a basic type void denoting "the absence of a value" allows statements to be checked. 2. Since type expressions may be named, a type name is a type expression. An example of the use of type names appears in 3(c) below. 3. A type constructor applied to type expressions is a type expression. Constructors include: a) Arrays. If T is a type expression, then array(1, T) is a type expression denoting the type of an array with elements of type T and index set I. I is often a range of integers. For example, the Pascal declaration var A: array[ 1. .10] of integer ; associates the type expression array ( 1 ...10, integer) with A. b) Products. If T1 and T2 are type expressions, then their Cartesian product T1 X T2 is a type expression. We assume that X associates to the left. C) Records;. The difference between a record and a product is that the fields of a record have names. The record type constructor will be applied to a tuple formed from field names and field types. For example, the Pascal program fragment type row a record address: integer; lexeme: array [1..151 of char end ; var table: array [1..101] of row; declares the type name row representing the type expression
  23. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.23 M.MANORANJITHAM / AP/IT record ((address x integer) x (lexeme x array ( 1.. 15, char))) and the variable table to be an array of records of this type. d) Pointers, If T is a type expression, then pointer(T) is a type expression denoting the type "pointer to an object of type T." For example, in Pascal, the declaration var p: ↑ row declares variable p to have type pointer (row). e) Functions, Mathematically, a function maps dements of one set, the domain, to another set, the range. We may treat functions in programming languages as mapping a domain type D to a range type R. The type of such a function will be denoted by the type expression D → R. For example, the built-in function mad of Pascal has domain type int x int, i.e., a pair of integers, and range type int. Thus. we say mod bas the type int x int → int 4. Type expressions may contain variables whose values are type expressions. A convenient way to represent a type expression is to use a graph. Using the syntax- directed approach, we can construct a tree or a dag for a type expression, with interior nodes for type constructors and leaves for basic types, type names, and type variables (Fig, 6.2). Examples: Type Expression: A type system is a collection of rules for assigning type expressions to the various parts of a program. A type checker implements a type system. (i) Static Checking of Types :Checking done by a compiler is said to be static, (ii) Dynamic Checking of Types: Checking done when the target program runs is termed dynamic. A language is said to be strongly typed if its compiler can guarantee that the programs it accepts will execute without type errors.
  24. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.24 M.MANORANJITHAM / AP/IT 4.7.2 Specification of a simple Type Checker The type checker is a translation scheme that synthesizes the type of each expression from the types of its subexpressions. The type checker cans handle arrays, pointers, statements, and functions. A Simple Language The grammar in Fig. 6.3 generates programs, represented by the nonterminal P, consisting of a sequence of declarations D followed by a single expression E. The language has two basic types, char, integer, a third basic type, type error is used to signal errors. Type Checking of Expressions In the following rules, the synthesized attribute type for E gives the type expression assigned by the type system to the expression generated by E. The following semantic rules say that constants represented by the tokens literal and num haw type char and integer, respectively: E → literal {E.type := char} E → num {E.type := integer}
  25. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.25 M.MANORANJITHAM / AP/IT We use a function lookup(e) to fetch the type saved in the symbol-table entry pointed to by e. When an identifier appears in an expression, its declared type is fetched and assigned to the attribute type; E → id { E.type := lookup (id.entry) } The expression formed by applying the mod operator to two subexpressions of type integer has type integer; otherwise, its type is type_error. The rule is E → E1 mod E2 { E.type := if E1.type = integer and E2.type = integer then integer else type_error } In an array reference E1 [ E2 ], the index expression E2 must have type integer, in which case the result is the element type t obtained from the type array(s. t) of E1; we make no use of the index set s of the array. E → E1 [ E2 ] { E.type := if E2.type = integer and E1.type = array(s,t) then t else type_error } Within expressions, the postfix operator ↑ yields the object pointed to by its operand. The type of E ↑ is the type t of the object pointed to by the pointer E: E → E1 ↑ { E.type := if E1.type = pointer(t) then t else type_error } Type Checking of Statements
  26. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.26 M.MANORANJITHAM / AP/IT The first value checks that the left and right sides of an assignment statements have the same type. The second and third rule specify that expressions in conditional and while statements has type void only if each sub statement has type void. Type Checking of Function The application of a function to an argument can be captured by the production E→ E(E) in which an expression 1s the application of one expression to another. The rules for associating type expressions with nonterminal T can be augmented by the following production and action to permit function types in declarations. T → T1’ →’T2 { T.type := T1.type → T2.type} Quotes around the arrow used as a function constructor distinguish it from the arrow used as the meta symbol in a production. The rule for checking the type of a function application is E→ E1(E2) { E.type := if E2.type =s and E1.type =s then t else type_error} This rule says that in an expression formed by applying E1 to E2, the type of E1 must be a function s → t from the type s of E2 to some range type t; the type of El (E2) is t.
  27. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.27 M.MANORANJITHAM / AP/IT 4.7.3 EQUIVALENCE OF TYPE EXPRESSIONS The checking rules is of the form, "if two type expressions are equal then return a certain type else return type_error." It is therefore important to have a precise definition of when two type expressions are equivalent. Potential ambiguities arise when names are given to type expressions and the names are then used in subsequent type expressions. The key issue is whether a name in a type expression stands for itself or whether it is an abbreviation for another type expression. Since there is interaction between the notion of equivalence of types and the representation of types, we shall talk about both together. For efficiency, compilers we representations that allow type equivalence to be determined quickly. The notion of type equivalence implemented by a specific compiler can often be explained using the concepts of structural and name . Structural Equivalence of Type Expressions As long as type expressions are built from basic types and constructors, a natural notion of equivalence between two type expressions is structural equivalence; i.e., two expressions are either the same basic type, or are formed by applying the same constructor to structurally equivalent types. That is, two type expressions are structurally equivalent if and only if they are identical For example, the type expression integer is equivalent only to integer because they are the same basic type. Similarly, pointer (integer) is equivalent only to pointer(integer) because the two are formed by applying the same constructor pointer to equivalent types.
  28. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.28 M.MANORANJITHAM / AP/IT The algorithm far testing structural equivalence in Fig. 7.1 can be adapted to test modified notions of equivalence. It assumes that the only type constructors are for arrays, products, pointers, and functions. The algorithm recursively compares the structure of type expressions without checking for cycles so it can be applied to a tree or a dag representation. Identical type expressions do not need to be represented by the same node in the dag. The array bounds s1 and t1 in s = array (s1, s2) t = array (t1, t2) are ignored if the test for array equivalence in lines 4 and 5 of Fig. 7.1 is reformulated as else if s = array (s1, s2) and t = array (t1, t2) then return sequiv (s2, t2) In certain situations, we can find a representation for type expressions that is significantly more compact than the type graph notation. Names for Type Expressions In some languages, types can be given names. For example, in the Pascal pro- gram fragment type link = ↑ cell; var next : link; last : link; P : ↑ cell; q, r : ↑ cell; the identifier link is declared to be a name for the type ↑cell. The question arises, do the variables next, last, p, q, r all have identical types? Surprisingly, the answer depends on the implementation; The problem arose because the Pascal Report did not define the term "identical type." To model this situation, we allow type expressions to be named and allow these names to appear in type expressions where we previously had only basic types. For example. if cell is the name of a type expression, then pointer(cell) is a type expression. For the time being, suppose there are no circular type expression definitions such as defining cell to be the name of a type expression containing cell.
  29. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.29 M.MANORANJITHAM / AP/IT When names are allowed in type expressions, two notions of equivalence of type expressions arise, depending on the treatment of names. Name equivalence views each type name as a distinct type, so two type expressions are name equivalent if and only if they are identical. Under structural equivalence, names are replaced by the type expressions they define, so two type expressions are structurally equivalent if they represent two structurally equivalent type expressions when all names have been substituted out. Cycles in Representations Type Basic data structures like linked lists and trees are often defined recursively; e.g,, a linked list is either empty or consists of a cell with a pointer to a linked list. Such data structures are usually implemented using records that contain pointers to similar records, and type names play an essential role in defining the types of such records. Consider a linked list of cells, each containing some integer information and a pointer to the next cell in the list. Pascal declarations of type names corresponding to links and cells are: type link = ↑ cell; cell = record info : integer; next : link end; Note that the type name link is defined in terms of cell and that cell is defined in terms of link, so their definitions are recursive. Recursively defined type names can be substituted our if we are willing to introduce cycles into the type graph. If pointer(cell) is substituted for link, the type expression shown in Fig. 6.(a) is obtained for cell. Using cycles as in Fig. 6.8(b), we can eliminate mention of cell from the part of the type graph below the node labeled record.
  30. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.30 M.MANORANJITHAM / AP/IT 4.7.4 TYPE CONVERSIONS Consider expressions like x+ i where x is of type real and i is of type integer. Since the representation of integers and real is different within a computer, and different machine instructions are used for operations on integers and reals, the compiler may have to first convert one of the operands of + to ensure that both operands are of the same type when the addition takes place. The language definition specifies what conversions are necessary . When an integer is assigned to a real, or vice versa, the conversion is to the type of the left side of the assignment, In expressions, the usual transformation is to convert the integer into a real number and then perform a real operation on the resulting pair of real operands. The type checker in a compiler can be used to insert these conversion operations into the intermediate representation of the source program, For example. Postfix notation for x+ i, might be Here, the inttoreal operator converts i from integer to real and then real+ performs real addition on its operands. Type conversion often arises in another context. A symbol having different meanings depending on its context is said to be overloaded. Coercions Conversion from one type to another is said to be implicit if it is to be done automatically by the compiler. Implicit type conversions, also called coercions, are limited in many languages to situations where no information is lost in principle; e.g,, an integer may be converted to a real but nut vice-versa. In practice, however, loss is possible when a real number must fit into the same number of bits as an integer. Conversion is said to be explicit if the programmer must write something to cause the conversion. For all practical purposes, all conversions in Ada are explicit. Explicit conversions look just Like function applications to a type checker, so they present no new problems. For example,. in Pascal, a built-in function ord maps a character to an integer, end chr does the inverse mapping from an integer to a character, so these conversions are explicit, C, on the other hand, coerces (i,e,, implicitiy converts) ASCII characters to integers ktween 0 and 127 in arithmetic expressions.
  31. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.31 M.MANORANJITHAM / AP/IT 4.8 – RUN-TIME ENVIRONMENTS Runtime environment examines relationship between names and data objects. The allocation and de-allocation is managed by run time support package, consisting of routines loaded with the generated target code. Each execution of a procedure is referred to as an activation of the procedure. It procedure is recursive; several activations may be alive at the same time. If a and b are activations of two procedures then their lifetime is either non- overlapping or nested. A procedure is recursive if an activation can being before an either activation of the same procedure has ended. 4.8.1 – Source Languages Issues This section distinguishes between the source text of a procedure and its activations at run time. Procedures: A procedure definition is a declaration that, associates an identifier with a statement. The identifier is the procedure name, and the statement is the procedure body.
  32. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.32 M.MANORANJITHAM / AP/IT For example, the Pascal code in Fig. 7.1 contains the definition of the procedure named readarray on lines 3-7; the body of the procedure is on lines 5-7. Procedures that return values are called functions in many languages; however, it is convenient to refer them a procedures. A complete program will also be treated as a procedure. When a procedure name appears within an executable statement, we say that the procedure is called at that point. The basic idea is that a procedure call executes the procedure body. The main program in lines 21 -25 of Fig. 7.1 calls the procedure readarray at line 23 and then calls quicksort at line 24. Note that procedure calls can also occur within expressions, as on line 16. Some of the identifiers appearing in a procedure definition are special, and are called formal parameters of the procedure. (C calls them "formal arguments" and Fortran calls them "dummy arguments, ") The identifiers m and n on line 12 are formal parameters of quicksort. Arguments, known as actual parameters (or actual) may be passed to a called procedure; they are substituted for the formals in the body. Activation Tree Lets make the following assumptions about the flow of control among procedures during the execution of a program: 1. Control flows sequentially; that is, the execution of a program consists of a sequence of steps, with control being at some specific point in the program at each step. 2. Each execution of a procedure starts at the beginning of the procedure body and eventually returns control to the point immediately following the place where the procedure was called. This means the flow of control between procedures can be depicted using trees. Each execution of a procedure body is referred to as an activation of the procedure, The lifetime of an activation of a procedure p is the sequence of steps between the first and last steps in the execution of the procedure body, including time spent executing procedures called by p, the procedures called by them, and so on. In general, the term "lifetime" refers to a consecutive sequence of steps during the execution of a program. In languages like Pascal, each time control enters a procedure q from procedure p, it eventually returns to p. More precisely, each time control flows from an activation of a procedure p to an activation of a procedure q, it returns to the same activation of p.
  33. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.33 M.MANORANJITHAM / AP/IT If a and b arc procedure activations, then their lifetimes are either non-overlapping or are nested. That is, if b is entered before a is left, then control must leave b before it leaves a. This nested property of activation lifetimes can be illustrated by inserting two print statements in each procedure, one before the first statement of the procedure body and the other after the last, The first statement prints enter followed by the name of the procedure and the values of the actual parameters; the last statement prints leave followed by the same information. One execution of the program in Fig. 7.1 with these print statements produced the output shown in Fig. 7,2. The lifetime of the activation quicksort ( (1, 9) is the sequence of steps executed between printing enter quicksort ( 1 ,9) and printing leave quicksort( l,9). In Fig. 7.2, it is assumed that the value returned by partition(1,9) is 4. A procedure is recursive if a new activation can begin before an earlier activation of the same procedure has ended. Figure 7,2 shows that control enters the activation of quicksort(1,9) , from line 24, early in the execution of the program but leaves this activation almost at the end. In the meantime, there are several other activations of quicksort, so quicksort is recursive. A recursive procedure p need not call itself directly; may call another procedure p, which may then call p through some sequence of procedure calls. We can us a tree, called an activation tree, to depict the way control enters and leaves activations. In an activation tree, 1. each node represents an activation of a procedure, 2. the root represents the activation of the main program,
  34. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.34 M.MANORANJITHAM / AP/IT 3. the node for a is the parent of the node for b if and only if control flows from activation a to b, and 4. the node for a is to the left of the node for b if and only if the lifetime of a occurs before the lifetime of b. Since each node represents a unique activation, and vice versa, it is convenient to talk of control being at a node when it is in the activation represented by the node. The activation tree in Fig. 7.3 is constructed from the output in Fig. 7.2. Control Stacks: The flow of control in a program corresponds to a depth-first traversal of the activation tree that starts at the root, visits a node before its children, and recursively visits children at each node in a left-to-right order. We can use a stack, called a control stack to keep track of live procedure activations, The idea is to push the node for an activation onto the control stack as the activation begins and to pop the node when the activation ends. Then the contents of the control stack are related to paths to the root of the activation tree. When node n is at the top of the control stack, the stack contains the nodes along the path from n to the root. In the previous example, we consider the activation tree when the control reaches q(2,3), then at this point the control stack will contain the following nodes.
  35. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.35 M.MANORANJITHAM / AP/IT The Scope of a Declaration A declaration in a language is a syntactic construct that associates information with a name. Declarations may be explicit, as in the Pascal fragment var i : integer; or they may be Implicit. For example, any variable name starting with I is assumed to denote an integer in a Fortran program, unless otherwise declared. There may be independent declarations of the same name in different parts of a program. The scope rules of a language determine which declaration of a name applies when the name appears in the text of a program. In the Pascal program In Fig. 7,1, i is declared thrice, on lines 4, 9, and 13, and the uses of the name i in procedures readarray, partition, and quicksort are independent of each other. The declaration on line 4 applies to the uses of i on line 6. That is, the two occurrences of i on line 6 are in the scope of the declaration on line 4. The three occurrences of i on lines 16-18 are in the scope of the declaration of i on line 13. The portion of the program to which a declaration applies is called the scope of that declaration. An occurrence of a name in a procedure is said to be local to the procedure if it is in the scope of a declaration within the procedure; otherwise, the occurrence is said to be non-local. The distinction between local and nonlocal names carries over to any syntactic construct that can have declarations within it. While scope is a property of the declaration of a name, it is sometimes convenient to use the abbreviation "the scope of name x" for "the scope of the declaration of name x that applies to this occurrence of x." In this sense, the scope of i on line 17 in Fig. 7. I is the body of quicksort. At compile time, the symbol table can be used to find the declaration that applies to an occurrence of a name. When a declaration is seen, a symbol-table entry is created for it. As long as we are in the scope of the declaration, its entry is returned when the name in it is looked up.
  36. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.36 M.MANORANJITHAM / AP/IT Bindings of Names Even if each name is declared once in a program, the same name may denote different data objects at run time. The informal term "data object" corresponds to a storage location that can hold values. In programming language semantics, the term environment refers to a function that maps a name to a storage location, and the term state refers to a function that maps a storage location to the value held there, as in Fig. 7.5. Using the terms l-value and r-value, an environment maps a name to an l-value, and a state maps the l-value to an r-value. Environments and states are different; an assignment changes the state, but not the environment. For example, suppose that storage address 100, associated with variable pi, holds 0. After the assignment pi := 3. 14, the same storage address is associated with pi, but the value held there is 3.14. When an environment associates storage location s with a name x, we say that x is bound to s; the association itself is referred to as a binding of x. The term storage "location" is to be taken figuratively. If x is not of a basic type, the storage s for x may be a collection of memory words. A binding is the dynamic counterpart of a declaration, as shown in Fig. 7.6. As we have seen, more than one activation of a recursive procedure can be alive at the same time. In Pascal, a local variable name in a procedure is bound to a different storage location in each activation of a procedure.
  37. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.37 M.MANORANJITHAM / AP/IT 4.8.2 – STORAGE ORGANlZATlON I The organization of run-time storage in this sect ion can lae u~d for languages such as Fortran, Pascal, and C. Subdivision of Run-Time Memory Suppose that the compiler obtains a black of storage from the operating system for the compiled program to run in. From the discussion in the last sec- tion, this run-time storage might be subdivided to hold: 1. the generated target code, 2. data objects, and 3. a counterpart of the control stack to keep track of procedure activations. The size of the generated target code is fixed at compile time, so the compiler can place it in a statically determined area, perhaps in the low end of memory. Similarly, the size of some d the data objects may also be known at compile time, and these too can be placed in a statically determined area, as in Fig. 7.7, One reason for statically allocating as many data objects as possible is that the addresses of these objects can be compiled into the target code. All data objects in Fortran can be allocated statically. Implementations of languages like Pascal and C use extensions of the control stack to manage activations of procedures, When a call occurs, execution of an activation is interrupted and information about the status of the machine, such as the value of the program counter and machine registers, is saved on the stack. When control returns from the call, this activation can k restarted after restoring the values of relevant registers and setting the program counter to the point immediately after the call. Data objects whose life times are contained in that of activation can be allocated on the slack, along with other information associated with the activation. A separate area of run-time memory, called a heap, holds all other information. Pascal allows data to be allocated under program control. The storage for such data is taken from the heap. Implementations of languages in which the lifetimes of activations cannot be represented by an activation tree might use the heap to keep information about activations; The controlled way in which data is allocated and deallocated on a stack makes it cheaper to place data on the stack than on the heap.
  38. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.38 M.MANORANJITHAM / AP/IT The sizes of the stack and the heap can change as the program executes, So we show these at opposite ends of memory in Fig. 7.7. where they can grow toward each other as needed. Pascal and C need both a run-time stack and heap, but not all languages do. By convention, stacks grow down. .That is, the "top" of the stack is drawn towards the bottom of the page. Since memory addresses increase as we go down a page, "downwards- growing " means toward higher addresses. If top marks the top of the stack, offsets from the top of the stack can be computed by subtracting the offset from top. On many machines this computation can be done efficiently by keeping the value of top in a register. Stack addresses can then k represented as offsets from top. Activation Records Information needed by a single execution of a procedure is managed using a contiguous block of storage called an activation record or frame, consisting of the collection of fields shown in Fig. 7.8. Not all languages, nor all compilers use all of these fields; often registers can take the place of one or more of them. For languages like Pascal and C, it is customary to push the activation record of a procedure on the run-time stack when the
  39. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.39 M.MANORANJITHAM / AP/IT procedure is called and to pop the activation record off the stack when control returns to the caller. The purpose of the fields of an activation record is as follows, starting from the field for temporaries. 1. Temporary values, such as those arising in the evaluation of expressions, are stored in the field for temporaries. 2. The field for local data holds data that is local to an execution of a procedure. 3. The field for saved machine status holds information about the state of the machine just before the procedure is called. This information includes the values of the program counter and machine registers that have to be restored when control returns from the procedure. 4. For a language like Fortran access links are not needed because nonlocal data is kept in a Fixed place. Access links, or the related "display" mechanism. are needed for Pascal. 5. The optional control link paints to the activation record of the caller 6. The field for actual parameters is used by the calling procedure to supply parameters to the called procedure. We show space for parameters in the activation record, but in practice parameters are often passed in machine registers for greater efficiency. 7. The field for the returned value is used by the called procedure to return a value to the calling procedure, Again, in practice this value is often returned in a register for greater efficiency. The sizes of each of these fields can be determined at the time a procedure is called. In fact, the sizes of almost all fields can be determined at compile time. An exception occurs if a procedure may have a local array whose size is determined by the value of an actual parameter, available only when the procedure is called at run time. Compile-Time Layout of Local Data Suppose run-time storage comes in blocks of contiguous bytes, where a byte is the smallest unit of addressable memory. On many machines, a byte is eight bits and some number of bytes forms a machine word. Multibyte objects are stored in consecutive bytes and given the address of the first byte. The amount of storage needed for a name is determined from its type. An elementary data type, such as a character, integer, or real, can usually be stored in an integer number of
  40. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.40 M.MANORANJITHAM / AP/IT bytes. Storage for an aggregate, such as an array or record, must be large enough to hold all its components. For easy access to the components, storage for aggregates is typically allocated in one contiguous block of bytes. The field for local data is laid out as the declarations in a procedure are examined at compile time, Variable-length data is kept outside this field. We keep a count of the memory locations that have been allocated for previous declarations. From the count we determine a relative address of the storage for a local with respect to some position such as the beginning of the activation record. The relative address, or offset, is the difference between the addresses of the position and the data object. The storage layout for data objects is strongly influenced by the addressing constraints of the target machine, For example, instructions to add integers may expect integers to be aligned, that is, placed at certain positions in memory such as an address divisible by 4. Although an array of ten characters needs only enough bytes to hold ten characters, a compiler may therefore allocate 12 bytes, leaving 2 bytes unused. Space left unused due to alignment considerations is referred to as padding. When space is at a premium, a compiler may pack data so that no padding is left; additional instruction may then need to be executed at run time to position packed data so that it can be operated on as if it were properly aligned.
  41. APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE CS6660- COMPILER DESIGN 4.41 M.MANORANJITHAM / AP/IT
Anúncio