O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Compiler an overview

Próximos SlideShares
Lecture1 introduction compilers
Lecture1 introduction compilers
Carregando em…3

Confira estes a seguir

1 de 50 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Compiler an overview (20)


Mais recentes (20)

Compiler an overview

  1. 1. Introduction to Compiler Dr.P.Amudha Associate Professor Dept. of CSE
  2. 2. Introduction • In order to reduce the complexity of designing and building computers, made to execute relatively simple commands • combining these very simple commands into a program in what is called machine language • . Since this is a tedious and error-prone process most programming is, instead, done using a high-level programming language • . This language can be very different from the machine language that the computer can execute, so some means of bridging the gap is required. • This is where the compiler comes in.
  3. 3. Contd., • Using a high-level language for programming has a large impact on how fast programs can be developed. • The main reasons for this are: • Compared to machine language, the notation used by programming languages is closer to the way humans think about problems. • The compiler can spot some obvious programming mistakes. • Programs written in a high-level language tend to be shorter than equivalent programs written in machine language. • Another advantage of using a high-level level language is: the same program can be compiled to many different machine languages and, hence, run on many different machines.
  4. 4. Translator • A program written in high-level language is called as source code. To convert the source code into machine code, translators are needed. • A translator takes a program written in source language as input and converts it into a program in target language as output. • It also detects and reports the error during translation. Roles of translator are: • Translating the high-level language program input into an equivalent machine language program. • Providing diagnostic messages wherever the programmer violates specification of the high-level language program.
  5. 5. Different type of translators Compiler Compiler is a translator which is used to convert programs in high-level language to low-level language. It translates the entire program and also reports the errors in source program encountered during the translation. Interpreter • Interpreter is a translator which is used to convert programs in high-level language to low-level language. Interpreter translates line by line and reports the error once it encountered during the translation process. • It directly executes the operations specified in the source program when the input is given by the user. • It gives better error diagnostics than a compiler.
  6. 6. • Assembler Assembler is a translator which is used to translate the assembly language code into machine language code.
  7. 7. S.No. Compiler Interpreter 1 Performs the translation of a program as a whole. Performs statement by statement translation. 2 Execution is faster. Execution is slower. 3 Requires more memory as linking is needed for the generated intermediate object code. Memory usage is efficient as no intermediate object code is generated. 4 Debugging is hard as the error messages are generated after scanning the entire program only. It stops translation when the first error is met. Hence, debugging is easy. 5 Programming languages like C, C++ uses compilers. Programming languages like Python, BASIC, and Ruby uses interpreters.
  8. 8. Why learn about compilers? a) It is considered a topic that you should know in order to be “well-cultured” in computer science. b) A good craftsman should know his tools, and compilers are important tools for programmers and computer scientists. c) The techniques used for constructing a compiler are useful for other purposes as well. d) There is a good chance that a programmer or computer scientist will need to write a compiler or interpreter for a domain-specific language
  9. 9. Compilers and Interpreters • “Compilation” – Translation of a program written in a source language into a semantically equivalent program written in a target language Compiler Error messages Source Program Target Program Input Output
  10. 10. Compilers and Interpreters (cont’d) Interpreter Source Program Input Output Error messages • “Interpretation” – Performing the operations implied by the source program
  11. 11. The Analysis-Synthesis Model of Compilation • There are two parts to compilation: – Analysis determines the operations implied by the source program which are recorded in a tree structure – Synthesis takes the tree structure and translates the operations therein into the target program
  12. 12. Other Tools that Use the Analysis-Synthesis Model • Editors (syntax highlighting) • Pretty printers (e.g. Doxygen) • Static checkers (e.g. Lint and Splint) • Interpreters • Text formatters (e.g. TeX and LaTeX) • Silicon compilers (e.g. VHDL) • Query interpreters/compilers (Databases)
  13. 13. Language Processing System Preprocessor Compiler Assembler Linker Skeletal Source Program Source Program Target Assembly Program Relocatable Object Code Absolute Machine Code Libraries and Relocatable Object Files
  14. 14. Language Processing System Preprocessor : • A preprocessor, generally considered as a part of compiler, is a tool that produces input for compilers. • It deals with macro-processing, augmentation, file inclusion, language extension, etc. Compiler: A compiler, translates high-level language into low-level machine language Assembler: • An assembler translates assembly language programs into machine code. • The output of an assembler is called an object file, which contains a combination of machine instructions as well as the data required to place these instructions in memory.
  15. 15. Contd., Loader: • Loader is a part of operating system and is responsible for loading executable files into memory and execute them. • It calculates the size of a program instructions and data and creates memory space for it. • Cross-compiler: A compiler that runs on platform A and is capable of generating executable code for platform B is called a cross-compiler. • Source-to-source Compiler: A compiler that takes the source code of one programming language and translates it into the source code of another programming language is called a source-to-source compiler.
  16. 16. example How a program, using C compiler, is executed on a host machine. • User writes a program in C language (high-level language). • The C compiler, compiles the program and translates it to assembly program (low-level language). • An assembler then translates the assembly program into machine code (object). • A linker tool is used to link all the parts of the program together for execution (executable machine code). • A loader loads all of them into memory and then the program is executed
  17. 17. The Grouping of Phases • Compiler front and back ends: – Front end: analysis (machine independent) – Back end: synthesis (machine dependent) • Compiler passes: – A collection of phases is done only once (single pass) or multiple times (multi pass) • Single pass: usually requires everything to be defined before being used in source program • Multi pass: compiler may have to keep entire program representation in memory
  18. 18. Phases of compiler • Lexical Analysis A token is a string of characters, categorized according to the rules as a symbol (e.g. IDENTIFIER, NUMBER, COMMA, etc.). The process of forming tokens from an input stream of characters is called tokenization and the lexer categorizes them according to symbol type. A token can look like anything that is useful for processing an input text stream or text file.
  19. 19. Contd., • A lexical analyzer generally does nothing with combinations of tokens, a task left for a parser. For example, a typical lexical analyzer recognizes parenthesis as tokens, but does nothing to ensure that each '(' is matched with a ')'. • In a compiler, linear analysis is called lexical analysis or scanning. For example, in lexical analysis the characters in the assignment statement: Position: = initial + rate ∗ 60 would be grouped into the following tokens: 1. The identifier position. 2. The assignment symbol : = 3. The identifier initial. 4. The plus sign + 5. The identifier 𝐫𝐚𝐭𝐞. 6. The multiplication sign 7. The number 60 The blanks separating the characters of these tokens would be eliminated
  20. 20. Syntax Analysis • Hierarchical analysis is called parsing or syntax analysis. • It involves grouping the tokens of the source program into grammatical phrases that are used by the compiler to synthesize output. • Usually, the grammatical phrases of the source program are represented by a parse tree
  21. 21. • The hierarchical structure of a program is usually expressed by recursive rules. • For example, we might have the following rules as part of the definition of expressions: 1. Any identifier is an expression. 2. Any number is an expression. 3. If expression1 and expression2 are expressions, then so are expression1 + expression2 expression1 ∗ expression2 (expression1) Rules (1) and (2) are non-recursive basic rules, while (3) defines expressions in terms of operators applied to other expressions. Thus, by rule (1), initial and rate are expressions. By rule (2), 60 is an expression, while by rule (3), we can first infer that rate∗60 is an expression and finally that initial + rate∗60 is an expression.
  22. 22. Semantic Analysis • The semantic analysis phase checks the source program for semantic errors and gathers type information for the subsequent code-generation phase. • It uses the hierarchical structure determined by the syntax-analysis phase to identify the operators and operands of expressions and statements. • An important component of semantic analysis is type checking. • Here the compiler checks that each operator has operands that are permitted by the source language specification. • For example, many programming language definitions require a compiler to report an error every time a real number is used to index an array. • However, the language specification may permit some operand corrections, • for example, when binary arithmetic operator is applied to an integer and real. • In this case, the compiler may need to convert the integer to a real.
  23. 23. • The Phases of a Compiler
  24. 24. • The six phases of compilation: lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. • The 6 phases divided into 2 Groups: 1. Front End: Depends on stream of tokens and parse tree 2. Back End: Dependent on Target, Independent of source code Symbol-Table Management: • A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. • The data structure allows us to find the record for each identifier quickly and to store or retrieve data from that record quickly. • Symbol table is a Data Structure in a Compiler used for Managing information about variables & their attributes.
  25. 25. Error Detection and Reporting • Each phase can encounter errors. • However, after detecting an error, a phase must somehow deal with that error, so that compilation can proceed, allowing further errors in the source program to be detected. • A compiler that stops when it finds the first error is not as helpful as it could be. • The syntax and semantic analysis phases usually handle a large fraction of the errors detectable by the compiler. • The lexical phase can detect errors where the characters remaining in the input do not form any token of the language. • Errors where the token stream violates the structure rules (syntax) of the language are determined by the syntax analysis phase.
  26. 26. • Intermediate Code Generations:- • An intermediate representation of the final machine language code is produced. • This phase bridges the analysis and synthesis phases of translation. • Code Optimization :- • This is optional phase described to improve the intermediate code so that the output runs faster and takes less space. • Code Generation:- • The last phase of translation is code generation. A number of optimizations to reduce the length of machine language program are carried out during this phase. • The output of the code generator is the machine language program of the specified computer.
  27. 27. Compiler construction tools 1. Parser generators. 2. Scanner generators. 3. Syntax-directed translation engines. 4. Automatic code generators. 5. Data-flow analysis engines. 6. Compiler-construction toolkits. Parser Generators • Input: Grammatical description of a programming language Output: Syntax analyzers. • Parser generator takes the grammatical description of a programming language and produces a syntax analyzer.
  28. 28. Scanner Generators • Input: Regular expression description of the tokens of a language Output: Lexical analyzers. • Scanner generator generates lexical analyzers from a regular expression description of the tokens of a language. Syntax-directed Translation Engines • Input: Parse tree. Output: Intermediate code. • Syntax-directed translation engines produce collections of routines that walk a parse tree and generates intermediate code.
  29. 29. Automatic Code Generators • Input: Intermediate language. Output: Machine language. • Code-generator takes a collection of rules that define the translation of each operation of the intermediate language into the machine language for a target machine. Data-flow Analysis Engines • Data-flow analysis engine gathers the information, that is, the values transmitted from one part of a program to each of the other parts. • Data-flow analysis is a key part of code optimization. Compiler Construction Toolkits • The toolkits provide integrated set of routines for various phases of compiler. • Compiler construction toolkits provide an integrated set of routines for construction of phases of compiler.
  30. 30. Cousins of Compiler • Interpreters: discussed in detail in first lecture • Preprocessors: They produce input for the compiler. They perform jobs such as deleting comments, include files, perform macros etc. • Assemblers: They are translators for Assembly language. Sometimes the compiler will generate assembly language in symbolic form then hand it over to assemblers. • Linkers: Both compilers and assemblers rely on linkers to collect code separately compiled or assembled in object file into a file that is directly executable. • Loaders: It resolves all relocatable addresses to base address
  31. 31. Applications of compiler technology 1. Implementation of High-Level Programming 2. Optimizations for Computer Architectures • Parallelism • M e m o r y hierarchies 3. Design of New Computer Architectures • R I S C • Specialized Architectures 4. Program Translations • Binary Translation • Hardware Synthesis • Database Query Interpreters • Compiled Simulation 5. Software Productivity Tools • Type Checking • Bounds Checking • Memory – Management Tools
  32. 32. Complexity of compiler technology • A compiler is possibly the most complex system software • The complexity arises from the fact that it is required to map a programmer’s requirements (in a HLL program) to architectural details • It uses algorithms and techniques from a very large number of areas in computer science • Translates intricate theory into practice - enables tool building
  33. 33. Compiler Algorithms Makes practical application of • Greedy algorithms - register allocation • Heuristic search - list scheduling • Graph algorithms - dead code elimination, register allocation • Dynamic programming - instruction selection • Optimization techniques - instruction scheduling • Finite automata - lexical analysis • Pushdown automata - parsing • Fixed point algorithms - data-flow analysis • Complex data structures - symbol tables, parse trees, data dependence graphs • Computer architecture - machine code generation
  34. 34. Context Free Grammars • Grammars are used to describe the syntax of a programming language. It specifies the structure of expression and statements. • stmt -> if (expr) then stmt where stmt denotes statements, expr denotes expressions. Types of grammar • Type 0 grammar • Type 1 grammar • Type 2 grammar • Type 3 grammar • Context free grammar is also called as Type 2 grammar.
  35. 35. A context free grammar G is defined by four tuples as, G=(V,T,P,S) where, • G - Grammar • V - Set of variables • T - Set of Terminals • P - Set of productions • S - Start symbol • It produces Context Free Language (CFL) which is a collection of input strings that are terminals, derived from the start symbol of grammar on multiple steps. where, • L-Language • G- Grammar • w - Input string • S - Start symbol • T - Terminal
  36. 36. Conventions Terminals are symbols from which strings are formed. • Lowercase letters i.e., a, b, c. • Operators i.e.,+,-,*· • Punctuation symbols i.e., comma, parenthesis. • Digits i.e. 0, 1, 2, · · · ,9. • Boldface letters i.e., id, if. Non-terminals are syntactic variables that denote a set of strings. • Uppercase letters i.e., A, B, C. • Lowercase italic names i.e., expr , stmt.
  37. 37. • Start symbol is the head of the production stated first in the grammar. • Production is of the form LHS ->RHS (or) head -> body, where head contains only one non-terminal and body contains a collection of terminals and non-terminals. • (eg.) Let G be,
  38. 38. Context Free Grammars vs Regular Expressions • Grammars are more powerful than regular expressions. • Every construct that can be described by a regular expression can be described by a grammar but not vice- versa. • Every regular language is a context free language but reverse does not hold. (eg.) • RE= (a I b)*abb (set of strings ending with abb). • Grammar
  39. 39. Syntax directed definition • Syntax directed definition specifies the values of attributes by associating semantic rules with the grammar productions. • It is a context free grammar with attributes and rules together which are associated with grammar symbols and productions respectively. • The process of syntax directed translation is two- fold: • Construction of syntax tree • Computing values of attributes at each node by visiting the nodes of syntax tree.
  40. 40. • Semantic actions • Semantic actions are fragments of code which are embedded within production bodies by syntax directed translation. • They are usually enclosed within curly braces ({ }). • It can occur anywhere in a production but usually at the end of production. • (eg.) • E---> E1 + T {print ‘+’}
  41. 41. Types of translation L-attributed translation • It performs translation during parsing itself. • No need of explicit tree construction. • L represents 'left to right'. • S-attributed translation • It is performed in connection with bottom up parsing. • 'S' represents synthesized.
  42. 42. Types of attributes Inherited attributes • It is defined by the semantic rule associated with the production at the parent of node. • Attributes values are confined to the parent of node, its siblings and by itself. • The non-terminal concerned must be in the body of the production. Synthesized attributes • It is defined by the semantic rule associated with the production at the node. • Attributes values are confined to the children of node and by itself. • non terminal concerned must be in the head of production. • Terminals have synthesized attributes which are the lexical values (denoted by lexval) generated by the lexical analyzer.
  43. 43. Syntax directed definition of simple desk calculator Production Semantic rules L ---> En L.val = E.val E ---> E1+ T E.val = E1.val+ T.val E ---> T E.val = T.val T---> T1*F T.val = Ti.val x F.val T ---> F T.val = F.val F ---> (E) F.val = E.val F ---> digit F.val = digit.lexval
  44. 44. Syntax-directed definition-inherited attributes Production Semantic Rules D --->TL L.inh = T.type T ---> int T.type =integer T ---> float T.type = float L ---> L1, id L1.inh = L.inh addType (id.entry, Linh) L ---> id addType (id.entry, L.inh) Symbol T is associated with a synthesized attribute type. • Symbol L is associated with an inherited attribute inh,
  45. 45. Types of Syntax Directed Definitions • S-attributed Definitions • Syntax directed definition that involves only synthesized attributes is called S-attributed. • Attribute values for the non-terminal at the head is computed from the attribute values of the symbols at the body of the production. • The attributes of a S-attributed SDD can be evaluated in bottom up order of nodes of the parse tree. • i.e., by performing post order traversal of the parse tree and • evaluating the attributes at a node when the traversal leaves that node for the last time.
  46. 46. Production Semantic rules L ---> En L.val = E.val E ---> E1+ T E.val = E1.val+ T.val E ---> T E.val = T.val T---> T1*F T.val = Ti.val x F.val T ---> F T.val = F.val F ---> (E) F.val = E.val F ---> digit F.val = digit.lexval
  47. 47. • L-attributed Definitions • The syntax directed definition in which the edges of dependency graph for the attributes in production body, can go from left to right and not from right to left is called L-attributed definitions. • Attributes of L-attributed definitions may either be synthesized or inherited. • If the attributes are inherited, it must be computed from: • Inherited attribute associated with the production head. • Either by inherited or synthesized attribute associated with the production located to the left of the attribute which is being computed. • Either by inherited or synthesized attribute associated with the attribute under consideration in such a way that no cycles can be formed by it in dependency graph.
  48. 48. Production Semantic Rules T ---> FT' T '.inh = F.val T ' ---> *FT1’ T’1.inh =T'.inh x F.val In production 1, the inherited attribute T' is computed from the value of F which is to its left. In production 2, the inherited attributed Tl' is computed from T'. inh associated with its head and the value of F which appears to its left in the production. i.e., for computing inherited attribute it must either use from the above or from the left information of SDD.