Mais conteúdo relacionado

Lecture 1 introduction to language processors

  1. Lectured by Rebaz Najeeb Compilers (CPL5316) Software Engineering Koya university 2015-2016 e-mail: rebaz.najeeb at koyauniversity dot org Lecture 1 : Introduction to language processors
  2. Outline Overview  Introduction to compiler  The Architecture of a Compiler  Lexical analysis (Scanning)  Syntax Analysis (Parsing)  Semantic Analysis  Intermediate Code Generation  Code Optimization  Code Generation
  3. What to know? The essential knowledge in Computation Theory RE, FSA, and CFG. The essential skills of a Java developer. The Dragon book
  4. Why compilers? • Increase productivity • Low level programming languages are harder to write , less portable , error-prone area, harder to maintain. • Hardware synthesis : Verilog , VHDL to describe high-level hardware description language. Register Transfer Language (RTL)  gates  transistors  physical layout. • Reverse engineering eg. (Executble applications .exe ) runtime (0100101100)  assembly code  high level language.
  5. Understanding binary language is hard • 01000101 01110110 01100101 01110010 01111001 01101111 01101110 01100101 00100000 01100011 01100001 01101110 00100000 01110000 01100001 01110011 01110011 00101100 00100000 01110101 01101110 01101100 01100101 01110011 01110011 00100000 01010011 00101111 01101000 01100101 00100000 01100100 01101111 01100101 01110011 01101110 00100111 01110100 00100000 01110111 01100001 01101110 01110100 00100000 01110100 01101111 00101110 • What does that mean ?
  6. Programming language classification (Levels) • Low-level language Assembly language Machine language • High-level language C, C++, java, Pascal, Prolog, Scheme • Natural language English, Kurdish. * There is also a classification by programming language generations.
  7. Who is that girl? Tell me what is her name?
  8. History of compilers • Grace Hopper, in 1952 ,A-0 System . • Alick Glennie in 1952, Autocode. • John W. Backus Speedcoding , 1953 . - More productive , but 10-20 slower in execution. - took 300 bytes of memory (30% memory) • Fortran, first complete compiler, in 1954-1957 (18 yrs) %50 code were in Fortran. • 1960 Cobol, lisp • 1970 Pascal , C • 1980 OOP Ada , smalltalk , C++ . • 1990 Java , script , Perl • 2000 language specifications
  9. Compilers • A compiler translates the code written in one language (HLL) to some other language (LLL) without changing the meaning of the program. • It is also expected that a compiler should make the target code efficient and optimized in terms of time and space. • Compiler design covers basic translation mechanism and error detection & recovery. • Fortran , Ada, C, C++ , C# , Cobol. Source program Compiler Output (result) Error/ Warning Executable program Executable program Input(data)
  10. interpreters • Interpreter is a type of language processor that directly executes the operations specified in the source program on inputs supplied by the user. • Python , Perl , Basic , Ruby, AWK. Source program Interpreter Output (result) Error/ Warning Input(data)
  11. Compilers vs interpreters •No Compiler Interpreter 1 Compiler Takes Entire program as input Interpreter Takes Single instruction as input . 2 Intermediate Object Code is Generated No Intermediate Object Code isGenerated 3 Conditional Control Statements are Executes faster Conditional Control Statements are Executes slower 4 Memory Requirement : More(Since Object Code is Generated) Memory Requirement is Less 5 Program need not be compiled every time Every time higher level program is converted into lower level program 6 Errors are displayed after entire program is checked Errors are displayed for every instruction interpreted (if any) 7 Example : C Compiler Example : BASIC
  12. Java Virtual Machine • Why Java machine independent ? • Tools to view and edit bytecodes - ASM ( - Jasmin ( • Show demo by CMD
  13. Java compilers List of compilers
  14. Language processing phases? Source Program Preprocessor Compiler Assembler Linker/Loader Target machine code Modified source program Target assembly language Relocatable machine code Library files Relocatable object files Memory
  15. Language processing phases? Source Program Preprocessor Compiler Assembler Linker/Loader Target machine code Modified source program Target assembly language Relocatable machine code Compiler Analysis Synthesis Library files Relocatable object files Memory
  16. Analysis and Synthesis Compiler Analysis Synthesis Lexical analysis Syntax analysis Semantic analysis Intermediate code generation Front-end Code optimization Code Generation Back-end
  17. Analysis VS Synthesis Analysis part Breaks up the source program into constituent pieces and create an intermediate representation of the source program. It is often called the front end of the compiler. The analysis part can be divided along the following phases: Lexical analysis , syntax analysis , semantic analysis and intermediate code generation (optional) Synthesis part Construct the desired target program from the intermediate representation and the information of the symbol table. It is often called the back end. The Synthesis part can be divided along the following phases: Intermediate code generator, code optimizer, code generator
  18. Compiler phases Lexical analysis Syntax analysis Semantic analysis Intermediate code generation Machine-independent Code optimization Code Generator Character stream Token stream Syntax tree Syntax tree Intermediate code representation Intermediate code representation Target machine code Machine-independent Code optimization Target machine code Symbol table
  19. Compiler phases Target code 1. Lexical analysis 2. Syntax analysis 3. Semantic analysis 4. Code optimization 5. Code Generation mages/compiler_phases.jpg
  20. Lexical analysis (Scanning) • How do we understand English ? • Break up sentence and recognize words Iamasmartstudent. I am a smart student. Separator Noun
  21. Lexical analysis (Scanning) lexical analysis is the process of converting a sequence of characters (such as a computer program or web page) into a sequence of tokens (strings with an identified "meaning").
  22. Syntax analysis • How do we understand English? The Smart students never ever give up. S Adv V Sentence
  23. Syntax analysis (Parsing) A Parser reads a stream of tokens from scanner, and determines if the syntax of the program is correct according to the context-free grammar (CFG) of the source language. Then, Tokens are grouped into grammatical phrases represented by a Parse Tree or an abstract syntax tree, which gives a hierarchical structure to the source program.
  24. Syntax analysis example
  25. Semantic analysis For example : A woman without her man is nothing. Vs A woman: without her, man is nothing
  26. Semantic analysis The Semantic Analysis phase checks the (meaning of) source program for semantic errors (Type Checking) and gathers type information for the successive phases. Semantic analysis is the heart of compiler. Also, type checking is the important part in this phase. Check language requirements like proper declarations. Semantic analysis catches inconsistencies for instance mismatching datatypes.
  27. Intermediate code generation After syntax and semantic analysis of the source program , many compilers generate an explicit low-level or machine-like intermediate code. In some compilers, a source program is translated into an intermediate code first and then translated into the target language. In other compilers, translated directly into the target language. One of the popular intermediate code is three-address code. Example: temp1 = int_to_float(60) temp2 = id3 ∗ temp1 temp3 = id2 + temp2 id1 = temp3
  28. Optimization This phase attempts to improve the intermediate code, which is produced. So that faster-running machine code can be achieved in the term of time and space. The optimized code MUST be CORRECT Run Faster (time) Minimize power consumption (Mobile devices) Use less memory Shorter is better Consider network, database access.
  29. Optimize this code
  30. Code Generation The code generator takes as input an intermediate code representation of the source program and maps it into the target language. If the target language is machine code , registers or memory locations are selected for each of the variables used by the program.
  31. Symbol table • A symbol table is a data structure containing a record for each variable name, with fields for the attributes of the name. • Symbol table should allow find the record for each name quickly and to store and retrieve data from that record quickly. • Attributes may provide information about storage allocation, type , scope , number and type of arguments , method of passing, the type returned.
  32. All phases in one example