SlideShare uma empresa Scribd logo
1 de 30
Inside PHP
Tom Lee @tglee

 OSCON 2012
19th July, 2012
Overview

• About   me!
 • New Relic’s PHP Agent escapee.
 • Now on New Projects, doing unspeakably     un-PHP things.
 • Wannabe compiler nerd.

• Terminology    & brief intro to compilers:
 • Grammars, Scanners & Parsers
 • General architecture of a bytecode   compiler

• Hands   on: Modifying the PHP language
 • PHP/Zend compiler architecture & summary
 • Case study in adding a new keyword
“Zend” vs. “Zend Engine” vs. “PHP”

•I   will use all of these interchangeably throughout this talk.
• Referring   to the bytecode compiler in the “Zend Engine 2” in most cases.
• The   distinction doesn’t really matter here.
Compilers 101: Scanners

• Or   lexical analyzers, or tokenizers                         T_WHILE

• Input:   raw source code
                                                                   '('
• Output:   a stream of tokens
                                                             T_VARIABLE("x")

                                          while ($x == $y)
                                                              T_IS_EQUAL



                                                             T_VARIABLE("y")



                                                                   ')'
Compilers 101: Parsers

• Input:       a stream of tokens from the scanner           T_WHILE


• Output         is implementation dependent                    '('
 • Often
       an intermediate, in-memory representation of the
   program in tree form.                                  T_VARIABLE("x")   0:   ZEND_IS_EQUAL ~0 !0 !1
   • e.g.   Parse Tree or Abstract Syntax Tree                              1:   ZEND_JMPZ ~0 ->3
                                                                            2:   …
 • Or   directly generate bytecode.                                         3:   …
                                                           T_IS_EQUAL


• Goal of a parser is to structure
                                                          T_VARIABLE("y")
 the token stream.
• Parsers        are frequently generated from a DSL
                                                                ')'


 • Seeparser generators like Yacc/Bison, ANTLR, etc.
   or e.g. parser combinators in Haskell, Scala, ML.
Compilers 101: Context-free grammars

• Or   simply “grammar”
•A   grammar describes the complete syntax of a (programming) language.
• Usually     expressed in Extended Backus-Naur Form (EBNF)
 • Or   some variant thereof.

• Variants    of EBNF used for a lot of DSL-based parser generators
 • e.g.   Yacc/Bison, ANTLR, etc.
Generalized Compiler Architecture*

    Source files   Source code                Scanner                  Token stream




                                                                           Parser




     Bytecode                                                          Abstract
                    Bytecode              Code Generator
    Interpreter                                                       Syntax Tree




                           * Actually a generalized *bytecode* compiler architecture
Generalized *PHP* Compiler Architecture
         Source files          Source code                   Scanner                    Token stream


                                                            nguage_ scanner.l
                                            Zend /zend_la




                                                                                              Parser
                                                                                                               y
                                                                                              languag e_parser.
                                                                                Ze nd/zend_




          Bytecode                                                                      Abstract
                                Bytecode              Code Generator
         Interpreter                                                                   Syntax Tree


                   xecute.c                                   compile.c                 PHP
               d_e                                Ze nd/zend_                        compil
    Zend/zen                                                                                 es
                                                                                    directly
                                                                                             to
                                                                                    byteco
                                                                                           de!
Case Study: The “until” statement

           <?php                          It’s basically
                                          while (!...) ...
           $x = 5;

           until ($x == 0) {
             $x--;
             echo “Oh hi, Mark [$x]n”;
           }

           -- output --

           Oh hi, Mark [4]
           Oh hi, Mark [3]
           Oh hi, Mark [2]
           Oh hi, Mark [1]
           Oh hi, Mark [0]
How to add “until” to the PHP language

1.Tell the scanner how to tokenize new keyword(s)
2.Describe the syntax of the new construct
3.Emit bytecode
Before you start...

• You’ll    need the usual gcc toolchain, GNU Bison, etc.
  • Debian/Ubuntuapt-get install build-essential
  • OSX Xcode command line tools should give you most of what you need.

• Also    ensure that you have re2c
  • Debian/Ubuntu   apt-get install re2c
  • OSX (Homebrew) brew install re2c
  • Used to generate the scanner
  • Silently ignored if not found by the configure script!

• And,     of course, source code for some recent version of PHP 5.
  • I’m   working with PHP 5.4.4
1. Tell the scanner how to tokenize “until”

                                                                                             T_UNTIL
• Zend/zend_language_scanner.l
  • Inputfor re2c, which will generate the Zend language scanner.
                                                                                                '('
  • Describes how raw source code should be converted into tokens.
  • Note that no structure is implied here: that’s the parser’s job.
                                                                                          T_VARIABLE("x")

• Tell   the scanner that the word “until” is special.                 until ($x == $y)
                                                                                           T_IS_EQUAL
• The    parser also needs to know about new tokens!
• How     is this done for the while keyword?                                             T_VARIABLE("y")



                                                                                                ')'
2. Describe the syntax of “until”

• Zend/zend_language_parser.y
  • Essentially serves as the grammar for the Zend language.
  • Also describes actions to perform during parsing.
  • Input for the the parser generator (Bison) used to generate the PHP parser.

• Tell   PHP how until statements are structured syntactically.
• How     was it done for a while statement?




                                         T_UNTIL         '('       expr           ')'   statement
3. Emit bytecode

• Add     actions to Zend/zend_language_parser.y
 • What    should they do?

• Recall    that PHP generates bytecode during the parsing process.
• Generate  bytecode describing the semantics of
 until in terms of the PHP VM.
 • Er,   wait -- what bytecode do we need to generate?   Compiler




                                                         Bytecode
Intermission: PHP bytecode intro

• opline                                                              <opcode> <result?> <op1?> <op2?>

 • Data structure representing a single line of PHP VM “assembly”
 • Includes opcode + operands                                                     ZEND_JMP <op1>
                                                                       Unconditional jump to the opline # in op1
 • opline # associated with each opline
                                                                                 e.g. jump to opline #10
• Different    variable types, differentiated by prefix:                             ZEND_JMP ->10

 • Variables
           ($)
                                                                         ZEND_JMPZ <op1> <op2>
 • Compiled variables (!)                                           Conditional jump to the opline # in op2
 • Temporary variables (~)                                                       iff op1 is zero

                                                                      e.g. jump to opline #3 if ~0 is zero
• ZEND_JMP                                                                    ZEND_JMPZ ~0 ->3
 • “goto”
 • Conditional variants: ZEND_JMPZ, ZEND_JMPNZ                        ZEND_IS_EQUAL <result> <op1> <op2>
 • opline #s used as address operand for JMP instructions (->)       result=1 if op1 == op2, otherwise result=0

                                                                              e.g. set ~0=1 if !0 == 10
                                                                              ZEND_IF_EQUAL ~0 !0 10
Unconditional jump: ZEND_JMP


                          0: ...




                          1: ...




                     2: ZEND_JMP ->0
Unconditional jump: ZEND_JMP


                          0: ...




                          1: ...




                     2: ZEND_JMP ->0
Unconditional jump: ZEND_JMP


                          0: ...




                          1: ...




                     2: ZEND_JMP ->0
Unconditional jump: ZEND_JMP


                          0: ...




                          1: ...




                     2: ZEND_JMP ->0
Conditional jump: ZEND_JMPZ / ZEND_JMPNZ

                           0: ...




                           1: ...




                    2: ZEND_JMPZ ~0 ->0




                           3: ...
Conditional jump: ZEND_JMPZ / ZEND_JMPNZ

                           0: ...




                           1: ...




                    2: ZEND_JMPZ ~0 ->0




                           3: ...
Conditional jump: ZEND_JMPZ / ZEND_JMPNZ

                           0: ...




                           1: ...




                    2: ZEND_JMPZ ~0 ->0




                           3: ...
Conditional jump: ZEND_JMPZ / ZEND_JMPNZ

                           0: ...




                           1: ...




                    2: ZEND_JMPZ ~0 ->0




                           3: ...
Conditional jump: ZEND_JMPZ / ZEND_JMPNZ

                           0: ...




                           1: ...




                    2: ZEND_JMPZ ~0 ->0




                           3: ...
Conditional jump: ZEND_JMPZ / ZEND_JMPNZ

                           0: ...




                           1: ...




                    2: ZEND_JMPZ ~0 ->0




                           3: ...
Conditional jump: ZEND_JMPZ / ZEND_JMPNZ

                           0: ...




                           1: ...




                    2: ZEND_JMPZ ~0 ->0




                           3: ...
4. Emit bytecode (cont.)

• Zend/zend_compile.c
  • The Zend language’s code generation logic   lives here.
  • No DSLs here: plain old C source code.

• First,   let’s try to understand the bytecode for while
• How      do we need to modify it for until?
Demo!

• Time      to build!
 • The   usual ./configure && make dance on Linux & OSX.

• Tobe thorough, regenerate data used by the tokenizer extension.
 (cd ext/tokenizer && ./tokenizer_data_gen.sh)
 • http://php.net/manual/en/book.tokenizer.php
 • You’ll   need to run make again once you’ve done this.

• With   a little luck, magic happens and you get a binary in sapi/cli/php
• Take   until out for a spin!
And exhale.

• Lots    to take in, right?
 • In   my experience, this stuff is best learned bit-by-bit through practice.

• Ask    questions!
 • Google
 • php-internals
 • Or hey, ask me...
Thanks!




                     oscon@tomlee.co    @tglee

                        http://newrelic.com




          ... and come see Inside Python @ 5pm in D135 :)

Mais conteúdo relacionado

Mais procurados

Mais procurados (14)

Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric System
 
Lexing and parsing
Lexing and parsingLexing and parsing
Lexing and parsing
 
Php extensions
Php extensionsPhp extensions
Php extensions
 
Python - Introduction
Python - IntroductionPython - Introduction
Python - Introduction
 
Python Workshop
Python WorkshopPython Workshop
Python Workshop
 
Os Goodger
Os GoodgerOs Goodger
Os Goodger
 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomatico
 
Python Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & stylePython Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & style
 
Python by Rj
Python by RjPython by Rj
Python by Rj
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python
 
name name2 n2
name name2 n2name name2 n2
name name2 n2
 
ppt18
ppt18ppt18
ppt18
 
name name2 n2.ppt
name name2 n2.pptname name2 n2.ppt
name name2 n2.ppt
 
ppt9
ppt9ppt9
ppt9
 

Destaque

What's New In PHP7
What's New In PHP7What's New In PHP7
What's New In PHP7Petra Barus
 
PHP 7 - Above and Beyond
PHP 7 - Above and BeyondPHP 7 - Above and Beyond
PHP 7 - Above and Beyondrafaelfqf
 
The Php Life Cycle
The Php Life CycleThe Php Life Cycle
The Php Life CycleXinchen Hui
 
AST - the only true tool for building JavaScript
AST - the only true tool for building JavaScriptAST - the only true tool for building JavaScript
AST - the only true tool for building JavaScriptIngvar Stepanyan
 
PHP7 - For Its Best Performance
PHP7 - For Its Best PerformancePHP7 - For Its Best Performance
PHP7 - For Its Best PerformanceXinchen Hui
 
PHP7 - The New Engine for old good train
PHP7 - The New Engine for old good trainPHP7 - The New Engine for old good train
PHP7 - The New Engine for old good trainXinchen Hui
 
The secret of PHP7's Performance
The secret of PHP7's Performance The secret of PHP7's Performance
The secret of PHP7's Performance Xinchen Hui
 
PHP 7 Crash Course - php[world] 2015
PHP 7 Crash Course - php[world] 2015PHP 7 Crash Course - php[world] 2015
PHP 7 Crash Course - php[world] 2015Colin O'Dell
 
Quick tour of PHP from inside
Quick tour of PHP from insideQuick tour of PHP from inside
Quick tour of PHP from insidejulien pauli
 
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)James Titcumb
 

Destaque (14)

ElePHPant7 - Introduction to PHP7
ElePHPant7 - Introduction to PHP7ElePHPant7 - Introduction to PHP7
ElePHPant7 - Introduction to PHP7
 
Sllideshare
SllideshareSllideshare
Sllideshare
 
TDC SP 2015 - PHP7: better & faster
TDC SP 2015 - PHP7: better & fasterTDC SP 2015 - PHP7: better & faster
TDC SP 2015 - PHP7: better & faster
 
What's New In PHP7
What's New In PHP7What's New In PHP7
What's New In PHP7
 
PHP 7 - Above and Beyond
PHP 7 - Above and BeyondPHP 7 - Above and Beyond
PHP 7 - Above and Beyond
 
The Php Life Cycle
The Php Life CycleThe Php Life Cycle
The Php Life Cycle
 
AST - the only true tool for building JavaScript
AST - the only true tool for building JavaScriptAST - the only true tool for building JavaScript
AST - the only true tool for building JavaScript
 
PHP7 - For Its Best Performance
PHP7 - For Its Best PerformancePHP7 - For Its Best Performance
PHP7 - For Its Best Performance
 
PHP7 - The New Engine for old good train
PHP7 - The New Engine for old good trainPHP7 - The New Engine for old good train
PHP7 - The New Engine for old good train
 
The secret of PHP7's Performance
The secret of PHP7's Performance The secret of PHP7's Performance
The secret of PHP7's Performance
 
PHP 7 Crash Course - php[world] 2015
PHP 7 Crash Course - php[world] 2015PHP 7 Crash Course - php[world] 2015
PHP 7 Crash Course - php[world] 2015
 
PHP7 is coming
PHP7 is comingPHP7 is coming
PHP7 is coming
 
Quick tour of PHP from inside
Quick tour of PHP from insideQuick tour of PHP from inside
Quick tour of PHP from inside
 
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)
Climbing the Abstract Syntax Tree (Bulgaria PHP 2016)
 

Semelhante a Inside PHP [OSCON 2012]

Introduction to compiler
Introduction to compilerIntroduction to compiler
Introduction to compilerAbha Damani
 
Embedding Languages Without Breaking Tools
Embedding Languages Without Breaking ToolsEmbedding Languages Without Breaking Tools
Embedding Languages Without Breaking ToolsLukas Renggli
 
Creating a Fibonacci Generator in Assembly - by Willem van Ketwich
Creating a Fibonacci Generator in Assembly - by Willem van KetwichCreating a Fibonacci Generator in Assembly - by Willem van Ketwich
Creating a Fibonacci Generator in Assembly - by Willem van KetwichWillem van Ketwich
 
Lecture 3 getting_started_with__c_
Lecture 3 getting_started_with__c_Lecture 3 getting_started_with__c_
Lecture 3 getting_started_with__c_eShikshak
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingPositive Hack Days
 
Os Worthington
Os WorthingtonOs Worthington
Os Worthingtonoscon2007
 
College Project - Java Disassembler - Description
College Project - Java Disassembler - DescriptionCollege Project - Java Disassembler - Description
College Project - Java Disassembler - DescriptionGanesh Samarthyam
 
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARFHES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARFHackito Ergo Sum
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manualSami Said
 

Semelhante a Inside PHP [OSCON 2012] (20)

Inside Python
Inside PythonInside Python
Inside Python
 
Unit 1 cd
Unit 1 cdUnit 1 cd
Unit 1 cd
 
1 cc
1 cc1 cc
1 cc
 
C tutorial
C tutorialC tutorial
C tutorial
 
Introduction to compiler
Introduction to compilerIntroduction to compiler
Introduction to compiler
 
Embedding Languages Without Breaking Tools
Embedding Languages Without Breaking ToolsEmbedding Languages Without Breaking Tools
Embedding Languages Without Breaking Tools
 
Creating a Fibonacci Generator in Assembly - by Willem van Ketwich
Creating a Fibonacci Generator in Assembly - by Willem van KetwichCreating a Fibonacci Generator in Assembly - by Willem van Ketwich
Creating a Fibonacci Generator in Assembly - by Willem van Ketwich
 
Lecture 3 getting_started_with__c_
Lecture 3 getting_started_with__c_Lecture 3 getting_started_with__c_
Lecture 3 getting_started_with__c_
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash Cracking
 
Xtext Webinar
Xtext WebinarXtext Webinar
Xtext Webinar
 
Os Worthington
Os WorthingtonOs Worthington
Os Worthington
 
College Project - Java Disassembler - Description
College Project - Java Disassembler - DescriptionCollege Project - Java Disassembler - Description
College Project - Java Disassembler - Description
 
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARFHES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
 
Writing Parsers and Compilers with PLY
Writing Parsers and Compilers with PLYWriting Parsers and Compilers with PLY
Writing Parsers and Compilers with PLY
 
C tutorial
C tutorialC tutorial
C tutorial
 
Xtext Webinar
Xtext WebinarXtext Webinar
Xtext Webinar
 
Assembler
AssemblerAssembler
Assembler
 
7986-lect 7.pdf
7986-lect 7.pdf7986-lect 7.pdf
7986-lect 7.pdf
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manual
 
LANGUAGE TRANSLATOR
LANGUAGE TRANSLATORLANGUAGE TRANSLATOR
LANGUAGE TRANSLATOR
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Inside PHP [OSCON 2012]

  • 1. Inside PHP Tom Lee @tglee OSCON 2012 19th July, 2012
  • 2. Overview • About me! • New Relic’s PHP Agent escapee. • Now on New Projects, doing unspeakably un-PHP things. • Wannabe compiler nerd. • Terminology & brief intro to compilers: • Grammars, Scanners & Parsers • General architecture of a bytecode compiler • Hands on: Modifying the PHP language • PHP/Zend compiler architecture & summary • Case study in adding a new keyword
  • 3. “Zend” vs. “Zend Engine” vs. “PHP” •I will use all of these interchangeably throughout this talk. • Referring to the bytecode compiler in the “Zend Engine 2” in most cases. • The distinction doesn’t really matter here.
  • 4. Compilers 101: Scanners • Or lexical analyzers, or tokenizers T_WHILE • Input: raw source code '(' • Output: a stream of tokens T_VARIABLE("x") while ($x == $y) T_IS_EQUAL T_VARIABLE("y") ')'
  • 5. Compilers 101: Parsers • Input: a stream of tokens from the scanner T_WHILE • Output is implementation dependent '(' • Often an intermediate, in-memory representation of the program in tree form. T_VARIABLE("x") 0: ZEND_IS_EQUAL ~0 !0 !1 • e.g. Parse Tree or Abstract Syntax Tree 1: ZEND_JMPZ ~0 ->3 2: … • Or directly generate bytecode. 3: … T_IS_EQUAL • Goal of a parser is to structure T_VARIABLE("y") the token stream. • Parsers are frequently generated from a DSL ')' • Seeparser generators like Yacc/Bison, ANTLR, etc. or e.g. parser combinators in Haskell, Scala, ML.
  • 6. Compilers 101: Context-free grammars • Or simply “grammar” •A grammar describes the complete syntax of a (programming) language. • Usually expressed in Extended Backus-Naur Form (EBNF) • Or some variant thereof. • Variants of EBNF used for a lot of DSL-based parser generators • e.g. Yacc/Bison, ANTLR, etc.
  • 7. Generalized Compiler Architecture* Source files Source code Scanner Token stream Parser Bytecode Abstract Bytecode Code Generator Interpreter Syntax Tree * Actually a generalized *bytecode* compiler architecture
  • 8. Generalized *PHP* Compiler Architecture Source files Source code Scanner Token stream nguage_ scanner.l Zend /zend_la Parser y languag e_parser. Ze nd/zend_ Bytecode Abstract Bytecode Code Generator Interpreter Syntax Tree xecute.c compile.c PHP d_e Ze nd/zend_ compil Zend/zen es directly to byteco de!
  • 9. Case Study: The “until” statement <?php It’s basically while (!...) ... $x = 5; until ($x == 0) { $x--; echo “Oh hi, Mark [$x]n”; } -- output -- Oh hi, Mark [4] Oh hi, Mark [3] Oh hi, Mark [2] Oh hi, Mark [1] Oh hi, Mark [0]
  • 10. How to add “until” to the PHP language 1.Tell the scanner how to tokenize new keyword(s) 2.Describe the syntax of the new construct 3.Emit bytecode
  • 11. Before you start... • You’ll need the usual gcc toolchain, GNU Bison, etc. • Debian/Ubuntuapt-get install build-essential • OSX Xcode command line tools should give you most of what you need. • Also ensure that you have re2c • Debian/Ubuntu apt-get install re2c • OSX (Homebrew) brew install re2c • Used to generate the scanner • Silently ignored if not found by the configure script! • And, of course, source code for some recent version of PHP 5. • I’m working with PHP 5.4.4
  • 12. 1. Tell the scanner how to tokenize “until” T_UNTIL • Zend/zend_language_scanner.l • Inputfor re2c, which will generate the Zend language scanner. '(' • Describes how raw source code should be converted into tokens. • Note that no structure is implied here: that’s the parser’s job. T_VARIABLE("x") • Tell the scanner that the word “until” is special. until ($x == $y) T_IS_EQUAL • The parser also needs to know about new tokens! • How is this done for the while keyword? T_VARIABLE("y") ')'
  • 13. 2. Describe the syntax of “until” • Zend/zend_language_parser.y • Essentially serves as the grammar for the Zend language. • Also describes actions to perform during parsing. • Input for the the parser generator (Bison) used to generate the PHP parser. • Tell PHP how until statements are structured syntactically. • How was it done for a while statement? T_UNTIL '(' expr ')' statement
  • 14. 3. Emit bytecode • Add actions to Zend/zend_language_parser.y • What should they do? • Recall that PHP generates bytecode during the parsing process. • Generate bytecode describing the semantics of until in terms of the PHP VM. • Er, wait -- what bytecode do we need to generate? Compiler Bytecode
  • 15. Intermission: PHP bytecode intro • opline <opcode> <result?> <op1?> <op2?> • Data structure representing a single line of PHP VM “assembly” • Includes opcode + operands ZEND_JMP <op1> Unconditional jump to the opline # in op1 • opline # associated with each opline e.g. jump to opline #10 • Different variable types, differentiated by prefix: ZEND_JMP ->10 • Variables ($) ZEND_JMPZ <op1> <op2> • Compiled variables (!) Conditional jump to the opline # in op2 • Temporary variables (~) iff op1 is zero e.g. jump to opline #3 if ~0 is zero • ZEND_JMP ZEND_JMPZ ~0 ->3 • “goto” • Conditional variants: ZEND_JMPZ, ZEND_JMPNZ ZEND_IS_EQUAL <result> <op1> <op2> • opline #s used as address operand for JMP instructions (->) result=1 if op1 == op2, otherwise result=0 e.g. set ~0=1 if !0 == 10 ZEND_IF_EQUAL ~0 !0 10
  • 16. Unconditional jump: ZEND_JMP 0: ... 1: ... 2: ZEND_JMP ->0
  • 17. Unconditional jump: ZEND_JMP 0: ... 1: ... 2: ZEND_JMP ->0
  • 18. Unconditional jump: ZEND_JMP 0: ... 1: ... 2: ZEND_JMP ->0
  • 19. Unconditional jump: ZEND_JMP 0: ... 1: ... 2: ZEND_JMP ->0
  • 20. Conditional jump: ZEND_JMPZ / ZEND_JMPNZ 0: ... 1: ... 2: ZEND_JMPZ ~0 ->0 3: ...
  • 21. Conditional jump: ZEND_JMPZ / ZEND_JMPNZ 0: ... 1: ... 2: ZEND_JMPZ ~0 ->0 3: ...
  • 22. Conditional jump: ZEND_JMPZ / ZEND_JMPNZ 0: ... 1: ... 2: ZEND_JMPZ ~0 ->0 3: ...
  • 23. Conditional jump: ZEND_JMPZ / ZEND_JMPNZ 0: ... 1: ... 2: ZEND_JMPZ ~0 ->0 3: ...
  • 24. Conditional jump: ZEND_JMPZ / ZEND_JMPNZ 0: ... 1: ... 2: ZEND_JMPZ ~0 ->0 3: ...
  • 25. Conditional jump: ZEND_JMPZ / ZEND_JMPNZ 0: ... 1: ... 2: ZEND_JMPZ ~0 ->0 3: ...
  • 26. Conditional jump: ZEND_JMPZ / ZEND_JMPNZ 0: ... 1: ... 2: ZEND_JMPZ ~0 ->0 3: ...
  • 27. 4. Emit bytecode (cont.) • Zend/zend_compile.c • The Zend language’s code generation logic lives here. • No DSLs here: plain old C source code. • First, let’s try to understand the bytecode for while • How do we need to modify it for until?
  • 28. Demo! • Time to build! • The usual ./configure && make dance on Linux & OSX. • Tobe thorough, regenerate data used by the tokenizer extension. (cd ext/tokenizer && ./tokenizer_data_gen.sh) • http://php.net/manual/en/book.tokenizer.php • You’ll need to run make again once you’ve done this. • With a little luck, magic happens and you get a binary in sapi/cli/php • Take until out for a spin!
  • 29. And exhale. • Lots to take in, right? • In my experience, this stuff is best learned bit-by-bit through practice. • Ask questions! • Google • php-internals • Or hey, ask me...
  • 30. Thanks! oscon@tomlee.co @tglee http://newrelic.com ... and come see Inside Python @ 5pm in D135 :)

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n