SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
(Do not be afraid of)

PHP Compiler Internals
          Sebastian Bergmann
                 August 23rd 2009
Sebastian Bergmann

   Co-Founder and
    Principal Consultant
    with thePHP.cc
   Creator of PHPUnit
   Involved in the PHP
    project since 2000
Under PHP's Hood


                                  Extensions

    (date, dom, gd, json, mysql, pcre, pdo, reflection, session, standard, …)



            PHP Core                                    Zend Engine

       Request Management                       Compilation and Execution
   File and Network Operations                Memory and Resource Allocation



                              Server API (SAPI)

                         (mod_php, FastCGI, CLI, ...)




                                                                 This slide contains material by Sara Golemon
How PHP executes code

   Lexical Analysis
      Scan the source for sequences of characters
      and convert them to a sequence of tokens
How PHP executes code

   Lexical Analysis
   Syntax Analysis
      Parse a sequence of tokens to determine
      their grammatical structure
How PHP executes code

   Lexical Analysis
   Syntax Analysis
   Bytecode Generation
      Generate bytecode based on the information
      gathered by analyzing the source
How PHP executes code

   Lexical Analysis
   Syntax Analysis
   Bytecode Generation
   Bytecode Execution
Lexical Analysis
Scan a sequence of characters
1   <?php
2   if (TRUE) {
3       print '*';
4   }
5   ?>
Lexical Analysis
Scan a sequence of characters
1   <?php                 T_OPEN_TAG
2   if (TRUE) {
3       print '*';
4   }
5   ?>
Lexical Analysis
Scan a sequence of characters
1 <?php                   T_OPEN_TAG
2 if (TRUE) {             T_IF
                          T_WHITESPACE
                          (
                          T_STRING
                          )
                          T_WHITESPACE
                          {
                          T_WHITESPACE
3      print '*';
4 }
5 ?>
Lexical Analysis
Scan a sequence of characters
1 <?php                   T_OPEN_TAG
2 if (TRUE) {             T_IF
                          T_WHITESPACE
                          (
                          T_STRING
                          )
                          T_WHITESPACE
                          {
                          T_WHITESPACE
3      print '*';         T_PRINT
                          T_WHITESPACE
                          T_CONSTANT_ENCAPSED_STRING
                          ;
4 }
5 ?>
Lexical Analysis
Scan a sequence of characters
1 <?php                   T_OPEN_TAG
2 if (TRUE) {             T_IF
                          T_WHITESPACE
                          (
                          T_STRING
                          )
                          T_WHITESPACE
                          {
                          T_WHITESPACE
3      print '*';         T_PRINT
                          T_WHITESPACE
                          T_CONSTANT_ENCAPSED_STRING
                          ;
                          T_WHITESPACE
4 }                       }
5 ?>
Lexical Analysis
Scan a sequence of characters
1 <?php                   T_OPEN_TAG
2 if (TRUE) {             T_IF
                          T_WHITESPACE
                          (
                          T_STRING
                          )
                          T_WHITESPACE
                          {
                          T_WHITESPACE
3      print '*';         T_PRINT
                          T_WHITESPACE
                          T_CONSTANT_ENCAPSED_STRING
                          ;
                          T_WHITESPACE
4 }                       }
                          T_WHITESPACE
5 ?>                      T_CLOSE_TAG
Lexical Analysis
Scan a sequence of characters
T_OPEN_TAG                <?php
T_IF                      if
T_WHITESPACE
(
T_STRING                  TRUE
)
T_WHITESPACE
{
T_WHITESPACE
T_PRINT                   print
T_WHITESPACE
T_CONSTANT_ENCAPSE        '*'
D_STRING
;
T_WHITESPACE
}
T_WHITESPACE              ?>
T_CLOSE_TAG
Lexical Analysis
Scan a sequence of characters
Lexical Analysis
Scanner Generators
   You do not want to write a scanner by
    hand
      At least when the code for the scanner should
      be efficient and maintainable
   Tools such as flex or re2c generate the
    code for a scanner from a set of rules


    <ST_IN_SCRIPTING>"if" {
    "if" {
      return T_IF;
    }
Lexical Analysis
    PHP Tokens
    T_ABSTRACT           T_CONCAT_EQUAL                  T_ELSE                         T_FUNCTION

    T_AND_EQUAL          T_CONST                         T_ELSEIF                       T_FUNC_C

    T_ARRAY              T_CONSTANT_ENCAPSED_STRING      T_EMPTY                        T_GLOBAL

    T_ARRAY_CAST         T_CONTINUE                      T_ENCAPSED_AND_WHITESPACE      T_GOTO

    T_AS                 T_CURLY_OPEN                    T_ENDDECLARE                   T_HALT_COMPILER

    T_BAD_CHARACTER      T_DEC                           T_ENDFOR                       T_IF

    T_BOOLEAN_AND        T_DECLARE                       T_ENDFOREACH                   T_IMPLEMENTS

    T_BOOLEAN_OR         T_DEFAULT                       T_ENDIF                        T_INC

    T_BOOL_CAST          T_DIR                           T_ENDSWITCH                    T_INCLUDE

    T_BREAK              T_DIV_EQUAL                     T_ENDWHILE                     T_INCLUDE_ONCE

    T_CASE               T_DNUMBER                       T_END_HEREDOC                  T_INLINE_HTML

    T_CATCH              T_DOC_COMMENT                   T_EVAL                         T_INSTANCEOF

    T_CHARACTER          T_DO                            T_EXIT                         T_INT_CAST

    T_CLASS              T_DOLLAR_OPEN_CURLY_BRACES      T_EXTENDS                      T_INTERFACE

    T_CLASS_C            T_DOUBLE_ARROW                  T_FILE                         T_ISSET

    T_CLONE              T_DOUBLE_CAST                   T_FINAL                        T_IS_EQUAL

    T_CLOSE_TAG          T_DOUBLE_COLON                  T_FOR                          T_IS_GREATER_OR_EQUAL

    T_COMMENT            T_ECHO                          T_FOREACH                      T_IS_IDENTICAL
Lexical Analysis
    PHP Tokens
    T_IS_NOT_EQUAL             T_OBJECT_CAST               T_SR_EQUAL

    T_IS_NOT_IDENTICAL         T_OBJECT_OPERATOR           T_START_HEREDOC

    T_IS_SMALLER_OR_EQUAL      T_OLD_FUNCTION              T_STATIC

    T_LINE                     T_OPEN_TAG                  T_STRING

    T_LIST                     T_OPEN_TAG_WITH_ECHO        T_STRING_CAST

    T_LNUMBER                  T_OR_EQUAL                  T_STRING_VARNAME

    T_LOGICAL_AND              T_PAAMAYIM_NEKUDOTAYIM      T_SWITCH

    T_LOGICAL_OR               T_PLUS_EQUAL                T_THROW

    T_LOGICAL_XOR              T_PRINT                     T_TRY

    T_METHOD_C                 T_PRIVATE                   T_UNSET

    T_MINUS_EQUAL              T_PUBLIC                    T_UNSET_CAST

    T_ML_COMMENT               T_PROTECTED                 T_USE

    T_MOD_EQUAL                T_REQUIRE                   T_VAR

    T_MUL_EQUAL                T_REQUIRE_ONCE              T_VARIABLE

    T_NAMESPACE                T_RETURN                    T_WHILE

    T_NS_C                     T_SL                        T_WHITESPACE

    T_NEW                      T_SL_EQUAL                  T_XOR_EQUAL

    T_NUM_STRING               T_SR
Syntax Analysis
Parse a sequence of tokens
Syntax Analysis
Parse a sequence of tokens
   You do not want to write a parser by hand
      At least when the code for the scanner should
      be efficient and maintainable
   Tools such as bison or lemon generate
    the code for a parser from a set of rules

     T_IF '(' expr ')' { ... }
     statement { ... }
     elseif_list else_single { ... }
PHP Bytecode
Using bytekit-cli to disassemble bytecode
1   <?php
2   if (TRUE) {
3       print '*';
4   }
5   ?>
 sb@thinkpad ~ % bytekit if.php
 bytekit-cli 1.0.0 by Sebastian Bergmann.

 Filename:            /home/sb/if.php
 Function:            main
 Number of oplines:   8

    line #      opcode                           result operands
    -----------------------------------------------------------------------------
    2     0     EXT_STMT
          1     JMPZ                                     true, ->6

    3    2     EXT_STMT
         3     PRINT                            ~0      '*'
         4     FREE                                     ~0
    4    5     JMP                                      ->6

    6    6     EXT_STMT
         7     RETURN                                   1
PHP Bytecode
Using bytekit-cli to visualize bytecode
1   <?php
2   if (TRUE) {
3       print '*';
4   }
5   ?>
 sb@thinkpad ~ % bytekit --graph /tmp --format svg if.php
How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
                  typedef struct _znode {
                      int op_type;
                      union {
                          zval constant;

                          zend_uint var;
                          zend_uint opline_num;
                          zend_op_array *op_array;
                          zend_op *jmp_addr;
                          struct {
                              zend_uint var;
                              zend_uint type;
                          } EA;
                      } u;
}                 } znode;



zend_do_if_cond() is called when an if statement is compiled
How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
  int if_cond_op_number =
  get_next_op_number(CG(active_op_array));
  zend_op *opline =
  get_next_op(CG(active_op_array) TSRMLS_CC);

                     struct _zend_op {
                         opcode_handler_t handler;
                         znode result;
                         znode op1;
                         znode op2;
                         ulong extended_value;
                         uint lineno;
                         zend_uchar opcode;
}                    };


Allocate a new opline in the current oparray
How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
  int if_cond_op_number =
  get_next_op_number(CG(active_op_array));
  zend_op *opline =
  get_next_op(CG(active_op_array) TSRMLS_CC);

    opline->opcode = ZEND_JMPZ;




}


Set the opcode of the new opline to JMPZ (jump if zero)
How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
  int if_cond_op_number =
  get_next_op_number(CG(active_op_array));
  zend_op *opline =
  get_next_op(CG(active_op_array) TSRMLS_CC);

    opline->opcode = ZEND_JMPZ;
    opline->op1    = *cond;




}


Set the first operand of the new opline to the if condition
How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
  int if_cond_op_number =
  get_next_op_number(CG(active_op_array));
  zend_op *opline =
  get_next_op(CG(active_op_array) TSRMLS_CC);

    opline->opcode = ZEND_JMPZ;
    opline->op1    = *cond;
    closing_bracket_token->u.opline_num =
    if_cond_op_number;
    SET_UNUSED(opline->op2);
    INC_BPC(CG(active_op_array));
}


Perform book keeping tasks such as marking the second operand of the
new opline as unused or incrementing the backpatching counter for the
current oparray
PHP Bytecode
    PHP Opcodes
    NOP                   IS_NOT_EQUAL             POST_INC         ADD_VAR                 UNSET_DIM

    ADD                   IS_SMALLER               POST_DEC         BEGIN_SILENCE           UNSET_OBJ

    SUB                   IS_SMALLER_OR_EQUAL      ASSIGN           END_SILENCE             FE_RESET

    MUL                   CAST                     ASSIGN_REF       INIT_FCALL_BY_NAME      FE_FETCH

    DIV                   QM_ASSIGN                ECHO             DO_FCALL                EXIT

    MOD                   ASSIGN_ADD               PRINT            DO_FCALL_BY_NAME        FETCH_R

    SL                    ASSIGN_SUB               JMPZ             RETURN                  FETCH_DIM_R

    SR                    ASSIGN_MUL               JMPNZ            RECV                    FETCH_OBJ_R

    CONCAT                ASSIGN_DIV               JMPZNZ           RECV_INIT               FETCH_W

    BW_OR                 ASSIGN_MOD               JMPZ_EX          SEND_VAL                FETCH_DIM_W

    BW_AND                ASSIGN_SL                JMPNZ_EX         SEND_VAR                FETCH_OBJ_W

    BW_XOR                ASSIGN_SR                CASE             SEND_REF                FETCH_RW

    BW_NOT                ASSIGN_CONCAT            SWITCH_FREE      NEW                     FETCH_DIM_RW

    BOOL_NOT              ASSIGN_BW_OR             BRK              FREE                    FETCH_OBJ_RW

    BOOL_XOR              ASSIGN_BW_AND            BOOL             INIT_ARRAY              FETCH_IS

    IS_IDENTICAL          ASSIGN_BW_XOR            INIT_STRING      ADD_ARRAY_ELEMENT       FETCH_DIM_IS

    IS_NOT_IDENTICAL      PRE_INC                  ADD_CHAR         INCLUDE_OR_EVAL         FETCH_OBJ_IS

    IS_EQUAL              PRE_DEC                  ADD_STRING       UNSET_VAR               FETCH_FUNC_ARG
PHP Bytecode
    PHP Opcodes
    FETCH_DIM_FUNC_ARG      INIT_STATIC_METHOD_CALL

    FETCH_OBJ_FUNC_ARG      ISSET_ISEMPTY_VAR

    FETCH_UNSET             ISSET_ISEMPTY_DIM_OBJ

    FETCH_DIM_UNSET         PRE_INC_OBJ

    FETCH_OBJ_UNSET         PRE_DEC_OBJ

    FETCH_DIM_TMP_VAR       POST_INC_OBJ

    FETCH_CONSTANT          POST_DEC_OBJ

    EXT_STMT                ASSIGN_OBJ

    EXT_FCALL_BEGIN         INSTANCEOF

    EXT_FCALL_END           DECLARE_CLASS

    EXT_NOP                 DECLARE_INHERITED_CLASS

    TICKS                   DECLARE_FUNCTION

    SEND_VAR_NO_REF         RAISE_ABSTRACT_ERROR

    CATCH                   ADD_INTERFACE

    THROW                   VERIFY_ABSTRACT_CLASS

    FETCH_CLASS             ASSIGN_DIM

    CLONE                   ISSET_ISEMPTY_PROP_OBJ

    INIT_METHOD_CALL        HANDLE_EXCEPTION
Extending the PHP Compiler
Test First!
--TEST--
unless statement
--FILE--
<?php
unless (FALSE) {
    print 'unless FALSE is TRUE, this is printed';
}

unless (TRUE) {
    print 'unless TRUE is TRUE, this is printed';
}
?>
--EXPECT--
unless FALSE is TRUE, this is printed
Extending the PHP Compiler

   Add token for unless to the scanner
   Add rule for unless to the parser
   Implement bytecode generation for
    unless in the compiler
   Add token for unless to ext/tokenizer
Add unless scanner token
Zend/zend_language_parser.y
%token   T_NAMESPACE
%token   T_NS_C
%token   T_DIR
%token   T_NS_SEPARATOR
%token   T_UNLESS
Add unless scanner token
Zend/zend_language_scanner.l
<ST_IN_SCRIPTING>"if" {
   return T_IF;
}

<ST_IN_SCRIPTING>"unless" {
   return T_UNLESS;
}

<ST_IN_SCRIPTING>"elseif" {
   return T_ELSEIF;
}

<ST_IN_SCRIPTING>"endif" {
   return T_ENDIF;
}

<ST_IN_SCRIPTING>"else" {
   return T_ELSE;
}
Add unless parser rule
Zend/zend_language_parser.y
unticked_statement:
   '{' inner_statement_list '}'
 | T_IF '(' expr ')' {
 .
 .
 | T_UNLESS '(' expr ')' {
    zend_do_unless_cond(&$3, &$4 TSRMLS_CC);
 } statement {
    zend_do_if_after_statement(&$4, 1 TSRMLS_CC);
 } {
    zend_do_if_end(TSRMLS_C);
 }
Add unless to the compiler
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
  int unless_cond_op_number =
  get_next_op_number(CG(active_op_array));
  zend_op *opline =
  get_next_op(CG(active_op_array) TSRMLS_CC);

    opline->opcode = ZEND_JMPNZ;
    opline->op1    = *cond;
    closing_bracket_token->u.opline_num =
    unless_cond_op_number;
    SET_UNUSED(opline->op2);
    INC_BPC(CG(active_op_array));
}


All we have to do to generate code for the unless statement,
as compared to generate code for the if statement, is to emit
JMPNZ (jump if not zero) instead of JMPZ (jump if zero)
Add unless to the compiler
The generated bytecode
1   <?php
2   unless (FALSE) {
3       print '*';
4   }
5   ?>
sb@thinkpad ~ % bytekit unless.php
bytekit-cli 1.0.0 by Sebastian Bergmann.

Filename:            /home/sb/unless.php
Function:            main
Number of oplines:   8

    line #      opcode                           result operands
    -----------------------------------------------------------------------------
    2     0     EXT_STMT
          1     JMPNZ                                    true, ->6

    3    2     EXT_STMT
         3     PRINT                            ~0      '*'
         4     FREE                                     ~0
    4    5     JMP                                      ->6

    6    6     EXT_STMT
         7     RETURN                                   1
Running the test
sb@thinkpad php-5.3-unless % make test TESTS=Zend/tests/unless.phpt

Build complete.
Don't forget to run 'make test'.


=====================================================================
PHP         : /usr/local/src/php/php-5.3-unless/sapi/cli/php
PHP_SAPI    : cli
PHP_VERSION : 5.3.1-dev
ZEND_VERSION: 2.3.0
PHP_OS      : Linux 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 01:19:55 UTC 2009 i686 GNU/Linux
INI actual : /usr/local/src/php/php-5.3-unless/tmp-php.ini
More .INIs :
CWD         : /usr/local/src/php/php-5.3-unless
Extra dirs :
VALGRIND    : Not used
=====================================================================
Running selected tests.
PASS unless statement [Zend/tests/unless.phpt]
=====================================================================
Number of tests :    1                 1
Tests skipped   :    0 ( 0.0%) --------
Tests warned    :    0 ( 0.0%) ( 0.0%)
Tests failed    :    0 ( 0.0%) ( 0.0%)
Expected fail   :    0 ( 0.0%) ( 0.0%)
Tests passed    :    1 (100.0%) (100.0%)
---------------------------------------------------------------------
Time taken      :    0 seconds
=====================================================================
Add unless to ext/tokenizer

sb@thinkpad tokenizer % ./tokenizer_data_gen.sh
Wrote tokenizer_data.c
The End

Thank you for your interest!


These slides will be posted on
http://slideshare.net/sebastian_bergmann
Acknowledgements

   Thomas Lee, whose Python Language
    Internals presentation at OSDC 2008
    inspired this presentation
   Stefan Esser for creating the Bytekit
    extension that provides PHP bytecode
    access and analysis features
   Derick Rethans, David Soria Parra, and
    Scott MacVicar for reviewing these slides
References
   http://www.php.net/manual/en/tokens.php
   http://www.zapt.info/opcodes.html
   ”Extending and Embedding PHP”,
    Sara Golemon
   http://bytekit.org/
   http://github.com/sebastianbergmann/bytekit-cli/
License
    This presentation material is published under the Attribution-Share Alike 3.0 Unported
    license.
    You are free:
      ✔   to Share – to copy, distribute and transmit the work.
      ✔   to Remix – to adapt the work.
    Under the following conditions:
      ●   Attribution. You must attribute the work in the manner specified by the author or
          licensor (but not in any way that suggests that they endorse you or your use of the
          work).
      ●   Share Alike. If you alter, transform, or build upon this work, you may distribute the
          resulting work only under the same, similar or a compatible license.
    For any reuse or distribution, you must make clear to others the license terms of this
    work.
    Any of the above conditions can be waived if you get permission from the copyright
    holder.
    Nothing in this license impairs or restricts the author's moral rights.

Mais conteúdo relacionado

Destaque

Php under the_hood
Php under the_hoodPhp under the_hood
Php under the_hood
frank_neff
 
Understanding PHP memory
Understanding PHP memoryUnderstanding PHP memory
Understanding PHP memory
julien pauli
 
The Php Life Cycle
The Php Life CycleThe Php Life Cycle
The Php Life Cycle
Xinchen Hui
 
Php Extensions for Dummies
Php Extensions for DummiesPhp Extensions for Dummies
Php Extensions for Dummies
Elizabeth Smith
 

Destaque (12)

How PHP Works ?
How PHP Works ?How PHP Works ?
How PHP Works ?
 
Php under the_hood
Php under the_hoodPhp under the_hood
Php under the_hood
 
Building Custom PHP Extensions
Building Custom PHP ExtensionsBuilding Custom PHP Extensions
Building Custom PHP Extensions
 
Accelerating or Complicating PHP execution by LLVM Compiler Infrastructure
Accelerating or Complicating PHP execution by LLVM Compiler Infrastructure Accelerating or Complicating PHP execution by LLVM Compiler Infrastructure
Accelerating or Complicating PHP execution by LLVM Compiler Infrastructure
 
PHP Internals
PHP InternalsPHP Internals
PHP Internals
 
Build Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVMBuild Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVM
 
Understanding PHP memory
Understanding PHP memoryUnderstanding PHP memory
Understanding PHP memory
 
The Php Life Cycle
The Php Life CycleThe Php Life Cycle
The Php Life Cycle
 
Php Extensions for Dummies
Php Extensions for DummiesPhp Extensions for Dummies
Php Extensions for Dummies
 
PHP 7 new engine
PHP 7 new enginePHP 7 new engine
PHP 7 new engine
 
About Tokens and Lexemes
About Tokens and LexemesAbout Tokens and Lexemes
About Tokens and Lexemes
 
Recognition-of-tokens
Recognition-of-tokensRecognition-of-tokens
Recognition-of-tokens
 

Semelhante a Phpcompilerinternals 090824022750-phpapp02 (7)

Basic of Python- Hands on Session
Basic of Python- Hands on SessionBasic of Python- Hands on Session
Basic of Python- Hands on Session
 
Advanced perl finer points ,pack&amp;unpack,eval,files
Advanced perl   finer points ,pack&amp;unpack,eval,filesAdvanced perl   finer points ,pack&amp;unpack,eval,files
Advanced perl finer points ,pack&amp;unpack,eval,files
 
Applying Generics
Applying GenericsApplying Generics
Applying Generics
 
Chapter2pp
Chapter2ppChapter2pp
Chapter2pp
 
Chapter 6 Intermediate Code Generation
Chapter 6   Intermediate Code GenerationChapter 6   Intermediate Code Generation
Chapter 6 Intermediate Code Generation
 
Diving deep into twig
Diving deep into twigDiving deep into twig
Diving deep into twig
 
Generics_RIO.ppt
Generics_RIO.pptGenerics_RIO.ppt
Generics_RIO.ppt
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Phpcompilerinternals 090824022750-phpapp02

  • 1. (Do not be afraid of) PHP Compiler Internals Sebastian Bergmann August 23rd 2009
  • 2. Sebastian Bergmann  Co-Founder and Principal Consultant with thePHP.cc  Creator of PHPUnit  Involved in the PHP project since 2000
  • 3. Under PHP's Hood Extensions (date, dom, gd, json, mysql, pcre, pdo, reflection, session, standard, …) PHP Core Zend Engine Request Management Compilation and Execution File and Network Operations Memory and Resource Allocation Server API (SAPI) (mod_php, FastCGI, CLI, ...) This slide contains material by Sara Golemon
  • 4. How PHP executes code  Lexical Analysis Scan the source for sequences of characters and convert them to a sequence of tokens
  • 5. How PHP executes code  Lexical Analysis  Syntax Analysis Parse a sequence of tokens to determine their grammatical structure
  • 6. How PHP executes code  Lexical Analysis  Syntax Analysis  Bytecode Generation Generate bytecode based on the information gathered by analyzing the source
  • 7. How PHP executes code  Lexical Analysis  Syntax Analysis  Bytecode Generation  Bytecode Execution
  • 8. Lexical Analysis Scan a sequence of characters 1 <?php 2 if (TRUE) { 3 print '*'; 4 } 5 ?>
  • 9. Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { 3 print '*'; 4 } 5 ?>
  • 10. Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; 4 } 5 ?>
  • 11. Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; 4 } 5 ?>
  • 12. Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; T_WHITESPACE 4 } } 5 ?>
  • 13. Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; T_WHITESPACE 4 } } T_WHITESPACE 5 ?> T_CLOSE_TAG
  • 14. Lexical Analysis Scan a sequence of characters T_OPEN_TAG <?php T_IF if T_WHITESPACE ( T_STRING TRUE ) T_WHITESPACE { T_WHITESPACE T_PRINT print T_WHITESPACE T_CONSTANT_ENCAPSE '*' D_STRING ; T_WHITESPACE } T_WHITESPACE ?> T_CLOSE_TAG
  • 16. Lexical Analysis Scanner Generators  You do not want to write a scanner by hand At least when the code for the scanner should be efficient and maintainable  Tools such as flex or re2c generate the code for a scanner from a set of rules <ST_IN_SCRIPTING>"if" { "if" { return T_IF; }
  • 17. Lexical Analysis PHP Tokens  T_ABSTRACT  T_CONCAT_EQUAL  T_ELSE  T_FUNCTION  T_AND_EQUAL  T_CONST  T_ELSEIF  T_FUNC_C  T_ARRAY  T_CONSTANT_ENCAPSED_STRING  T_EMPTY  T_GLOBAL  T_ARRAY_CAST  T_CONTINUE  T_ENCAPSED_AND_WHITESPACE  T_GOTO  T_AS  T_CURLY_OPEN  T_ENDDECLARE  T_HALT_COMPILER  T_BAD_CHARACTER  T_DEC  T_ENDFOR  T_IF  T_BOOLEAN_AND  T_DECLARE  T_ENDFOREACH  T_IMPLEMENTS  T_BOOLEAN_OR  T_DEFAULT  T_ENDIF  T_INC  T_BOOL_CAST  T_DIR  T_ENDSWITCH  T_INCLUDE  T_BREAK  T_DIV_EQUAL  T_ENDWHILE  T_INCLUDE_ONCE  T_CASE  T_DNUMBER  T_END_HEREDOC  T_INLINE_HTML  T_CATCH  T_DOC_COMMENT  T_EVAL  T_INSTANCEOF  T_CHARACTER  T_DO  T_EXIT  T_INT_CAST  T_CLASS  T_DOLLAR_OPEN_CURLY_BRACES  T_EXTENDS  T_INTERFACE  T_CLASS_C  T_DOUBLE_ARROW  T_FILE  T_ISSET  T_CLONE  T_DOUBLE_CAST  T_FINAL  T_IS_EQUAL  T_CLOSE_TAG  T_DOUBLE_COLON  T_FOR  T_IS_GREATER_OR_EQUAL  T_COMMENT  T_ECHO  T_FOREACH  T_IS_IDENTICAL
  • 18. Lexical Analysis PHP Tokens  T_IS_NOT_EQUAL  T_OBJECT_CAST  T_SR_EQUAL  T_IS_NOT_IDENTICAL  T_OBJECT_OPERATOR  T_START_HEREDOC  T_IS_SMALLER_OR_EQUAL  T_OLD_FUNCTION  T_STATIC  T_LINE  T_OPEN_TAG  T_STRING  T_LIST  T_OPEN_TAG_WITH_ECHO  T_STRING_CAST  T_LNUMBER  T_OR_EQUAL  T_STRING_VARNAME  T_LOGICAL_AND  T_PAAMAYIM_NEKUDOTAYIM  T_SWITCH  T_LOGICAL_OR  T_PLUS_EQUAL  T_THROW  T_LOGICAL_XOR  T_PRINT  T_TRY  T_METHOD_C  T_PRIVATE  T_UNSET  T_MINUS_EQUAL  T_PUBLIC  T_UNSET_CAST  T_ML_COMMENT  T_PROTECTED  T_USE  T_MOD_EQUAL  T_REQUIRE  T_VAR  T_MUL_EQUAL  T_REQUIRE_ONCE  T_VARIABLE  T_NAMESPACE  T_RETURN  T_WHILE  T_NS_C  T_SL  T_WHITESPACE  T_NEW  T_SL_EQUAL  T_XOR_EQUAL  T_NUM_STRING  T_SR
  • 20. Syntax Analysis Parse a sequence of tokens  You do not want to write a parser by hand At least when the code for the scanner should be efficient and maintainable  Tools such as bison or lemon generate the code for a parser from a set of rules T_IF '(' expr ')' { ... } statement { ... } elseif_list else_single { ... }
  • 21. PHP Bytecode Using bytekit-cli to disassemble bytecode 1 <?php 2 if (TRUE) { 3 print '*'; 4 } 5 ?> sb@thinkpad ~ % bytekit if.php bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: /home/sb/if.php Function: main Number of oplines: 8 line # opcode result operands ----------------------------------------------------------------------------- 2 0 EXT_STMT 1 JMPZ true, ->6 3 2 EXT_STMT 3 PRINT ~0 '*' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1
  • 22. PHP Bytecode Using bytekit-cli to visualize bytecode 1 <?php 2 if (TRUE) { 3 print '*'; 4 } 5 ?> sb@thinkpad ~ % bytekit --graph /tmp --format svg if.php
  • 23. How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { typedef struct _znode { int op_type; union { zval constant; zend_uint var; zend_uint opline_num; zend_op_array *op_array; zend_op *jmp_addr; struct { zend_uint var; zend_uint type; } EA; } u; } } znode; zend_do_if_cond() is called when an if statement is compiled
  • 24. How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); struct _zend_op { opcode_handler_t handler; znode result; znode op1; znode op2; ulong extended_value; uint lineno; zend_uchar opcode; } }; Allocate a new opline in the current oparray
  • 25. How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; } Set the opcode of the new opline to JMPZ (jump if zero)
  • 26. How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; opline->op1 = *cond; } Set the first operand of the new opline to the if condition
  • 27. How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = if_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array)); } Perform book keeping tasks such as marking the second operand of the new opline as unused or incrementing the backpatching counter for the current oparray
  • 28. PHP Bytecode PHP Opcodes  NOP  IS_NOT_EQUAL  POST_INC  ADD_VAR  UNSET_DIM  ADD  IS_SMALLER  POST_DEC  BEGIN_SILENCE  UNSET_OBJ  SUB  IS_SMALLER_OR_EQUAL  ASSIGN  END_SILENCE  FE_RESET  MUL  CAST  ASSIGN_REF  INIT_FCALL_BY_NAME  FE_FETCH  DIV  QM_ASSIGN  ECHO  DO_FCALL  EXIT  MOD  ASSIGN_ADD  PRINT  DO_FCALL_BY_NAME  FETCH_R  SL  ASSIGN_SUB  JMPZ  RETURN  FETCH_DIM_R  SR  ASSIGN_MUL  JMPNZ  RECV  FETCH_OBJ_R  CONCAT  ASSIGN_DIV  JMPZNZ  RECV_INIT  FETCH_W  BW_OR  ASSIGN_MOD  JMPZ_EX  SEND_VAL  FETCH_DIM_W  BW_AND  ASSIGN_SL  JMPNZ_EX  SEND_VAR  FETCH_OBJ_W  BW_XOR  ASSIGN_SR  CASE  SEND_REF  FETCH_RW  BW_NOT  ASSIGN_CONCAT  SWITCH_FREE  NEW  FETCH_DIM_RW  BOOL_NOT  ASSIGN_BW_OR  BRK  FREE  FETCH_OBJ_RW  BOOL_XOR  ASSIGN_BW_AND  BOOL  INIT_ARRAY  FETCH_IS  IS_IDENTICAL  ASSIGN_BW_XOR  INIT_STRING  ADD_ARRAY_ELEMENT  FETCH_DIM_IS  IS_NOT_IDENTICAL  PRE_INC  ADD_CHAR  INCLUDE_OR_EVAL  FETCH_OBJ_IS  IS_EQUAL  PRE_DEC  ADD_STRING  UNSET_VAR  FETCH_FUNC_ARG
  • 29. PHP Bytecode PHP Opcodes  FETCH_DIM_FUNC_ARG  INIT_STATIC_METHOD_CALL  FETCH_OBJ_FUNC_ARG  ISSET_ISEMPTY_VAR  FETCH_UNSET  ISSET_ISEMPTY_DIM_OBJ  FETCH_DIM_UNSET  PRE_INC_OBJ  FETCH_OBJ_UNSET  PRE_DEC_OBJ  FETCH_DIM_TMP_VAR  POST_INC_OBJ  FETCH_CONSTANT  POST_DEC_OBJ  EXT_STMT  ASSIGN_OBJ  EXT_FCALL_BEGIN  INSTANCEOF  EXT_FCALL_END  DECLARE_CLASS  EXT_NOP  DECLARE_INHERITED_CLASS  TICKS  DECLARE_FUNCTION  SEND_VAR_NO_REF  RAISE_ABSTRACT_ERROR  CATCH  ADD_INTERFACE  THROW  VERIFY_ABSTRACT_CLASS  FETCH_CLASS  ASSIGN_DIM  CLONE  ISSET_ISEMPTY_PROP_OBJ  INIT_METHOD_CALL  HANDLE_EXCEPTION
  • 30. Extending the PHP Compiler Test First! --TEST-- unless statement --FILE-- <?php unless (FALSE) { print 'unless FALSE is TRUE, this is printed'; } unless (TRUE) { print 'unless TRUE is TRUE, this is printed'; } ?> --EXPECT-- unless FALSE is TRUE, this is printed
  • 31. Extending the PHP Compiler  Add token for unless to the scanner  Add rule for unless to the parser  Implement bytecode generation for unless in the compiler  Add token for unless to ext/tokenizer
  • 32. Add unless scanner token Zend/zend_language_parser.y %token T_NAMESPACE %token T_NS_C %token T_DIR %token T_NS_SEPARATOR %token T_UNLESS
  • 33. Add unless scanner token Zend/zend_language_scanner.l <ST_IN_SCRIPTING>"if" { return T_IF; } <ST_IN_SCRIPTING>"unless" { return T_UNLESS; } <ST_IN_SCRIPTING>"elseif" { return T_ELSEIF; } <ST_IN_SCRIPTING>"endif" { return T_ENDIF; } <ST_IN_SCRIPTING>"else" { return T_ELSE; }
  • 34. Add unless parser rule Zend/zend_language_parser.y unticked_statement: '{' inner_statement_list '}' | T_IF '(' expr ')' { . . | T_UNLESS '(' expr ')' { zend_do_unless_cond(&$3, &$4 TSRMLS_CC); } statement { zend_do_if_after_statement(&$4, 1 TSRMLS_CC); } { zend_do_if_end(TSRMLS_C); }
  • 35. Add unless to the compiler Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int unless_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPNZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = unless_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array)); } All we have to do to generate code for the unless statement, as compared to generate code for the if statement, is to emit JMPNZ (jump if not zero) instead of JMPZ (jump if zero)
  • 36. Add unless to the compiler The generated bytecode 1 <?php 2 unless (FALSE) { 3 print '*'; 4 } 5 ?> sb@thinkpad ~ % bytekit unless.php bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: /home/sb/unless.php Function: main Number of oplines: 8 line # opcode result operands ----------------------------------------------------------------------------- 2 0 EXT_STMT 1 JMPNZ true, ->6 3 2 EXT_STMT 3 PRINT ~0 '*' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1
  • 37. Running the test sb@thinkpad php-5.3-unless % make test TESTS=Zend/tests/unless.phpt Build complete. Don't forget to run 'make test'. ===================================================================== PHP : /usr/local/src/php/php-5.3-unless/sapi/cli/php PHP_SAPI : cli PHP_VERSION : 5.3.1-dev ZEND_VERSION: 2.3.0 PHP_OS : Linux 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 01:19:55 UTC 2009 i686 GNU/Linux INI actual : /usr/local/src/php/php-5.3-unless/tmp-php.ini More .INIs : CWD : /usr/local/src/php/php-5.3-unless Extra dirs : VALGRIND : Not used ===================================================================== Running selected tests. PASS unless statement [Zend/tests/unless.phpt] ===================================================================== Number of tests : 1 1 Tests skipped : 0 ( 0.0%) -------- Tests warned : 0 ( 0.0%) ( 0.0%) Tests failed : 0 ( 0.0%) ( 0.0%) Expected fail : 0 ( 0.0%) ( 0.0%) Tests passed : 1 (100.0%) (100.0%) --------------------------------------------------------------------- Time taken : 0 seconds =====================================================================
  • 38. Add unless to ext/tokenizer sb@thinkpad tokenizer % ./tokenizer_data_gen.sh Wrote tokenizer_data.c
  • 39. The End Thank you for your interest! These slides will be posted on http://slideshare.net/sebastian_bergmann
  • 40. Acknowledgements  Thomas Lee, whose Python Language Internals presentation at OSDC 2008 inspired this presentation  Stefan Esser for creating the Bytekit extension that provides PHP bytecode access and analysis features  Derick Rethans, David Soria Parra, and Scott MacVicar for reviewing these slides
  • 41. References  http://www.php.net/manual/en/tokens.php  http://www.zapt.info/opcodes.html  ”Extending and Embedding PHP”, Sara Golemon  http://bytekit.org/  http://github.com/sebastianbergmann/bytekit-cli/
  • 42. License   This presentation material is published under the Attribution-Share Alike 3.0 Unported license.   You are free: ✔ to Share – to copy, distribute and transmit the work. ✔ to Remix – to adapt the work.   Under the following conditions: ● Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). ● Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.   For any reuse or distribution, you must make clear to others the license terms of this work.   Any of the above conditions can be waived if you get permission from the copyright holder.   Nothing in this license impairs or restricts the author's moral rights.