SlideShare uma empresa Scribd logo
1 de 36
POG


   ANKIT ASTHANA
PROGRAM MANAGER
INDEX
• History
• What is Profile Guided Optimization (POGO) ?
• POGO Build Process
• Steps to do POGO (Demo)
• POGO under the hood
• POGO case studies
• Questions
HISTORY          ~ In a nutshell POGO is a major constituent which makes up the DNA for many Microsoft products ~


 • POGO that is shipped in VS, was started as a joint venture between VisualC and Microsoft Research
   group in the late 90’s.

 • POGO initially only focused on Itanium platform

 • For almost an entire decade, even within Microsoft only a few components were POGO’ized

 • POGO was first shipped in 2005 on all pro-plus SKU(s)

 • Today POGO is a KEY optimization which provides significant performance boost to a plethora of
   Microsoft products.
HISTORY    ~ In a nutshell POGO is a major constituent which makes up the DNA for many Microsoft products ~




                  BROWSERS               BUSINESS ANALYTICS




                                                                                POG
     POG




                Microsoft Products     PRODUCTIVITY SOFTWARE

  DIRECTLY or INDIRECTLY you have used products which ship with POGO technology!
What is Profile Guided Optimization
(POGO) ?




              Really ?, NO! .
 But how many people here have used POGO ?
What is Profile Guided Optimization
(POGO) ?
  • Static analysis of code leaves many open questions for the compiler…


                    if(a < b)                        switch (i) {
                       foo();                           case 1: …
                    else                                case 2: …
                       baz();
                                                  What is the typical value of i?
               How often is a < b?




          for(i = 0; i < count; ++i)              for(i = 0; i < count; ++i)
             bar();                                  (*p)(x, y);

           What is the typical value of count?   What is the typical value of pointer p?
What is Profile Guided Optimization
(POGO) ?
 •   PGO (Profile guided optimization) is a runtime compiler optimization which leverages
     profile data collected from running important or performance centric user scenarios to
     build an optimized version of the application.

 • PGO optimizations have some significant advantage over traditional static optimizations as
   they are based upon how the application is likely to perform in a production environment
   which allow the optimizer to optimize for speed for hotter code paths (common user
   scenarios) and optimize for size for colder code paths (not so common user scenarios)
   resulting in generating faster and smaller code for the application attributing to
   significant performance gains.

 • PGO can be used on traditional desktop applications and is currently on supported on
   x86, x64 platform.


                                                              Mantra behind PGO is ‘Faster and Smaller Code’
POGO Build Process



     INSTRUMENT           TRAIN            OPTIMIZE




      ~ Three steps to perform Profile Guided Optimization ~
POGO Build Process
POGO Build Process

       1
           2
                     TRIVIA ?
                     Does anyone know (1), (2) and (3) do ?



               3
POGO Build Process
                   1


       1
           2
                   /GL: This flag tells the compiler to defer code
                   generation until you link your program. Then at link
                    time the linker calls back to the compiler to
                    finish compilation. If you compile all
                   your sources this way, the compiler optimizes
                    your program as a whole rather than one
                   source file at a time.
               3


                   Although /GL introduces a plethora of optimizations, one
                   major advantage is that it with Link Time Code Gen we can
                   inline functions from one source file (foo.obj) into callers
                   defined in another source file (bar.obj)
POGO Build Process
                 /LTCG
                 The linker invokes link-time code generation if it is passed
       1   2     a module that was compiled by using /GL. If you do not
                  explicitly specify /LTCG when you pass /GL or MSIL modules
                  to the linker, the linker eventually detects this and restarts
                  the link by using /LTCG. Explicitly specify /LTCG when you
                 pass /GL and MSIL modules to the linker for the fastest
                 possible build performance.

                 /LTCG:PGI 2
           3     Specifies that the linker outputs a .pgd file in preparation
                 for instrumented test runs on the application.


                 /LTCG:PGO 3
                 Specifies that the linker uses the profile data that is created
                 after the instrumented binary is run to create an
                 optimized image.
STEPS to do POGO (DEMO)




                  POG
                        TRIVIA

                        Does anyone know what Nbody
                        Simulation is all about ?
STEPS to do POGO (DEMO)




                  POG
                        NBODY Sample application

                        Speaking plainly, An N-body simulation is a
                        simulation for a System of particles, usually
                        under the influence of physical forces,
                        such as gravity.
POGO Under the hood!
Remember this ?


                       if(a < b)                       switch (i) {
                          foo();                          case 1: …
                       else                               case 2: …
                          baz();
                                                    What is the typical value of i?
                  How often is a < b?




            for(i = 0; i < count; ++i)              for(i = 0; i < count; ++i)
               bar();                                  (*p)(x, y);

             What is the typical value of count?   What is the typical value of pointer p?
POGO Under the hood                                                  Instrument Phase
• Instrument with “probes” inserted into the code
  There are two kinds of probes:
          1. Count (Simple/Entry) probes
                Used to count the number of a path is taken. (Function entry/exit)
          2. Value probes
                Used to construct histogram of values (Switch value, Indirect call target address)
• To simplify correlation process, some optimizations, such as Inliner, are off
• 1.5X to 2X slower than optimized build




                          Side-effects: Instrumented build of the application, empty .pgd file
Instrument Phase
POGO Under the hood
                               Foo
                            Entry probe


                                                             Single dataset
                               Cond
                                                           Entry Probe
                                                           Simple Probe 1
                                                           Simple probe 2
                                           Value probe 1   Value probe 1
                                          switch (i) {
             More code                      case 1: …
           Simple probe 1                   default:…
                                          }




   Simple probe 2           More Code


                              return
POGO Under the hood Phase
                 Training

• Run your training scenarios, During this phase the user runs the instrumented version
  of the application and exercises only common performance centric user scenarios.
  Exercising these training scenarios results in creation of (.pgc) files which contain
  training data correlating to each user scenario.

• For example, For modern applications a common performance user scenario is
  startup of the application.

• Training for these scenarios would result in creation of appname!#.pgc files (where
  appname is the name of the running application and # is 1 + the number of
  appname!#.pgc files in the directory).


                                                                 Side-effects: A bunch of .pgc files
POGO Under the hood             Optimize Phase

•   Full and partial inlining
•   Function layout
•   Speed and size decision
•   Basic block layout
•   Code separation
•   Virtual call speculation
•   Switch expansion
•   Data separation
•   Loop unrolling
POGO Under the hood Phase
                 Optimize

CALL GRAPH PATH PROFILING
  • Behavior of function on one call-path may be drastically different from another
  • Call-path specific info results in better inlining and optimization decisions
  • Let us take an example, (next slide)
POGO Under the hood Phase
                Optimize
EXAMPLE: CALL GRAPH PATH PROFILING
• Assign path numbers bottom-up
• Number of paths out of a function =   callee paths + 1
                                                           Path 1: Foo
                                                           Path 2: B
                                                           Path 3: B-Foo
                          Start                            Path 4: C
                                                           Path 5: C-Foo
                                                           Path 6: D
                            A7                             Path 7: D-Foo
                                                           Path 8: A
                                                           Path 9: A-B
                 B2         C2          D2                 Path 10: A-B-Foo
                                                           Path 11: A-C
                                                           Path 12: A-C-Foo
                           Foo1                            Path 13: A-D
                                                           Path 14: A-D-Foo

                      There are 7 paths for Foo
POGO Under the hood                Optimize Phase

INLINING

                 10
           goo
                             140
                 20
           foo         bar         baz

                 100
           bat
POGO Under the hood                                    Optimize Phase

INLINING
     POGO uses call graph path profiling.


                                  10              75
                   goo                      bar        baz

                                  20              50
                    foo                     bar        baz

                                  100             15
                                                         15
                    bat                     bar        baz
POGO Under the hood                                                         Optimize Phase

INLINING
   Inlining decisions are made at each call site.


                                          10         Call site specific profile directed inlining minimizes the
                          goo                        code bloat due to inlining while still gaining performance
                                                     where needed.

                                          20                   125
                           foo                      bar                     baz

                                          100                   15
                                                                                  15
                           bat                      bar                     baz
POGO Under the hood                              Optimize Phase

INLINE HEURISTICS
 Pogo Inline decision is made before layout, speed-size decision and
 all other optimizations
POGO Under the hood                                    Optimize Phase
SPEED AND SIZE
    The decision is based on post-inliner dynamic instruction count
    Code segments with higher dynamic instruction count = SPEED
    Code segments with lower dynamic instruction = SIZE


                goo       10

                                              125
                foo       20        bar                baz

                          100                 15
                bat                 bar                baz 15
POGO Under the hood                                Optimize Phase

BLOCK LAYOUT
                       Basic blocks are ordered so that
                       most frequent path falls through.
                            Default layout   Optimized layout

              A                  A                  A
        100       10

                                 B                  B
         B        C

        100       10             C                  D
              D
                                 D                  C
POGO Under the hood                                      Optimize Phase

BLOCK LAYOUT
                       Basic blocks are ordered so that
                       most frequent path falls through.
                            Default layout       Optimized layout

              A                  A                        A
        100       10

                                 B                        B
         B        C

        100       10             C                        D
              D
                                 D                        C

                              Better Instruction Cache Locality
POGO Under the hood
LIVE AND PGO DEAD CODE
                                                                    Optimize Phase

SEPARATION
•   Dead functions/blocks are placed in a special section.

                                         Default layout       Optimized layout

                         A                    A                      A
                   100       0

                                              B                      B
                    B        C

                   100       0                C                      D
                         D
                                              D                      C


                                    To minimize working set and improve code locality, code
                                    that is scenario dead can be moved out of the way.
POGO Under the hood                                     Optimize Phase
 FUNCTION LAYOUT
   Based on post-inliner and post-code-separation call graph and profile data
   Only functions/segments in live section is laid out. POGO Dead blocks are not
   included
   Overall strategy is Closest is best: functions strongly connected are put
   together
   A call is considered achieving page locality if the callee is located in the same
   page.
POGO Under the hood Phase
                 Optimize
EXAMPLE: FUNCTION LAYOUT
                              A
                 1000                 12              A   B              A   B        E

                                                                   100                    100
                   B                  C         300
                                                          12                 12
       300
                        100           500        E             C   D              C       D
             E                    D


                                            A     B       E    C    D


             • In general, >70% page locality is achieved regardless
               the component size
POGO Under the hood                                              Optimize Phase
SWITCH EXPANSION
• Many ways to expand switches: linear search, jump table, binary search, etc
• Pogo collects the value of switch expression
                   Most frequent values are pulled out.
                    // 90% of the
                                                 if (i == 10)
                    // time i = 10;                 goto default;
                    switch (i) {                 switch (i) {
                       case 1: …                    case 1: …
                       case 2: …                    case 2: …
                       case 3: …                    case 3: …
                       default:…                    default:…
                    }                            }
POGO Under the hood                                                 Optimize Phase
VIRTUAL CALL SPECULATION
The type of object A in function Bar was almost always
Foo via the profiles
                                                       void Bar(Base *A)
                                                       {
                                                       void Bar(Parent *A)
                    class Base{                        { …
                    …                                    while(true)
                                                         …
                    virtual void call();                 {
                                                         while(true)
                    }                                    { …
                                                           if(type(A) == Foo:Base)
                                                           …
                                                           {
                                                           A->call();
          Class Foo:Base{           class Bar:Base {       … // inline of A->call();
          …                         …                    } }
          void call();              void call();       } else
          }                         }                         A->call();
                                                           …
                                                         }
                                                       }
POGO Under the hood                                                  Optimize Phase
• During this phase the application is rebuilt for the last time to generate the optimized
  version of the application. Behind the scenes, the (.pgc) training data files are merged
  into the empty program database file (.pgd) created in the instrumented phase.
• The compiler backend then uses this program database file to make more intelligent
  optimization decisions on the code generating a highly optimized version of the
  application




                                                 Side-effect: An optimized version of the application!
POGO CASE STUDIES
             SPEC2K
    SPEC2K:                    Sjeng   Gobmk    Perl     Povray   Gcc

    Application Size           Small   Medium   Medium   Medium   Large

    LTCG size Mbyte            0.14    0.57     0.79     0.92     2.36
    Pogo size Mbyte            0.14    0.52     0.74     0.82     2.0
    Live section size          0.5     0.3      0.25     0.17     0.77

    # of functions             129     2588     1824     1928     5247
    % of live functions        54%     62%      47%      39%      47%
    % of Speed funcs           18%     2.9%     5%       2%       4.2%
    # of LTCG Inlines          163     2678     8050     9977     21898
    # of POGO Inlines          235     938      1729     4976     3936
    % of Inlined edge counts   50%     53%      25%      79%      65%
    % of page locality         97%     75%      85%      98%      80%
    % of speed gain            8.5%    6.6%     14.9%    36.9%    7.9%
POG
         ANKIT ASTHANA
AASTHAN@MICROSOFT.COM

Mais conteúdo relacionado

Mais procurados

How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPUTomer Gabel
 
Error detection recovery
Error detection recoveryError detection recovery
Error detection recoveryTech_MX
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Jian-Hong Pan
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in LinuxAdrian Huang
 
Cours système d’exploitation partie3
Cours système d’exploitation partie3Cours système d’exploitation partie3
Cours système d’exploitation partie3manou2008
 
逆向工程入門
逆向工程入門逆向工程入門
逆向工程入門耀德 蔡
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic NotationsRishabh Soni
 
C++ compilation process
C++ compilation processC++ compilation process
C++ compilation processRahul Jamwal
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedBrendan Gregg
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernellcplcp1
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
 
Semmle Codeql
Semmle Codeql Semmle Codeql
Semmle Codeql M. S.
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 

Mais procurados (20)

How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPU
 
Error detection recovery
Error detection recoveryError detection recovery
Error detection recovery
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
 
Cours système d’exploitation partie3
Cours système d’exploitation partie3Cours système d’exploitation partie3
Cours système d’exploitation partie3
 
逆向工程入門
逆向工程入門逆向工程入門
逆向工程入門
 
SCSI commands
SCSI commandsSCSI commands
SCSI commands
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic Notations
 
C++ compilation process
C++ compilation processC++ compilation process
C++ compilation process
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Iterations and Recursions
Iterations and RecursionsIterations and Recursions
Iterations and Recursions
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
Semmle Codeql
Semmle Codeql Semmle Codeql
Semmle Codeql
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
COMPILER DESIGN
COMPILER DESIGNCOMPILER DESIGN
COMPILER DESIGN
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 

Destaque

Android Code Optimization Techniques
Android Code Optimization TechniquesAndroid Code Optimization Techniques
Android Code Optimization TechniquesIshrat khan
 
Android Code Optimization Techniques 2
Android Code Optimization Techniques 2Android Code Optimization Techniques 2
Android Code Optimization Techniques 2Ishrat khan
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimizationliu_ming50
 
Android Code Optimization Techniques 3
Android Code Optimization Techniques 3Android Code Optimization Techniques 3
Android Code Optimization Techniques 3Ishrat khan
 
Improved Teaching Leaning Based Optimization Algorithm
Improved Teaching Leaning Based Optimization AlgorithmImproved Teaching Leaning Based Optimization Algorithm
Improved Teaching Leaning Based Optimization Algorithmrajani51
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideLinaro
 
Code Optimization
Code OptimizationCode Optimization
Code OptimizationESUG
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovZeroTurnaround
 
Understanding the Dalvik bytecode with the Dedexer tool
Understanding the Dalvik bytecode with the Dedexer toolUnderstanding the Dalvik bytecode with the Dedexer tool
Understanding the Dalvik bytecode with the Dedexer toolGabor Paller
 
LAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android NLAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android NLinaro
 
optimizing code in compilers using parallel genetic algorithm
optimizing code in compilers using parallel genetic algorithm optimizing code in compilers using parallel genetic algorithm
optimizing code in compilers using parallel genetic algorithm Fatemeh Karimi
 
JEEConf 2016. Effectiveness and code optimization in Java applications
JEEConf 2016. Effectiveness and code optimization in  Java applicationsJEEConf 2016. Effectiveness and code optimization in  Java applications
JEEConf 2016. Effectiveness and code optimization in Java applicationsStrannik_2013
 
Google ART (Android RunTime)
Google ART (Android RunTime)Google ART (Android RunTime)
Google ART (Android RunTime)Niraj Solanke
 
IoT Smart Home & Connected Car Convergence Insights from Patents
IoT Smart Home & Connected Car Convergence Insights from PatentsIoT Smart Home & Connected Car Convergence Insights from Patents
IoT Smart Home & Connected Car Convergence Insights from PatentsAlex G. Lee, Ph.D. Esq. CLP
 

Destaque (18)

Android Code Optimization Techniques
Android Code Optimization TechniquesAndroid Code Optimization Techniques
Android Code Optimization Techniques
 
Android Code Optimization Techniques 2
Android Code Optimization Techniques 2Android Code Optimization Techniques 2
Android Code Optimization Techniques 2
 
C++ Optimization Tips
C++ Optimization TipsC++ Optimization Tips
C++ Optimization Tips
 
IoT
IoTIoT
IoT
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
 
Android Code Optimization Techniques 3
Android Code Optimization Techniques 3Android Code Optimization Techniques 3
Android Code Optimization Techniques 3
 
Abdelrahman Al-Ogail Resume
Abdelrahman Al-Ogail ResumeAbdelrahman Al-Ogail Resume
Abdelrahman Al-Ogail Resume
 
Improved Teaching Leaning Based Optimization Algorithm
Improved Teaching Leaning Based Optimization AlgorithmImproved Teaching Leaning Based Optimization Algorithm
Improved Teaching Leaning Based Optimization Algorithm
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
 
Gc in android
Gc in androidGc in android
Gc in android
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir Ivanov
 
Understanding the Dalvik bytecode with the Dedexer tool
Understanding the Dalvik bytecode with the Dedexer toolUnderstanding the Dalvik bytecode with the Dedexer tool
Understanding the Dalvik bytecode with the Dedexer tool
 
LAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android NLAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android N
 
optimizing code in compilers using parallel genetic algorithm
optimizing code in compilers using parallel genetic algorithm optimizing code in compilers using parallel genetic algorithm
optimizing code in compilers using parallel genetic algorithm
 
JEEConf 2016. Effectiveness and code optimization in Java applications
JEEConf 2016. Effectiveness and code optimization in  Java applicationsJEEConf 2016. Effectiveness and code optimization in  Java applications
JEEConf 2016. Effectiveness and code optimization in Java applications
 
Google ART (Android RunTime)
Google ART (Android RunTime)Google ART (Android RunTime)
Google ART (Android RunTime)
 
IoT Smart Home & Connected Car Convergence Insights from Patents
IoT Smart Home & Connected Car Convergence Insights from PatentsIoT Smart Home & Connected Car Convergence Insights from Patents
IoT Smart Home & Connected Car Convergence Insights from Patents
 

Semelhante a Profile Guided Optimization

Run Go applications on Pico using TinyGo
Run Go applications on Pico using TinyGo Run Go applications on Pico using TinyGo
Run Go applications on Pico using TinyGo Yu-Shuan Hsieh
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Itzik Kotler
 
Golang from Scala developer’s perspective
Golang from Scala developer’s perspectiveGolang from Scala developer’s perspective
Golang from Scala developer’s perspectiveSveta Bozhko
 
Mender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and GolangMender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and GolangMender.io
 
Apache Beam: Lote portátil y procesamiento de transmisión
Apache Beam: Lote portátil y procesamiento de transmisiónApache Beam: Lote portátil y procesamiento de transmisión
Apache Beam: Lote portátil y procesamiento de transmisiónGlobant
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCTim Burks
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdbRoman Podoliaka
 
OpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideOpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideTim Burks
 
LF_APIStrat17_OpenAPI and gRPC Side-by-Side
LF_APIStrat17_OpenAPI and gRPC Side-by-SideLF_APIStrat17_OpenAPI and gRPC Side-by-Side
LF_APIStrat17_OpenAPI and gRPC Side-by-SideLF_APIStrat
 
Code quality par Simone Civetta
Code quality par Simone CivettaCode quality par Simone Civetta
Code quality par Simone CivettaCocoaHeads France
 
Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBbmbouter
 
Reverse engineering
Reverse engineeringReverse engineering
Reverse engineeringSaswat Padhi
 
Lab 2Lab ObjectivesThe objective for this lab is to review.docx
Lab 2Lab ObjectivesThe objective for this lab is to review.docxLab 2Lab ObjectivesThe objective for this lab is to review.docx
Lab 2Lab ObjectivesThe objective for this lab is to review.docxDIPESH30
 
Protocol T50: Five months later... So what?
Protocol T50: Five months later... So what?Protocol T50: Five months later... So what?
Protocol T50: Five months later... So what?Nelson Brito
 
Debugging Modern C++ Application with Gdb
Debugging Modern C++ Application with GdbDebugging Modern C++ Application with Gdb
Debugging Modern C++ Application with GdbSenthilKumar Selvaraj
 
Oh the compilers you'll build
Oh the compilers you'll buildOh the compilers you'll build
Oh the compilers you'll buildMark Stoodley
 

Semelhante a Profile Guided Optimization (20)

Run Go applications on Pico using TinyGo
Run Go applications on Pico using TinyGo Run Go applications on Pico using TinyGo
Run Go applications on Pico using TinyGo
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)
 
Golang from Scala developer’s perspective
Golang from Scala developer’s perspectiveGolang from Scala developer’s perspective
Golang from Scala developer’s perspective
 
Mender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and GolangMender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and Golang
 
Apache Beam: Lote portátil y procesamiento de transmisión
Apache Beam: Lote portátil y procesamiento de transmisiónApache Beam: Lote portátil y procesamiento de transmisión
Apache Beam: Lote portátil y procesamiento de transmisión
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPC
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
OpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideOpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-Side
 
LF_APIStrat17_OpenAPI and gRPC Side-by-Side
LF_APIStrat17_OpenAPI and gRPC Side-by-SideLF_APIStrat17_OpenAPI and gRPC Side-by-Side
LF_APIStrat17_OpenAPI and gRPC Side-by-Side
 
Golang testing
Golang testingGolang testing
Golang testing
 
Golang testing
Golang testingGolang testing
Golang testing
 
Code quality par Simone Civetta
Code quality par Simone CivettaCode quality par Simone Civetta
Code quality par Simone Civetta
 
Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDB
 
Reverse engineering
Reverse engineeringReverse engineering
Reverse engineering
 
Debugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to LinuxDebugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to Linux
 
Lab 2Lab ObjectivesThe objective for this lab is to review.docx
Lab 2Lab ObjectivesThe objective for this lab is to review.docxLab 2Lab ObjectivesThe objective for this lab is to review.docx
Lab 2Lab ObjectivesThe objective for this lab is to review.docx
 
Protocol T50: Five months later... So what?
Protocol T50: Five months later... So what?Protocol T50: Five months later... So what?
Protocol T50: Five months later... So what?
 
Debugging Modern C++ Application with Gdb
Debugging Modern C++ Application with GdbDebugging Modern C++ Application with Gdb
Debugging Modern C++ Application with Gdb
 
Introduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdfIntroduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdf
 
Oh the compilers you'll build
Oh the compilers you'll buildOh the compilers you'll build
Oh the compilers you'll build
 

Último

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Profile Guided Optimization

  • 1. POG ANKIT ASTHANA PROGRAM MANAGER
  • 2. INDEX • History • What is Profile Guided Optimization (POGO) ? • POGO Build Process • Steps to do POGO (Demo) • POGO under the hood • POGO case studies • Questions
  • 3. HISTORY ~ In a nutshell POGO is a major constituent which makes up the DNA for many Microsoft products ~ • POGO that is shipped in VS, was started as a joint venture between VisualC and Microsoft Research group in the late 90’s. • POGO initially only focused on Itanium platform • For almost an entire decade, even within Microsoft only a few components were POGO’ized • POGO was first shipped in 2005 on all pro-plus SKU(s) • Today POGO is a KEY optimization which provides significant performance boost to a plethora of Microsoft products.
  • 4. HISTORY ~ In a nutshell POGO is a major constituent which makes up the DNA for many Microsoft products ~ BROWSERS BUSINESS ANALYTICS POG POG Microsoft Products PRODUCTIVITY SOFTWARE DIRECTLY or INDIRECTLY you have used products which ship with POGO technology!
  • 5. What is Profile Guided Optimization (POGO) ? Really ?, NO! . But how many people here have used POGO ?
  • 6. What is Profile Guided Optimization (POGO) ? • Static analysis of code leaves many open questions for the compiler… if(a < b) switch (i) { foo(); case 1: … else case 2: … baz(); What is the typical value of i? How often is a < b? for(i = 0; i < count; ++i) for(i = 0; i < count; ++i) bar(); (*p)(x, y); What is the typical value of count? What is the typical value of pointer p?
  • 7. What is Profile Guided Optimization (POGO) ? • PGO (Profile guided optimization) is a runtime compiler optimization which leverages profile data collected from running important or performance centric user scenarios to build an optimized version of the application. • PGO optimizations have some significant advantage over traditional static optimizations as they are based upon how the application is likely to perform in a production environment which allow the optimizer to optimize for speed for hotter code paths (common user scenarios) and optimize for size for colder code paths (not so common user scenarios) resulting in generating faster and smaller code for the application attributing to significant performance gains. • PGO can be used on traditional desktop applications and is currently on supported on x86, x64 platform. Mantra behind PGO is ‘Faster and Smaller Code’
  • 8. POGO Build Process INSTRUMENT TRAIN OPTIMIZE ~ Three steps to perform Profile Guided Optimization ~
  • 10. POGO Build Process 1 2 TRIVIA ? Does anyone know (1), (2) and (3) do ? 3
  • 11. POGO Build Process 1 1 2 /GL: This flag tells the compiler to defer code generation until you link your program. Then at link time the linker calls back to the compiler to finish compilation. If you compile all your sources this way, the compiler optimizes your program as a whole rather than one source file at a time. 3 Although /GL introduces a plethora of optimizations, one major advantage is that it with Link Time Code Gen we can inline functions from one source file (foo.obj) into callers defined in another source file (bar.obj)
  • 12. POGO Build Process /LTCG The linker invokes link-time code generation if it is passed 1 2 a module that was compiled by using /GL. If you do not explicitly specify /LTCG when you pass /GL or MSIL modules to the linker, the linker eventually detects this and restarts the link by using /LTCG. Explicitly specify /LTCG when you pass /GL and MSIL modules to the linker for the fastest possible build performance. /LTCG:PGI 2 3 Specifies that the linker outputs a .pgd file in preparation for instrumented test runs on the application. /LTCG:PGO 3 Specifies that the linker uses the profile data that is created after the instrumented binary is run to create an optimized image.
  • 13. STEPS to do POGO (DEMO) POG TRIVIA Does anyone know what Nbody Simulation is all about ?
  • 14. STEPS to do POGO (DEMO) POG NBODY Sample application Speaking plainly, An N-body simulation is a simulation for a System of particles, usually under the influence of physical forces, such as gravity.
  • 15. POGO Under the hood! Remember this ? if(a < b) switch (i) { foo(); case 1: … else case 2: … baz(); What is the typical value of i? How often is a < b? for(i = 0; i < count; ++i) for(i = 0; i < count; ++i) bar(); (*p)(x, y); What is the typical value of count? What is the typical value of pointer p?
  • 16. POGO Under the hood Instrument Phase • Instrument with “probes” inserted into the code There are two kinds of probes: 1. Count (Simple/Entry) probes Used to count the number of a path is taken. (Function entry/exit) 2. Value probes Used to construct histogram of values (Switch value, Indirect call target address) • To simplify correlation process, some optimizations, such as Inliner, are off • 1.5X to 2X slower than optimized build Side-effects: Instrumented build of the application, empty .pgd file
  • 17. Instrument Phase POGO Under the hood Foo Entry probe Single dataset Cond Entry Probe Simple Probe 1 Simple probe 2 Value probe 1 Value probe 1 switch (i) { More code case 1: … Simple probe 1 default:… } Simple probe 2 More Code return
  • 18. POGO Under the hood Phase Training • Run your training scenarios, During this phase the user runs the instrumented version of the application and exercises only common performance centric user scenarios. Exercising these training scenarios results in creation of (.pgc) files which contain training data correlating to each user scenario. • For example, For modern applications a common performance user scenario is startup of the application. • Training for these scenarios would result in creation of appname!#.pgc files (where appname is the name of the running application and # is 1 + the number of appname!#.pgc files in the directory). Side-effects: A bunch of .pgc files
  • 19. POGO Under the hood Optimize Phase • Full and partial inlining • Function layout • Speed and size decision • Basic block layout • Code separation • Virtual call speculation • Switch expansion • Data separation • Loop unrolling
  • 20. POGO Under the hood Phase Optimize CALL GRAPH PATH PROFILING • Behavior of function on one call-path may be drastically different from another • Call-path specific info results in better inlining and optimization decisions • Let us take an example, (next slide)
  • 21. POGO Under the hood Phase Optimize EXAMPLE: CALL GRAPH PATH PROFILING • Assign path numbers bottom-up • Number of paths out of a function = callee paths + 1 Path 1: Foo Path 2: B Path 3: B-Foo Start Path 4: C Path 5: C-Foo Path 6: D A7 Path 7: D-Foo Path 8: A Path 9: A-B B2 C2 D2 Path 10: A-B-Foo Path 11: A-C Path 12: A-C-Foo Foo1 Path 13: A-D Path 14: A-D-Foo There are 7 paths for Foo
  • 22. POGO Under the hood Optimize Phase INLINING 10 goo 140 20 foo bar baz 100 bat
  • 23. POGO Under the hood Optimize Phase INLINING POGO uses call graph path profiling. 10 75 goo bar baz 20 50 foo bar baz 100 15 15 bat bar baz
  • 24. POGO Under the hood Optimize Phase INLINING Inlining decisions are made at each call site. 10 Call site specific profile directed inlining minimizes the goo code bloat due to inlining while still gaining performance where needed. 20 125 foo bar baz 100 15 15 bat bar baz
  • 25. POGO Under the hood Optimize Phase INLINE HEURISTICS Pogo Inline decision is made before layout, speed-size decision and all other optimizations
  • 26. POGO Under the hood Optimize Phase SPEED AND SIZE The decision is based on post-inliner dynamic instruction count Code segments with higher dynamic instruction count = SPEED Code segments with lower dynamic instruction = SIZE goo 10 125 foo 20 bar baz 100 15 bat bar baz 15
  • 27. POGO Under the hood Optimize Phase BLOCK LAYOUT Basic blocks are ordered so that most frequent path falls through. Default layout Optimized layout A A A 100 10 B B B C 100 10 C D D D C
  • 28. POGO Under the hood Optimize Phase BLOCK LAYOUT Basic blocks are ordered so that most frequent path falls through. Default layout Optimized layout A A A 100 10 B B B C 100 10 C D D D C Better Instruction Cache Locality
  • 29. POGO Under the hood LIVE AND PGO DEAD CODE Optimize Phase SEPARATION • Dead functions/blocks are placed in a special section. Default layout Optimized layout A A A 100 0 B B B C 100 0 C D D D C To minimize working set and improve code locality, code that is scenario dead can be moved out of the way.
  • 30. POGO Under the hood Optimize Phase FUNCTION LAYOUT Based on post-inliner and post-code-separation call graph and profile data Only functions/segments in live section is laid out. POGO Dead blocks are not included Overall strategy is Closest is best: functions strongly connected are put together A call is considered achieving page locality if the callee is located in the same page.
  • 31. POGO Under the hood Phase Optimize EXAMPLE: FUNCTION LAYOUT A 1000 12 A B A B E 100 100 B C 300 12 12 300 100 500 E C D C D E D A B E C D • In general, >70% page locality is achieved regardless the component size
  • 32. POGO Under the hood Optimize Phase SWITCH EXPANSION • Many ways to expand switches: linear search, jump table, binary search, etc • Pogo collects the value of switch expression Most frequent values are pulled out. // 90% of the if (i == 10) // time i = 10; goto default; switch (i) { switch (i) { case 1: … case 1: … case 2: … case 2: … case 3: … case 3: … default:… default:… } }
  • 33. POGO Under the hood Optimize Phase VIRTUAL CALL SPECULATION The type of object A in function Bar was almost always Foo via the profiles void Bar(Base *A) { void Bar(Parent *A) class Base{ { … … while(true) … virtual void call(); { while(true) } { … if(type(A) == Foo:Base) … { A->call(); Class Foo:Base{ class Bar:Base { … // inline of A->call(); … … } } void call(); void call(); } else } } A->call(); … } }
  • 34. POGO Under the hood Optimize Phase • During this phase the application is rebuilt for the last time to generate the optimized version of the application. Behind the scenes, the (.pgc) training data files are merged into the empty program database file (.pgd) created in the instrumented phase. • The compiler backend then uses this program database file to make more intelligent optimization decisions on the code generating a highly optimized version of the application Side-effect: An optimized version of the application!
  • 35. POGO CASE STUDIES SPEC2K SPEC2K: Sjeng Gobmk Perl Povray Gcc Application Size Small Medium Medium Medium Large LTCG size Mbyte 0.14 0.57 0.79 0.92 2.36 Pogo size Mbyte 0.14 0.52 0.74 0.82 2.0 Live section size 0.5 0.3 0.25 0.17 0.77 # of functions 129 2588 1824 1928 5247 % of live functions 54% 62% 47% 39% 47% % of Speed funcs 18% 2.9% 5% 2% 4.2% # of LTCG Inlines 163 2678 8050 9977 21898 # of POGO Inlines 235 938 1729 4976 3936 % of Inlined edge counts 50% 53% 25% 79% 65% % of page locality 97% 75% 85% 98% 80% % of speed gain 8.5% 6.6% 14.9% 36.9% 7.9%
  • 36. POG ANKIT ASTHANA AASTHAN@MICROSOFT.COM

Notas do Editor

  1. Most important MI opt: InlineMost important MD opt: register allocationInliner is crucial because it remove calling convention overhead and expose more information for intra-procedural optimizer. On the other hand, inlining increase register pressure and in general substantially increase code size. double-digit % code size saved with this tuning on several Win8 components. In general 5% code size reduction on Spec2k &amp; Speck26 without losing any CPU cyclesYes, Pogo inlining could be very aggressive for some hot functions or paths, but overall, it should be
  2. For Speck2 programs, most achieve &gt;99% locality.For SQL TPC-E, &gt;75% page locality.