SlideShare uma empresa Scribd logo
1 de 26
Performance Profiling
        of Virtual Machines
Jiaqing Du+, Nipun Sehrawat*, Willy Zwaenepoel+
+EPFL, Switzerland
*University of Illinois at Urbana-Champaign
Performance Profiling
•    Use CPU performance counters
•    Monitor software runtime behavior
•    Incur very low overhead
•    Used extensively: OProfile, VTune, …

                    %CYCLE       Function            Module
                    98.5529      vmx_vcpu_run        kvm-intel.ko
                    0.2226       (no symbols)        libc.so
                    0.1034       hpet_cpuhp_notify   vmlinux
                    0.1034       native_patch        vmlinux


Jiaqing Du, VEE, March 9, 2011                                      2
Terminology

      OS                          Guest                       Guest
                   profiler                profiler                       profiler

                                  VMM                         VMM
                                                                      profiler

     CPU             PMU          CPU        PMU             CPU         PMU



        (1) native profiling     (2) guest-wide profiling   (3) system-wide profiling




Jiaqing Du, VEE, March 9, 2011                                                          3
Profiling with Virtual Machines

                                 Para-            Hardware     Binary
                                 virtualization   assistance   translation
                  Guest-wide
                  profiling
                                       ?                ?            ?
                  System-wide
                  profiling        XenOprof             ?            ?


         Profilers do not work well with virtual machines.



Jiaqing Du, VEE, March 9, 2011                                               4
Contributions

                                 (1) Give solutions

                                    Para-            Hardware     Binary
                                    virtualization   assistance   translation
                  Guest-wide
                  profiling
                                          ?                ?            ?
                  System-wide
                  profiling
                                      XenOprof             ?            ?



                                              (2) Implement prototypes
Jiaqing Du, VEE, March 9, 2011                                                  5
Outline
•    Native profiling
•    Guest-wide profiling
•    System-wide profiling
•    Evaluation




Jiaqing Du, VEE, March 9, 2011             6
Native Profiling
• Performance monitoring unit (PMU)
       – consists of a set of event counters
       – generates an interrupt when a counter overflows
• PMU-based profiler
                                 User
                                           Control     Interpret   - previous PC value
                                 Kernel
                                                                   - process identifier
                                          Configure    Collect




                                 CPU                 PMU

Jiaqing Du, VEE, March 9, 2011                                                        7
Guest-wide Profiling
• Profiler runs in the guest and only profiles the guest
                                   Guest
                                            Control     Interpret   Injected interrupts
                                                                    should be handled
                                                                    right after guest
                                           Configure    Collect     resumes execution.


                                 VMM


                                   CPU                PMU



              Challenge: synchronous interrupt delivery to the guest
Jiaqing Du, VEE, March 9, 2011                                                        8
System-wide Profiling (1/3)
• Reveal runtime behavior of both VMM and guest(s)

                                         Guest1              Guest2

                                                                         Do not know the
                                                                         internals of a guest.
                                                  Control    Interpret


                                 VMM          Configure      Collect


                                   CPU                 PMU



               Challenge: interpret samples belonging to the guest
Jiaqing Du, VEE, March 9, 2011                                                               9
System-wide Profiling (2/3)
• Interpret guest samples: full delegation

                                                     Control        Interpret
                                       Guest
                                                    Configure       Collect



                                          Control       Interpret


                                 VMM     Configure      Collect


                                   CPU                    PMU



Jiaqing Du, VEE, March 9, 2011                                                  10
System-wide Profiling (3/3)
• Interpret guest samples: interpretation delegation

                                                     Control        Interpret
                                       Guest
                                                    Configure       Collect



                                          Control       Interpret
                                                                              Shared
                                                                              Buffer
                                 VMM     Configure      Collect


                                   CPU                    PMU



Jiaqing Du, VEE, March 9, 2011                                                         11
PMU Multiplexing
• When to save & restore performance counters?
• CPU switch
       – only in-guest execution is accounted to the guest
                                 VMM                   VMM
                  guest1         I/Oguest1   guest2           I/Oguest2       guest2

            account to guest 1           account to guest 2               account to guest 2
• Domain switch
       – in-VMM execution is also accounted to the guest
                                 VMM                   VMM
                  guest1         I/Oguest1   guest2           I/Oguest2       guest2

                    account to guest1          account to guest2

Jiaqing Du, VEE, March 9, 2011                                                                 12
Implementation


                                  Para-            KVM   QEMU
                                  virtualization
                  Guest-wide
                  profiling
                                        ?          √      ?
                  System-wide
                  profiling
                                    XenOprof       √      √




Jiaqing Du, VEE, March 9, 2011                                  13
Evaluation question #1

How much does profiling slow down programs?




Jiaqing Du, VEE, March 9, 2011                            14
Profiling Overhead
• Measure execution time
       – a computation-intensive program
       – with and without profiling
       – about 400 counter overflows per second

                     Profiling environment   Increased execution time
                     Native Linux                0.04% ± 0.004%
                     KVM guest-wide              0.39% ± 0.045%
                     KVM system-wide             0.44% ± 0.043%
                     QEMU system-wide            0.94% ± 0.044%




Jiaqing Du, VEE, March 9, 2011                                          15
Evaluation question #2

                       Are profiling results accurate?




Jiaqing Du, VEE, March 9, 2011                            16
Profiling Accuracy (1/4)
• A computation-intensive benchmark
• compute_{a|b}() does floating point arithmetic
• Monitor CPU cycles

                           int main(int argc, char *argv[])
                           {
                               while (1) {
                                   compute_a();
                                   compute_b();
                               }
                           }




Jiaqing Du, VEE, March 9, 2011                                17
Profiling Accuracy (2/4)
• Comparison with native profiling
                 90

                 80

                 70

                 60

                 50                                                Native
   Cycle %       40
                                                                   KVM guest-wide
                                                                   KVM system-wide
                 30
                                                                   QEMU system-wide
                 20

                 10

                  0
                                 compute_a             compute_b

                                        Routine name
Jiaqing Du, VEE, March 9, 2011                                                        18
Profiling Accuracy (3/4)
• A memory-intensive benchmark
• Randomly access a fixed-size region of memory
• Monitor last level cache misses

                        struct item {
                            struct item *next;
                            long pad[NUM_PAD];
                        }

                        void chase_pointer()
                        {
                            struct item *p = NULL;
                            p = &randomly_connected_items;
                            while (p != null) p = p->next;
                        }


Jiaqing Du, VEE, March 9, 2011                               19
Profiling Accuracy (4/4)
 • Comparison with native profiling
                         1.6

                         1.4

                         1.2

                           1
                                                                                             Native
Cache misses per         0.8                                                                 KVM guest-wide
memory access            0.6
                                                                                             KVM system-wide
                                                                                             QEMU system-wide
                         0.4

                         0.2

                           0
                                  256 512 768 1024 1280 1536 1792 2048 2304 2560 2816 3072

                                               Working set size (KB)

 Jiaqing Du, VEE, March 9, 2011                                                                                 20
Evaluation question #3

                     What is the difference between
                     CPU switch and domain switch?




Jiaqing Du, VEE, March 9, 2011                            21
Recap
• CPU switch
                           VMM                    VMM
              guest1        I/Oguest1    guest2          I/Oguest2       guest2

        account to guest 1          account to guest 2               account to guest 2


• Domain switch
                            VMM                   VMM
              guest1         I/Oguest1   guest2          I/Oguest2       guest2

                account to guest1         account to guest2




Jiaqing Du, VEE, March 9, 2011                                                            22
Profiling Packet Receive (1/2)
• Experiment
       – push packets to a Linux guest in KVM
       – run OProfile in the guest
       – monitor instruction retirements

                             Linux


                         KVM         virtual NIC         Linux

                         Hardware                  Hardware
                                           NIC                   NIC




Jiaqing Du, VEE, March 9, 2011                                         23
Profiling Packet Receive (2/2)
                                CPU Switch                       Domain Switch
                 INSTR       Function                    INSTR    Function
                 167         csum_partial                2261     cp_interrupt
                 106         csum_partial_copy_generic   1336     cp_rx_poll
Packet           74          copy_to_user                1034     cp_start_xmit               I/O
Processing                                                                                    Related
                 47          ipt_do_table                421      native_apic_mem_write
                 38          tcp_v4_rcv                  374      native_apic_mem_read
                 …             …                         191
                                                         …        csum_partial
                                                                    …
                 …             …                         105
                                                         …        csum_partial_copy_generic
                                                                    …
                 …             …                         94
                                                         …        copy_to_user
                                                                    …
                 …             …                         79
                                                         …        ipt_do_table
                                                                    …
                 …             …                         51
                                                         …        tcp_v4_rcv
                                                                    …


                      Domain switch gives more insight for I/O operations.
     Jiaqing Du, VEE, March 9, 2011                                                           24
Related Work
• XenOprof
       – first profiler targeting virtual machines
       – system-wide profiling for Xen
• Linux perf
       – a profiling infrastructure for Linux
       – limited support of profiling KVM Linux guest
• VMware vmkperf
       – only read and write CPU performance counters



Jiaqing Du, VEE, March 9, 2011                          25
Conclusions


                                 Para-            Hardware     Binary
                                 virtualization   assistance   translation
                  Guest-wide           √                             √
                  profiling                            √
                  System-wide
                  profiling
                                   XenOprof            √             √




Jiaqing Du, VEE, March 9, 2011                                               26

Mais conteúdo relacionado

Mais procurados

Transcendent memoryupdate xensummit2010-final
Transcendent memoryupdate xensummit2010-finalTranscendent memoryupdate xensummit2010-final
Transcendent memoryupdate xensummit2010-final
The Linux Foundation
 
Track A-Shmuel Panijel, Windriver
Track A-Shmuel Panijel, WindriverTrack A-Shmuel Panijel, Windriver
Track A-Shmuel Panijel, Windriver
chiportal
 
Ov psim demo_slides_power_pc
Ov psim demo_slides_power_pcOv psim demo_slides_power_pc
Ov psim demo_slides_power_pc
simon56
 
Java Standard Edition 5 Performance
Java Standard Edition 5 PerformanceJava Standard Edition 5 Performance
Java Standard Edition 5 Performance
white paper
 
SMI_SNUG_paper_v10
SMI_SNUG_paper_v10SMI_SNUG_paper_v10
SMI_SNUG_paper_v10
Igor Lesik
 
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
npinto
 

Mais procurados (20)

Develop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM BoardsDevelop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM Boards
 
Energy efficient storage in vm
Energy efficient storage in vmEnergy efficient storage in vm
Energy efficient storage in vm
 
Implement Checkpointing for Android
Implement Checkpointing for AndroidImplement Checkpointing for Android
Implement Checkpointing for Android
 
Nakajima hvm-be final
Nakajima hvm-be finalNakajima hvm-be final
Nakajima hvm-be final
 
Xen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization OpportunitiesXen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization Opportunities
 
Transcendent memoryupdate xensummit2010-final
Transcendent memoryupdate xensummit2010-finalTranscendent memoryupdate xensummit2010-final
Transcendent memoryupdate xensummit2010-final
 
L4 Microkernel :: Design Overview
L4 Microkernel :: Design OverviewL4 Microkernel :: Design Overview
L4 Microkernel :: Design Overview
 
Track A-Shmuel Panijel, Windriver
Track A-Shmuel Panijel, WindriverTrack A-Shmuel Panijel, Windriver
Track A-Shmuel Panijel, Windriver
 
Ov psim demo_slides_power_pc
Ov psim demo_slides_power_pcOv psim demo_slides_power_pc
Ov psim demo_slides_power_pc
 
Explore Android Internals
Explore Android InternalsExplore Android Internals
Explore Android Internals
 
Java Standard Edition 5 Performance
Java Standard Edition 5 PerformanceJava Standard Edition 5 Performance
Java Standard Edition 5 Performance
 
Embedded Virtualization for Mobile Devices
Embedded Virtualization for Mobile DevicesEmbedded Virtualization for Mobile Devices
Embedded Virtualization for Mobile Devices
 
Learn C Programming Language by Using GDB
Learn C Programming Language by Using GDBLearn C Programming Language by Using GDB
Learn C Programming Language by Using GDB
 
Mobile Virtualization using the Xen Technologies
Mobile Virtualization using the Xen TechnologiesMobile Virtualization using the Xen Technologies
Mobile Virtualization using the Xen Technologies
 
Low Level View of Android System Architecture
Low Level View of Android System ArchitectureLow Level View of Android System Architecture
Low Level View of Android System Architecture
 
Keynote Speech: Xen ARM Virtualization
Keynote Speech: Xen ARM VirtualizationKeynote Speech: Xen ARM Virtualization
Keynote Speech: Xen ARM Virtualization
 
Minimizing I/O Latency in Xen-ARM
Minimizing I/O Latency in Xen-ARMMinimizing I/O Latency in Xen-ARM
Minimizing I/O Latency in Xen-ARM
 
Android Virtualization: Opportunity and Organization
Android Virtualization: Opportunity and OrganizationAndroid Virtualization: Opportunity and Organization
Android Virtualization: Opportunity and Organization
 
SMI_SNUG_paper_v10
SMI_SNUG_paper_v10SMI_SNUG_paper_v10
SMI_SNUG_paper_v10
 
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
 

Destaque

Performance Of Pb Free Solder Pastes At Different Reflow
Performance Of Pb Free Solder Pastes At Different ReflowPerformance Of Pb Free Solder Pastes At Different Reflow
Performance Of Pb Free Solder Pastes At Different Reflow
volcanicvoltage
 
Nicolas Grekas (Blackfire) – App Profiling - the Must-Have Tool of your Daily...
Nicolas Grekas (Blackfire) – App Profiling - the Must-Have Tool of your Daily...Nicolas Grekas (Blackfire) – App Profiling - the Must-Have Tool of your Daily...
Nicolas Grekas (Blackfire) – App Profiling - the Must-Have Tool of your Daily...
Techsylvania
 

Destaque (12)

XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64
 
Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loose...
Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loose...Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loose...
Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loose...
 
Performance Of Pb Free Solder Pastes At Different Reflow
Performance Of Pb Free Solder Pastes At Different ReflowPerformance Of Pb Free Solder Pastes At Different Reflow
Performance Of Pb Free Solder Pastes At Different Reflow
 
Final report reflow process control
Final report reflow process controlFinal report reflow process control
Final report reflow process control
 
The Art of CSS
The Art of CSSThe Art of CSS
The Art of CSS
 
Optimising reflow oven for SMT
Optimising reflow oven for SMTOptimising reflow oven for SMT
Optimising reflow oven for SMT
 
Reflow oven
Reflow ovenReflow oven
Reflow oven
 
NO REFLOW
NO REFLOWNO REFLOW
NO REFLOW
 
Profiles - Why getting them right is important
Profiles - Why getting them right is important Profiles - Why getting them right is important
Profiles - Why getting them right is important
 
PCBA Assembly Process Flow / PCB Assembly Manufacturing
PCBA Assembly Process Flow / PCB Assembly ManufacturingPCBA Assembly Process Flow / PCB Assembly Manufacturing
PCBA Assembly Process Flow / PCB Assembly Manufacturing
 
Nicolas Grekas (Blackfire) – App Profiling - the Must-Have Tool of your Daily...
Nicolas Grekas (Blackfire) – App Profiling - the Must-Have Tool of your Daily...Nicolas Grekas (Blackfire) – App Profiling - the Must-Have Tool of your Daily...
Nicolas Grekas (Blackfire) – App Profiling - the Must-Have Tool of your Daily...
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 

Semelhante a Performance Profiling of Virtual Machines

Analysis of Testability of a Flight Software Product Line
Analysis of Testability of a Flight Software Product LineAnalysis of Testability of a Flight Software Product Line
Analysis of Testability of a Flight Software Product Line
Dharmalingam Ganesan
 
Configuration management 101 - A tale of disaster recovery using CFEngine 3
Configuration management 101 - A tale of disaster recovery using CFEngine 3Configuration management 101 - A tale of disaster recovery using CFEngine 3
Configuration management 101 - A tale of disaster recovery using CFEngine 3
RUDDER
 
Gupta cell verification dv club
Gupta cell verification dv clubGupta cell verification dv club
Gupta cell verification dv club
Obsidian Software
 
Cell Verification Metrics
Cell Verification MetricsCell Verification Metrics
Cell Verification Metrics
DVClub
 
Architecture Analysis of Systems based on Publish-Subscribe Systems
Architecture Analysis of Systems based on Publish-Subscribe SystemsArchitecture Analysis of Systems based on Publish-Subscribe Systems
Architecture Analysis of Systems based on Publish-Subscribe Systems
Dharmalingam Ganesan
 
Getting started with Puppet
Getting started with PuppetGetting started with Puppet
Getting started with Puppet
jeyg
 
VMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialVMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A Tutorial
Richard McDougall
 
Me3D: A Model-driven Methodology Expediting Embedded Device Driver Development
Me3D: A Model-driven Methodology  Expediting Embedded Device  Driver DevelopmentMe3D: A Model-driven Methodology  Expediting Embedded Device  Driver Development
Me3D: A Model-driven Methodology Expediting Embedded Device Driver Development
huichenphd
 

Semelhante a Performance Profiling of Virtual Machines (20)

Configuration management: automating and rationalizing server setup with CFEn...
Configuration management: automating and rationalizing server setup with CFEn...Configuration management: automating and rationalizing server setup with CFEn...
Configuration management: automating and rationalizing server setup with CFEn...
 
Configuration management: automating and rationalizing server setup with CFEn...
Configuration management: automating and rationalizing server setup with CFEn...Configuration management: automating and rationalizing server setup with CFEn...
Configuration management: automating and rationalizing server setup with CFEn...
 
Xen Community Update 2011
Xen Community Update 2011Xen Community Update 2011
Xen Community Update 2011
 
Performance tuningtoolkitintroduction
Performance tuningtoolkitintroductionPerformance tuningtoolkitintroduction
Performance tuningtoolkitintroduction
 
Analysis of Testability of a Flight Software Product Line
Analysis of Testability of a Flight Software Product LineAnalysis of Testability of a Flight Software Product Line
Analysis of Testability of a Flight Software Product Line
 
Configuration management 101 - A tale of disaster recovery using CFEngine 3
Configuration management 101 - A tale of disaster recovery using CFEngine 3Configuration management 101 - A tale of disaster recovery using CFEngine 3
Configuration management 101 - A tale of disaster recovery using CFEngine 3
 
Gupta cell verification dv club
Gupta cell verification dv clubGupta cell verification dv club
Gupta cell verification dv club
 
Cell Verification Metrics
Cell Verification MetricsCell Verification Metrics
Cell Verification Metrics
 
Architecture Analysis of Systems based on Publish-Subscribe Systems
Architecture Analysis of Systems based on Publish-Subscribe SystemsArchitecture Analysis of Systems based on Publish-Subscribe Systems
Architecture Analysis of Systems based on Publish-Subscribe Systems
 
Getting started with Puppet
Getting started with PuppetGetting started with Puppet
Getting started with Puppet
 
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...
 
VMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialVMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A Tutorial
 
12th Japan CloudStack User Group Meetup MidoNet with scalable virtual router
12th Japan CloudStack User Group Meetup   MidoNet with scalable virtual router12th Japan CloudStack User Group Meetup   MidoNet with scalable virtual router
12th Japan CloudStack User Group Meetup MidoNet with scalable virtual router
 
12th Japan CloudStack User Group Meetup
12th Japan CloudStack User Group Meetup12th Japan CloudStack User Group Meetup
12th Japan CloudStack User Group Meetup
 
Rudder - Configuration management benefits for everyone (FOSDEM 2012)
Rudder - Configuration management benefits for everyone (FOSDEM 2012)Rudder - Configuration management benefits for everyone (FOSDEM 2012)
Rudder - Configuration management benefits for everyone (FOSDEM 2012)
 
Esp 100107093030-phpapp02
Esp 100107093030-phpapp02Esp 100107093030-phpapp02
Esp 100107093030-phpapp02
 
Binary translation
Binary translationBinary translation
Binary translation
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java Developers
 
Me3D: A Model-driven Methodology Expediting Embedded Device Driver Development
Me3D: A Model-driven Methodology  Expediting Embedded Device  Driver DevelopmentMe3D: A Model-driven Methodology  Expediting Embedded Device  Driver Development
Me3D: A Model-driven Methodology Expediting Embedded Device Driver Development
 
ICSM08a.ppt
ICSM08a.pptICSM08a.ppt
ICSM08a.ppt
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Performance Profiling of Virtual Machines

  • 1. Performance Profiling of Virtual Machines Jiaqing Du+, Nipun Sehrawat*, Willy Zwaenepoel+ +EPFL, Switzerland *University of Illinois at Urbana-Champaign
  • 2. Performance Profiling • Use CPU performance counters • Monitor software runtime behavior • Incur very low overhead • Used extensively: OProfile, VTune, … %CYCLE Function Module 98.5529 vmx_vcpu_run kvm-intel.ko 0.2226 (no symbols) libc.so 0.1034 hpet_cpuhp_notify vmlinux 0.1034 native_patch vmlinux Jiaqing Du, VEE, March 9, 2011 2
  • 3. Terminology OS Guest Guest profiler profiler profiler VMM VMM profiler CPU PMU CPU PMU CPU PMU (1) native profiling (2) guest-wide profiling (3) system-wide profiling Jiaqing Du, VEE, March 9, 2011 3
  • 4. Profiling with Virtual Machines Para- Hardware Binary virtualization assistance translation Guest-wide profiling ? ? ? System-wide profiling XenOprof ? ? Profilers do not work well with virtual machines. Jiaqing Du, VEE, March 9, 2011 4
  • 5. Contributions (1) Give solutions Para- Hardware Binary virtualization assistance translation Guest-wide profiling ? ? ? System-wide profiling XenOprof ? ? (2) Implement prototypes Jiaqing Du, VEE, March 9, 2011 5
  • 6. Outline • Native profiling • Guest-wide profiling • System-wide profiling • Evaluation Jiaqing Du, VEE, March 9, 2011 6
  • 7. Native Profiling • Performance monitoring unit (PMU) – consists of a set of event counters – generates an interrupt when a counter overflows • PMU-based profiler User Control Interpret - previous PC value Kernel - process identifier Configure Collect CPU PMU Jiaqing Du, VEE, March 9, 2011 7
  • 8. Guest-wide Profiling • Profiler runs in the guest and only profiles the guest Guest Control Interpret Injected interrupts should be handled right after guest Configure Collect resumes execution. VMM CPU PMU Challenge: synchronous interrupt delivery to the guest Jiaqing Du, VEE, March 9, 2011 8
  • 9. System-wide Profiling (1/3) • Reveal runtime behavior of both VMM and guest(s) Guest1 Guest2 Do not know the internals of a guest. Control Interpret VMM Configure Collect CPU PMU Challenge: interpret samples belonging to the guest Jiaqing Du, VEE, March 9, 2011 9
  • 10. System-wide Profiling (2/3) • Interpret guest samples: full delegation Control Interpret Guest Configure Collect Control Interpret VMM Configure Collect CPU PMU Jiaqing Du, VEE, March 9, 2011 10
  • 11. System-wide Profiling (3/3) • Interpret guest samples: interpretation delegation Control Interpret Guest Configure Collect Control Interpret Shared Buffer VMM Configure Collect CPU PMU Jiaqing Du, VEE, March 9, 2011 11
  • 12. PMU Multiplexing • When to save & restore performance counters? • CPU switch – only in-guest execution is accounted to the guest VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest 1 account to guest 2 account to guest 2 • Domain switch – in-VMM execution is also accounted to the guest VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest1 account to guest2 Jiaqing Du, VEE, March 9, 2011 12
  • 13. Implementation Para- KVM QEMU virtualization Guest-wide profiling ? √ ? System-wide profiling XenOprof √ √ Jiaqing Du, VEE, March 9, 2011 13
  • 14. Evaluation question #1 How much does profiling slow down programs? Jiaqing Du, VEE, March 9, 2011 14
  • 15. Profiling Overhead • Measure execution time – a computation-intensive program – with and without profiling – about 400 counter overflows per second Profiling environment Increased execution time Native Linux 0.04% ± 0.004% KVM guest-wide 0.39% ± 0.045% KVM system-wide 0.44% ± 0.043% QEMU system-wide 0.94% ± 0.044% Jiaqing Du, VEE, March 9, 2011 15
  • 16. Evaluation question #2 Are profiling results accurate? Jiaqing Du, VEE, March 9, 2011 16
  • 17. Profiling Accuracy (1/4) • A computation-intensive benchmark • compute_{a|b}() does floating point arithmetic • Monitor CPU cycles int main(int argc, char *argv[]) { while (1) { compute_a(); compute_b(); } } Jiaqing Du, VEE, March 9, 2011 17
  • 18. Profiling Accuracy (2/4) • Comparison with native profiling 90 80 70 60 50 Native Cycle % 40 KVM guest-wide KVM system-wide 30 QEMU system-wide 20 10 0 compute_a compute_b Routine name Jiaqing Du, VEE, March 9, 2011 18
  • 19. Profiling Accuracy (3/4) • A memory-intensive benchmark • Randomly access a fixed-size region of memory • Monitor last level cache misses struct item { struct item *next; long pad[NUM_PAD]; } void chase_pointer() { struct item *p = NULL; p = &randomly_connected_items; while (p != null) p = p->next; } Jiaqing Du, VEE, March 9, 2011 19
  • 20. Profiling Accuracy (4/4) • Comparison with native profiling 1.6 1.4 1.2 1 Native Cache misses per 0.8 KVM guest-wide memory access 0.6 KVM system-wide QEMU system-wide 0.4 0.2 0 256 512 768 1024 1280 1536 1792 2048 2304 2560 2816 3072 Working set size (KB) Jiaqing Du, VEE, March 9, 2011 20
  • 21. Evaluation question #3 What is the difference between CPU switch and domain switch? Jiaqing Du, VEE, March 9, 2011 21
  • 22. Recap • CPU switch VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest 1 account to guest 2 account to guest 2 • Domain switch VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest1 account to guest2 Jiaqing Du, VEE, March 9, 2011 22
  • 23. Profiling Packet Receive (1/2) • Experiment – push packets to a Linux guest in KVM – run OProfile in the guest – monitor instruction retirements Linux KVM virtual NIC Linux Hardware Hardware NIC NIC Jiaqing Du, VEE, March 9, 2011 23
  • 24. Profiling Packet Receive (2/2) CPU Switch Domain Switch INSTR Function INSTR Function 167 csum_partial 2261 cp_interrupt 106 csum_partial_copy_generic 1336 cp_rx_poll Packet 74 copy_to_user 1034 cp_start_xmit I/O Processing Related 47 ipt_do_table 421 native_apic_mem_write 38 tcp_v4_rcv 374 native_apic_mem_read … … 191 … csum_partial … … … 105 … csum_partial_copy_generic … … … 94 … copy_to_user … … … 79 … ipt_do_table … … … 51 … tcp_v4_rcv … Domain switch gives more insight for I/O operations. Jiaqing Du, VEE, March 9, 2011 24
  • 25. Related Work • XenOprof – first profiler targeting virtual machines – system-wide profiling for Xen • Linux perf – a profiling infrastructure for Linux – limited support of profiling KVM Linux guest • VMware vmkperf – only read and write CPU performance counters Jiaqing Du, VEE, March 9, 2011 25
  • 26. Conclusions Para- Hardware Binary virtualization assistance translation Guest-wide √ √ profiling √ System-wide profiling XenOprof √ √ Jiaqing Du, VEE, March 9, 2011 26