SlideShare uma empresa Scribd logo
1 de 12
peemuperf

Cache monitoring on ARM Linux
            2012
What is PMU ?
•   Cortex-A series processors contain event counting hardware which
    can be used to profile and benchmark code, including generation of
    cycle and instruction count figures and to derive figures for cache
    misses and so forth. The performance counter block contains a cycle
    counter which can count processor cycles, or be configured to count
    every 64 cycles. There are also a number of configurable 32-bit wide
    event counters which can be set to count instances of events from a
    wide-ranging list (for example, instructions executed, or MMU TLB
    misses). These counters can be accessed through debug tools, or by
    software running on the processor, through the CP15 Performance
    Monitoring Unit (PMU) registers. They provide a non-invasive debug
    feature and do not change the behavior of the processor. CP15 also
    provides a number of controls for enabling and resetting the counters
    and to indicate overflows (there is an option to generate an interrupt
    on a counter overflow). The cycle counter can be enabled
    independently of the event counters.
•   From ARM Architecture Reference Manual
Profiling alternatives
• Oprofile
   – Supported in mainline kernel (drivers/oprofile)
   – ARM support enabled
   – Relies on “Interrupts” from HW unit, when event counters
     overflow
   – Timer fallback when no HW event monitors are available
• Unfortunately, different errata in current ARM A8/A9
  devices, make interrupt based monitoring unreliable
   – To be fixed in later ARM cores
• Due to above, oprofile only supports CPU cycle
  measurement using timers, on majority of ARM cores,
  atleast upto 3.2 kernel
Latest status
•   http://lists.infradead.org/pipermail/linux-arm-kernel/2012-June/103189.html
•   Convert OMAP2/3 devices to use HWMOD for creating a PMU device. To support PMU
•   on OMAP2/3 devices we only need to use MPU sub-system and so we can simply use
•   the MPU HWMOD to create the PMU device. The MPU HWMOD for OMAP2/3 devices is
•   currently missing the PMU interrupt and so add the PMU interrupt to the MPU
•   HWMOD for these devices.

•   This change also moves the PMU code out of the mach-omap2/devices.c files into
•   its own pmu.c file as suggested by Kevin Hilman to de-clutter devices.c.

•   Cc: Ming Lei <ming.lei at canonical.com>
•   Cc: Will Deacon <will.deacon at arm.com>
•   Cc: Benoit Cousson <b-cousson at ti.com>
•   Cc: Paul Walmsley <paul at pwsan.com>
•   Cc: Kevin Hilman <khilman at ti.com>

•   Signed-off-by: Jon Hunter <jon-hunter at ti.com>
•   ---
•    arch/arm/mach-omap2/Makefile                     | 1+
•    arch/arm/mach-omap2/devices.c                     | 33 -----------
•    arch/arm/mach-omap2/omap_hwmod_2xxx_ipblock_data.c | 6 ++
•    arch/arm/mach-omap2/omap_hwmod_3xxx_data.c                   | 6 ++
•    arch/arm/mach-omap2/pmu.c                       | 59 ++++++++++++++++++++
•    arch/arm/plat-omap/include/plat/irqs.h          | 1+
•    6 files changed, 73 insertions(+), 33 deletions(-)
•    create mode 100644 arch/arm/mach-omap2/pmu.c
Patch status
• The patch set mentioned in earlier slide, is
  in various stages of integration into
  different SOC architectures
• Beagle/ OMAP35x is supported
• This is not supported in AM335x as of
  2012, expect to be in mainline by 2013
• In the interim, what is the option ?
What is the need ?
• For measuring different aspects of
  performance related to external memory
  bandwidth, cache usage monitoring is very
  key
• Current oprofile does not support this in
  different SOCs
peemuperf
• A tool to measure overall Linux
  Performance using PMU HW of ARM -
  ARM CPU Cycles, Cache misses at L1
  and L2 level, stalls, NEON..
• Consists of a kernel module that does the
  heavy lifting, and exposes all profile
  information to userspace via proc entry
Configurable parameters
• evdelay=500 evlist=1,68,3,4 evdebug=1

• evdelay – Sampling interval (milliseconds)
• evlist – Comma separated array of event
  IDs (refer 3.2.49 c9, Event Selection
  Register of Cortex A8 TRM)
• evdebug – Controls debug output
  messages
Userspace access
• Proc entry is
  – /proc/peemuperf
• Displays in below format
  – <COUNTER #> : <COUNTER VALUE>
  – Counter[0] : 48,
  – Counter[1] :77448,
  – Counter[2]: 13,
  – Counter[3]: 115058
  – Overflow flag: = 0, Cycle Count: = 5739253
A8 vs A9
• A8 has 4 performance counters
• A9 has 6
• peemuperf dynamically configures based
  on run-time query
Default Events monitored
• 1 ==> Instruction fetch that causes a refill at the
  lowest level of instruction or unified cache
• 68 ==> Any cacheable miss in the L2 cache
• 3 ==> Data read or write operation that causes a
  refill at the lowest level of data or unified cache
• 4 ==> Data read or write operation that causes a
  cache access at the lowest level of data or
  unified cache
Source
• github.com/prabindh/peemuperf

Mais conteúdo relacionado

Mais procurados

Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernellcplcp1
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Brendan Gregg
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Valeriy Kravchuk
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsBrendan Gregg
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to RootsBrendan Gregg
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing ToolsBrendan Gregg
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet FiltersKernel TLV
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance ToolsBrendan Gregg
 
Stateless Hypervisors at Scale
Stateless Hypervisors at ScaleStateless Hypervisors at Scale
Stateless Hypervisors at ScaleAntony Messerl
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debuggingHao-Ran Liu
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
 

Mais procurados (20)

Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame Graphs
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame Graphs
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
Stateless Hypervisors at Scale
Stateless Hypervisors at ScaleStateless Hypervisors at Scale
Stateless Hypervisors at Scale
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 

Destaque

Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsJiannan Ouyang, PhD
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsJiannan Ouyang, PhD
 
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONSENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONSStephan Cadene
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer
 
Denser, cooler, faster, stronger: PHP on ARM microservers
Denser, cooler, faster, stronger: PHP on ARM microserversDenser, cooler, faster, stronger: PHP on ARM microservers
Denser, cooler, faster, stronger: PHP on ARM microserversJez Halford
 
Preemptable ticket spinlocks: improving consolidated performance in the cloud
Preemptable ticket spinlocks: improving consolidated performance in the cloudPreemptable ticket spinlocks: improving consolidated performance in the cloud
Preemptable ticket spinlocks: improving consolidated performance in the cloudJiannan Ouyang, PhD
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Eric Van Hensbergen
 
SDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + QuantumSDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + QuantumThe Linux Foundation
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingLinaro
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEric Van Hensbergen
 
DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2Outlyer
 
reference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysisreference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_AnalysisBuland Singh
 
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionLinux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionHemanth Venkatesh
 
Memory Barriers in the Linux Kernel
Memory Barriers in the Linux KernelMemory Barriers in the Linux Kernel
Memory Barriers in the Linux KernelDavidlohr Bueso
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterAaron Joue
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespacesLocaweb
 
SFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationSFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationLinaro
 
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...David Evans
 
SFO15-BFO2: Reducing the arm linux kernel size without losing your mind
SFO15-BFO2: Reducing the arm linux kernel size without losing your mindSFO15-BFO2: Reducing the arm linux kernel size without losing your mind
SFO15-BFO2: Reducing the arm linux kernel size without losing your mindLinaro
 

Destaque (20)

Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
 
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONSENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
 
Docker by demo
Docker by demoDocker by demo
Docker by demo
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
Denser, cooler, faster, stronger: PHP on ARM microservers
Denser, cooler, faster, stronger: PHP on ARM microserversDenser, cooler, faster, stronger: PHP on ARM microservers
Denser, cooler, faster, stronger: PHP on ARM microservers
 
Preemptable ticket spinlocks: improving consolidated performance in the cloud
Preemptable ticket spinlocks: improving consolidated performance in the cloudPreemptable ticket spinlocks: improving consolidated performance in the cloud
Preemptable ticket spinlocks: improving consolidated performance in the cloud
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
SDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + QuantumSDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + Quantum
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP Scheduling
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS Interference
 
DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2
 
reference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysisreference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysis
 
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionLinux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emption
 
Memory Barriers in the Linux Kernel
Memory Barriers in the Linux KernelMemory Barriers in the Linux Kernel
Memory Barriers in the Linux Kernel
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespaces
 
SFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM VirtualizationSFO15-407: Performance Overhead of ARM Virtualization
SFO15-407: Performance Overhead of ARM Virtualization
 
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...
 
SFO15-BFO2: Reducing the arm linux kernel size without losing your mind
SFO15-BFO2: Reducing the arm linux kernel size without losing your mindSFO15-BFO2: Reducing the arm linux kernel size without losing your mind
SFO15-BFO2: Reducing the arm linux kernel size without losing your mind
 

Semelhante a ARM Linux Cache Monitoring Using PMU

Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.Atollic
 
micro controllers 1.ppt
micro controllers 1.pptmicro controllers 1.ppt
micro controllers 1.pptsiminkhan
 
Unit 1 processormemoryorganisation
Unit 1 processormemoryorganisationUnit 1 processormemoryorganisation
Unit 1 processormemoryorganisationKarunamoorthy B
 
Unit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisationUnit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisationPavithra S
 
6 profiling tools
6 profiling tools6 profiling tools
6 profiling toolsvideos
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerAmrutaMehata
 
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...CanSecWest
 
Computer Organization : CPU, Memory and I/O organization
Computer Organization : CPU, Memory and I/O organizationComputer Organization : CPU, Memory and I/O organization
Computer Organization : CPU, Memory and I/O organizationAmrutaMehata
 
Introduction to embedded systems
Introduction  to embedded systemsIntroduction  to embedded systems
Introduction to embedded systemsRAMPRAKASHT1
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsButtaRajasekhar2
 
EE6602 Embedded System
EE6602 Embedded SystemEE6602 Embedded System
EE6602 Embedded Systemrmkceteee
 
Computer system architecture
Computer system architectureComputer system architecture
Computer system architecturejeetesh036
 
Embedded systems 101 final
Embedded systems 101 finalEmbedded systems 101 final
Embedded systems 101 finalKhalid Elmeadawy
 

Semelhante a ARM Linux Cache Monitoring Using PMU (20)

Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
 
micro controllers 1.ppt
micro controllers 1.pptmicro controllers 1.ppt
micro controllers 1.ppt
 
Dsp on an-avr
Dsp on an-avrDsp on an-avr
Dsp on an-avr
 
Unit 1 processormemoryorganisation
Unit 1 processormemoryorganisationUnit 1 processormemoryorganisation
Unit 1 processormemoryorganisation
 
Unit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisationUnit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisation
 
6 profiling tools
6 profiling tools6 profiling tools
6 profiling tools
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and Microcontroller
 
Chapter01 (1).ppt
Chapter01 (1).pptChapter01 (1).ppt
Chapter01 (1).ppt
 
CPU Architecture
CPU ArchitectureCPU Architecture
CPU Architecture
 
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
 
Computer Organization : CPU, Memory and I/O organization
Computer Organization : CPU, Memory and I/O organizationComputer Organization : CPU, Memory and I/O organization
Computer Organization : CPU, Memory and I/O organization
 
TMS320C5x
TMS320C5xTMS320C5x
TMS320C5x
 
Introduction to embedded systems
Introduction  to embedded systemsIntroduction  to embedded systems
Introduction to embedded systems
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose Processors
 
EE6602 Embedded System
EE6602 Embedded SystemEE6602 Embedded System
EE6602 Embedded System
 
Computer system architecture
Computer system architectureComputer system architecture
Computer system architecture
 
Embedded systems 101 final
Embedded systems 101 finalEmbedded systems 101 final
Embedded systems 101 final
 
Os introduction
Os introductionOs introduction
Os introduction
 
Os introduction
Os introductionOs introduction
Os introduction
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
 

Mais de Prabindh Sundareson

Synthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in RoboticsSynthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in RoboticsPrabindh Sundareson
 
Machine learning in the Indian Context - IEEE talk at SRM Institute
Machine learning in the Indian Context - IEEE talk at SRM InstituteMachine learning in the Indian Context - IEEE talk at SRM Institute
Machine learning in the Indian Context - IEEE talk at SRM InstitutePrabindh Sundareson
 
ICCE Asia 2017 - Program Outline
ICCE Asia 2017 - Program OutlineICCE Asia 2017 - Program Outline
ICCE Asia 2017 - Program OutlinePrabindh Sundareson
 
Call for Papers - ICCE Asia 2017
Call for Papers - ICCE Asia 2017Call for Papers - ICCE Asia 2017
Call for Papers - ICCE Asia 2017Prabindh Sundareson
 
Technology, Innovation - A Perspective
Technology, Innovation - A PerspectiveTechnology, Innovation - A Perspective
Technology, Innovation - A PerspectivePrabindh Sundareson
 
IEEE - Consumer Electronics Trends Opportunities (2015)
IEEE - Consumer Electronics Trends Opportunities (2015)IEEE - Consumer Electronics Trends Opportunities (2015)
IEEE - Consumer Electronics Trends Opportunities (2015)Prabindh Sundareson
 
GFX part 8 - Three.js introduction and usage
GFX part 8 - Three.js introduction and usageGFX part 8 - Three.js introduction and usage
GFX part 8 - Three.js introduction and usagePrabindh Sundareson
 
GFX Part 7 - Introduction to Rendering Targets in OpenGL ES
GFX Part 7 - Introduction to Rendering Targets in OpenGL ESGFX Part 7 - Introduction to Rendering Targets in OpenGL ES
GFX Part 7 - Introduction to Rendering Targets in OpenGL ESPrabindh Sundareson
 
GFX Part 6 - Introduction to Vertex and Fragment Shaders in OpenGL ES
GFX Part 6 - Introduction to Vertex and Fragment Shaders in OpenGL ESGFX Part 6 - Introduction to Vertex and Fragment Shaders in OpenGL ES
GFX Part 6 - Introduction to Vertex and Fragment Shaders in OpenGL ESPrabindh Sundareson
 
GFX Part 5 - Introduction to Object Transformations in OpenGL ES
GFX Part 5 - Introduction to Object Transformations in OpenGL ESGFX Part 5 - Introduction to Object Transformations in OpenGL ES
GFX Part 5 - Introduction to Object Transformations in OpenGL ESPrabindh Sundareson
 
GFX Part 4 - Introduction to Texturing in OpenGL ES
GFX Part 4 - Introduction to Texturing in OpenGL ESGFX Part 4 - Introduction to Texturing in OpenGL ES
GFX Part 4 - Introduction to Texturing in OpenGL ESPrabindh Sundareson
 
GFX Part 3 - Vertices and interactions in OpenGL
GFX Part 3 - Vertices and interactions in OpenGLGFX Part 3 - Vertices and interactions in OpenGL
GFX Part 3 - Vertices and interactions in OpenGLPrabindh Sundareson
 
GFX Part 2 - Introduction to GPU Programming
GFX Part 2 - Introduction to GPU ProgrammingGFX Part 2 - Introduction to GPU Programming
GFX Part 2 - Introduction to GPU ProgrammingPrabindh Sundareson
 
GFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
GFX Part 1 - Introduction to GPU HW and OpenGL ES specificationsGFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
GFX Part 1 - Introduction to GPU HW and OpenGL ES specificationsPrabindh Sundareson
 
John Carmack talk at SMU, April 2014 - Virtual Reality
John Carmack talk at SMU, April 2014 - Virtual RealityJohn Carmack talk at SMU, April 2014 - Virtual Reality
John Carmack talk at SMU, April 2014 - Virtual RealityPrabindh Sundareson
 

Mais de Prabindh Sundareson (20)

Synthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in RoboticsSynthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in Robotics
 
Work and Life
Work and Life Work and Life
Work and Life
 
GPU Algorithms and trends 2018
GPU Algorithms and trends 2018GPU Algorithms and trends 2018
GPU Algorithms and trends 2018
 
Machine learning in the Indian Context - IEEE talk at SRM Institute
Machine learning in the Indian Context - IEEE talk at SRM InstituteMachine learning in the Indian Context - IEEE talk at SRM Institute
Machine learning in the Indian Context - IEEE talk at SRM Institute
 
Students Hackathon - 2017
Students Hackathon - 2017Students Hackathon - 2017
Students Hackathon - 2017
 
ICCE Asia 2017 - Program Outline
ICCE Asia 2017 - Program OutlineICCE Asia 2017 - Program Outline
ICCE Asia 2017 - Program Outline
 
Call for Papers - ICCE Asia 2017
Call for Papers - ICCE Asia 2017Call for Papers - ICCE Asia 2017
Call for Papers - ICCE Asia 2017
 
Technology, Innovation - A Perspective
Technology, Innovation - A PerspectiveTechnology, Innovation - A Perspective
Technology, Innovation - A Perspective
 
Open Shading Language (OSL)
Open Shading Language (OSL)Open Shading Language (OSL)
Open Shading Language (OSL)
 
IEEE - Consumer Electronics Trends Opportunities (2015)
IEEE - Consumer Electronics Trends Opportunities (2015)IEEE - Consumer Electronics Trends Opportunities (2015)
IEEE - Consumer Electronics Trends Opportunities (2015)
 
GFX part 8 - Three.js introduction and usage
GFX part 8 - Three.js introduction and usageGFX part 8 - Three.js introduction and usage
GFX part 8 - Three.js introduction and usage
 
GFX Part 7 - Introduction to Rendering Targets in OpenGL ES
GFX Part 7 - Introduction to Rendering Targets in OpenGL ESGFX Part 7 - Introduction to Rendering Targets in OpenGL ES
GFX Part 7 - Introduction to Rendering Targets in OpenGL ES
 
GFX Part 6 - Introduction to Vertex and Fragment Shaders in OpenGL ES
GFX Part 6 - Introduction to Vertex and Fragment Shaders in OpenGL ESGFX Part 6 - Introduction to Vertex and Fragment Shaders in OpenGL ES
GFX Part 6 - Introduction to Vertex and Fragment Shaders in OpenGL ES
 
GFX Part 5 - Introduction to Object Transformations in OpenGL ES
GFX Part 5 - Introduction to Object Transformations in OpenGL ESGFX Part 5 - Introduction to Object Transformations in OpenGL ES
GFX Part 5 - Introduction to Object Transformations in OpenGL ES
 
GFX Part 4 - Introduction to Texturing in OpenGL ES
GFX Part 4 - Introduction to Texturing in OpenGL ESGFX Part 4 - Introduction to Texturing in OpenGL ES
GFX Part 4 - Introduction to Texturing in OpenGL ES
 
GFX Part 3 - Vertices and interactions in OpenGL
GFX Part 3 - Vertices and interactions in OpenGLGFX Part 3 - Vertices and interactions in OpenGL
GFX Part 3 - Vertices and interactions in OpenGL
 
GFX Part 2 - Introduction to GPU Programming
GFX Part 2 - Introduction to GPU ProgrammingGFX Part 2 - Introduction to GPU Programming
GFX Part 2 - Introduction to GPU Programming
 
GFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
GFX Part 1 - Introduction to GPU HW and OpenGL ES specificationsGFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
GFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
 
John Carmack talk at SMU, April 2014 - Virtual Reality
John Carmack talk at SMU, April 2014 - Virtual RealityJohn Carmack talk at SMU, April 2014 - Virtual Reality
John Carmack talk at SMU, April 2014 - Virtual Reality
 
GFX2014 OpenGL ES Quiz
GFX2014 OpenGL ES QuizGFX2014 OpenGL ES Quiz
GFX2014 OpenGL ES Quiz
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

ARM Linux Cache Monitoring Using PMU

  • 2. What is PMU ? • Cortex-A series processors contain event counting hardware which can be used to profile and benchmark code, including generation of cycle and instruction count figures and to derive figures for cache misses and so forth. The performance counter block contains a cycle counter which can count processor cycles, or be configured to count every 64 cycles. There are also a number of configurable 32-bit wide event counters which can be set to count instances of events from a wide-ranging list (for example, instructions executed, or MMU TLB misses). These counters can be accessed through debug tools, or by software running on the processor, through the CP15 Performance Monitoring Unit (PMU) registers. They provide a non-invasive debug feature and do not change the behavior of the processor. CP15 also provides a number of controls for enabling and resetting the counters and to indicate overflows (there is an option to generate an interrupt on a counter overflow). The cycle counter can be enabled independently of the event counters. • From ARM Architecture Reference Manual
  • 3. Profiling alternatives • Oprofile – Supported in mainline kernel (drivers/oprofile) – ARM support enabled – Relies on “Interrupts” from HW unit, when event counters overflow – Timer fallback when no HW event monitors are available • Unfortunately, different errata in current ARM A8/A9 devices, make interrupt based monitoring unreliable – To be fixed in later ARM cores • Due to above, oprofile only supports CPU cycle measurement using timers, on majority of ARM cores, atleast upto 3.2 kernel
  • 4. Latest status • http://lists.infradead.org/pipermail/linux-arm-kernel/2012-June/103189.html • Convert OMAP2/3 devices to use HWMOD for creating a PMU device. To support PMU • on OMAP2/3 devices we only need to use MPU sub-system and so we can simply use • the MPU HWMOD to create the PMU device. The MPU HWMOD for OMAP2/3 devices is • currently missing the PMU interrupt and so add the PMU interrupt to the MPU • HWMOD for these devices. • This change also moves the PMU code out of the mach-omap2/devices.c files into • its own pmu.c file as suggested by Kevin Hilman to de-clutter devices.c. • Cc: Ming Lei <ming.lei at canonical.com> • Cc: Will Deacon <will.deacon at arm.com> • Cc: Benoit Cousson <b-cousson at ti.com> • Cc: Paul Walmsley <paul at pwsan.com> • Cc: Kevin Hilman <khilman at ti.com> • Signed-off-by: Jon Hunter <jon-hunter at ti.com> • --- • arch/arm/mach-omap2/Makefile | 1+ • arch/arm/mach-omap2/devices.c | 33 ----------- • arch/arm/mach-omap2/omap_hwmod_2xxx_ipblock_data.c | 6 ++ • arch/arm/mach-omap2/omap_hwmod_3xxx_data.c | 6 ++ • arch/arm/mach-omap2/pmu.c | 59 ++++++++++++++++++++ • arch/arm/plat-omap/include/plat/irqs.h | 1+ • 6 files changed, 73 insertions(+), 33 deletions(-) • create mode 100644 arch/arm/mach-omap2/pmu.c
  • 5. Patch status • The patch set mentioned in earlier slide, is in various stages of integration into different SOC architectures • Beagle/ OMAP35x is supported • This is not supported in AM335x as of 2012, expect to be in mainline by 2013 • In the interim, what is the option ?
  • 6. What is the need ? • For measuring different aspects of performance related to external memory bandwidth, cache usage monitoring is very key • Current oprofile does not support this in different SOCs
  • 7. peemuperf • A tool to measure overall Linux Performance using PMU HW of ARM - ARM CPU Cycles, Cache misses at L1 and L2 level, stalls, NEON.. • Consists of a kernel module that does the heavy lifting, and exposes all profile information to userspace via proc entry
  • 8. Configurable parameters • evdelay=500 evlist=1,68,3,4 evdebug=1 • evdelay – Sampling interval (milliseconds) • evlist – Comma separated array of event IDs (refer 3.2.49 c9, Event Selection Register of Cortex A8 TRM) • evdebug – Controls debug output messages
  • 9. Userspace access • Proc entry is – /proc/peemuperf • Displays in below format – <COUNTER #> : <COUNTER VALUE> – Counter[0] : 48, – Counter[1] :77448, – Counter[2]: 13, – Counter[3]: 115058 – Overflow flag: = 0, Cycle Count: = 5739253
  • 10. A8 vs A9 • A8 has 4 performance counters • A9 has 6 • peemuperf dynamically configures based on run-time query
  • 11. Default Events monitored • 1 ==> Instruction fetch that causes a refill at the lowest level of instruction or unified cache • 68 ==> Any cacheable miss in the L2 cache • 3 ==> Data read or write operation that causes a refill at the lowest level of data or unified cache • 4 ==> Data read or write operation that causes a cache access at the lowest level of data or unified cache