ARM AAE - Intrustion Sets

SOFTWARE & SYSTEMS
DESIGN
3 – Instruction Sets

AGENDA
• Instruction Sets
VFP and NEON
Pipelines
AAETC3v00
Instruction Sets 2
Pipelines
Cycle Counting

INSTRUCTION SET
• ARM instruction set
– All instructions are 32-bit
– Most instructions can be executed conditionally
• Thumb instruction set
– 16-bit instruction set
No condition execution (except for branches)
AAETC3v00
Instruction Sets 3
– 16-bit instruction set
– No condition execution (except for branches)
– Optimized for code density from C code (~65% of ARM code size)
• Thumb-2 technology
– Extension to Thumb instruction set
– Mix of 16-bit and 32-bit instructions
– Condition execution via IT instruction
– Higher performance than Thumb and smaller than ARM

ASSEMBLER SYNTAX
• Data processing instructions
<operation><condition> Rd, Rm, <op2>
ADDEQ r4, r5, r6 // if (EQ) r4 = r5 + r6
ORR r2, r3, r6, LSL #4 // if (EQ) r4 = r5 + r6
SUBS r5, r7, #4 // r5 = r7 – 4; set flags
MOV r4, #7 // r4 = 7
• Memory access instructions
AAETC3v00
Instruction Sets 4
• Memory access instructions
<operation><size> Rd, [<address>]
LDR r0, [r6, #4] // r0 = *(r6 + 4)
STRB r4, [r7], #8 // *(byte *) r7 = r4; r7 += 8
<operation><addressing mode> <Rn>!, <registers list>
LDMIA r0, {r1, r2, r7}
STMFD sp!, {r4-r11, lr}
• Program flow instructions
<branch> <label>
BL foo
B baR

DATA PROCESSING INSTRUCTIONS
• These instructions operate on the contents of registers
– They DO NOT affect memory
arithmetic logical move
manipulation
(has destination
register)
ADD
ADC
SUB
SBC
RSB
RSC
AND EOR MOV
ORR
ORN
BIC
T2T2
MVN
AAETC3v00
Instruction Sets 5
• Syntax:
<Operation>{S}{<cond>} {Rd,} Rn, Operand2
• Examples:
ADD r0, r1, r2 ; r0 = r1 + r2
TEQ r0, r1 ; if r0 = r1, Z flag will be set
MOV r0, r1 ; copy r1 to r0
comparison
(set flags only)
CMN
(ADDS)
CMP
(SUBS)
TST
(ANDS)
TEQ
(EORS)

MULTIPLY / DIVIDE
• 32-bit multiplication 64-bit multiplication
××××
Rn Rm
+
××××
Rn Rm
Ra
+/-
optional
accumulation
optional
accumulation
MUL
MLA
MLS
UMULL
SMULL
UMLAL
SMLAL
AAETC3v00
Instruction Sets 6
Examples:
MLA r0, r1, r2, r3 ; r0 = r3 + (r1 * r2)
[U|S]MULL r4, r5, r2, r3 ; r5:r4 = r2 * r3
Division:
SDIV r0, r1, r2 ; signed: r0 = r1 / r2
UDIV r0, r1, r2 ; unsigned: r0 = r1 / r2
RdHi RdLoRdMLS
SMLAL
Optional in 7-A

BIT MANIPULATION INSTRUCTIONS
031
0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0
031
0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 1 0 0
031
BFI r0, r0, #9, #6 ; Bit Field Insert
UBFX r1, r0, #18, #7 ; Bit Field Extract
1 1 0 1 0 0
1 0 1 0 011 1 0 1 0 0
r0
r0
AAETC3v00
Instruction Sets 7
031
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1
031
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
BFC r1, #3, #4 ; Bit Field Clear
0
RBIT r2, r1 ; Reverse Bit Order
0
Zero extend
r1
r2
031
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 10 0 0r1

BYTE REVERSAL
• Byte Reversal Instructions
REV{cond} Rd, Rm Reverses the bytes in a word
REV16{cond} Rd, Rm Reverses the bytes in each halfword
3 2 01 0 1 32
REV r0, r0
AAETC3v00
Instruction Sets 8
REV16{cond} Rd, Rm Reverses the bytes in each halfword
REVSH{cond} Rd, Rm Reverses the bottom two bytes,
and sign extends to 32 bits
V6 and later
REV r0, r0
Pre-V6
EOR r1, r0, r0, ROR #16
BIC r1, r1, #0xFF0000
MOV r0, r0, ROR #8
EOR r0, r0, r1, LSR #8

SIMD
• ARMv6 added a number of instructions which perform SIMD (Single Instruction
Multiple Data) operations using ARM registers
– Includes instructions for addition, subtraction, multiplication and sum of absolute
differences
– Instructions can work on four 8-bit quantities, or two 16-bit quantities
– Signed/unsigned and saturating versions available of many instructions
– CPSR GE bits used instead of normal ALU flags
UADD16 Rd, Rm, Rs
AAETC3v00
Instruction Sets 9
• There are instructions for packing (PKHBT/PKHTB) and unpacking
(UXTH/UXTB) registers
+
Rs
+
Rm
UADD16 Rd, Rm, Rs
Rd
GE[3:2] GE[1:0]

SATURATED MATH AND CLZ
• Support for Saturated Arithmetic
– Targeted at DSP & control applications
– Overflow sets Q flag (sticky) not V, and sets result to +/- max value
QSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm - Rn)
QADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm + Rn)
0x0
0x7FFFFFFF
0x80000000
-ve
+ve
AAETC3v00
Instruction Sets 10
QDSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm
- saturate(Rn * 2))
QDADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm
+ saturate(Rn * 2))
• Count Leading Zeros
CLZ{cond} Rd, Rm
– Returns number of unset bits before the most significant set bit
031
0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0
CLZ returns 10 in this case

SATURATION
• Saturate a value to a specified bit position (effectively saturating to any
power of 2)
– USAT - Unsigned saturate 32-bit
• Syntax: USAT Rd, #sat, Rm {shift}
• Operation: Rd = Saturate(Shift(Rm), #sat)
0 0 1 1 1
saturation position
max
(unsigned saturation)
max min
AAETC3v00
Instruction Sets 11
– Variants
SSAT - signed saturation
USAT16 - saturates two 16-bit unsigned halfwords (no rotation allowed)
SSAT16 - signed saturation of two 16-bit halfwords (no rotation
allowed)
– #sat is specified as an immediate value in the range 0 to 31
– {shift} is optional and is limited to LSL or ASR
– Q flag is set if saturation occurs
0 0 0 1 1
max
1 1 1 0 0
min
(signed saturation)

SINGLE / DOUBLE REGISTER DATA
TRANSFER
• Use to move data between one or two registers and memory
LDRD STRD Doubleword
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load
Memory
31 0
AAETC3v00
Instruction Sets 12
• Syntax:
– LDR{<size>}{<cond>} Rd, <address>
– STR{<size>}{<cond>} Rd, <address>
• Example:
– LDRB r0, [r1] ; load bottom byte of r0 from the
; byte of memory at address in r1
Any remaining space
zero filled or sign extended
Rd

ADDRESSING MEMORY
• The address accessed by LDR/STR is specified by a base register with
an optional offset
– Base register only (no offset)
LDR r0, [r1]
– Base register plus constant
LDR r0, [r1, #8] r2, LSL #2
AAETC3v00
Instruction Sets 13
LDR r0, [r1, #8]
– Base register, plus register (optionally shifted by an immediate value)
LDR r0, [r1, r2]
LDR r0, [r1, r2, LSL #2]
– The offset can be either added or
subtracted from the base register
LDR r0, [r1, #-8]
LDR r0, [r1, -r2]
LDR r0, [r1, -r2, LSL #2]
+/-
r1 #8
r0
memory
address
r2, LSL #2
or

PRE- AND POST-INDEXED ADDRESSING
• Post-indexed (add offset after
memory access)
LDR r0, [r1], #12
• Pre-indexed (add offset before
memory access)
LDR r0, [r1, #12]{!}
+
r1 #12
address
r1
address
AAETC3v00
Instruction Sets 14
r0
memory
r0
memory
+
r1
#12
r1
• If ‘!’ present, update base register (r1) • Always update base register (r1)
+
r1
#12
r1

• These instructions move data between multiple registers and memory
• Syntax
<LDM|STM>{<addressing_mode>}{<cond>} Rb{!}, <register list>
• 4 addressing modes
• Increment after/before
• Decrement after/before
MULTIPLE REGISTER DATA TRANSFER
(IA)
r1 Increasing
r4 r1
r4
r0
IB DA DB
AAETC3v00
Instruction Sets 15
• Also
PUSH/POP, equivalent to STMDB/LDMIA with SP! as base register
• Example
LDM r10, {r0,r1,r4} ; load registers, using r10 base
PUSH {r4-r6,pc} ; store registers, using SP base
Increasing
Addressr0
r1
r4
r0 r1
r4
r0
r10Base Register (Rb)

INSTRUCTIONS FOR LOADING
CONSTANTS
• The assembler provides some instructions for loading
values into registers
– These are the recommended mechanisms for loading
constants into registers
• PC- or register-relative constants
ADR Rn, label
• Add or subtract an immediate value
to or from the PC to generate the
• Absolute constants
LDR Rn, =<constant>
LDR Rn, =label
AAETC3v00
Instruction Sets 16
to or from the PC to generate the
address of the label into the
specified register, using one
instruction
• ADRL pseudo instruction uses two
instructions, giving a better range
• Can be used to generate addresses
for position independent code (but
only if in same code section)
• Constant determined at run time
• Pseudo instruction
• Assembler will use optimal sequence to
generate constant into specified register
(one of MOV, MVN or an LDR from a
literal pool)
• Can load to the PC, causing a branch
• Use for absolute addressing and
references outside the current section
(resulting in position dependent code)
• Constant determined at assembly or
link time

LDR= EXAMPLES
• The following examples show how the LDR= pseudo instruction
makes code more readable, portable and flexible
LDR r0, =0x2543 MOV r0, #0x2543
DisassemblyCode
AAETC3v00
Instruction Sets 17
LDR r0, =0xFFFF43FF
LDR r0, =0xFFFFF5
MVN r0, #0xBC00
LDR r0, [pc, #xx]
...
DCD 0xFFFFF5

BRANCH INSTRUCTIONS
• Branch instructions have the following format
B{<cond>} label
– Might not cause a pipeline flush (branch prediction)
– Branch range depends on instruction set and width
• A BL instruction additionally generates a return address in r14 (lr)
– Returning is performed by restoring the program counter (pc) from lr
AAETC3v00
Instruction Sets 18
– Returning is performed by restoring the program counter (pc) from lr
:
BL func2
:
:
BX lr
func1 func2
void func1 (void)
{
:
func2();
:
}

BRANCH RANGES
• The range of a branch instruction depends on which instruction set
is being used
• It also varies between different types of branch
ARM Thumb
B ±32MB ±16MB
CBZ/CBNZ 126 bytes
AAETC3v00
Instruction Sets 19
CBZ/CBNZ 126 bytes
BL/BLX (imm) ±32MB ±16MB
BLX (reg) Any Any
BX Any Any
TBB 510 bytes
TBH 131070 bytes
“Any” indicates an instruction which can branch to any address in the 4GB address space

READING AND WRITING PC
• In general, writing PC causes a branch to the value written
– Bit zero controls the execution state (ARM or Thumb) at the destination
– The bottom bit of the destination address is always forced to zero
– Writing a value with ‘10’ in the bottom two bits results in unpredictable behavior
– Note that architectures prior to ARMv7 do not change state when the PC is written
directly
AAETC3v00
Instruction Sets 20
• Loading PC from memory behaves similarly
– Architectures prior to ARMv5T do not change state when the PC is loaded from memory
• The PC reads as the address of the current instruction plus an offset
– In ARM state, the offset is 8
– In Thumb state, the offset is 4
– This reflects the 3-stage structure of the ARM7TDMI pipeline
– In Thumb state, the bottom bit always reads as zero
– In ARM state, the bottom two bits will always read as zero

CHANGING STATE
• Changing between ARM and Thumb states (or “interworking”) can be carried out
using the Branch Exchange instruction
BX Rn
BLX RN
– Bit 0 of Rn determines the exchange behavior
• Unset (0) - change to (or remain in) ARM state
• Set (1) - change to (or remain in) Thumb state
AAETC3v00
Instruction Sets 21
• Branch and Link with Exchange
– Used to branch to a subroutine which is known to be in the opposite instruction set
– When branching to imported labels use BL, the linker will substitute BLX if necessary
BLX offset ; ARM/Thumb instruction which always
; changes state (and sets LR)
• All instructions which modify the PC can cause a state change
– Depending on bit 0 of the result
– For data processing instructions, state changes only if S variant not used

IF-THEN
• Thumb only, makes the next 1-4 instructions
conditional
• Syntax
IT{T|E}{T|E}{T|E} <cond>
– Any condition code may be used
– Doesn’t affect condition flags
– 16-bit instructions in the IT block do not affect condition
; if (r0 == 0)
; r0 = *r1 + 2;
; else
; r0 = *r2 + 4;
; if
CMP r0, #0
ITTEE EQ
AAETC3v00
Instruction Sets 22
– 16-bit instructions in the IT block do not affect condition
flags (except CMP, CMN & TST)
– 32-bit instructions do affect condition flags (normal rules
apply)
– No need to write this instruction: the assembler will insert
it for you where necessary
• Current “if-then status” stored in CPSR
– Conditional block may be safely interrupted and returned
to
– Not recommended to branch into or out of
‘if-then’ block
ITTEE EQ
; then
LDREQ r0, [r1]
ADDEQ r0, #2
; else
LDRNE r0, [r2]
ADDNE r0, #4

STATUS REGISTER ACCESS
• MRS and MSR allow contents of CPSR/SPSR to be transferred
to/from a general purpose register or be set to an immediate value
– MSR allows the whole status register, or just parts of it, to be updated
MRS r0,CPSR ; read CPSR into r0
BIC r0,r0,#0x80 ; clear bit 7 to enable IRQ
MSR CPSR_c,r0 ; write modified value to ‘c’ byte only
AAETC3v00
Instruction Sets 23
• CPS can be used to directly modify some bits in the CPSR
– These are related to interrupt enable/disable and operating mode
• SETEND instruction selects the endianness of data accesses
– For use in systems with mixed endian data (e.g. peripherals)
SETEND BE
LDR r0, [r7], #4 ; big-endian
SETEND LE
LDR r1, [r7], #4 ; little-endian
User mode programs may
read all bits of CPSR but
may only change the flag
bits

SYSTEM CONTROL INSTRUCTIONS
• ARM uses coprocessors for “internal functions” so as not to enforce
a particular memory map
– System Control Coprocessor: cp15
• Used for processor configuration: System ID, caches, MMU, TCMs, etc.
– Debug Coprocessor: cp14
• Can be used to access debug control registers
AAETC3v00
Instruction Sets 24
• Can be used to access debug control registers
– VFP and NEON: cp10 and cp11
• In earlier versions of the architecture, designers were permitted to
add external coprocessors
– This is not permitted in ARMv7 architecture profiles

AGENDA
Instruction Sets
• VFP and NEON
Pipelines
AAETC3v00
Instruction Sets 25
Pipelines
Cycle Counting

VFP ARCHITECTURE
• VFP (Vector Floating Point) is ARM’s floating point architecture
– There have been 4 versions of the architecture to date (VFPv1 is no longer
AAETC3v00
Instruction Sets 26
– There have been 4 versions of the architecture to date (VFPv1 is no longer
supported)
– VFPv2 is supported by ARM9 and ARM11 processor families
– VFPv3 and VFPv4 are optional extensions to the ARMv7-AR architecture profiles
• VFPv3 (Cortex-A8, Cortex-A9, Cortex-R4, Cortex-R5)
– Can be implemented with either 16 (VFPv3-D16) or 32 (VFPv3-D32) registers
– Can be extended with half-precision conversion functions
• VFPv4 (Cortex-A5, Cortex-A7 and Cortex-A15)
– Includes half-precision conversion functions
– Supports fused multiply-add operations

THE NEON ARCHITECTURE EXTENSION
• NEON refers to the Advanced SIMD instruction set extension
– Optional extension to ARMv7-AR architecture profiles
– The NEON register set is separate from the core register bank
– NEON instruction support parallel operations on vectors of elements held in registers
– Advanced SIMDv1 is the base NEON architecture
• Can be extended with half-precision conversion functions
– Advanced SIMDv2 adds fused multiply-add operations
AAETC3v00
Instruction Sets 27

AGENDA
Instruction Sets
VFP and NEON
• Pipelines
AAETC3v00
Instruction Sets 28
• Pipelines
Cycle Counting

Fetch Decode Execute ARM7
Fetch Decode Execute Memory Writeback ARM9
Fetch 1 Fetch 2 Decode Issue MAC 1 MAC 2 MAC 3
Shift ALU Saturate
Address Data 1 Data 2 Writeback
Writeback
ARM1136
HISTORIC PIPELINES
AAETC3v00
Instruction Sets 29
Address Data 1 Data 2 Writeback
Fetch 1 Fetch 2 Fetch 3 Queue Decode Rename Issue Execute 1 Execute 2
MAC 1 MAC 2
Address Load/Store
Writeback
Writeback
Execute 1 Execute 2
Data Engine
Writeback
Writeback
Cortex-A9

Operation
Cycle 1 2 3 4 5 6
Execute
Fetch Decode Execute
Fetch Decode
ADD
SUB
MOV
AND
ORR
ARM7TDMI PIPELINE (DATA PROC)
AAETC3v00
Instruction Sets 30
Fetch Decode
Fetch
ORR
EOR
CMP
RSB
• In this example it takes 6 clock cycles to execute 6 instructions
• All operations here are on registers ( single cycle execution )
• Clock cycles per Instruction (CPI) = 1

ARM7TDMI PIPELINE (LDR)
Cycle
Operation
1 2 3 4 5 6
ADD
SUB
LDR
FetchFetch Decode Execute
Fetch Decode Execute Data Writeback
AAETC3v00
Instruction Sets 31
• In this example it takes 6 clock cycles to execute 4
instructions
• Clock cycles per Instruction (CPI) = 1.5
LDR
MOV
AND
ORR
Fetch Decode Execute Data Writeback
Fetch Decode
Fetch

ARM7TDMI PIPELINE (BRANCH)
Fetch Decode
Cycle 1 2 3 4 5
0x8000 BL
0x8004 X
0x8008 XX
0x8FEC ADD
Address Operation
Linkret AdjustFetch Decode Execute
Fetch Decode
Fetch
Execute
AAETC3v00
Instruction Sets 32
• Refilling the pipeline
• Note that the core is executing in ARM state
Fetch Decode
Fetch
0x8FEC ADD
0x8FF0 SUB
0x8FF4 MOV
Execute
Decode Execute
Fetch Decode
Fetch

Cycle 1 2 3 4 5 6 7 8
IRQ
Address Operation
Fetch DecodeExecute
Linkret Adjust
Fetch
Decode
IRQ Linkret
Execute
IRQ Adjust
0x8000 ADD
0x8008 MOV
0x8004 SUB
0x800C X
Fetch
Fetch
ARM7TDMI PIPELINE (INTERRUPT)
AAETC3v00
Instruction Sets 33
0x0018 B (to 0xAF00)
0x001C XX
0x0020 XXX
0xAF00 STMFD
0xAF04 MOV
0xAF08 LDR
Fetch
Fetch
Fetch
Fetch
Fetch
Fetch
Decode
Decode
Decode
Decode
Execute
Execute
IRQ interrupt minimum latency (service routine entry) = 7 cycles

ARM9TDMI PIPELINE (LDR INTERLOCK)
Cycle
Operation
ADD R1, R1, R2
SUB R3, R4, R1
ORR R8, R3, R4
AND R6, R3, R1
1 2 3 4 5 6 7 8
LDR R4, [R7]
9
F D E
F D E W
F D E W
F D E W
F D WE
W
I
M
S
AAETC3v00
Instruction Sets 34
• In this example it takes 7 clock cycles to execute 6 instructions, CPI of 1.2
• The LDR instruction immediately followed by a data operation using the same
register causes an interlock
EOR R3, R1, R2 F D E W
F - Fetch D - Decode E - Execute I - Interlock M - Memory
W - Writeback

ARM9TDMI PIPELINE (LDR)
Cycle
Operation
ADD R1, R1, R2
SUB R3, R4, R1
ORR R8, R3, R4
AND R6, R3, R1
LDR R4, [R7]
1 2 3 4 5 6 7 8 9
F D E W
F D E W
F E W
F D E W
F D WE M
D
AAETC3v00
Instruction Sets 35
• In this example it takes 6 cycles to execute 6 instructions, CPI of 1
• Cycle 4 has simultaneous I & D memory accesses
• Cycle 5 R4 data available to ORR before written to register
– Internal forwarding paths are used
EOR R3, R1, R2 F D E W
F - Fetch D - Decode E - Execute I - Interlock M - Memory
W - Writeback

CORTEX-R4 PIPELINE
Decode Issue
Pre-
Decode
Fetch2
Shift ALU Sat
MAC
1
MAC
2
Data
Cache
Data
Cache Format
Fetch1
A
G
Common decode pipeline
4 parallel back end pipelines
MAC
3
Wr
Instruction
AAETC3v00
Instruction Sets 36
FPU2
Cache
1
Cache
2
Format
FPU0 FPU1
Branch3
Wr
G
UPrefetch Unit
• Dual issue can occur for certain instruction
sequences
• Enabled at reset, can be disabled in CP15
• AGU = Address Generation Unit
• Separate divide pipeline for hardware DIV
instruction
Branch1Branch2
FPU (Optional)
Instruction
queue

CORTEX-A9 PIPELINE
Prefetch
Unit
ISS
Ex1
Ex1
WB
WB
De Re
BM
Main
(P0)
Dual
(P1)
M1
Mac
(M)
Ex2
Ex2
M2
IQ
Instruction
Address
Instruction
fetching
64
AAETC3v00
Instruction Sets 37
• IQ: Instruction Queue
• Re: Register renaming
• BM:Branch Monitor
• P0: Main execution pipeline
• M: MAC pipeline
• P1: Secondary (“dual”) execution pipeline
• AGU: Address Generation Unit
• LSU: Load/Store Unit
• DE: Data Engine - (NEON and/or FPU) pipeline
AGU WB
Data Engine
LSU
Load/store
(LS)
WB
Data Engine
(DE)

CORTEX-A15 AND CORTEX-A7
Fetch
Decode, Rename &
Dispatch
Loop Cache
Queue Issue
Integer
Integer
Multiply
Floating-Point / NEON
Branch
Load
Store
Writeback
AAETC3v00
Instruction Sets 38
Fetch Decode
Queue
Issue
Integer
Multiply
Floating-Point / NEON
Dual Issue
Load/Store
Writeback
Cortex-A15 and Cortex-A7 form an
architecturally-identical pair
Cortex-A15 is optimized for performance
Cortex-A7 is optimized for power
consumption
Together they can be built into a big.LITTLE
configuration

AGENDA
Instruction Sets
VFP and NEON
Pipelines
AAETC3v00
Instruction Sets 39
Pipelines
• Cycle Counting

CYCLE COUNTING
• Early pipelines (e.g. ARM7TDMI) were entirely deterministic and
predictable
• Later pipelines introduce interlocks and inter-instruction
dependencies
– Address, resource and data dependencies are all possible
AAETC3v00
Instruction Sets 40
– Address, resource and data dependencies are all possible
– Interactions between instructions become very complicated
• On ARMv7 cores, manual cycle counting is not really possible, so
need to use…
– Cycle-accurate trace
– Simulation models
– Performance Monitoring Unit (see later)

PERFORMANCE MONITORING
HARDWARE
• ARMv7-A cores include a performance monitoring unit (PMU)
• A PMU provides a non-intrusive method of collecting execution information
from the core
– Enabling the PMU does not change the timing of the core
• The PMU provides:
– Cycle counter – counts execution cycles (optional 1/64 divider)
AAETC3v00
Instruction Sets 41
– Cycle counter – counts execution cycles (optional 1/64 divider)
– Programmable event counters
• The number of counters and available events vary between cores
– The PMU can be configured to generate interrupts if a counter overflows
• Some examples common to most cores:
– Cache Hits or Misses, TLB Misses (on MMU cores), Branch prediction,
correct/incorrect predictions, Number of instructions executed, etc…
• Some events are architecturally defined while others are core-dependent
– Check the ARM ARM and your core’s TRM for a full list

ARM AAE - Intrustion Sets

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a ARM AAE - Intrustion Sets

Semelhante a ARM AAE - Intrustion Sets (20)

Mais de Anh Dung NGUYEN

Mais de Anh Dung NGUYEN (11)

Último

Último (20)

ARM AAE - Intrustion Sets