SlideShare uma empresa Scribd logo
1 de 42
SOFTWARE & SYSTEMS
DESIGN
3 – Instruction Sets
AGENDA
• Instruction Sets
VFP and NEON
Pipelines
AAETC3v00
Instruction Sets 2
Pipelines
Cycle Counting
INSTRUCTION SET
• ARM instruction set
– All instructions are 32-bit
– Most instructions can be executed conditionally
• Thumb instruction set
– 16-bit instruction set
No condition execution (except for branches)
AAETC3v00
Instruction Sets 3
– 16-bit instruction set
– No condition execution (except for branches)
– Optimized for code density from C code (~65% of ARM code size)
• Thumb-2 technology
– Extension to Thumb instruction set
– Mix of 16-bit and 32-bit instructions
– Condition execution via IT instruction
– Higher performance than Thumb and smaller than ARM
ASSEMBLER SYNTAX
• Data processing instructions
<operation><condition> Rd, Rm, <op2>
ADDEQ r4, r5, r6 // if (EQ) r4 = r5 + r6
ORR r2, r3, r6, LSL #4 // if (EQ) r4 = r5 + r6
SUBS r5, r7, #4 // r5 = r7 – 4; set flags
MOV r4, #7 // r4 = 7
• Memory access instructions
AAETC3v00
Instruction Sets 4
• Memory access instructions
<operation><size> Rd, [<address>]
LDR r0, [r6, #4] // r0 = *(r6 + 4)
STRB r4, [r7], #8 // *(byte *) r7 = r4; r7 += 8
<operation><addressing mode> <Rn>!, <registers list>
LDMIA r0, {r1, r2, r7}
STMFD sp!, {r4-r11, lr}
• Program flow instructions
<branch> <label>
BL foo
B baR
DATA PROCESSING INSTRUCTIONS
• These instructions operate on the contents of registers
– They DO NOT affect memory
arithmetic logical move
manipulation
(has destination
register)
ADD
ADC
SUB
SBC
RSB
RSC
AND EOR MOV
ORR
ORN
BIC
T2T2
MVN
AAETC3v00
Instruction Sets 5
• Syntax:
<Operation>{S}{<cond>} {Rd,} Rn, Operand2
• Examples:
ADD r0, r1, r2 ; r0 = r1 + r2
TEQ r0, r1 ; if r0 = r1, Z flag will be set
MOV r0, r1 ; copy r1 to r0
comparison
(set flags only)
CMN
(ADDS)
CMP
(SUBS)
TST
(ANDS)
TEQ
(EORS)
MULTIPLY / DIVIDE
• 32-bit multiplication 64-bit multiplication
××××
Rn Rm
+
××××
Rn Rm
Ra
+/-
optional
accumulation
optional
accumulation
MUL
MLA
MLS
UMULL
SMULL
UMLAL
SMLAL
AAETC3v00
Instruction Sets 6
Examples:
MLA r0, r1, r2, r3 ; r0 = r3 + (r1 * r2)
[U|S]MULL r4, r5, r2, r3 ; r5:r4 = r2 * r3
Division:
SDIV r0, r1, r2 ; signed: r0 = r1 / r2
UDIV r0, r1, r2 ; unsigned: r0 = r1 / r2
RdHi RdLoRdMLS
SMLAL
Optional in 7-A
BIT MANIPULATION INSTRUCTIONS
031
0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0
031
0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 1 0 0
031
BFI r0, r0, #9, #6 ; Bit Field Insert
UBFX r1, r0, #18, #7 ; Bit Field Extract
1 1 0 1 0 0
1 0 1 0 011 1 0 1 0 0
r0
r0
AAETC3v00
Instruction Sets 7
031
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1
031
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
BFC r1, #3, #4 ; Bit Field Clear
0
RBIT r2, r1 ; Reverse Bit Order
0
Zero extend
r1
r2
031
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 10 0 0r1
BYTE REVERSAL
• Byte Reversal Instructions
REV{cond} Rd, Rm Reverses the bytes in a word
REV16{cond} Rd, Rm Reverses the bytes in each halfword
3 2 01 0 1 32
REV r0, r0
AAETC3v00
Instruction Sets 8
REV16{cond} Rd, Rm Reverses the bytes in each halfword
REVSH{cond} Rd, Rm Reverses the bottom two bytes,
and sign extends to 32 bits
V6 and later
REV r0, r0
Pre-V6
EOR r1, r0, r0, ROR #16
BIC r1, r1, #0xFF0000
MOV r0, r0, ROR #8
EOR r0, r0, r1, LSR #8
SIMD
• ARMv6 added a number of instructions which perform SIMD (Single Instruction
Multiple Data) operations using ARM registers
– Includes instructions for addition, subtraction, multiplication and sum of absolute
differences
– Instructions can work on four 8-bit quantities, or two 16-bit quantities
– Signed/unsigned and saturating versions available of many instructions
– CPSR GE bits used instead of normal ALU flags
UADD16 Rd, Rm, Rs
AAETC3v00
Instruction Sets 9
• There are instructions for packing (PKHBT/PKHTB) and unpacking
(UXTH/UXTB) registers
+
Rs
+
Rm
UADD16 Rd, Rm, Rs
Rd
GE[3:2] GE[1:0]
SATURATED MATH AND CLZ
• Support for Saturated Arithmetic
– Targeted at DSP & control applications
– Overflow sets Q flag (sticky) not V, and sets result to +/- max value
QSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm - Rn)
QADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm + Rn)
0x0
0x7FFFFFFF
0x80000000
-ve
+ve
AAETC3v00
Instruction Sets 10
QDSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm
- saturate(Rn * 2))
QDADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm
+ saturate(Rn * 2))
• Count Leading Zeros
CLZ{cond} Rd, Rm
– Returns number of unset bits before the most significant set bit
031
0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0
CLZ returns 10 in this case
SATURATION
• Saturate a value to a specified bit position (effectively saturating to any
power of 2)
– USAT - Unsigned saturate 32-bit
• Syntax: USAT Rd, #sat, Rm {shift}
• Operation: Rd = Saturate(Shift(Rm), #sat)
0 0 1 1 1
saturation position
max
(unsigned saturation)
max min
AAETC3v00
Instruction Sets 11
– Variants
SSAT - signed saturation
USAT16 - saturates two 16-bit unsigned halfwords (no rotation allowed)
SSAT16 - signed saturation of two 16-bit halfwords (no rotation
allowed)
– #sat is specified as an immediate value in the range 0 to 31
– {shift} is optional and is limited to LSL or ASR
– Q flag is set if saturation occurs
0 0 0 1 1
max
1 1 1 0 0
min
(signed saturation)
SINGLE / DOUBLE REGISTER DATA
TRANSFER
• Use to move data between one or two registers and memory
LDRD STRD Doubleword
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load
Memory
31 0
AAETC3v00
Instruction Sets 12
• Syntax:
– LDR{<size>}{<cond>} Rd, <address>
– STR{<size>}{<cond>} Rd, <address>
• Example:
– LDRB r0, [r1] ; load bottom byte of r0 from the
; byte of memory at address in r1
Any remaining space
zero filled or sign extended
Rd
ADDRESSING MEMORY
• The address accessed by LDR/STR is specified by a base register with
an optional offset
– Base register only (no offset)
LDR r0, [r1]
– Base register plus constant
LDR r0, [r1, #8] r2, LSL #2
AAETC3v00
Instruction Sets 13
LDR r0, [r1, #8]
– Base register, plus register (optionally shifted by an immediate value)
LDR r0, [r1, r2]
LDR r0, [r1, r2, LSL #2]
– The offset can be either added or
subtracted from the base register
LDR r0, [r1, #-8]
LDR r0, [r1, -r2]
LDR r0, [r1, -r2, LSL #2]
+/-
r1 #8
r0
memory
address
r2, LSL #2
or
PRE- AND POST-INDEXED ADDRESSING
• Post-indexed (add offset after
memory access)
LDR r0, [r1], #12
• Pre-indexed (add offset before
memory access)
LDR r0, [r1, #12]{!}
+
r1 #12
address
r1
address
AAETC3v00
Instruction Sets 14
r0
memory
r0
memory
+
r1
#12
r1
• If ‘!’ present, update base register (r1) • Always update base register (r1)
+
r1
#12
r1
• These instructions move data between multiple registers and memory
• Syntax
<LDM|STM>{<addressing_mode>}{<cond>} Rb{!}, <register list>
• 4 addressing modes
• Increment after/before
• Decrement after/before
MULTIPLE REGISTER DATA TRANSFER
(IA)
r1 Increasing
r4 r1
r4
r0
IB DA DB
AAETC3v00
Instruction Sets 15
• Also
PUSH/POP, equivalent to STMDB/LDMIA with SP! as base register
• Example
LDM r10, {r0,r1,r4} ; load registers, using r10 base
PUSH {r4-r6,pc} ; store registers, using SP base
Increasing
Addressr0
r1
r4
r0 r1
r4
r0
r10Base Register (Rb)
INSTRUCTIONS FOR LOADING
CONSTANTS
• The assembler provides some instructions for loading
values into registers
– These are the recommended mechanisms for loading
constants into registers
• PC- or register-relative constants
ADR Rn, label
• Add or subtract an immediate value
to or from the PC to generate the
• Absolute constants
LDR Rn, =<constant>
LDR Rn, =label
AAETC3v00
Instruction Sets 16
to or from the PC to generate the
address of the label into the
specified register, using one
instruction
• ADRL pseudo instruction uses two
instructions, giving a better range
• Can be used to generate addresses
for position independent code (but
only if in same code section)
• Constant determined at run time
• Pseudo instruction
• Assembler will use optimal sequence to
generate constant into specified register
(one of MOV, MVN or an LDR from a
literal pool)
• Can load to the PC, causing a branch
• Use for absolute addressing and
references outside the current section
(resulting in position dependent code)
• Constant determined at assembly or
link time
LDR= EXAMPLES
• The following examples show how the LDR= pseudo instruction
makes code more readable, portable and flexible
LDR r0, =0x2543 MOV r0, #0x2543
DisassemblyCode
AAETC3v00
Instruction Sets 17
LDR r0, =0xFFFF43FF
LDR r0, =0xFFFFF5
MVN r0, #0xBC00
LDR r0, [pc, #xx]
...
DCD 0xFFFFF5
BRANCH INSTRUCTIONS
• Branch instructions have the following format
B{<cond>} label
– Might not cause a pipeline flush (branch prediction)
– Branch range depends on instruction set and width
• A BL instruction additionally generates a return address in r14 (lr)
– Returning is performed by restoring the program counter (pc) from lr
AAETC3v00
Instruction Sets 18
– Returning is performed by restoring the program counter (pc) from lr
:
BL func2
:
:
BX lr
func1 func2
void func1 (void)
{
:
func2();
:
}
BRANCH RANGES
• The range of a branch instruction depends on which instruction set
is being used
• It also varies between different types of branch
ARM Thumb
B ±32MB ±16MB
CBZ/CBNZ 126 bytes
AAETC3v00
Instruction Sets 19
CBZ/CBNZ 126 bytes
BL/BLX (imm) ±32MB ±16MB
BLX (reg) Any Any
BX Any Any
TBB 510 bytes
TBH 131070 bytes
“Any” indicates an instruction which can branch to any address in the 4GB address space
READING AND WRITING PC
• In general, writing PC causes a branch to the value written
– Bit zero controls the execution state (ARM or Thumb) at the destination
– The bottom bit of the destination address is always forced to zero
– Writing a value with ‘10’ in the bottom two bits results in unpredictable behavior
– Note that architectures prior to ARMv7 do not change state when the PC is written
directly
AAETC3v00
Instruction Sets 20
• Loading PC from memory behaves similarly
– Architectures prior to ARMv5T do not change state when the PC is loaded from memory
• The PC reads as the address of the current instruction plus an offset
– In ARM state, the offset is 8
– In Thumb state, the offset is 4
– This reflects the 3-stage structure of the ARM7TDMI pipeline
– In Thumb state, the bottom bit always reads as zero
– In ARM state, the bottom two bits will always read as zero
CHANGING STATE
• Changing between ARM and Thumb states (or “interworking”) can be carried out
using the Branch Exchange instruction
BX Rn
BLX RN
– Bit 0 of Rn determines the exchange behavior
• Unset (0) - change to (or remain in) ARM state
• Set (1) - change to (or remain in) Thumb state
AAETC3v00
Instruction Sets 21
• Branch and Link with Exchange
– Used to branch to a subroutine which is known to be in the opposite instruction set
– When branching to imported labels use BL, the linker will substitute BLX if necessary
BLX offset ; ARM/Thumb instruction which always
; changes state (and sets LR)
• All instructions which modify the PC can cause a state change
– Depending on bit 0 of the result
– For data processing instructions, state changes only if S variant not used
IF-THEN
• Thumb only, makes the next 1-4 instructions
conditional
• Syntax
IT{T|E}{T|E}{T|E} <cond>
– Any condition code may be used
– Doesn’t affect condition flags
– 16-bit instructions in the IT block do not affect condition
; if (r0 == 0)
; r0 = *r1 + 2;
; else
; r0 = *r2 + 4;
; if
CMP r0, #0
ITTEE EQ
AAETC3v00
Instruction Sets 22
– 16-bit instructions in the IT block do not affect condition
flags (except CMP, CMN & TST)
– 32-bit instructions do affect condition flags (normal rules
apply)
– No need to write this instruction: the assembler will insert
it for you where necessary
• Current “if-then status” stored in CPSR
– Conditional block may be safely interrupted and returned
to
– Not recommended to branch into or out of
‘if-then’ block
ITTEE EQ
; then
LDREQ r0, [r1]
ADDEQ r0, #2
; else
LDRNE r0, [r2]
ADDNE r0, #4
STATUS REGISTER ACCESS
• MRS and MSR allow contents of CPSR/SPSR to be transferred
to/from a general purpose register or be set to an immediate value
– MSR allows the whole status register, or just parts of it, to be updated
MRS r0,CPSR ; read CPSR into r0
BIC r0,r0,#0x80 ; clear bit 7 to enable IRQ
MSR CPSR_c,r0 ; write modified value to ‘c’ byte only
AAETC3v00
Instruction Sets 23
• CPS can be used to directly modify some bits in the CPSR
– These are related to interrupt enable/disable and operating mode
• SETEND instruction selects the endianness of data accesses
– For use in systems with mixed endian data (e.g. peripherals)
SETEND BE
LDR r0, [r7], #4 ; big-endian
SETEND LE
LDR r1, [r7], #4 ; little-endian
User mode programs may
read all bits of CPSR but
may only change the flag
bits
SYSTEM CONTROL INSTRUCTIONS
• ARM uses coprocessors for “internal functions” so as not to enforce
a particular memory map
– System Control Coprocessor: cp15
• Used for processor configuration: System ID, caches, MMU, TCMs, etc.
– Debug Coprocessor: cp14
• Can be used to access debug control registers
AAETC3v00
Instruction Sets 24
• Can be used to access debug control registers
– VFP and NEON: cp10 and cp11
• In earlier versions of the architecture, designers were permitted to
add external coprocessors
– This is not permitted in ARMv7 architecture profiles
AGENDA
Instruction Sets
• VFP and NEON
Pipelines
AAETC3v00
Instruction Sets 25
Pipelines
Cycle Counting
VFP ARCHITECTURE
• VFP (Vector Floating Point) is ARM’s floating point architecture
– There have been 4 versions of the architecture to date (VFPv1 is no longer
AAETC3v00
Instruction Sets 26
– There have been 4 versions of the architecture to date (VFPv1 is no longer
supported)
– VFPv2 is supported by ARM9 and ARM11 processor families
– VFPv3 and VFPv4 are optional extensions to the ARMv7-AR architecture profiles
• VFPv3 (Cortex-A8, Cortex-A9, Cortex-R4, Cortex-R5)
– Can be implemented with either 16 (VFPv3-D16) or 32 (VFPv3-D32) registers
– Can be extended with half-precision conversion functions
• VFPv4 (Cortex-A5, Cortex-A7 and Cortex-A15)
– Includes half-precision conversion functions
– Supports fused multiply-add operations
THE NEON ARCHITECTURE EXTENSION
• NEON refers to the Advanced SIMD instruction set extension
– Optional extension to ARMv7-AR architecture profiles
– The NEON register set is separate from the core register bank
– NEON instruction support parallel operations on vectors of elements held in registers
– Advanced SIMDv1 is the base NEON architecture
• Can be extended with half-precision conversion functions
– Advanced SIMDv2 adds fused multiply-add operations
AAETC3v00
Instruction Sets 27
AGENDA
Instruction Sets
VFP and NEON
• Pipelines
AAETC3v00
Instruction Sets 28
• Pipelines
Cycle Counting
Fetch Decode Execute ARM7
Fetch Decode Execute Memory Writeback ARM9
Fetch 1 Fetch 2 Decode Issue MAC 1 MAC 2 MAC 3
Shift ALU Saturate
Address Data 1 Data 2 Writeback
Writeback
ARM1136
HISTORIC PIPELINES
AAETC3v00
Instruction Sets 29
Address Data 1 Data 2 Writeback
Fetch 1 Fetch 2 Fetch 3 Queue Decode Rename Issue Execute 1 Execute 2
MAC 1 MAC 2
Address Load/Store
Writeback
Writeback
Execute 1 Execute 2
Data Engine
Writeback
Writeback
Cortex-A9
Operation
Cycle 1 2 3 4 5 6
Execute
Fetch Decode Execute
Fetch Decode Execute
Fetch Decode
Fetch Decode Execute
Fetch Decode Execute
ADD
SUB
MOV
AND
ORR
ARM7TDMI PIPELINE (DATA PROC)
AAETC3v00
Instruction Sets 30
Fetch Decode Execute
Fetch Decode Execute
Fetch Decode
Fetch
ORR
EOR
CMP
RSB
• In this example it takes 6 clock cycles to execute 6 instructions
• All operations here are on registers ( single cycle execution )
• Clock cycles per Instruction (CPI) = 1
ARM7TDMI PIPELINE (LDR)
Cycle
Operation
1 2 3 4 5 6
ADD
SUB
LDR
FetchFetch Decode Execute
Fetch Decode Execute
Fetch Decode Execute Data Writeback
AAETC3v00
Instruction Sets 31
• In this example it takes 6 clock cycles to execute 4
instructions
• Clock cycles per Instruction (CPI) = 1.5
LDR
MOV
AND
ORR
Fetch Decode Execute Data Writeback
Fetch Decode Execute
Fetch Decode
Fetch
ARM7TDMI PIPELINE (BRANCH)
Fetch Decode
Cycle 1 2 3 4 5
0x8000 BL
0x8004 X
0x8008 XX
0x8FEC ADD
Address Operation
Linkret AdjustFetch Decode Execute
Fetch Decode
Fetch
Execute
AAETC3v00
Instruction Sets 32
• Refilling the pipeline
• Note that the core is executing in ARM state
Fetch Decode
Fetch
0x8FEC ADD
0x8FF0 SUB
0x8FF4 MOV
Execute
Decode Execute
Fetch Decode
Fetch
Cycle 1 2 3 4 5 6 7 8
IRQ
Address Operation
Fetch DecodeExecute
Linkret Adjust
Fetch
Decode
IRQ Linkret
Execute
IRQ Adjust
0x8000 ADD
0x8008 MOV
0x8004 SUB
0x800C X
Fetch
Fetch
ARM7TDMI PIPELINE (INTERRUPT)
AAETC3v00
Instruction Sets 33
0x0018 B (to 0xAF00)
0x001C XX
0x0020 XXX
0xAF00 STMFD
0xAF04 MOV
0xAF08 LDR
Fetch
Fetch
Fetch
Fetch
Fetch
Fetch
Decode
Decode
Decode
Decode
Execute
Execute
IRQ interrupt minimum latency (service routine entry) = 7 cycles
ARM9TDMI PIPELINE (LDR INTERLOCK)
Cycle
Operation
ADD R1, R1, R2
SUB R3, R4, R1
ORR R8, R3, R4
AND R6, R3, R1
1 2 3 4 5 6 7 8
LDR R4, [R7]
9
F D E
F D E W
F D E W
F D E W
F D WE
W
I
M
S
AAETC3v00
Instruction Sets 34
• In this example it takes 7 clock cycles to execute 6 instructions, CPI of 1.2
• The LDR instruction immediately followed by a data operation using the same
register causes an interlock
EOR R3, R1, R2 F D E W
F - Fetch D - Decode E - Execute I - Interlock M - Memory
W - Writeback
ARM9TDMI PIPELINE (LDR)
Cycle
Operation
ADD R1, R1, R2
SUB R3, R4, R1
ORR R8, R3, R4
AND R6, R3, R1
LDR R4, [R7]
1 2 3 4 5 6 7 8 9
F D E W
F D E W
F E W
F D E W
F D WE M
D
AAETC3v00
Instruction Sets 35
• In this example it takes 6 cycles to execute 6 instructions, CPI of 1
• Cycle 4 has simultaneous I & D memory accesses
• Cycle 5 R4 data available to ORR before written to register
– Internal forwarding paths are used
EOR R3, R1, R2 F D E W
F - Fetch D - Decode E - Execute I - Interlock M - Memory
W - Writeback
CORTEX-R4 PIPELINE
Decode Issue
Pre-
Decode
Fetch2
Shift ALU Sat
MAC
1
MAC
2
Data
Cache
Data
Cache Format
Fetch1
A
G
Common decode pipeline
4 parallel back end pipelines
MAC
3
Wr
Instruction
AAETC3v00
Instruction Sets 36
FPU2
Cache
1
Cache
2
Format
FPU0 FPU1
Branch3
Wr
G
UPrefetch Unit
• Dual issue can occur for certain instruction
sequences
• Enabled at reset, can be disabled in CP15
• AGU = Address Generation Unit
• Separate divide pipeline for hardware DIV
instruction
Branch1Branch2
FPU (Optional)
Instruction
queue
CORTEX-A9 PIPELINE
Prefetch
Unit
ISS
Ex1
Ex1
WB
WB
De Re
BM
Main
(P0)
Dual
(P1)
M1
Mac
(M)
Ex2
Ex2
M2
IQ
Instruction
Address
Instruction
fetching
64
AAETC3v00
Instruction Sets 37
• IQ: Instruction Queue
• Re: Register renaming
• BM:Branch Monitor
• P0: Main execution pipeline
• M: MAC pipeline
• P1: Secondary (“dual”) execution pipeline
• AGU: Address Generation Unit
• LSU: Load/Store Unit
• DE: Data Engine - (NEON and/or FPU) pipeline
AGU WB
Data Engine
LSU
Load/store
(LS)
WB
Data Engine
(DE)
CORTEX-A15 AND CORTEX-A7
Fetch
Decode, Rename &
Dispatch
Loop Cache
Queue Issue
Integer
Integer
Multiply
Floating-Point / NEON
Branch
Load
Store
Writeback
AAETC3v00
Instruction Sets 38
Fetch Decode
Queue
Issue
Integer
Multiply
Floating-Point / NEON
Dual Issue
Load/Store
Writeback
Cortex-A15 and Cortex-A7 form an
architecturally-identical pair
Cortex-A15 is optimized for performance
Cortex-A7 is optimized for power
consumption
Together they can be built into a big.LITTLE
configuration
AGENDA
Instruction Sets
VFP and NEON
Pipelines
AAETC3v00
Instruction Sets 39
Pipelines
• Cycle Counting
CYCLE COUNTING
• Early pipelines (e.g. ARM7TDMI) were entirely deterministic and
predictable
• Later pipelines introduce interlocks and inter-instruction
dependencies
– Address, resource and data dependencies are all possible
AAETC3v00
Instruction Sets 40
– Address, resource and data dependencies are all possible
– Interactions between instructions become very complicated
• On ARMv7 cores, manual cycle counting is not really possible, so
need to use…
– Cycle-accurate trace
– Simulation models
– Performance Monitoring Unit (see later)
PERFORMANCE MONITORING
HARDWARE
• ARMv7-A cores include a performance monitoring unit (PMU)
• A PMU provides a non-intrusive method of collecting execution information
from the core
– Enabling the PMU does not change the timing of the core
• The PMU provides:
– Cycle counter – counts execution cycles (optional 1/64 divider)
AAETC3v00
Instruction Sets 41
– Cycle counter – counts execution cycles (optional 1/64 divider)
– Programmable event counters
• The number of counters and available events vary between cores
– The PMU can be configured to generate interrupts if a counter overflows
• Some examples common to most cores:
– Cache Hits or Misses, TLB Misses (on MMU cores), Branch prediction,
correct/incorrect predictions, Number of instructions executed, etc…
• Some events are architecturally defined while others are core-dependent
– Check the ARM ARM and your core’s TRM for a full list
SOFTWARE & SYSTEMS
DESIGN
3 – Instruction Sets

Mais conteúdo relacionado

Mais procurados

RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V IntroductionYi-Hsiu Hsu
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptxJoyChowdhury30
 
An Overview on Programmable System on Chip: PSoC-5
An Overview on Programmable System on Chip: PSoC-5An Overview on Programmable System on Chip: PSoC-5
An Overview on Programmable System on Chip: PSoC-5Premier Farnell
 
Embedded System Programming on ARM Cortex M3 and M4 Course
Embedded System Programming on ARM Cortex M3 and M4 CourseEmbedded System Programming on ARM Cortex M3 and M4 Course
Embedded System Programming on ARM Cortex M3 and M4 CourseFastBit Embedded Brain Academy
 
Embedded c program and programming structure for beginners
Embedded c program and programming structure for beginnersEmbedded c program and programming structure for beginners
Embedded c program and programming structure for beginnersKamesh Mtec
 
ARM - Advance RISC Machine
ARM - Advance RISC MachineARM - Advance RISC Machine
ARM - Advance RISC MachineEdutechLearners
 
Introduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorIntroduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorDarling Jemima
 
AMBA 5 COHERENT HUB INTERFACE.pptx
AMBA 5 COHERENT HUB INTERFACE.pptxAMBA 5 COHERENT HUB INTERFACE.pptx
AMBA 5 COHERENT HUB INTERFACE.pptxSairam Chebrolu
 
Interleaved memory
Interleaved memoryInterleaved memory
Interleaved memoryashishgy
 
ARM architcture
ARM architcture ARM architcture
ARM architcture Hossam Adel
 
Module 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and ProgrammingModule 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and ProgrammingAmogha Bandrikalli
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64Yi-Hsiu Hsu
 

Mais procurados (20)

RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
 
An Overview on Programmable System on Chip: PSoC-5
An Overview on Programmable System on Chip: PSoC-5An Overview on Programmable System on Chip: PSoC-5
An Overview on Programmable System on Chip: PSoC-5
 
Embedded System Programming on ARM Cortex M3 and M4 Course
Embedded System Programming on ARM Cortex M3 and M4 CourseEmbedded System Programming on ARM Cortex M3 and M4 Course
Embedded System Programming on ARM Cortex M3 and M4 Course
 
ARM Processor
ARM ProcessorARM Processor
ARM Processor
 
Embedded c program and programming structure for beginners
Embedded c program and programming structure for beginnersEmbedded c program and programming structure for beginners
Embedded c program and programming structure for beginners
 
Uart 16550
Uart 16550Uart 16550
Uart 16550
 
CPU Verification
CPU VerificationCPU Verification
CPU Verification
 
ARM - Advance RISC Machine
ARM - Advance RISC MachineARM - Advance RISC Machine
ARM - Advance RISC Machine
 
Introduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorIntroduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM Processor
 
Microcontroller 8096
Microcontroller 8096Microcontroller 8096
Microcontroller 8096
 
AMBA 5 COHERENT HUB INTERFACE.pptx
AMBA 5 COHERENT HUB INTERFACE.pptxAMBA 5 COHERENT HUB INTERFACE.pptx
AMBA 5 COHERENT HUB INTERFACE.pptx
 
Interleaved memory
Interleaved memoryInterleaved memory
Interleaved memory
 
Dif fft
Dif fftDif fft
Dif fft
 
Architecture of pentium family
Architecture of pentium familyArchitecture of pentium family
Architecture of pentium family
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
 
Blackfin core architecture
Blackfin core architectureBlackfin core architecture
Blackfin core architecture
 
I2C
I2CI2C
I2C
 
Module 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and ProgrammingModule 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and Programming
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64
 

Semelhante a ARM AAE - Intrustion Sets

Semelhante a ARM AAE - Intrustion Sets (20)

Arm teaching material
Arm teaching materialArm teaching material
Arm teaching material
 
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching material
 
ARM Architecture Instruction Set
ARM Architecture Instruction SetARM Architecture Instruction Set
ARM Architecture Instruction Set
 
Unit vi
Unit viUnit vi
Unit vi
 
module 5.1.pptx
module 5.1.pptxmodule 5.1.pptx
module 5.1.pptx
 
module 5.pptx
module 5.pptxmodule 5.pptx
module 5.pptx
 
UNIT 2 ERTS.ppt
UNIT 2 ERTS.pptUNIT 2 ERTS.ppt
UNIT 2 ERTS.ppt
 
15CS44 MP & MC module 5
15CS44 MP & MC  module 515CS44 MP & MC  module 5
15CS44 MP & MC module 5
 
Module 2 PPT of ES.pptx
Module 2 PPT of ES.pptxModule 2 PPT of ES.pptx
Module 2 PPT of ES.pptx
 
Arm Cortex material Arm Cortex material3222886.ppt
Arm Cortex material Arm Cortex material3222886.pptArm Cortex material Arm Cortex material3222886.ppt
Arm Cortex material Arm Cortex material3222886.ppt
 
Unit II arm 7 Instruction Set
Unit II arm 7 Instruction SetUnit II arm 7 Instruction Set
Unit II arm 7 Instruction Set
 
Lecture8
Lecture8Lecture8
Lecture8
 
OptimizingARM
OptimizingARMOptimizingARM
OptimizingARM
 
LPC 2148 Instructions Set.ppt
LPC 2148 Instructions Set.pptLPC 2148 Instructions Set.ppt
LPC 2148 Instructions Set.ppt
 
arm-intro.ppt
arm-intro.pptarm-intro.ppt
arm-intro.ppt
 
ARM Introduction
ARM IntroductionARM Introduction
ARM Introduction
 
armcortexinsructionsetupdated (2).pptx
armcortexinsructionsetupdated (2).pptxarmcortexinsructionsetupdated (2).pptx
armcortexinsructionsetupdated (2).pptx
 
ARM instruction set
ARM instruction  setARM instruction  set
ARM instruction set
 
ARM instruction set
ARM instruction  setARM instruction  set
ARM instruction set
 
Arm instruction set
Arm instruction setArm instruction set
Arm instruction set
 

Mais de Anh Dung NGUYEN

ARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARMARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARMAnh Dung NGUYEN
 
ARM AAE - Memory Systems
ARM AAE - Memory SystemsARM AAE - Memory Systems
ARM AAE - Memory SystemsAnh Dung NGUYEN
 
AAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation DiversityAAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation DiversityAnh Dung NGUYEN
 
AAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System StartupAAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System StartupAnh Dung NGUYEN
 
AAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAnh Dung NGUYEN
 
AAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced FeaturesAAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced FeaturesAnh Dung NGUYEN
 
AAME ARM Techcon2013 003v02 Software Development
AAME ARM Techcon2013 003v02  Software DevelopmentAAME ARM Techcon2013 003v02  Software Development
AAME ARM Techcon2013 003v02 Software DevelopmentAnh Dung NGUYEN
 
AAME ARM Techcon2013 Intro
AAME ARM Techcon2013 IntroAAME ARM Techcon2013 Intro
AAME ARM Techcon2013 IntroAnh Dung NGUYEN
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAnh Dung NGUYEN
 

Mais de Anh Dung NGUYEN (11)

ARM AAE - System Issues
ARM AAE - System IssuesARM AAE - System Issues
ARM AAE - System Issues
 
ARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARMARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARM
 
ARM AAE - Memory Systems
ARM AAE - Memory SystemsARM AAE - Memory Systems
ARM AAE - Memory Systems
 
ARM AAE - Introduction
ARM AAE - IntroductionARM AAE - Introduction
ARM AAE - Introduction
 
AAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation DiversityAAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation Diversity
 
AAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System StartupAAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System Startup
 
AAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and Optimization
 
AAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced FeaturesAAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced Features
 
AAME ARM Techcon2013 003v02 Software Development
AAME ARM Techcon2013 003v02  Software DevelopmentAAME ARM Techcon2013 003v02  Software Development
AAME ARM Techcon2013 003v02 Software Development
 
AAME ARM Techcon2013 Intro
AAME ARM Techcon2013 IntroAAME ARM Techcon2013 Intro
AAME ARM Techcon2013 Intro
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
 

Último

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 

Último (20)

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

ARM AAE - Intrustion Sets

  • 1. SOFTWARE & SYSTEMS DESIGN 3 – Instruction Sets
  • 2. AGENDA • Instruction Sets VFP and NEON Pipelines AAETC3v00 Instruction Sets 2 Pipelines Cycle Counting
  • 3. INSTRUCTION SET • ARM instruction set – All instructions are 32-bit – Most instructions can be executed conditionally • Thumb instruction set – 16-bit instruction set No condition execution (except for branches) AAETC3v00 Instruction Sets 3 – 16-bit instruction set – No condition execution (except for branches) – Optimized for code density from C code (~65% of ARM code size) • Thumb-2 technology – Extension to Thumb instruction set – Mix of 16-bit and 32-bit instructions – Condition execution via IT instruction – Higher performance than Thumb and smaller than ARM
  • 4. ASSEMBLER SYNTAX • Data processing instructions <operation><condition> Rd, Rm, <op2> ADDEQ r4, r5, r6 // if (EQ) r4 = r5 + r6 ORR r2, r3, r6, LSL #4 // if (EQ) r4 = r5 + r6 SUBS r5, r7, #4 // r5 = r7 – 4; set flags MOV r4, #7 // r4 = 7 • Memory access instructions AAETC3v00 Instruction Sets 4 • Memory access instructions <operation><size> Rd, [<address>] LDR r0, [r6, #4] // r0 = *(r6 + 4) STRB r4, [r7], #8 // *(byte *) r7 = r4; r7 += 8 <operation><addressing mode> <Rn>!, <registers list> LDMIA r0, {r1, r2, r7} STMFD sp!, {r4-r11, lr} • Program flow instructions <branch> <label> BL foo B baR
  • 5. DATA PROCESSING INSTRUCTIONS • These instructions operate on the contents of registers – They DO NOT affect memory arithmetic logical move manipulation (has destination register) ADD ADC SUB SBC RSB RSC AND EOR MOV ORR ORN BIC T2T2 MVN AAETC3v00 Instruction Sets 5 • Syntax: <Operation>{S}{<cond>} {Rd,} Rn, Operand2 • Examples: ADD r0, r1, r2 ; r0 = r1 + r2 TEQ r0, r1 ; if r0 = r1, Z flag will be set MOV r0, r1 ; copy r1 to r0 comparison (set flags only) CMN (ADDS) CMP (SUBS) TST (ANDS) TEQ (EORS)
  • 6. MULTIPLY / DIVIDE • 32-bit multiplication 64-bit multiplication ×××× Rn Rm + ×××× Rn Rm Ra +/- optional accumulation optional accumulation MUL MLA MLS UMULL SMULL UMLAL SMLAL AAETC3v00 Instruction Sets 6 Examples: MLA r0, r1, r2, r3 ; r0 = r3 + (r1 * r2) [U|S]MULL r4, r5, r2, r3 ; r5:r4 = r2 * r3 Division: SDIV r0, r1, r2 ; signed: r0 = r1 / r2 UDIV r0, r1, r2 ; unsigned: r0 = r1 / r2 RdHi RdLoRdMLS SMLAL Optional in 7-A
  • 7. BIT MANIPULATION INSTRUCTIONS 031 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0 031 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 1 0 0 031 BFI r0, r0, #9, #6 ; Bit Field Insert UBFX r1, r0, #18, #7 ; Bit Field Extract 1 1 0 1 0 0 1 0 1 0 011 1 0 1 0 0 r0 r0 AAETC3v00 Instruction Sets 7 031 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 031 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 BFC r1, #3, #4 ; Bit Field Clear 0 RBIT r2, r1 ; Reverse Bit Order 0 Zero extend r1 r2 031 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 10 0 0r1
  • 8. BYTE REVERSAL • Byte Reversal Instructions REV{cond} Rd, Rm Reverses the bytes in a word REV16{cond} Rd, Rm Reverses the bytes in each halfword 3 2 01 0 1 32 REV r0, r0 AAETC3v00 Instruction Sets 8 REV16{cond} Rd, Rm Reverses the bytes in each halfword REVSH{cond} Rd, Rm Reverses the bottom two bytes, and sign extends to 32 bits V6 and later REV r0, r0 Pre-V6 EOR r1, r0, r0, ROR #16 BIC r1, r1, #0xFF0000 MOV r0, r0, ROR #8 EOR r0, r0, r1, LSR #8
  • 9. SIMD • ARMv6 added a number of instructions which perform SIMD (Single Instruction Multiple Data) operations using ARM registers – Includes instructions for addition, subtraction, multiplication and sum of absolute differences – Instructions can work on four 8-bit quantities, or two 16-bit quantities – Signed/unsigned and saturating versions available of many instructions – CPSR GE bits used instead of normal ALU flags UADD16 Rd, Rm, Rs AAETC3v00 Instruction Sets 9 • There are instructions for packing (PKHBT/PKHTB) and unpacking (UXTH/UXTB) registers + Rs + Rm UADD16 Rd, Rm, Rs Rd GE[3:2] GE[1:0]
  • 10. SATURATED MATH AND CLZ • Support for Saturated Arithmetic – Targeted at DSP & control applications – Overflow sets Q flag (sticky) not V, and sets result to +/- max value QSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm - Rn) QADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm + Rn) 0x0 0x7FFFFFFF 0x80000000 -ve +ve AAETC3v00 Instruction Sets 10 QDSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm - saturate(Rn * 2)) QDADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm + saturate(Rn * 2)) • Count Leading Zeros CLZ{cond} Rd, Rm – Returns number of unset bits before the most significant set bit 031 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0 CLZ returns 10 in this case
  • 11. SATURATION • Saturate a value to a specified bit position (effectively saturating to any power of 2) – USAT - Unsigned saturate 32-bit • Syntax: USAT Rd, #sat, Rm {shift} • Operation: Rd = Saturate(Shift(Rm), #sat) 0 0 1 1 1 saturation position max (unsigned saturation) max min AAETC3v00 Instruction Sets 11 – Variants SSAT - signed saturation USAT16 - saturates two 16-bit unsigned halfwords (no rotation allowed) SSAT16 - signed saturation of two 16-bit halfwords (no rotation allowed) – #sat is specified as an immediate value in the range 0 to 31 – {shift} is optional and is limited to LSL or ASR – Q flag is set if saturation occurs 0 0 0 1 1 max 1 1 1 0 0 min (signed saturation)
  • 12. SINGLE / DOUBLE REGISTER DATA TRANSFER • Use to move data between one or two registers and memory LDRD STRD Doubleword LDR STR Word LDRB STRB Byte LDRH STRH Halfword LDRSB Signed byte load LDRSH Signed halfword load Memory 31 0 AAETC3v00 Instruction Sets 12 • Syntax: – LDR{<size>}{<cond>} Rd, <address> – STR{<size>}{<cond>} Rd, <address> • Example: – LDRB r0, [r1] ; load bottom byte of r0 from the ; byte of memory at address in r1 Any remaining space zero filled or sign extended Rd
  • 13. ADDRESSING MEMORY • The address accessed by LDR/STR is specified by a base register with an optional offset – Base register only (no offset) LDR r0, [r1] – Base register plus constant LDR r0, [r1, #8] r2, LSL #2 AAETC3v00 Instruction Sets 13 LDR r0, [r1, #8] – Base register, plus register (optionally shifted by an immediate value) LDR r0, [r1, r2] LDR r0, [r1, r2, LSL #2] – The offset can be either added or subtracted from the base register LDR r0, [r1, #-8] LDR r0, [r1, -r2] LDR r0, [r1, -r2, LSL #2] +/- r1 #8 r0 memory address r2, LSL #2 or
  • 14. PRE- AND POST-INDEXED ADDRESSING • Post-indexed (add offset after memory access) LDR r0, [r1], #12 • Pre-indexed (add offset before memory access) LDR r0, [r1, #12]{!} + r1 #12 address r1 address AAETC3v00 Instruction Sets 14 r0 memory r0 memory + r1 #12 r1 • If ‘!’ present, update base register (r1) • Always update base register (r1) + r1 #12 r1
  • 15. • These instructions move data between multiple registers and memory • Syntax <LDM|STM>{<addressing_mode>}{<cond>} Rb{!}, <register list> • 4 addressing modes • Increment after/before • Decrement after/before MULTIPLE REGISTER DATA TRANSFER (IA) r1 Increasing r4 r1 r4 r0 IB DA DB AAETC3v00 Instruction Sets 15 • Also PUSH/POP, equivalent to STMDB/LDMIA with SP! as base register • Example LDM r10, {r0,r1,r4} ; load registers, using r10 base PUSH {r4-r6,pc} ; store registers, using SP base Increasing Addressr0 r1 r4 r0 r1 r4 r0 r10Base Register (Rb)
  • 16. INSTRUCTIONS FOR LOADING CONSTANTS • The assembler provides some instructions for loading values into registers – These are the recommended mechanisms for loading constants into registers • PC- or register-relative constants ADR Rn, label • Add or subtract an immediate value to or from the PC to generate the • Absolute constants LDR Rn, =<constant> LDR Rn, =label AAETC3v00 Instruction Sets 16 to or from the PC to generate the address of the label into the specified register, using one instruction • ADRL pseudo instruction uses two instructions, giving a better range • Can be used to generate addresses for position independent code (but only if in same code section) • Constant determined at run time • Pseudo instruction • Assembler will use optimal sequence to generate constant into specified register (one of MOV, MVN or an LDR from a literal pool) • Can load to the PC, causing a branch • Use for absolute addressing and references outside the current section (resulting in position dependent code) • Constant determined at assembly or link time
  • 17. LDR= EXAMPLES • The following examples show how the LDR= pseudo instruction makes code more readable, portable and flexible LDR r0, =0x2543 MOV r0, #0x2543 DisassemblyCode AAETC3v00 Instruction Sets 17 LDR r0, =0xFFFF43FF LDR r0, =0xFFFFF5 MVN r0, #0xBC00 LDR r0, [pc, #xx] ... DCD 0xFFFFF5
  • 18. BRANCH INSTRUCTIONS • Branch instructions have the following format B{<cond>} label – Might not cause a pipeline flush (branch prediction) – Branch range depends on instruction set and width • A BL instruction additionally generates a return address in r14 (lr) – Returning is performed by restoring the program counter (pc) from lr AAETC3v00 Instruction Sets 18 – Returning is performed by restoring the program counter (pc) from lr : BL func2 : : BX lr func1 func2 void func1 (void) { : func2(); : }
  • 19. BRANCH RANGES • The range of a branch instruction depends on which instruction set is being used • It also varies between different types of branch ARM Thumb B ±32MB ±16MB CBZ/CBNZ 126 bytes AAETC3v00 Instruction Sets 19 CBZ/CBNZ 126 bytes BL/BLX (imm) ±32MB ±16MB BLX (reg) Any Any BX Any Any TBB 510 bytes TBH 131070 bytes “Any” indicates an instruction which can branch to any address in the 4GB address space
  • 20. READING AND WRITING PC • In general, writing PC causes a branch to the value written – Bit zero controls the execution state (ARM or Thumb) at the destination – The bottom bit of the destination address is always forced to zero – Writing a value with ‘10’ in the bottom two bits results in unpredictable behavior – Note that architectures prior to ARMv7 do not change state when the PC is written directly AAETC3v00 Instruction Sets 20 • Loading PC from memory behaves similarly – Architectures prior to ARMv5T do not change state when the PC is loaded from memory • The PC reads as the address of the current instruction plus an offset – In ARM state, the offset is 8 – In Thumb state, the offset is 4 – This reflects the 3-stage structure of the ARM7TDMI pipeline – In Thumb state, the bottom bit always reads as zero – In ARM state, the bottom two bits will always read as zero
  • 21. CHANGING STATE • Changing between ARM and Thumb states (or “interworking”) can be carried out using the Branch Exchange instruction BX Rn BLX RN – Bit 0 of Rn determines the exchange behavior • Unset (0) - change to (or remain in) ARM state • Set (1) - change to (or remain in) Thumb state AAETC3v00 Instruction Sets 21 • Branch and Link with Exchange – Used to branch to a subroutine which is known to be in the opposite instruction set – When branching to imported labels use BL, the linker will substitute BLX if necessary BLX offset ; ARM/Thumb instruction which always ; changes state (and sets LR) • All instructions which modify the PC can cause a state change – Depending on bit 0 of the result – For data processing instructions, state changes only if S variant not used
  • 22. IF-THEN • Thumb only, makes the next 1-4 instructions conditional • Syntax IT{T|E}{T|E}{T|E} <cond> – Any condition code may be used – Doesn’t affect condition flags – 16-bit instructions in the IT block do not affect condition ; if (r0 == 0) ; r0 = *r1 + 2; ; else ; r0 = *r2 + 4; ; if CMP r0, #0 ITTEE EQ AAETC3v00 Instruction Sets 22 – 16-bit instructions in the IT block do not affect condition flags (except CMP, CMN & TST) – 32-bit instructions do affect condition flags (normal rules apply) – No need to write this instruction: the assembler will insert it for you where necessary • Current “if-then status” stored in CPSR – Conditional block may be safely interrupted and returned to – Not recommended to branch into or out of ‘if-then’ block ITTEE EQ ; then LDREQ r0, [r1] ADDEQ r0, #2 ; else LDRNE r0, [r2] ADDNE r0, #4
  • 23. STATUS REGISTER ACCESS • MRS and MSR allow contents of CPSR/SPSR to be transferred to/from a general purpose register or be set to an immediate value – MSR allows the whole status register, or just parts of it, to be updated MRS r0,CPSR ; read CPSR into r0 BIC r0,r0,#0x80 ; clear bit 7 to enable IRQ MSR CPSR_c,r0 ; write modified value to ‘c’ byte only AAETC3v00 Instruction Sets 23 • CPS can be used to directly modify some bits in the CPSR – These are related to interrupt enable/disable and operating mode • SETEND instruction selects the endianness of data accesses – For use in systems with mixed endian data (e.g. peripherals) SETEND BE LDR r0, [r7], #4 ; big-endian SETEND LE LDR r1, [r7], #4 ; little-endian User mode programs may read all bits of CPSR but may only change the flag bits
  • 24. SYSTEM CONTROL INSTRUCTIONS • ARM uses coprocessors for “internal functions” so as not to enforce a particular memory map – System Control Coprocessor: cp15 • Used for processor configuration: System ID, caches, MMU, TCMs, etc. – Debug Coprocessor: cp14 • Can be used to access debug control registers AAETC3v00 Instruction Sets 24 • Can be used to access debug control registers – VFP and NEON: cp10 and cp11 • In earlier versions of the architecture, designers were permitted to add external coprocessors – This is not permitted in ARMv7 architecture profiles
  • 25. AGENDA Instruction Sets • VFP and NEON Pipelines AAETC3v00 Instruction Sets 25 Pipelines Cycle Counting
  • 26. VFP ARCHITECTURE • VFP (Vector Floating Point) is ARM’s floating point architecture – There have been 4 versions of the architecture to date (VFPv1 is no longer AAETC3v00 Instruction Sets 26 – There have been 4 versions of the architecture to date (VFPv1 is no longer supported) – VFPv2 is supported by ARM9 and ARM11 processor families – VFPv3 and VFPv4 are optional extensions to the ARMv7-AR architecture profiles • VFPv3 (Cortex-A8, Cortex-A9, Cortex-R4, Cortex-R5) – Can be implemented with either 16 (VFPv3-D16) or 32 (VFPv3-D32) registers – Can be extended with half-precision conversion functions • VFPv4 (Cortex-A5, Cortex-A7 and Cortex-A15) – Includes half-precision conversion functions – Supports fused multiply-add operations
  • 27. THE NEON ARCHITECTURE EXTENSION • NEON refers to the Advanced SIMD instruction set extension – Optional extension to ARMv7-AR architecture profiles – The NEON register set is separate from the core register bank – NEON instruction support parallel operations on vectors of elements held in registers – Advanced SIMDv1 is the base NEON architecture • Can be extended with half-precision conversion functions – Advanced SIMDv2 adds fused multiply-add operations AAETC3v00 Instruction Sets 27
  • 28. AGENDA Instruction Sets VFP and NEON • Pipelines AAETC3v00 Instruction Sets 28 • Pipelines Cycle Counting
  • 29. Fetch Decode Execute ARM7 Fetch Decode Execute Memory Writeback ARM9 Fetch 1 Fetch 2 Decode Issue MAC 1 MAC 2 MAC 3 Shift ALU Saturate Address Data 1 Data 2 Writeback Writeback ARM1136 HISTORIC PIPELINES AAETC3v00 Instruction Sets 29 Address Data 1 Data 2 Writeback Fetch 1 Fetch 2 Fetch 3 Queue Decode Rename Issue Execute 1 Execute 2 MAC 1 MAC 2 Address Load/Store Writeback Writeback Execute 1 Execute 2 Data Engine Writeback Writeback Cortex-A9
  • 30. Operation Cycle 1 2 3 4 5 6 Execute Fetch Decode Execute Fetch Decode Execute Fetch Decode Fetch Decode Execute Fetch Decode Execute ADD SUB MOV AND ORR ARM7TDMI PIPELINE (DATA PROC) AAETC3v00 Instruction Sets 30 Fetch Decode Execute Fetch Decode Execute Fetch Decode Fetch ORR EOR CMP RSB • In this example it takes 6 clock cycles to execute 6 instructions • All operations here are on registers ( single cycle execution ) • Clock cycles per Instruction (CPI) = 1
  • 31. ARM7TDMI PIPELINE (LDR) Cycle Operation 1 2 3 4 5 6 ADD SUB LDR FetchFetch Decode Execute Fetch Decode Execute Fetch Decode Execute Data Writeback AAETC3v00 Instruction Sets 31 • In this example it takes 6 clock cycles to execute 4 instructions • Clock cycles per Instruction (CPI) = 1.5 LDR MOV AND ORR Fetch Decode Execute Data Writeback Fetch Decode Execute Fetch Decode Fetch
  • 32. ARM7TDMI PIPELINE (BRANCH) Fetch Decode Cycle 1 2 3 4 5 0x8000 BL 0x8004 X 0x8008 XX 0x8FEC ADD Address Operation Linkret AdjustFetch Decode Execute Fetch Decode Fetch Execute AAETC3v00 Instruction Sets 32 • Refilling the pipeline • Note that the core is executing in ARM state Fetch Decode Fetch 0x8FEC ADD 0x8FF0 SUB 0x8FF4 MOV Execute Decode Execute Fetch Decode Fetch
  • 33. Cycle 1 2 3 4 5 6 7 8 IRQ Address Operation Fetch DecodeExecute Linkret Adjust Fetch Decode IRQ Linkret Execute IRQ Adjust 0x8000 ADD 0x8008 MOV 0x8004 SUB 0x800C X Fetch Fetch ARM7TDMI PIPELINE (INTERRUPT) AAETC3v00 Instruction Sets 33 0x0018 B (to 0xAF00) 0x001C XX 0x0020 XXX 0xAF00 STMFD 0xAF04 MOV 0xAF08 LDR Fetch Fetch Fetch Fetch Fetch Fetch Decode Decode Decode Decode Execute Execute IRQ interrupt minimum latency (service routine entry) = 7 cycles
  • 34. ARM9TDMI PIPELINE (LDR INTERLOCK) Cycle Operation ADD R1, R1, R2 SUB R3, R4, R1 ORR R8, R3, R4 AND R6, R3, R1 1 2 3 4 5 6 7 8 LDR R4, [R7] 9 F D E F D E W F D E W F D E W F D WE W I M S AAETC3v00 Instruction Sets 34 • In this example it takes 7 clock cycles to execute 6 instructions, CPI of 1.2 • The LDR instruction immediately followed by a data operation using the same register causes an interlock EOR R3, R1, R2 F D E W F - Fetch D - Decode E - Execute I - Interlock M - Memory W - Writeback
  • 35. ARM9TDMI PIPELINE (LDR) Cycle Operation ADD R1, R1, R2 SUB R3, R4, R1 ORR R8, R3, R4 AND R6, R3, R1 LDR R4, [R7] 1 2 3 4 5 6 7 8 9 F D E W F D E W F E W F D E W F D WE M D AAETC3v00 Instruction Sets 35 • In this example it takes 6 cycles to execute 6 instructions, CPI of 1 • Cycle 4 has simultaneous I & D memory accesses • Cycle 5 R4 data available to ORR before written to register – Internal forwarding paths are used EOR R3, R1, R2 F D E W F - Fetch D - Decode E - Execute I - Interlock M - Memory W - Writeback
  • 36. CORTEX-R4 PIPELINE Decode Issue Pre- Decode Fetch2 Shift ALU Sat MAC 1 MAC 2 Data Cache Data Cache Format Fetch1 A G Common decode pipeline 4 parallel back end pipelines MAC 3 Wr Instruction AAETC3v00 Instruction Sets 36 FPU2 Cache 1 Cache 2 Format FPU0 FPU1 Branch3 Wr G UPrefetch Unit • Dual issue can occur for certain instruction sequences • Enabled at reset, can be disabled in CP15 • AGU = Address Generation Unit • Separate divide pipeline for hardware DIV instruction Branch1Branch2 FPU (Optional) Instruction queue
  • 37. CORTEX-A9 PIPELINE Prefetch Unit ISS Ex1 Ex1 WB WB De Re BM Main (P0) Dual (P1) M1 Mac (M) Ex2 Ex2 M2 IQ Instruction Address Instruction fetching 64 AAETC3v00 Instruction Sets 37 • IQ: Instruction Queue • Re: Register renaming • BM:Branch Monitor • P0: Main execution pipeline • M: MAC pipeline • P1: Secondary (“dual”) execution pipeline • AGU: Address Generation Unit • LSU: Load/Store Unit • DE: Data Engine - (NEON and/or FPU) pipeline AGU WB Data Engine LSU Load/store (LS) WB Data Engine (DE)
  • 38. CORTEX-A15 AND CORTEX-A7 Fetch Decode, Rename & Dispatch Loop Cache Queue Issue Integer Integer Multiply Floating-Point / NEON Branch Load Store Writeback AAETC3v00 Instruction Sets 38 Fetch Decode Queue Issue Integer Multiply Floating-Point / NEON Dual Issue Load/Store Writeback Cortex-A15 and Cortex-A7 form an architecturally-identical pair Cortex-A15 is optimized for performance Cortex-A7 is optimized for power consumption Together they can be built into a big.LITTLE configuration
  • 39. AGENDA Instruction Sets VFP and NEON Pipelines AAETC3v00 Instruction Sets 39 Pipelines • Cycle Counting
  • 40. CYCLE COUNTING • Early pipelines (e.g. ARM7TDMI) were entirely deterministic and predictable • Later pipelines introduce interlocks and inter-instruction dependencies – Address, resource and data dependencies are all possible AAETC3v00 Instruction Sets 40 – Address, resource and data dependencies are all possible – Interactions between instructions become very complicated • On ARMv7 cores, manual cycle counting is not really possible, so need to use… – Cycle-accurate trace – Simulation models – Performance Monitoring Unit (see later)
  • 41. PERFORMANCE MONITORING HARDWARE • ARMv7-A cores include a performance monitoring unit (PMU) • A PMU provides a non-intrusive method of collecting execution information from the core – Enabling the PMU does not change the timing of the core • The PMU provides: – Cycle counter – counts execution cycles (optional 1/64 divider) AAETC3v00 Instruction Sets 41 – Cycle counter – counts execution cycles (optional 1/64 divider) – Programmable event counters • The number of counters and available events vary between cores – The PMU can be configured to generate interrupts if a counter overflows • Some examples common to most cores: – Cache Hits or Misses, TLB Misses (on MMU cores), Branch prediction, correct/incorrect predictions, Number of instructions executed, etc… • Some events are architecturally defined while others are core-dependent – Check the ARM ARM and your core’s TRM for a full list
  • 42. SOFTWARE & SYSTEMS DESIGN 3 – Instruction Sets