SlideShare uma empresa Scribd logo
1 de 67
Baixar para ler offline
Processor Architecture and
  Advanced RISC Machine
               Prof. Anish Goel
von Neumann/Princeton Architecture
       Memory holds both data and instructions.

       Central processing unit (CPU) fetches instructions from memory.
           Separate CPU and memory distinguishes stored-program computer.

       CPU registers: program counter (PC), instruction register (IR), general-
        purpose registers, etc.

       von Neumann machines also known as stored-program computers

       Named after John von Neumann who wrote First Draft of a report on the
        EDVAC (Electronic Discrete Variable Automatic Computer), 1952




    2                                   Processor Architecure and ARM
CPU + Memory




                     address
                                             200
                                             PC
        memory       data
                                             CPU
200   ADD r5,r1,r3                    ADD IR
                                           r5,r1,r3




3                      Processor Architecure and ARM
Harvard Architecture


                     address
     data memory
                      data                      PC
                                               CPU
                     address

    program memory    data




4                    Processor Architecure and ARM
Princeton Arch. vs. Harvard Arch.
       From programmer’s perspective, general purpose computers appear to be
        Princeton machines

       However, modern high-performance CPUs are, at their heart, frequently
        designed in Harvard architecture, with added hardware outside the CPU to
        create the appearance of a Princeton design.

       Harvard can’t use self-modifying code.

       Harvard allows two simultaneous memory fetches.

       Most DSPs use Harvard architecture for streaming data:
           greater memory bandwidth;
           more predictable bandwidth.




    5                                     Processor Architecure and ARM
Instruction Set Architecture (ISA)
       Instruction set architecture (ISA)
           interface between hardware and software
           Express operations visible to the programmer or compiler
            writer
           Portion of the computer visible to software

       ISA is supported by:
           Organization of programmable storage
           Data type &data structures: encodings & representations
           Addressing modes for data and instructions
           Instruction formats
           Instruction/opcode set
           Exceptional conditions


    6                              Processor Architecure and ARM
Application Considerations in ISA Design
       Desktops
           Emphasis on performance with integer and floating-point (FP) data types. Little
            regard for program size or power consumption
       Servers
           Primarily used for databases, file servers, web applications & multi-user time-
            sharing. Performance on integers & character strings is important. However, FP
            instructions are virtually in every server processor
       Embedded systems
           Emphasis on low cost and low power  small code size. Some instruction types,
            eg., FP, may be optional to reduce chip costs.




    7                                      Processor Architecure and ARM
Processor Internal Storage vs. ISA
       Internal storage type serves for the most basic differentiation
       Classes of ISAs based on operand storage type
           Stack architecture: operands implicitly on top of stack
           Accumulator architecture: 1 operand implicitly in accumulator
           General-purpose register (GPR) architecture: only explicit operands
           Extended accumulator or special-purpose register architecture:
            restrictions on using special registers




    8                                   Processor Architecure and ARM
General-Purpose Register Architecture
       Three possible choices for GPR arch.
           Register-memory architecture: memory access can be part of
            any instruction (memory operands)
           Register-register (load-store) architecture: only load & store
            instructions can access memory
               Almost all designs after 1980
           Memory-memory architecture: all operands in memory
               Not normally available nowadays




    9                                    Processor Architecure and ARM
ISA Classification
       Based on internal storage in processor
            (a) Stack         (b) Accumulator      (c) Register-memory   (d) Register-register

Processor

            TOS




                        ALU          ALU                  ALU                    ALU




  Mem




       10                                   Processor Architecure and ARM
Comparison of ISAs
     Code sequences for “C=A+B”
Stack          Accumulator     Register-memory                 Register-register
Push A         Load A          Load R1, A                      Load R1, A
Push B         Add B           Add R3, R1, B                   Load R2, B
Add            Store C         Store R3, C                     Add R3, R1, R2
Pop C                                                          Store R3, C




     Implicit operands in Stack/Accumulator arch.
     Less flexibility of execution order in Stack arch.


    11                         Processor Architecure and ARM
Instruction Set Complexity
    Depends on:
        Number of instructions
        Instruction formats
        Data formats
        Addressing modes
        General-purpose registers (number and size)
        Flow-control mechanisms (conditionals, exceptions)

    Instruction set characteristics
        Fixed vs. variable length.
        Addressing modes.
        Number of operands.
        Types of operands.


    12                                Processor Architecure and ARM
CISC vs. RISC
    Complex instruction set computer (CISC):
        many addressing modes;
            can directly operate on operands in memory
        many operations.
        variable instruction length
        Examples: Intel x86 microprocessors and compatibles

    Reduced instruction set computer (RISC):
        load/store;
            operands in memory must be first loaded into register before any
             operation
        fixed instruction length (in general)
        pipelinable instructions.
        examples: ARM, MIPS, Sun Sparc, PowerPC, …

    13                                Processor Architecure and ARM
Exploit ILP: Superscalar vs. VLIW
    RISC pipeline executes one instruction per clock cycle (usually).


    Based on complex hardware design: superscalar machines issue/execute
     multiple instructions per clock cycle.
        Faster execution.
        More variability in execution times.
        More expensive CPU.


    VLIW machines rely on sophisticated compiler to identify ILP and statically
     schedule parallel instructions




    14                                   Processor Architecure and ARM
Finding Parallelism
    Independent operations can be performed in parallel:
     ADD r0, r0, r1
     ADD r2, r2, r3
     ADD r6, r4, r0                     r0         r1           r2        r3

                                               +                     +
    Register renaming:
     ADD r10, r0, r1                                            r4       r2
     ADD r11, r2, r3                          r0
     ADD r12, r4, r10                                       +
                                                      r6



    15                      Processor Architecure and ARM
Order of Execution
    In-order:
        Instructions are issued/executed in the program order
        Machine stops issuing instructions when the next instruction
         can’t be dispatched.


    Out-of-order:
        Instructions are eligible for issue/execution once source
         operands become available
        Machine will change order of instructions to keep dispatching.
        Substantially faster but also more complex.



    16                           Processor Architecure and ARM
What is VLIW?
    VLIW: very long instruction word
    A VLIW instruction consists of several operations to be
     executed in parallel
    Parallel function units with shared register file:
                            register file


         function    function          function             ...   function
           unit        unit              unit                       unit


                    instruction decode and memory

    17                          Processor Architecure and ARM
VLIW Cluster
    Organized into clusters to accommodate available
     register bandwidth:




         cluster           cluster             ...         cluster




    18                     Processor Architecure and ARM
VLIW and Compilers
    VLIW requires considerably more sophisticated compiler
     technology than traditional architectures---must be able
     to extract parallelism to keep the instructions full.
    Many VLIWs have good compiler support.

    Contemporary VLIW processors
        TriMedia media processors by NXP (formerly Philips
         Semiconductors),
        SHARC DSP by Analog Devices,
        C6000 DSP family by Texas Instruments, and
        STMicroelectronics ST200 family based on the Lx architecture.


    19                          Processor Architecure and ARM
Static Scheduling



a         b            e           f                  a         b         e


     c                      g                         f         c     nop


         d                                            d         g     nop


         expressions                                       instructions
20                         Processor Architecure and ARM
Limits in VLIW
    VLIW (at least the original forms) has several short-
     comings that precluded it from becoming mainstream:
        VLIW instruction sets are not backward compatible between
         implementations. As wider implementations (more execution
         units) are built, the instruction set for the wider machines is
         not backward compatible with older, narrower
         implementations.

        Load responses from a memory hierarchy which includes CPU
         caches and DRAM do not give a deterministic delay of when
         the load response returns to the processor. This makes static
         scheduling of load instructions by the compiler very difficult.


    21                            Processor Architecure and ARM
EPIC
    EPIC = Explicitly parallel instruction computing.

    Used in Intel/HP Merced (IA-64) machine.

    Incorporates several features to allow machine to find,
     exploit increased parallelism.
        Each group of multiple software instructions is called a bundle.
         Each of the bundles has information indicating if this set of
         operations is depended upon by the subsequent bundle.
        A speculative load instruction is used as a type of data prefetch.
        A check load instruction also aids speculative loads by checking
         that a load was not dependent on a previous store.

    22                            Processor Architecure and ARM
IA-64 Instruction Format
    Instructions are bundled with tag to indicate which
     instructions can be executed in parallel:




                               128 bits

         tag   instruction 1   instruction 2              instruction 3




    23                         Processor Architecure and ARM
Assembly Language
    One-to-one with instructions (more or less).
    Basic features:
        One instruction per line.
        Labels provide names for addresses (usually in first column).
        Instructions often start in later columns.
        Columns run to end of line.




    24                            Processor Architecure and ARM
ARM Instruction Set
    ARM versions.
    ARM assembly language.
    ARM programming model.
    ARM data operations.
    ARM flow of control.




    25                  Processor Architecure and ARM
ARM Versions
    ARM architecture has been extended over several
     versions.
    Latest version: ARM11
    We will concentrate on ARM7.




    26                     Processor Architecure and ARM
ARM Assembly Language Example
    Fairly standard assembly language:
     label1     ADR   r4,c
                LDR   r0,[r4] ; a comment
                ADR   r4,d
                LDR   r1,[r4]
                SUB   r0,r0,r1 ; comment
                   destination




    27                       Processor Architecure and ARM
Pseudo-ops
    Some assembler directives don’t correspond directly to
     instructions:
        Define current address.
        Reserve storage.
        Constants.




    28                             Processor Architecure and ARM
ARM Instruction Set Format




              From ARM710T datasheet
29                   Processor Architecure and ARM
ARM Data Types
    Word is 32 bits long.

    Word can be divided into four 8-bit bytes.

    ARM addresses can be 32 bits long.

    Address refers to byte.
        Address 4 starts at byte 4.

    Can be configured at power-up as either little- or big-
     endian mode.


    30                            Processor Architecure and ARM
Endianness
    Endianness: ordering of bytes within a larger object, e.g.,
     word, i.e., how a large object is stored in memory
    68000 is a BIG Endian processor
                                Memory
                                          0x00..00

                                          0x00..10
             Big Endian                                  Little Endian
                                          0x00..13




3        2   1   0                                                3      2    1     0
                                          0xffffffff
    register                                                             register
    31                        Processor Architecure and ARM
ARM Programming Model


       r0           r8
       r1           r9                               0
                                       31
       r2          r10
       r3          r11                        CPSR
       r4          r12
       r5          r13
       r6          r14                  NZCV
       r7        r15 (PC)



32            Processor Architecure and ARM
The Program Status Registers (CPSR and SPSRs)
              31        28                                                       8           4          0

              N Z CV                                                                 I F T       Mode


    Copies of the ALU status flags (latched if the
    instruction has the "S" bit set).

*     Condition Code Flags                                *    Interrupt Disable bits.
      N = Negative result from ALU flag.                       I = 1, disables the IRQ.
      Z = Zero result from ALU flag.                           F = 1, disables the FIQ.
      C = ALU operation Carried out
      V = ALU operation oVerflowed                        *    T Bit    (Architecture v4T only)
                                                               T = 0, Processor in ARM state
*    Mode Bits                                                 T = 1, Processor in Thumb state
     M[4:0] define the processor mode.




     33                                          Processor Architecure and ARM
Processor Modes
    The ARM has six operating modes:
        User (16) (unprivileged mode under which most tasks run)
        FIQ (17) (entered when a high priority (fast) interrupt is raised)
        IRQ (18) (entered when a low priority (normal) interrupt is
         raised)
        Supervisor (19) (entered on reset and when a Software
         Interrupt instruction is executed)
        Abort (23) (used to handle memory access violations)
        Undef (27) (used to handle undefined instructions)
    ARM Architecture Version 4 adds a seventh mode:
        System (31) (privileged mode using the same registers as user
         mode)
    34                            Processor Architecure and ARM
Condition Flags

            Logical Instruction              Arithmetic Instruction

 Flag

 Negative   No meaning                       Bit 31 of the result has been set
 (N=‘1’)                                     Indicates a negative number in
                                             signed operations

 Zero       Result is all zeroes             Result of operation was zero
 (Z=‘1’)

 Carry      After Shift operation            Result was greater than 32 bits
 (C=‘1’)    ‘1’ was left in carry flag

 oVerflow   No meaning                       Result was greater than 31 bits
 (V=‘1’)                                     Indicates a possible corruption of
                                             the sign bit in signed
                                             numbers

35                          Processor Architecure and ARM
Conditional Execution
        Most instruction sets only allow branches to be executed
         conditionally.
        However by reusing the condition evaluation hardware, ARM
         effectively increases number of instructions.
            All instructions contain a condition field which determines whether
             the CPU will execute them.
            Non-executed instructions soak up 1 cycle.
                  Still have to complete cycle so as to allow fetching and decoding of
                   following instructions.
        This removes the need for many branches, which stall the
         pipeline (3 cycles to refill).
            Allows very dense in-line code, without branches.
            The Time penalty of not executing several conditional instructions
             is frequently less than overhead of the branch
             or subroutine call that would otherwise be needed.


    36                                    Processor Architecure and ARM
The Condition Field
             31       28           24    20        16         12        8        4            0


              Cond


0000 = EQ - Z set (equal)                               1001 = LS - C clear or Z set
                                                              (unsigned lower or same)
0001 = NE - Z clear (not equal)
0010 = HS / CS - C set (unsigned                        1010 = GE - N set and V set, or N
       higher or same)                                        clear and V clear (>or =)
0011 = LO / CC - C clear                                1011 = LT - N set and V clear, or N
       (unsigned lower)                                       clear and V set (>)
0100 = MI -N set (negative)                             1100 = GT - Z clear, and either N set
0101 = PL - N clear (positive or                              and V set, or N clear and V set
       zero)                                                  (>)
0110 = VS - V set (overflow)                            1101 = LE - Z set, or N set and V
0111 = VC - V clear (no overflow)                             clear,or N clear and V set (<,
                                                              or =)
1000 = HI - C set and Z clear
       (unsigned higher)                                1110 = AL - always
                                                        1111 = NV - reserved.

37                                      Processor Architecure and ARM
Using and updating the Condition Field
        To execute an instruction conditionally, simply postfix it with
         the appropriate condition:
            For example an add instruction takes the form:
                  ADD r0,r1,r2            ; r0 = r1 + r2 (ADDAL)
            To execute this only if the zero flag is set:
                  ADDEQ r0,r1,r2          ; If zero flag set then…
                                           ; ... r0 = r1 + r2
        By default, data processing operations do not affect the
         condition flags (apart from the comparisons where this is the
         only effect). To cause the condition flags to be updated, the S
         bit of the instruction needs to be set by postfixing the
         instruction (and any condition code) with an “S”.
            For example to add two numbers and set the condition flags:
                  ADDS r0,r1,r2           ; r0 = r1 + r2
                                           ; ... and set flags


    38                                  Processor Architecure and ARM
Data processing Instructions
        Largest family of ARM instructions, all sharing the same
         instruction format.
        Contains:
            Arithmetic operations
            Comparisons (no results - just set condition codes)
            Logical operations
            Data movement between registers
        Remember, this is a load / store architecture
            These instruction only work on registers, NOT memory.
        They each perform a specific operation on one or two
         operands.
            First operand always a register - Rn
            Second operand sent to the ALU via barrel shifter.
        We will examine the barrel shifter shortly.

    39                                Processor Architecure and ARM
Arithmetic Operations
        Operations are:
            ADD       operand1 + operand2
            ADC       operand1 + operand2 + carry
            SUB       operand1 - operand2
            SBC       operand1 - operand2 + carry -1
            RSB       operand2 - operand1
            RSC       operand2 - operand1 + carry – 1
        Syntax:
            <Operation>{<cond>}{S} Rd, Rn, Operand2
        Examples
            ADD r0, r1, r2
            SUBGT r3, r3, #1
            RSBLES r4, r5, #5

    40                              Processor Architecure and ARM
Multiplication Instructions
        The Basic ARM provides two multiplication instructions.
        Multiply
            MUL{<cond>}{S} Rd, Rm, Rs                 ; Rd = Rm * Rs
        Multiply Accumulate               - does addition for free
            MLA{<cond>}{S} Rd, Rm, Rs,Rn              ; Rd = (Rm * Rs) + Rn
        Restrictions on use:
            Rd and Rm cannot be the same register
                  Can be avoid by swapping Rm and Rs around. This works because
                   multiplication is commutative.
            Cannot use PC.
         These will be picked up by the assembler if overlooked.
        Operands can be considered signed or unsigned
            Up to user to interpret correctly.

    41                                  Processor Architecure and ARM
Comparisons
        The only effect of the comparisons is to
            UPDATE THE CONDITION FLAGS Thus no need to set S bit.
                                  FLAGS.
        Operations are:
            CMP      operand1 - operand2, but result not written
            CMN      operand1 + operand2, but result not written
            TST      operand1 AND operand2, but result not written
            TEQ      operand1 EOR operand2, but result not written
        Syntax:
            <Operation>{<cond>} Rn, Operand2
        Examples:
            CMP      r0, r1
            TSTEQ    r2, #5

    42                             Processor Architecure and ARM
Logical Operations
        Operations are:
            AND     operand1 AND operand2
            EOR     operand1 EOR operand2
            ORR     operand1 OR operand2
            BIC     operand1 AND NOT operand2 [ie bit clear]
        Syntax:
            <Operation>{<cond>}{S} Rd, Rn, Operand2
        Examples:
            AND     r0, r1, r2
            BICEQ   r2, r3, #7
            EORS    r1,r3,r0

    43                            Processor Architecure and ARM
Data Movement
        Operations are:
            MOV     operand2
            MVN     NOT operand2
         Note that these make no use of operand1.
        Syntax:
            <Operation>{<cond>}{S} Rd, Operand2
        Examples:
            MOV   r0, r1
            MOVS r2, #10
            MVNEQ r1,#0


    44                           Processor Architecure and ARM
The Barrel Shifter
        The ARM doesn’t have actual shift instructions.

        Instead it has a barrel shifter which provides a
         mechanism to carry out shifts as part of other
         instructions.

        So what operations does the barrel shifter support?




    45                          Processor Architecure and ARM
Barrel Shifter - Left Shift
        Shifts left by the specified amount (multiplies by powers
         of two) e.g.
            LSL #5 = multiply by 32




                          Logical Shift Left (LSL)



               CF                 Destination                   0




    46                          Processor Architecure and ARM
Barrel Shifter - Right Shifts
Logical Shift Right
• Shiftsright by the
 specified amount (divides
                                              Logical Shift Right
 by powers of two) e.g.
 LSR #5 = divide by 32             ...0           Destination            CF


Arithmetic Shift Right
• Shifts
       right (divides by
 powers of two) and                             Arithmetic Shift Right
 preserves the sign bit, for
 2's complement
 operations. e.g.                                 Destination            CF
 ASR #5 = divide by 32
                               Sign bit shifted in


 47                                Processor Architecure and ARM
Barrel Shifter - Rotations
Rotate Right (ROR)                                 Rotate Right
• Similar to an ASR but the bits
  wrap around as they leave the
  LSB and appear as the MSB.                      Destination            CF
    e.g. ROR #5
•   Note the last bit rotated is also
    used as the Carry Out.


Rotate Right Extended (RRX)
• This operation uses the CPSR C
  flag as a 33rd bit.                       Rotate Right through Carry
•   Rotates right by 1 bit. Encoded
    as ROR #0.
                                                   Destination           CF



     48                                 Processor Architecure and ARM
Barrel Shifter
   Barrel shifter: a hardware device that can shift or rotate a data word by any number of bits in
    a single operation. It is implemented like a multiplexor, each output can be connected to any
    input depending on the shift distance.




ECE 692 L02-ISA.49                                                          Processor Architecure and
                                                                                                ARM
Using the Barrel Shifter: the Second
Operand

Operand   Operand                        Register, optionally with shift
   1         2                            operation applied.
                                         Shift value can be either be:
                                             5 bit unsigned integer
             Barrel                          Specified in bottom byte of
                                              another register.
             Shifter
                                     * Immediate value
                                        • 8 bit number
                                        • Can be rotated right through
                                          an even number of
       ALU                                positions.
                                        • Assembler will calculate
                                          rotate for you from
                                          constant.
      Result
50                     Processor Architecure and ARM
Second Operand : Shifted Register
    The amount by which the register is to be shifted is
     contained in either:
        the immediate 5-bit field in the instruction
              NO OVERHEAD
              Shift is done for free - executes in single cycle.
        the bottom byte of a register (not PC)
              Then takes extra cycle to execute
              ARM doesn’t have enough read ports to read 3 registers at
               once.
              Then same as on other processors where shift is
               separate instruction.
    If no shift is specified then a default shift is applied: LSL
     #0
        i.e. barrel shifter has no effect on value in register.




51                                  Processor Architecure and ARM
Second Operand : Using a Shifted Register
        Using a multiplication instruction to multiply by a constant means first
         loading the constant into a register and then waiting a number of internal
         cycles for the instruction to complete.
        A more optimum solution can often be found by using some combination
         of MOVs, ADDs, SUBs and RSBs with shifts.
            Multiplications by a constant equal to a ((power of 2) ± 1) can be done in one
             cycle.
        Example: r0 = r1 * 5
         Example: r0 = r1 + (r1 * 4)
                      ADD r0, r1, r1, LSL #2
        Example: r2 = r3 * 105
         Example: r2 = r3 * 15 * 7
         Example: r2 = r3 * (16 - 1) * (8 - 1)
                      RSB r2, r3, r3, LSL #4 ; r2 = r3 * 15
                      RSB r2, r2, r2, LSL #3 ; r2 = r2 * 7


    52                                    Processor Architecure and ARM
ARM Load/Store Instructions
    LDR, LDRH, LDRB : load (half-word, byte)
    STR, STRH, STRB : store (half-word, byte)
    Addressing modes:
        register indirect : LDR r0,[r1]
        with second register : LDR r0,[r1,-r2]
        with constant : LDR r0,[r1,#4]




    53                        Processor Architecure and ARM
ARM ADR Pseudo-op
    Cannot refer to an address directly in an instruction.
    Generate value by performing arithmetic on PC.
    ADR pseudo-op generates instruction required to
     calculate address:
     ADR r1,FOO




    54                       Processor Architecure and ARM
Additional addressing modes
    Base-plus-offset addressing:
     LDR r0,[r1,#16]
      Loads from location r1+16
    Auto-indexing increments base register:
     LDR r0,[r1,#16]!
    Post-indexing fetches, then does offset:
     LDR r0,[r1],#16
      Loads r0 from r1, then adds 16 to r1.




    55                         Processor Architecure and ARM
Example: C Assignments
    C:
     x = (a + b) - c;
    Assembler:
     ADR   r4,a                 ; get address for a
     LDR   r0,[r4]    ;   get value of a
     ADR   r4,b                 ; get address for b, reusing r4
     LDR   r1,[r4]    ;   get value of b
     ADD   r3,r0,r1   ;   compute a+b
     ADR   r4,c                 ; get address for c
     LDR   r2,[r4]    ;   get value of c
     SUB   r3,r3,r2   ;   complete computation of x
     ADR   r4,x                 ; get address for x
     STR   r3,[r4]    ;   store value of x




    56                               Processor Architecure and ARM
Example: C Assignment
    C:
     y = a*(b+c);
    Assembler:
     ADR   r4,b ; get address for b
     LDR   r0,[r4] ; get value of b
     ADR   r4,c ; get address for c
     LDR   r1,[r4] ; get value of c
     ADD   r2,r0,r1 ; compute partial result
     ADR   r4,a ; get address for a
     LDR   r0,[r4] ; get value of a
     MUL   r2,r2,r0 ; compute final value for y
     ADR   r4,y ; get address for y
     STR   r2,[r4] ; store y




    57                           Processor Architecure and ARM
Example: C Assignment
    C:
     z = (a << 2) |    (b & 15);

    Assembler:
     ADR   r4,a ; get address for a
     LDR   r0,[r4] ; get value of a
     MOV   r0,r0,LSL 2 ; perform shift
     ADR   r4,b ; get address for b
     LDR   r1,[r4] ; get value of b
     AND   r1,r1,#15 ; perform AND
     ORR   r1,r0,r1 ; perform OR
     ADR   r4,z ; get address for z
     STR   r1,[r4] ; store value for z




    58                             Processor Architecure and ARM
ARM Flow of Control
    All operations can be performed conditionally, testing
     CPSR:
        EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS,
         GE, LT, GT, LE
    Branch operation:
     B #100
      Can be performed conditionally.




    59                        Processor Architecure and ARM
Example: if Statement
    C:
     if (a < b) { x = 5; y = c + d; } else x = c - d;



    Assembler:
; compute and test condition
          ADR r4,a ; get address for a
          LDR r0,[r4] ; get value of a
          ADR r4,b ; get address for b
          LDR r1,[r4] ; get value for b
          CMP r0,r1 ; compare a < b
          BGE fblock ; if a >= b, branch to false block




    60                         Processor Architecure and ARM
if Statement, cont’d.
; true block
        MOV r0,#5 ; generate value for x
        ADR r4,x ; get address for x
        STR r0,[r4] ; store x
        ADR r4,c ; get address for c
        LDR r0,[r4] ; get value of c
        ADR r4,d ; get address for d
        LDR r1,[r4] ; get value of d
        ADD r0,r0,r1 ; compute y
        ADR r4,y ; get address for y
        STR r0,[r4] ; store y
        B after ; branch around false block
; false block
fblock ADR r4,c ; get address for c
        LDR r0,[r4] ; get value of c
        ADR r4,d ; get address for d
        LDR r1,[r4] ; get value for d
        SUB r0,r0,r1 ; compute a-b
        ADR r4,x ; get address for x
        STR r0,[r4] ; store value of x
after ...

 61                         Processor Architecure and ARM
Conditional Instruction Implementation
; compute and test condition
   ADR r4,a ; get address for a
   LDR r0,[r4] ; get value of a
   ADR r4,b ; get address for b
   LDR r1,[r4] ; get value for b
   CMP r0,r1 ; compare a < b
; true block
   MOVLT r0,#5 ; generate value for x
   ADRLT r4,x ; get address for x
   STRLT r0,[r4] ; store x
   ADRLT r4,c ; get address for c
   LDRLT r0,[r4] ; get value of c
   ADRLT r4,d ; get address for d
   LDRLT r1,[r4] ; get value of d
   ADDLT r0,r0,r1 ; compute y
   ADRLT r4,y ; get address for y
   STRLT r0,[r4] ; store y
; false block
   ADRGE r4,c ; get address for c
   LDRGE r0,[r4] ; get value of c
   ADRGE r4,d ; get address for d
   LDRGE r1,[r4] ; get value for d
   SUBGE r0,r0,r1 ; compute a-b
   ADRGE r4,x ; get address for x
   STRGE r0,[r4] ; store value of x
 62                                Processor Architecure and ARM
Example: switch Statement
    C:
     switch (test) { case 0: … break; case 1: … }

    Assembler:
     ADR r2,test ; get address for test
     LDR r0,[r2] ; load value for test
     ADR r1,switchtab ; load address for switch table
     LDR r1,[r1,r0,LSL #2] ; index switch table



switchtab         DCD case0
                  DCD case1
...




    63                         Processor Architecure and ARM
Example: FIR filter
    C:
     for (i=0, f=0; i<N; i++)
       f = f + c[i]*x[i];

    Assembler
; loop initiation code
     MOV r0,#0 ; use r0 for I
     MOV r8,#0 ; use separate index for arrays
     ADR r2,N ; get address for N
     LDR r1,[r2] ; get value of N
     MOV r2,#0 ; use r2 for f
     ADR r3,c ; load r3 with base of c
     ADR r5,x ; load r5 with base of x




    64                          Processor Architecure and ARM
FIR filter, cont’.d
; loop body
loop   LDR r4,[r3,r8] ; get c[i]
       LDR r6,[r5,r8] ; get x[i]
       MUL r4,r4,r6 ; compute c[i]*x[i]
       ADD r2,r2,r4 ; add into running sum
       ADD r8,r8,#4 ; add one word offset to array index
       ADD r0,r0,#1 ; add 1 to i
       CMP r0,r1 ; exit?
       BLT loop ; if i < N, continue




 65                        Processor Architecure and ARM
ARM Subroutine Linkage
    Branch and link instruction:
     BL foo
      Copies current PC to r14.

    To return from subroutine:
     MOV r15,r14




    66                        Processor Architecure and ARM
Summary
    All instructions are 32 bits long.
    Load/store architecture
        Data processing instructions act only on registers
        Specific memory access instructions with powerful auto-
         indexing addressing modes.
    Most instructions operate in single cycle.
        Some multi-register operations take longer.
    All instructions can be executed conditionally.




    67                           Processor Architecure and ARM

Mais conteúdo relacionado

Mais procurados

Arm assembly language programming
Arm assembly language programmingArm assembly language programming
Arm assembly language programmingv Kalairajan
 
Calculator design with lcd using fpga
Calculator design with lcd using fpgaCalculator design with lcd using fpga
Calculator design with lcd using fpgaHossam Hassan
 
Instruction cycle with interrupts
Instruction cycle with interruptsInstruction cycle with interrupts
Instruction cycle with interruptsShubham Jain
 
Register transfer and micro-operation
Register transfer and micro-operationRegister transfer and micro-operation
Register transfer and micro-operationNikhil Pandit
 
The ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM ArchitectureThe ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM Architecturesreea4
 
Embedded System Tools ppt
Embedded System Tools  pptEmbedded System Tools  ppt
Embedded System Tools pptHalai Hansika
 
ARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introductionARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introductionanand hd
 
Digital Electronics Question Bank
Digital Electronics Question BankDigital Electronics Question Bank
Digital Electronics Question BankMathankumar S
 
Programmable Logic Devices
Programmable Logic DevicesProgrammable Logic Devices
Programmable Logic DevicesMadhusudan Donga
 
Microprogram Control
Microprogram Control Microprogram Control
Microprogram Control Anuj Modi
 
Control Units : Microprogrammed and Hardwired:control unit
Control Units : Microprogrammed and Hardwired:control unitControl Units : Microprogrammed and Hardwired:control unit
Control Units : Microprogrammed and Hardwired:control unitabdosaidgkv
 
Ch 1 introduction to Embedded Systems (AY:2018-2019--> First Semester)
Ch 1 introduction to Embedded Systems (AY:2018-2019--> First Semester)Ch 1 introduction to Embedded Systems (AY:2018-2019--> First Semester)
Ch 1 introduction to Embedded Systems (AY:2018-2019--> First Semester)Moe Moe Myint
 
embedded systems ppt 2
embedded systems ppt 2embedded systems ppt 2
embedded systems ppt 2pavan kumar
 

Mais procurados (20)

Arm assembly language programming
Arm assembly language programmingArm assembly language programming
Arm assembly language programming
 
Calculator design with lcd using fpga
Calculator design with lcd using fpgaCalculator design with lcd using fpga
Calculator design with lcd using fpga
 
ARM Processor
ARM ProcessorARM Processor
ARM Processor
 
8255 Programmable parallel I/O
8255 Programmable parallel I/O 8255 Programmable parallel I/O
8255 Programmable parallel I/O
 
Instruction cycle with interrupts
Instruction cycle with interruptsInstruction cycle with interrupts
Instruction cycle with interrupts
 
Register transfer and micro-operation
Register transfer and micro-operationRegister transfer and micro-operation
Register transfer and micro-operation
 
The ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM ArchitectureThe ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM Architecture
 
Arm architecture
Arm architectureArm architecture
Arm architecture
 
Embedded System Tools ppt
Embedded System Tools  pptEmbedded System Tools  ppt
Embedded System Tools ppt
 
ARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introductionARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introduction
 
Digital Electronics Question Bank
Digital Electronics Question BankDigital Electronics Question Bank
Digital Electronics Question Bank
 
Interfacing of LCD with LPC2148
Interfacing of LCD with LPC2148Interfacing of LCD with LPC2148
Interfacing of LCD with LPC2148
 
ADDRESSING MODES
ADDRESSING MODESADDRESSING MODES
ADDRESSING MODES
 
loaders and linkers
 loaders and linkers loaders and linkers
loaders and linkers
 
Programmable Logic Devices
Programmable Logic DevicesProgrammable Logic Devices
Programmable Logic Devices
 
Microprogram Control
Microprogram Control Microprogram Control
Microprogram Control
 
Control Units : Microprogrammed and Hardwired:control unit
Control Units : Microprogrammed and Hardwired:control unitControl Units : Microprogrammed and Hardwired:control unit
Control Units : Microprogrammed and Hardwired:control unit
 
Ch 1 introduction to Embedded Systems (AY:2018-2019--> First Semester)
Ch 1 introduction to Embedded Systems (AY:2018-2019--> First Semester)Ch 1 introduction to Embedded Systems (AY:2018-2019--> First Semester)
Ch 1 introduction to Embedded Systems (AY:2018-2019--> First Semester)
 
embedded systems ppt 2
embedded systems ppt 2embedded systems ppt 2
embedded systems ppt 2
 
Unit vi (1)
Unit vi (1)Unit vi (1)
Unit vi (1)
 

Destaque

Processor powerpoint
Processor powerpointProcessor powerpoint
Processor powerpointbrennan_jame
 
CDA4411: Chapter 4 - Processor Technology and Architecture
CDA4411: Chapter 4 - Processor Technology and ArchitectureCDA4411: Chapter 4 - Processor Technology and Architecture
CDA4411: Chapter 4 - Processor Technology and ArchitectureFreddy San
 
Chapter 04 the processor
Chapter 04   the processorChapter 04   the processor
Chapter 04 the processorBảo Hoang
 
Basic circuit or cad
Basic circuit or cadBasic circuit or cad
Basic circuit or cadanishgoel
 
8051 Microcontroller Timer
8051 Microcontroller Timer8051 Microcontroller Timer
8051 Microcontroller Timeranishgoel
 
Llpc2148 sci
Llpc2148 sciLlpc2148 sci
Llpc2148 scianishgoel
 
8051 Microcontroller I/O ports
8051 Microcontroller I/O ports8051 Microcontroller I/O ports
8051 Microcontroller I/O portsanishgoel
 
Xilinx lca and altera flex
Xilinx lca and altera flexXilinx lca and altera flex
Xilinx lca and altera flexanishgoel
 
Embedded systems ppt iv part c
Embedded systems ppt iv   part cEmbedded systems ppt iv   part c
Embedded systems ppt iv part canishgoel
 
Serial Communication Interfaces
Serial Communication InterfacesSerial Communication Interfaces
Serial Communication Interfacesanishgoel
 
Processors - an overview
Processors - an overviewProcessors - an overview
Processors - an overviewLorenz Lo Sauer
 
8086 instruction set with types
8086 instruction set with types8086 instruction set with types
8086 instruction set with typesRavinder Rautela
 
Processor architecture design using 3 d integration technologies
Processor architecture design using 3 d integration technologiesProcessor architecture design using 3 d integration technologies
Processor architecture design using 3 d integration technologiesAvinash Reddy Penugonda
 

Destaque (20)

Processor powerpoint
Processor powerpointProcessor powerpoint
Processor powerpoint
 
CDA4411: Chapter 4 - Processor Technology and Architecture
CDA4411: Chapter 4 - Processor Technology and ArchitectureCDA4411: Chapter 4 - Processor Technology and Architecture
CDA4411: Chapter 4 - Processor Technology and Architecture
 
Presentation1(1)
Presentation1(1)Presentation1(1)
Presentation1(1)
 
Chapter 04 the processor
Chapter 04   the processorChapter 04   the processor
Chapter 04 the processor
 
Basic circuit or cad
Basic circuit or cadBasic circuit or cad
Basic circuit or cad
 
8051 Microcontroller Timer
8051 Microcontroller Timer8051 Microcontroller Timer
8051 Microcontroller Timer
 
Llpc2148 sci
Llpc2148 sciLlpc2148 sci
Llpc2148 sci
 
PLD's
PLD'sPLD's
PLD's
 
8051 Microcontroller I/O ports
8051 Microcontroller I/O ports8051 Microcontroller I/O ports
8051 Microcontroller I/O ports
 
Parallel processing extra
Parallel processing extraParallel processing extra
Parallel processing extra
 
Xilinx lca and altera flex
Xilinx lca and altera flexXilinx lca and altera flex
Xilinx lca and altera flex
 
Embedded systems ppt iv part c
Embedded systems ppt iv   part cEmbedded systems ppt iv   part c
Embedded systems ppt iv part c
 
Gre edited
Gre editedGre edited
Gre edited
 
Cpld fpga
Cpld fpgaCpld fpga
Cpld fpga
 
Lpc2148 i2c
Lpc2148 i2cLpc2148 i2c
Lpc2148 i2c
 
Serial Communication Interfaces
Serial Communication InterfacesSerial Communication Interfaces
Serial Communication Interfaces
 
Processors - an overview
Processors - an overviewProcessors - an overview
Processors - an overview
 
8086 instruction set with types
8086 instruction set with types8086 instruction set with types
8086 instruction set with types
 
Vme
VmeVme
Vme
 
Processor architecture design using 3 d integration technologies
Processor architecture design using 3 d integration technologiesProcessor architecture design using 3 d integration technologies
Processor architecture design using 3 d integration technologies
 

Semelhante a Arm

Arm processors' architecture
Arm processors'   architectureArm processors'   architecture
Arm processors' architectureDr.YNM
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architectureZakaria Gomaa
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architectureTaha Malampatti
 
2 introduction to arm architecture
2 introduction to arm architecture2 introduction to arm architecture
2 introduction to arm architecturesatish1jisatishji
 
Unit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptxUnit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptxVijayKumar201823
 
ARM 7 and 9 Core Architecture Illustration
ARM 7 and 9 Core Architecture IllustrationARM 7 and 9 Core Architecture Illustration
ARM 7 and 9 Core Architecture IllustrationJason J Pulikkottil
 
SNAPDRAGON SoC Family and ARM Architecture
SNAPDRAGON SoC Family and ARM Architecture SNAPDRAGON SoC Family and ARM Architecture
SNAPDRAGON SoC Family and ARM Architecture Abdullaziz Tagawy
 
Microcontroller(18CS44) module 1
Microcontroller(18CS44)  module 1Microcontroller(18CS44)  module 1
Microcontroller(18CS44) module 1Swetha A
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Marina Kolpakova
 
EC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptxEC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptxdeviifet2015
 

Semelhante a Arm (20)

Processor types
Processor typesProcessor types
Processor types
 
Unit vi (2)
Unit vi (2)Unit vi (2)
Unit vi (2)
 
Module-2 Instruction Set Cpus.pdf
Module-2 Instruction Set Cpus.pdfModule-2 Instruction Set Cpus.pdf
Module-2 Instruction Set Cpus.pdf
 
Arm processors' architecture
Arm processors'   architectureArm processors'   architecture
Arm processors' architecture
 
Arm Lecture
Arm LectureArm Lecture
Arm Lecture
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architecture
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architecture
 
2 introduction to arm architecture
2 introduction to arm architecture2 introduction to arm architecture
2 introduction to arm architecture
 
Unit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptxUnit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptx
 
ARM 7 and 9 Core Architecture Illustration
ARM 7 and 9 Core Architecture IllustrationARM 7 and 9 Core Architecture Illustration
ARM 7 and 9 Core Architecture Illustration
 
18CS44-MODULE1-PPT.pdf
18CS44-MODULE1-PPT.pdf18CS44-MODULE1-PPT.pdf
18CS44-MODULE1-PPT.pdf
 
SNAPDRAGON SoC Family and ARM Architecture
SNAPDRAGON SoC Family and ARM Architecture SNAPDRAGON SoC Family and ARM Architecture
SNAPDRAGON SoC Family and ARM Architecture
 
ARM Architecture
ARM ArchitectureARM Architecture
ARM Architecture
 
Microcontroller(18CS44) module 1
Microcontroller(18CS44)  module 1Microcontroller(18CS44)  module 1
Microcontroller(18CS44) module 1
 
Arm arc-2016
Arm arc-2016Arm arc-2016
Arm arc-2016
 
arm-cortex-a8
arm-cortex-a8arm-cortex-a8
arm-cortex-a8
 
arm_3.ppt
arm_3.pptarm_3.ppt
arm_3.ppt
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...
 
Arm
ArmArm
Arm
 
EC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptxEC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptx
 

Mais de anishgoel

Computer Organization
Computer OrganizationComputer Organization
Computer Organizationanishgoel
 
Learning vhdl by examples
Learning vhdl by examplesLearning vhdl by examples
Learning vhdl by examplesanishgoel
 
Dot matrix module interface wit Raspberry Pi
Dot matrix module interface wit Raspberry PiDot matrix module interface wit Raspberry Pi
Dot matrix module interface wit Raspberry Pianishgoel
 
Input interface with Raspberry pi
Input interface with Raspberry piInput interface with Raspberry pi
Input interface with Raspberry pianishgoel
 
Learning Python for Raspberry Pi
Learning Python for Raspberry PiLearning Python for Raspberry Pi
Learning Python for Raspberry Pianishgoel
 
Raspberry Pi
Raspberry PiRaspberry Pi
Raspberry Pianishgoel
 
learning vhdl by examples
learning vhdl by exampleslearning vhdl by examples
learning vhdl by examplesanishgoel
 
Digital System Design Basics
Digital System Design BasicsDigital System Design Basics
Digital System Design Basicsanishgoel
 
digital design of communication systems
digital design of communication systemsdigital design of communication systems
digital design of communication systemsanishgoel
 
Rtos concepts
Rtos conceptsRtos concepts
Rtos conceptsanishgoel
 
Embedded systems ppt iv part d
Embedded systems ppt iv   part dEmbedded systems ppt iv   part d
Embedded systems ppt iv part danishgoel
 
Embedded systems ppt iv part b
Embedded systems ppt iv   part bEmbedded systems ppt iv   part b
Embedded systems ppt iv part banishgoel
 
Embedded systems ppt ii
Embedded systems ppt iiEmbedded systems ppt ii
Embedded systems ppt iianishgoel
 
Embedded systems ppt iii
Embedded systems ppt iiiEmbedded systems ppt iii
Embedded systems ppt iiianishgoel
 
Embedded systems ppt iv part a
Embedded systems ppt iv   part aEmbedded systems ppt iv   part a
Embedded systems ppt iv part aanishgoel
 
Embedded systems ppt i
Embedded systems ppt iEmbedded systems ppt i
Embedded systems ppt ianishgoel
 
Nios2 and ip core
Nios2 and ip coreNios2 and ip core
Nios2 and ip coreanishgoel
 
Keil tutorial
Keil tutorialKeil tutorial
Keil tutorialanishgoel
 
ARM 7 LPC 2148 lecture
ARM 7 LPC 2148 lectureARM 7 LPC 2148 lecture
ARM 7 LPC 2148 lectureanishgoel
 

Mais de anishgoel (20)

Computer Organization
Computer OrganizationComputer Organization
Computer Organization
 
Learning vhdl by examples
Learning vhdl by examplesLearning vhdl by examples
Learning vhdl by examples
 
Dot matrix module interface wit Raspberry Pi
Dot matrix module interface wit Raspberry PiDot matrix module interface wit Raspberry Pi
Dot matrix module interface wit Raspberry Pi
 
Input interface with Raspberry pi
Input interface with Raspberry piInput interface with Raspberry pi
Input interface with Raspberry pi
 
Learning Python for Raspberry Pi
Learning Python for Raspberry PiLearning Python for Raspberry Pi
Learning Python for Raspberry Pi
 
Raspberry Pi
Raspberry PiRaspberry Pi
Raspberry Pi
 
learning vhdl by examples
learning vhdl by exampleslearning vhdl by examples
learning vhdl by examples
 
Digital System Design Basics
Digital System Design BasicsDigital System Design Basics
Digital System Design Basics
 
digital design of communication systems
digital design of communication systemsdigital design of communication systems
digital design of communication systems
 
Rtos concepts
Rtos conceptsRtos concepts
Rtos concepts
 
Embedded systems ppt iv part d
Embedded systems ppt iv   part dEmbedded systems ppt iv   part d
Embedded systems ppt iv part d
 
Embedded systems ppt iv part b
Embedded systems ppt iv   part bEmbedded systems ppt iv   part b
Embedded systems ppt iv part b
 
Embedded systems ppt ii
Embedded systems ppt iiEmbedded systems ppt ii
Embedded systems ppt ii
 
Embedded systems ppt iii
Embedded systems ppt iiiEmbedded systems ppt iii
Embedded systems ppt iii
 
Embedded systems ppt iv part a
Embedded systems ppt iv   part aEmbedded systems ppt iv   part a
Embedded systems ppt iv part a
 
Embedded systems ppt i
Embedded systems ppt iEmbedded systems ppt i
Embedded systems ppt i
 
Nios2 and ip core
Nios2 and ip coreNios2 and ip core
Nios2 and ip core
 
Keil tutorial
Keil tutorialKeil tutorial
Keil tutorial
 
ESD Lab1
ESD Lab1ESD Lab1
ESD Lab1
 
ARM 7 LPC 2148 lecture
ARM 7 LPC 2148 lectureARM 7 LPC 2148 lecture
ARM 7 LPC 2148 lecture
 

Último

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 

Último (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

Arm

  • 1. Processor Architecture and Advanced RISC Machine Prof. Anish Goel
  • 2. von Neumann/Princeton Architecture  Memory holds both data and instructions.  Central processing unit (CPU) fetches instructions from memory.  Separate CPU and memory distinguishes stored-program computer.  CPU registers: program counter (PC), instruction register (IR), general- purpose registers, etc.  von Neumann machines also known as stored-program computers  Named after John von Neumann who wrote First Draft of a report on the EDVAC (Electronic Discrete Variable Automatic Computer), 1952 2 Processor Architecure and ARM
  • 3. CPU + Memory address 200 PC memory data CPU 200 ADD r5,r1,r3 ADD IR r5,r1,r3 3 Processor Architecure and ARM
  • 4. Harvard Architecture address data memory data PC CPU address program memory data 4 Processor Architecure and ARM
  • 5. Princeton Arch. vs. Harvard Arch.  From programmer’s perspective, general purpose computers appear to be Princeton machines  However, modern high-performance CPUs are, at their heart, frequently designed in Harvard architecture, with added hardware outside the CPU to create the appearance of a Princeton design.  Harvard can’t use self-modifying code.  Harvard allows two simultaneous memory fetches.  Most DSPs use Harvard architecture for streaming data:  greater memory bandwidth;  more predictable bandwidth. 5 Processor Architecure and ARM
  • 6. Instruction Set Architecture (ISA)  Instruction set architecture (ISA)  interface between hardware and software  Express operations visible to the programmer or compiler writer  Portion of the computer visible to software  ISA is supported by:  Organization of programmable storage  Data type &data structures: encodings & representations  Addressing modes for data and instructions  Instruction formats  Instruction/opcode set  Exceptional conditions 6 Processor Architecure and ARM
  • 7. Application Considerations in ISA Design  Desktops  Emphasis on performance with integer and floating-point (FP) data types. Little regard for program size or power consumption  Servers  Primarily used for databases, file servers, web applications & multi-user time- sharing. Performance on integers & character strings is important. However, FP instructions are virtually in every server processor  Embedded systems  Emphasis on low cost and low power  small code size. Some instruction types, eg., FP, may be optional to reduce chip costs. 7 Processor Architecure and ARM
  • 8. Processor Internal Storage vs. ISA  Internal storage type serves for the most basic differentiation  Classes of ISAs based on operand storage type  Stack architecture: operands implicitly on top of stack  Accumulator architecture: 1 operand implicitly in accumulator  General-purpose register (GPR) architecture: only explicit operands  Extended accumulator or special-purpose register architecture: restrictions on using special registers 8 Processor Architecure and ARM
  • 9. General-Purpose Register Architecture  Three possible choices for GPR arch.  Register-memory architecture: memory access can be part of any instruction (memory operands)  Register-register (load-store) architecture: only load & store instructions can access memory  Almost all designs after 1980  Memory-memory architecture: all operands in memory  Not normally available nowadays 9 Processor Architecure and ARM
  • 10. ISA Classification  Based on internal storage in processor (a) Stack (b) Accumulator (c) Register-memory (d) Register-register Processor TOS ALU ALU ALU ALU Mem 10 Processor Architecure and ARM
  • 11. Comparison of ISAs  Code sequences for “C=A+B” Stack Accumulator Register-memory Register-register Push A Load A Load R1, A Load R1, A Push B Add B Add R3, R1, B Load R2, B Add Store C Store R3, C Add R3, R1, R2 Pop C Store R3, C  Implicit operands in Stack/Accumulator arch.  Less flexibility of execution order in Stack arch. 11 Processor Architecure and ARM
  • 12. Instruction Set Complexity  Depends on:  Number of instructions  Instruction formats  Data formats  Addressing modes  General-purpose registers (number and size)  Flow-control mechanisms (conditionals, exceptions)  Instruction set characteristics  Fixed vs. variable length.  Addressing modes.  Number of operands.  Types of operands. 12 Processor Architecure and ARM
  • 13. CISC vs. RISC  Complex instruction set computer (CISC):  many addressing modes;  can directly operate on operands in memory  many operations.  variable instruction length  Examples: Intel x86 microprocessors and compatibles  Reduced instruction set computer (RISC):  load/store;  operands in memory must be first loaded into register before any operation  fixed instruction length (in general)  pipelinable instructions.  examples: ARM, MIPS, Sun Sparc, PowerPC, … 13 Processor Architecure and ARM
  • 14. Exploit ILP: Superscalar vs. VLIW  RISC pipeline executes one instruction per clock cycle (usually).  Based on complex hardware design: superscalar machines issue/execute multiple instructions per clock cycle.  Faster execution.  More variability in execution times.  More expensive CPU.  VLIW machines rely on sophisticated compiler to identify ILP and statically schedule parallel instructions 14 Processor Architecure and ARM
  • 15. Finding Parallelism  Independent operations can be performed in parallel: ADD r0, r0, r1 ADD r2, r2, r3 ADD r6, r4, r0 r0 r1 r2 r3 + +  Register renaming: ADD r10, r0, r1 r4 r2 ADD r11, r2, r3 r0 ADD r12, r4, r10 + r6 15 Processor Architecure and ARM
  • 16. Order of Execution  In-order:  Instructions are issued/executed in the program order  Machine stops issuing instructions when the next instruction can’t be dispatched.  Out-of-order:  Instructions are eligible for issue/execution once source operands become available  Machine will change order of instructions to keep dispatching.  Substantially faster but also more complex. 16 Processor Architecure and ARM
  • 17. What is VLIW?  VLIW: very long instruction word  A VLIW instruction consists of several operations to be executed in parallel  Parallel function units with shared register file: register file function function function ... function unit unit unit unit instruction decode and memory 17 Processor Architecure and ARM
  • 18. VLIW Cluster  Organized into clusters to accommodate available register bandwidth: cluster cluster ... cluster 18 Processor Architecure and ARM
  • 19. VLIW and Compilers  VLIW requires considerably more sophisticated compiler technology than traditional architectures---must be able to extract parallelism to keep the instructions full.  Many VLIWs have good compiler support.  Contemporary VLIW processors  TriMedia media processors by NXP (formerly Philips Semiconductors),  SHARC DSP by Analog Devices,  C6000 DSP family by Texas Instruments, and  STMicroelectronics ST200 family based on the Lx architecture. 19 Processor Architecure and ARM
  • 20. Static Scheduling a b e f a b e c g f c nop d d g nop expressions instructions 20 Processor Architecure and ARM
  • 21. Limits in VLIW  VLIW (at least the original forms) has several short- comings that precluded it from becoming mainstream:  VLIW instruction sets are not backward compatible between implementations. As wider implementations (more execution units) are built, the instruction set for the wider machines is not backward compatible with older, narrower implementations.  Load responses from a memory hierarchy which includes CPU caches and DRAM do not give a deterministic delay of when the load response returns to the processor. This makes static scheduling of load instructions by the compiler very difficult. 21 Processor Architecure and ARM
  • 22. EPIC  EPIC = Explicitly parallel instruction computing.  Used in Intel/HP Merced (IA-64) machine.  Incorporates several features to allow machine to find, exploit increased parallelism.  Each group of multiple software instructions is called a bundle. Each of the bundles has information indicating if this set of operations is depended upon by the subsequent bundle.  A speculative load instruction is used as a type of data prefetch.  A check load instruction also aids speculative loads by checking that a load was not dependent on a previous store. 22 Processor Architecure and ARM
  • 23. IA-64 Instruction Format  Instructions are bundled with tag to indicate which instructions can be executed in parallel: 128 bits tag instruction 1 instruction 2 instruction 3 23 Processor Architecure and ARM
  • 24. Assembly Language  One-to-one with instructions (more or less).  Basic features:  One instruction per line.  Labels provide names for addresses (usually in first column).  Instructions often start in later columns.  Columns run to end of line. 24 Processor Architecure and ARM
  • 25. ARM Instruction Set  ARM versions.  ARM assembly language.  ARM programming model.  ARM data operations.  ARM flow of control. 25 Processor Architecure and ARM
  • 26. ARM Versions  ARM architecture has been extended over several versions.  Latest version: ARM11  We will concentrate on ARM7. 26 Processor Architecure and ARM
  • 27. ARM Assembly Language Example  Fairly standard assembly language: label1 ADR r4,c LDR r0,[r4] ; a comment ADR r4,d LDR r1,[r4] SUB r0,r0,r1 ; comment destination 27 Processor Architecure and ARM
  • 28. Pseudo-ops  Some assembler directives don’t correspond directly to instructions:  Define current address.  Reserve storage.  Constants. 28 Processor Architecure and ARM
  • 29. ARM Instruction Set Format From ARM710T datasheet 29 Processor Architecure and ARM
  • 30. ARM Data Types  Word is 32 bits long.  Word can be divided into four 8-bit bytes.  ARM addresses can be 32 bits long.  Address refers to byte.  Address 4 starts at byte 4.  Can be configured at power-up as either little- or big- endian mode. 30 Processor Architecure and ARM
  • 31. Endianness  Endianness: ordering of bytes within a larger object, e.g., word, i.e., how a large object is stored in memory  68000 is a BIG Endian processor Memory 0x00..00 0x00..10 Big Endian Little Endian 0x00..13 3 2 1 0 3 2 1 0 0xffffffff register register 31 Processor Architecure and ARM
  • 32. ARM Programming Model r0 r8 r1 r9 0 31 r2 r10 r3 r11 CPSR r4 r12 r5 r13 r6 r14 NZCV r7 r15 (PC) 32 Processor Architecure and ARM
  • 33. The Program Status Registers (CPSR and SPSRs) 31 28 8 4 0 N Z CV I F T Mode Copies of the ALU status flags (latched if the instruction has the "S" bit set). * Condition Code Flags * Interrupt Disable bits. N = Negative result from ALU flag. I = 1, disables the IRQ. Z = Zero result from ALU flag. F = 1, disables the FIQ. C = ALU operation Carried out V = ALU operation oVerflowed * T Bit (Architecture v4T only) T = 0, Processor in ARM state * Mode Bits T = 1, Processor in Thumb state M[4:0] define the processor mode. 33 Processor Architecure and ARM
  • 34. Processor Modes  The ARM has six operating modes:  User (16) (unprivileged mode under which most tasks run)  FIQ (17) (entered when a high priority (fast) interrupt is raised)  IRQ (18) (entered when a low priority (normal) interrupt is raised)  Supervisor (19) (entered on reset and when a Software Interrupt instruction is executed)  Abort (23) (used to handle memory access violations)  Undef (27) (used to handle undefined instructions)  ARM Architecture Version 4 adds a seventh mode:  System (31) (privileged mode using the same registers as user mode) 34 Processor Architecure and ARM
  • 35. Condition Flags Logical Instruction Arithmetic Instruction Flag Negative No meaning Bit 31 of the result has been set (N=‘1’) Indicates a negative number in signed operations Zero Result is all zeroes Result of operation was zero (Z=‘1’) Carry After Shift operation Result was greater than 32 bits (C=‘1’) ‘1’ was left in carry flag oVerflow No meaning Result was greater than 31 bits (V=‘1’) Indicates a possible corruption of the sign bit in signed numbers 35 Processor Architecure and ARM
  • 36. Conditional Execution  Most instruction sets only allow branches to be executed conditionally.  However by reusing the condition evaluation hardware, ARM effectively increases number of instructions.  All instructions contain a condition field which determines whether the CPU will execute them.  Non-executed instructions soak up 1 cycle.  Still have to complete cycle so as to allow fetching and decoding of following instructions.  This removes the need for many branches, which stall the pipeline (3 cycles to refill).  Allows very dense in-line code, without branches.  The Time penalty of not executing several conditional instructions is frequently less than overhead of the branch or subroutine call that would otherwise be needed. 36 Processor Architecure and ARM
  • 37. The Condition Field 31 28 24 20 16 12 8 4 0 Cond 0000 = EQ - Z set (equal) 1001 = LS - C clear or Z set (unsigned lower or same) 0001 = NE - Z clear (not equal) 0010 = HS / CS - C set (unsigned 1010 = GE - N set and V set, or N higher or same) clear and V clear (>or =) 0011 = LO / CC - C clear 1011 = LT - N set and V clear, or N (unsigned lower) clear and V set (>) 0100 = MI -N set (negative) 1100 = GT - Z clear, and either N set 0101 = PL - N clear (positive or and V set, or N clear and V set zero) (>) 0110 = VS - V set (overflow) 1101 = LE - Z set, or N set and V 0111 = VC - V clear (no overflow) clear,or N clear and V set (<, or =) 1000 = HI - C set and Z clear (unsigned higher) 1110 = AL - always 1111 = NV - reserved. 37 Processor Architecure and ARM
  • 38. Using and updating the Condition Field  To execute an instruction conditionally, simply postfix it with the appropriate condition:  For example an add instruction takes the form:  ADD r0,r1,r2 ; r0 = r1 + r2 (ADDAL)  To execute this only if the zero flag is set:  ADDEQ r0,r1,r2 ; If zero flag set then… ; ... r0 = r1 + r2  By default, data processing operations do not affect the condition flags (apart from the comparisons where this is the only effect). To cause the condition flags to be updated, the S bit of the instruction needs to be set by postfixing the instruction (and any condition code) with an “S”.  For example to add two numbers and set the condition flags:  ADDS r0,r1,r2 ; r0 = r1 + r2 ; ... and set flags 38 Processor Architecure and ARM
  • 39. Data processing Instructions  Largest family of ARM instructions, all sharing the same instruction format.  Contains:  Arithmetic operations  Comparisons (no results - just set condition codes)  Logical operations  Data movement between registers  Remember, this is a load / store architecture  These instruction only work on registers, NOT memory.  They each perform a specific operation on one or two operands.  First operand always a register - Rn  Second operand sent to the ALU via barrel shifter.  We will examine the barrel shifter shortly. 39 Processor Architecure and ARM
  • 40. Arithmetic Operations  Operations are:  ADD operand1 + operand2  ADC operand1 + operand2 + carry  SUB operand1 - operand2  SBC operand1 - operand2 + carry -1  RSB operand2 - operand1  RSC operand2 - operand1 + carry – 1  Syntax:  <Operation>{<cond>}{S} Rd, Rn, Operand2  Examples  ADD r0, r1, r2  SUBGT r3, r3, #1  RSBLES r4, r5, #5 40 Processor Architecure and ARM
  • 41. Multiplication Instructions  The Basic ARM provides two multiplication instructions.  Multiply  MUL{<cond>}{S} Rd, Rm, Rs ; Rd = Rm * Rs  Multiply Accumulate - does addition for free  MLA{<cond>}{S} Rd, Rm, Rs,Rn ; Rd = (Rm * Rs) + Rn  Restrictions on use:  Rd and Rm cannot be the same register  Can be avoid by swapping Rm and Rs around. This works because multiplication is commutative.  Cannot use PC. These will be picked up by the assembler if overlooked.  Operands can be considered signed or unsigned  Up to user to interpret correctly. 41 Processor Architecure and ARM
  • 42. Comparisons  The only effect of the comparisons is to  UPDATE THE CONDITION FLAGS Thus no need to set S bit. FLAGS.  Operations are:  CMP operand1 - operand2, but result not written  CMN operand1 + operand2, but result not written  TST operand1 AND operand2, but result not written  TEQ operand1 EOR operand2, but result not written  Syntax:  <Operation>{<cond>} Rn, Operand2  Examples:  CMP r0, r1  TSTEQ r2, #5 42 Processor Architecure and ARM
  • 43. Logical Operations  Operations are:  AND operand1 AND operand2  EOR operand1 EOR operand2  ORR operand1 OR operand2  BIC operand1 AND NOT operand2 [ie bit clear]  Syntax:  <Operation>{<cond>}{S} Rd, Rn, Operand2  Examples:  AND r0, r1, r2  BICEQ r2, r3, #7  EORS r1,r3,r0 43 Processor Architecure and ARM
  • 44. Data Movement  Operations are:  MOV operand2  MVN NOT operand2 Note that these make no use of operand1.  Syntax:  <Operation>{<cond>}{S} Rd, Operand2  Examples:  MOV r0, r1  MOVS r2, #10  MVNEQ r1,#0 44 Processor Architecure and ARM
  • 45. The Barrel Shifter  The ARM doesn’t have actual shift instructions.  Instead it has a barrel shifter which provides a mechanism to carry out shifts as part of other instructions.  So what operations does the barrel shifter support? 45 Processor Architecure and ARM
  • 46. Barrel Shifter - Left Shift  Shifts left by the specified amount (multiplies by powers of two) e.g. LSL #5 = multiply by 32 Logical Shift Left (LSL) CF Destination 0 46 Processor Architecure and ARM
  • 47. Barrel Shifter - Right Shifts Logical Shift Right • Shiftsright by the specified amount (divides Logical Shift Right by powers of two) e.g. LSR #5 = divide by 32 ...0 Destination CF Arithmetic Shift Right • Shifts right (divides by powers of two) and Arithmetic Shift Right preserves the sign bit, for 2's complement operations. e.g. Destination CF ASR #5 = divide by 32 Sign bit shifted in 47 Processor Architecure and ARM
  • 48. Barrel Shifter - Rotations Rotate Right (ROR) Rotate Right • Similar to an ASR but the bits wrap around as they leave the LSB and appear as the MSB. Destination CF e.g. ROR #5 • Note the last bit rotated is also used as the Carry Out. Rotate Right Extended (RRX) • This operation uses the CPSR C flag as a 33rd bit. Rotate Right through Carry • Rotates right by 1 bit. Encoded as ROR #0. Destination CF 48 Processor Architecure and ARM
  • 49. Barrel Shifter  Barrel shifter: a hardware device that can shift or rotate a data word by any number of bits in a single operation. It is implemented like a multiplexor, each output can be connected to any input depending on the shift distance. ECE 692 L02-ISA.49 Processor Architecure and ARM
  • 50. Using the Barrel Shifter: the Second Operand Operand Operand  Register, optionally with shift 1 2 operation applied.  Shift value can be either be:  5 bit unsigned integer Barrel  Specified in bottom byte of another register. Shifter * Immediate value • 8 bit number • Can be rotated right through an even number of ALU positions. • Assembler will calculate rotate for you from constant. Result 50 Processor Architecure and ARM
  • 51. Second Operand : Shifted Register  The amount by which the register is to be shifted is contained in either:  the immediate 5-bit field in the instruction  NO OVERHEAD  Shift is done for free - executes in single cycle.  the bottom byte of a register (not PC)  Then takes extra cycle to execute  ARM doesn’t have enough read ports to read 3 registers at once.  Then same as on other processors where shift is separate instruction.  If no shift is specified then a default shift is applied: LSL #0  i.e. barrel shifter has no effect on value in register. 51 Processor Architecure and ARM
  • 52. Second Operand : Using a Shifted Register  Using a multiplication instruction to multiply by a constant means first loading the constant into a register and then waiting a number of internal cycles for the instruction to complete.  A more optimum solution can often be found by using some combination of MOVs, ADDs, SUBs and RSBs with shifts.  Multiplications by a constant equal to a ((power of 2) ± 1) can be done in one cycle.  Example: r0 = r1 * 5 Example: r0 = r1 + (r1 * 4) ADD r0, r1, r1, LSL #2  Example: r2 = r3 * 105 Example: r2 = r3 * 15 * 7 Example: r2 = r3 * (16 - 1) * (8 - 1) RSB r2, r3, r3, LSL #4 ; r2 = r3 * 15 RSB r2, r2, r2, LSL #3 ; r2 = r2 * 7 52 Processor Architecure and ARM
  • 53. ARM Load/Store Instructions  LDR, LDRH, LDRB : load (half-word, byte)  STR, STRH, STRB : store (half-word, byte)  Addressing modes:  register indirect : LDR r0,[r1]  with second register : LDR r0,[r1,-r2]  with constant : LDR r0,[r1,#4] 53 Processor Architecure and ARM
  • 54. ARM ADR Pseudo-op  Cannot refer to an address directly in an instruction.  Generate value by performing arithmetic on PC.  ADR pseudo-op generates instruction required to calculate address: ADR r1,FOO 54 Processor Architecure and ARM
  • 55. Additional addressing modes  Base-plus-offset addressing: LDR r0,[r1,#16]  Loads from location r1+16  Auto-indexing increments base register: LDR r0,[r1,#16]!  Post-indexing fetches, then does offset: LDR r0,[r1],#16  Loads r0 from r1, then adds 16 to r1. 55 Processor Architecure and ARM
  • 56. Example: C Assignments  C: x = (a + b) - c;  Assembler: ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b, reusing r4 LDR r1,[r4] ; get value of b ADD r3,r0,r1 ; compute a+b ADR r4,c ; get address for c LDR r2,[r4] ; get value of c SUB r3,r3,r2 ; complete computation of x ADR r4,x ; get address for x STR r3,[r4] ; store value of x 56 Processor Architecure and ARM
  • 57. Example: C Assignment  C: y = a*(b+c);  Assembler: ADR r4,b ; get address for b LDR r0,[r4] ; get value of b ADR r4,c ; get address for c LDR r1,[r4] ; get value of c ADD r2,r0,r1 ; compute partial result ADR r4,a ; get address for a LDR r0,[r4] ; get value of a MUL r2,r2,r0 ; compute final value for y ADR r4,y ; get address for y STR r2,[r4] ; store y 57 Processor Architecure and ARM
  • 58. Example: C Assignment  C: z = (a << 2) | (b & 15);  Assembler: ADR r4,a ; get address for a LDR r0,[r4] ; get value of a MOV r0,r0,LSL 2 ; perform shift ADR r4,b ; get address for b LDR r1,[r4] ; get value of b AND r1,r1,#15 ; perform AND ORR r1,r0,r1 ; perform OR ADR r4,z ; get address for z STR r1,[r4] ; store value for z 58 Processor Architecure and ARM
  • 59. ARM Flow of Control  All operations can be performed conditionally, testing CPSR:  EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE, LT, GT, LE  Branch operation: B #100  Can be performed conditionally. 59 Processor Architecure and ARM
  • 60. Example: if Statement  C: if (a < b) { x = 5; y = c + d; } else x = c - d;  Assembler: ; compute and test condition ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b LDR r1,[r4] ; get value for b CMP r0,r1 ; compare a < b BGE fblock ; if a >= b, branch to false block 60 Processor Architecure and ARM
  • 61. if Statement, cont’d. ; true block MOV r0,#5 ; generate value for x ADR r4,x ; get address for x STR r0,[r4] ; store x ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value of d ADD r0,r0,r1 ; compute y ADR r4,y ; get address for y STR r0,[r4] ; store y B after ; branch around false block ; false block fblock ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value for d SUB r0,r0,r1 ; compute a-b ADR r4,x ; get address for x STR r0,[r4] ; store value of x after ... 61 Processor Architecure and ARM
  • 62. Conditional Instruction Implementation ; compute and test condition ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b LDR r1,[r4] ; get value for b CMP r0,r1 ; compare a < b ; true block MOVLT r0,#5 ; generate value for x ADRLT r4,x ; get address for x STRLT r0,[r4] ; store x ADRLT r4,c ; get address for c LDRLT r0,[r4] ; get value of c ADRLT r4,d ; get address for d LDRLT r1,[r4] ; get value of d ADDLT r0,r0,r1 ; compute y ADRLT r4,y ; get address for y STRLT r0,[r4] ; store y ; false block ADRGE r4,c ; get address for c LDRGE r0,[r4] ; get value of c ADRGE r4,d ; get address for d LDRGE r1,[r4] ; get value for d SUBGE r0,r0,r1 ; compute a-b ADRGE r4,x ; get address for x STRGE r0,[r4] ; store value of x 62 Processor Architecure and ARM
  • 63. Example: switch Statement  C: switch (test) { case 0: … break; case 1: … }  Assembler: ADR r2,test ; get address for test LDR r0,[r2] ; load value for test ADR r1,switchtab ; load address for switch table LDR r1,[r1,r0,LSL #2] ; index switch table switchtab DCD case0 DCD case1 ... 63 Processor Architecure and ARM
  • 64. Example: FIR filter  C: for (i=0, f=0; i<N; i++) f = f + c[i]*x[i];  Assembler ; loop initiation code MOV r0,#0 ; use r0 for I MOV r8,#0 ; use separate index for arrays ADR r2,N ; get address for N LDR r1,[r2] ; get value of N MOV r2,#0 ; use r2 for f ADR r3,c ; load r3 with base of c ADR r5,x ; load r5 with base of x 64 Processor Architecure and ARM
  • 65. FIR filter, cont’.d ; loop body loop LDR r4,[r3,r8] ; get c[i] LDR r6,[r5,r8] ; get x[i] MUL r4,r4,r6 ; compute c[i]*x[i] ADD r2,r2,r4 ; add into running sum ADD r8,r8,#4 ; add one word offset to array index ADD r0,r0,#1 ; add 1 to i CMP r0,r1 ; exit? BLT loop ; if i < N, continue 65 Processor Architecure and ARM
  • 66. ARM Subroutine Linkage  Branch and link instruction: BL foo  Copies current PC to r14.  To return from subroutine: MOV r15,r14 66 Processor Architecure and ARM
  • 67. Summary  All instructions are 32 bits long.  Load/store architecture  Data processing instructions act only on registers  Specific memory access instructions with powerful auto- indexing addressing modes.  Most instructions operate in single cycle.  Some multi-register operations take longer.  All instructions can be executed conditionally. 67 Processor Architecure and ARM