SlideShare uma empresa Scribd logo
1 de 49
Baixar para ler offline
The World Leader in High Performance Signal Processing Solutions




Core Architecture Overview



           Presented by:
         George Kadziolka
             President
          Kaztek Systems
About This Module
    This module introduces the Blackfin® family and provides
    an overview of the Blackfin processor architecture.




2                                 Core Architecture Overview
Module Outline
    Blackfin Family Overview
    The Blackfin Core
     • Arithmetic operations
     • Data fetching
     • Sequencing

    The Blackfin Bus Architecture and Memory
     • Modified Harvard architecture
     • Hierarchical memory structure
     • Flexible memory management

    Additional Blackfin Core Features
     • DMA
     • Dynamic power management
     • On-chip debug support

    Summary



3                                      Core Architecture Overview
Blackfin Family Overview
    The Blackfin family consists of:
     • A broad range of Blackfin processors
     • Software development tools
     • Hardware evaluation and debug tools

    Extensive third-party support
     • Development tools
     • Operating systems
     • TCP/IP stacks
     • Hardware building blocks
     • Software solutions




4                                     Core Architecture Overview
Blackfin Processors
    All Blackfin processors combine extensive DSP capability with
    high end MCU functions on the same core.
     • Creates a highly efficient and cost-effective solution.
     • A single software development tool chain

    All Blackfin processors are based on the same core architecture.
     • Once you understand one Blackfin processor, you can easily migrate
       from one family member to another.
     • Code compatible across family members.

    Processors vary in clock speed, amount of on-chip memory,
    peripheral suite, package types and sizes, power, and price.
     • Large selection lets you optimize your choice of a Blackfin processor
      for your application.




5                                        Core Architecture Overview
Blackfin Family Peripherals
    The Blackfin family supports a wide variety of I/O:
     • EBIU (External Bus Interface Unit)
     • Parallel peripheral interface (PPI)
     • Serial ports (SPORTS)
     • GPIO
     • Timers
     • UARTS
     • SPI®
     • Ethernet
     • USB
     • CAN®
     • Two Wire Interface (TWI)
     • Pixel compositor
     • Lockbox™ secure technology
     • Host DMA
     • ATAPI
     • SDIO




      See the Blackfin selection guide for complete details



6                                                             Core Architecture Overview
Blackfin Processors Perform
    Signal Processing and Microcontroller Functions
                   Traditional Model

    MCU                                                                                 ASIC
    • Control                                         Signal Processing                  • Interfaces to sensors
    • Networking                                                                         • Broad peripheral mix
    • RTC                                                                                • Memory
                               MCU                    Signal Processing          ASIC
    • Watchdog
    • RTOS
                                                      Signal Processing
    • MMU
    • Byte addressable




                             New Model
                             Blackfin core can
                             perform all of these
                             functions




7                                                   Core Architecture Overview
Blackfin Architecture
    What does it mean for the developer?
    Combining controller and DSP capabilities into a single core, along
    with rich I/O, enables development of efficient, low cost embedded
    media applications.
     • For example, multimedia over IP, digital cameras, telematics,
      software radio
    From a development perspective, a single core means there
    is only one tool chain.
     • An embedded application consisting of both control and signal
       processing modules is built using the same compiler.
     • The result is dense control code and high performance DSP code.




8                                      Core Architecture Overview
Features
    Controller
     •L1 memory space for stack and heap
     •Dedicated stack and frame pointers
     •Byte addressability
     •Simple bit-level manipulation

    DSP
     • Fast, flexible arithmetic computational units
     • Unconstrained data flow to/from computational units
     • Extended precision and dynamic range
     • Efficient sequencing
     • Efficient I/O processing
     • The DSP aspect of the Blackfin core is optimized to perform FFTs and
      convolutions


              y[n] = ∑k =0 x[n − k ] ∗ h[k ]
                              N −1




9                                       Core Architecture Overview
Blackfin Core (e.g., ADSP-BF54x)




10                           Core Architecture Overview
The Blackfin Core




11      Core Architecture Overview
The Blackfin Core
     The core consists of:
     Arithmetic unit
      • Supports SIMD operation
      • Load/store architecture

     Addressing unit
      • Supports dual data fetch

     Sequencer
      • Efficient program flow control

     Register files
      • Data
      • Addressing




12                                       Core Architecture Overview
The Arithmetic Unit
     Performs arithmetic operations
     Dual 40-bit ALU (Arithmetic/
     Logic Unit)
      • Performs 16-/32-/40-bit
       arithmetic and logical
       operations
     Dual 16 x 16 multiplier
      • Performs dual MACs
       (multiply-accumulates) when
       used with ALUs
     Barrel shifter
      • Performs shifts, rotates, bit
       operations




13                                      Core Architecture Overview
Data Registers
     There are 8x 32-bit registers in the data register file.
      • Used to hold 32-bit vales or packed 16-bit

     There are also 2x 40-bit accumulators.
      • Typically used for MAC operations
                                                                      Data Registers
                                                                      39     31          15          0
                                                                      A0.X        A0.H        A0.L
                                                                      A1.X        A1.H        A1.L

                                                                             31          15          0
                                                                      R0          R0.H        R0.L
                                                                      R1          R1.H        R1.L
                                                                      R2
                                                                      R3
                                                                      R4          R4.H        R4.L
                                                                      R5
                                                                      R6
                                                                      R7          R7.H        R7.L




14                                       Core Architecture Overview
16-Bit ALU Operations—Examples

     The Algebraic Assembly syntax is intuitive and makes it easy to
     understand what the instruction is doing.

     Single 16-bit operation     Dual 16-bit operation                       Quad 16-bit operation
     R6.H = R3.H + R2.L (s);     R6 = R2 + | - R3;                       R3 = R0 - | - R1, R2 = R0 + | + R1;

         31     16    0              31    16      0                          31    16    0          31   16   0
                          R2                           R2                                     R0                   R0
                          R3                           R3                                     R1                   R1

               +                       + -                                         - -                + +
                          R6                           R6                                     R2                   R3



              These operations effectively execute in a single cycle.




15                                              Core Architecture Overview
32-Bit ALU Operations—Examples

     Single 32-bit addition      Dual 32-bit operation
     R6 = R2 + R3;                R3 = R1 - R2, R4 = R1 + R2;

         31           0             31           0                31           0
                          R2                         R1                            R1
                          R3                         R2                            R2

               +                          -                                +
                          R6                         R4                            R3




 These operations effectively execute in a single cycle.




16                                            Core Architecture Overview
Dual MAC Operations—Example
     Both MACs can be used at the same time to double the MAC throughput. The
     same two 32-bit input registers must be used (R2 and R3 in this example).

                              Dual MAC operation

                                                             R2
                                                             R3

                                              x x
                                        -                  +
                                                    A0

                                               A1
                                A1 -= R2.H * R3.H, A0 += R2.L * R3.L;




       These operations effectively execute in a single cycle.
         NOTE: This can happen in parallel with a dual data fetch.



17                                                     Core Architecture Overview
Barrel Shifter
     Enable shifting or rotating any number of bits within a 16-/32-/40-bit
     register in a single cycle
     Perform individual bit operations on 32-bit data register contents
      • BITSET, BITCLR, BITTGL, BITTST

     Field Extract and Deposit instructions
      • Extract or insert a field of bits out of or into a 32-bit data register




18                                          Core Architecture Overview
8-Bit Video ALUs




                                            Four Video ALUs




19                      Core Architecture Overview
8-Bit ALU Operations
     Four 8-bit ALUs provide parallel computational power targeted mainly
     for video operations.
      • Quad 8-bit add/subtract
      • Quad 8-bit average
      • SAA (Subtract-Absolute-Accumulate) instruction

     A quad 8-bit ALU instruction takes one cycle to complete.


                        64-Bit/8-Byte Field                           64-Bit/8-Byte Field
                         R3           R2                               R1           R0



                       4 Bytes                                                   4 Bytes

                                         Four 8-Bit Video ALUs

                                                            32-Bits

                                              Data Register File




20                                               Core Architecture Overview
Additional Arithmetic Instructions
     There are a number of specialized instructions that are used
     to speed up the inner loop on various algorithms.
     Bitwise XOR
      • Enable creating LFSR (Linear Feedback Shift Registers) for use in CRC
       calculations or the generation of PRN sequences
     Bit stream multiplexing, add on sign, compare select
      • Convolutional encoder and Viterbi decoder support

     Add/Subtract with prescale up/down
      • IEEE 1180–compliant 2D 8 x 8 DCTs (Discrete Cosine Transforms)

     Vector search
      • Enable search a vector a pair at a time for greatest or least value




21                                       Core Architecture Overview
The Addressing Unit


The addressing unit generates
addresses for data fetches.
      •Two DAG (Data Address
      Generator) arithmetic units
      enable generation of
      independent 32-bit wide
      addresses that can reach
      anywhere within the Blackfin
      memory space.
           •Up to two fetches can occur
           at the same time.




22                                        Core Architecture Overview
Address Registers
     There are 6x general-purpose Pointer
     Registers.
                                                                                              31                  0
      • Used for GP 8-/16-/32-bit fetches                                                              P0
                                                                   Address
       from memory
                                                                   Registers                           P1
                                                                                                                      Pointer Registers:
     There are four sets of registers used                                                             P2             P0-P5 are referred to
                                                                                                                      as “preg.”
     for DSP-style data accesses.                                                                      P3
                                                                                                       P4
      • Used for 16-/32-bit DSP data fetches
       such as dual data fetch, circular buffer                                                        P5

       addressing, and bit reversal                                                                    FP

     There are also dedicated stack (SP)                                                               SP

     and frame (FP) pointers.                                                                          USP
                                                         31        0 31           0 31             0    31        0
      • These are used for 32-bit accesses to
                                                              I0             L0          B0                  M0
       stack frames.                                                                                                  Index Registers:
                                                              I1             L1          B1                  M1       I0-I3 are referred to
                                                                                                                      as “ireg.”
                                                              I2             L2          B2                  M2
                                                              I3             L3          B3                  M3




23                                              Core Architecture Overview
Addressing
     Addressing Unit supports:
      • Addressing only
        • With specified Pointer or Index Register
      • Provide address and post modify
        • Add an offset after the fetch is done
        • Circular buffering supported with this method
      • Provide address with an offset
        • Add an offset before the fetch, but no pointer update
      • Update address only
      • Modify address with reverse carry add

     All addressing is Register Indirect.
      • Index Registers I0-I3 (32-/16-bit accesses)
      • Pointer Registers P0–P5 (32-/16-/8-bit accesses)
      • Stack and Frame Pointer Registers (32-bit accesses)

     All addresses are Byte addresses.
      • Ordering is Little Endian.
      • Addresses must be aligned for the word size being fetched.
        • i.e., 32-bit fetches from addresses that are a multiple of four




24                                                      Core Architecture Overview
Circular Buffer Example
Example                       Address
Base address (B) and           0x00     0x00000001         1st Access        0x00000001
Starting address (I) = 0                0x00000002                           0x00000002
                               0x04                                                       4th Access
Buffer length                  0x08     0x00000003                           0x00000003
L = 44
                               0x0C     0x00000004                           0x00000004
(There are 11 data elements
and each data element is       0x10     0x00000005        2nd Access         0x00000005
4-bytes)
                               0x14     0x00000006                           0x00000006   5th Access
Modify value M = 16            0x18     0x00000007                           0x00000007
(4 elements *
4-bytes/element)               0x1C     0x00000008                           0x00000008
                               0x20     0x00000009         3rd Access        0x00000009
Example memory access:
                               0x24     0x0000000A                           0x0000000A
R1 = [I0 ++ M2];               0x28     0x0000000B                           0x0000000B



     • The Addressing Unit supports Circular Buffer pointer addressing.
     • The process of boundary checking and pointer wrapping to stay in bounds happens
       in hardware with no overhead.
     • Buffers can be placed anywhere in memory without restriction due to the Base
       address registers.



25                                              Core Architecture Overview
The Sequencer
The sequencer’s function is to
generate addresses for fetching
instructions.
     • Uses a variety of registers to select
      the next address
Aligns instructions as they
are fetched
     • Always reads 64 bits from memory
     • Realigns what is fetched into
      individual 16-/32-/64-bit opcodes
      before sending to the execution
      pipeline
Handles events
     • Interrupts and exceptions

Conditional execution




26                                             Core Architecture Overview
Program Flow




27                  Core Architecture Overview
Sequencer Registers
     These are registers that are used in the control of program flow.
     Arithmetic Status (ASTAT) tracks status bits for core operations.
      • Used in conditional execution of instructions.
     Return address registers.
      • Hold the 32-bit return address for program flow interruptions.
     Two sets of hardware loop management registers.
      • They manage up to two nested levels of zero overhead looping.
         • There are no core cycles spent managing the loop.
         • Looped code runs as efficiently as straight line code.


                         Arithmetic Status          LC0                                    Loop Counter
               ASTAT
                                                                          LT0               Loop Top
                         Subroutine Return                                           LB0    Loop Bottom
               RETS

                RETI     Interrupt Return           LC1
                                                                          LT1
               RETX      Exception Return                                            LB1

               RETN      NMI Return             SYSCFG               System Config
               RETE      Emulation Return       SEQSTAT              Sequencer Status




28                                           Core Architecture Overview
Instruction Pipeline
     The pipeline is fully interlocked. In the event of a data hazard,
     the sequencer automatically inserts stall cycles.
               Pipeline Stage                              Description
                                     Issue instruction address to IAB bus, start compare
           Instruction Fetch 1 (IF1)
                                     tag of instruction cache
           Instruction Fetch 2 (IF2) Wait for instruction data
           Instruction Fetch 3 (IF3) Read from IDB bus and align instruction
          Instruction Decode (DEC) Decode instructions
                                     Calculation of data addresses and branch target
          Address Calculation (AC)
                                     address
                                      Issue data address to DA0 and DA1 bus, start
              Data Fetch 1 (DF1)
                                     compare tag of data cache
             Data Fetch 2 (DF2)      Read register files
                                     Read data from LD0 and LD1 bus, start multiply and
               Execute 1 (EX1)
                                     video instructions
              Execute 2 (EX2)
                                     Execute/Complete instructions (shift, add, logic, etc.)
                                     Writes back to register files, SD bus, and pointer
              Write Back (WB)
                                     updates (also referred to as the “commit” stage)


29                                       Core Architecture Overview
Instruction Pipeline
                IF1        IF2        IF3       DEC          AC            DF1     DF2       EX1       EX2       WB
     Cycle
N            INST #1
N+1          INST #2    INST #1
N+2          INST #3    INST #2    INST #1
N+3          INST #4    INST #3    INST #2    INST #1
N+4          INST #5    INST #4    INST #3    INST #2    INST #1
N+5          INST #6    INST #5    INST #4    INST #3    INST #2       INST #1
N+6          INST #7    INST #6    INST #5    INST #4    INST #3       INST #2   INST #1
N+7          INST #8    INST #7    INST #6    INST #5    INST #4       INST #3   INST #2   INST #1
N+8          INST #9    INST #8    INST #7    INST #6    INST #5       INST #4   INST #3   INST #2   INST #1
N+9          INST #10   INST #9    INST #8    INST #7    INST #6       INST #5   INST #4   INST #3   INST #2   INST #1
N+10         INST #11   INST #10   INST #9    INST #8    INST #7       INST #6   INST #5   INST #4   INST #3   INST #2
N+11         INST #12   INST #11   INST #10   INST #9    INST #8       INST #7   INST #6   INST #5   INST #4   INST #3
N+12         INST #13   INST #12   INST #11   INST #10   INST #9       INST #8   INST #7   INST #6   INST #5   INST #4


     Once the pipe is filled (e.g., at cycle N+9 in this case), one instruction will exit
     the pipeline on each core clock tick.
      • i.e., effectively, one instruction per core clock cycle executed




30                                                  Core Architecture Overview
Blackfin Event Handling
     Blackfin events include:
      • Interrupts
         • Generated by hardware (e.g., DMA complete) or software
      • Exceptions
         • Error condition or service related

     Handling is split between the CEC (Core Event Controller) and SIC
     (System Interrupt Controller).
      • CEC has 16 levels and deals with requests on a priority basis.
         • The nine lowest levels are general purpose and are used for Peripheral Interrupt Requests.
         • Each level has a 32-bit Interrupt Vector Register that points to the start of the ISR for
           that level.
      • SIC allows enabling Peripheral Interrupt Requests and mapping to specific
       CEC GP levels.
         • Allows setting peripheral request priorities




31                                                Core Architecture Overview
System and Core Interrupts (ADSP-BF533 example)
     System Interrupt Source         IVG #1
                                                                                                  Core Event
     PLL Wakeup interrupt            IVG7
                                                                     Event Source   IVG #         Name
     DMA error (generic)             IVG7
                                                                                                                      Highest
                                                          Emulator                   0            EMU
     PPI error interrupt             IVG7

     SPORT0 error interrupt          IVG7                 Reset                      1            RST

     SPORT1 error interrupt          IVG7                 Non Maskable Interrupt     2            NMI
     SPI error interrupt             IVG7                 Exceptions                 3            EVSW
     UART error interrupt            IVG7
                                                          Reserved                   4            -
     RTC interrupt                   IVG8
                                                          Hardware Error             5            IVHW                     P
     DMA 0 interrupt (PPI)           IVG8
                                                          Core Timer                 6            IVTMR                    R
     DMA 1 interrupt (SPORT0 RX)     IVG9                                                                                  I
                                                          General Purpose 7          7            IVG7                     O
     DMA 2 interrupt (SPORT0 TX)     IVG9
                                                                                                                           R
     DMA 3 interrupt (SPORT1 RX)     IVG9                 General Purpose 8          8            IVG8
                                                                                                                           I
     DMA 4 interrupt (SPORT1 TX)     IVG9                 General Purpose 9          9            IVG9                     T
     DMA 5 interrupt (SPI)           IVG10                                                                                 Y
                                                          General Purpose 10         10           IVG10
     DMA 6 interrupt (UART RX)       IVG10
                                                          General Purpose 11         11           IVG11
     DMA 7 interrupt (UART TX)       IVG10

     Timer0 interrupt                IVG11                General Purpose 12         12           IVG12

     Timer1 interrupt                IVG11                General Purpose 13         13           IVG13
     Timer2 interrupt                IVG11
                                                          General Purpose 14         14           IVG14
     PF interrupt A                  IVG12
                                                          General Purpose 15         15           IVG15
     PF interrupt B                  IVG12
                                                                                                                      Lowest
     DMA 8/9 interrupt (MemDMA0)     IVG13
                                                                                     1 Note:
     DMA 10/11 interrupt (MemDMA1)   IVG13                                                     Default IVG configuration shown.
     Watchdog timer interrupt        IVG13


32                                            Core Architecture Overview
Variable Instruction Lengths
     The Blackfin architecture uses three instruction opcode lengths to obtain the
     best code density while maintaining high performance.
     16-bit instructions
      • Most control-type instructions and data fetches are 16-bits long to improve code density.

     32-bit instructions
      • Most control-type instructions with an immediate value in the expression and most
       arithmetic instructions are 32-bits in length.
     Multi-issue 64-bit instructions
      • Certain 32-bit instructions can be executed in parallel with a pair of specific 16-bit
       instructions and specified with one 64-bit instruction.
        • Typically a 32-bit ALU/MAC instruction and one or two data fetch instructions
      • The delimiter symbol to separate instructions in a multi-issue instruction is a double pipe
       character “||.”

      Example:
                       A1+=R0.L*R1.H, A0+=R0.L*R1.L || r0 = [i0++] || r1 = [i1++];




33                                                    Core Architecture Overview
Instruction Packing
     When code is compiled and linked,                                   16-bit OP
                                                                         32-bit OP
     the instructions are packed into                                    64-bit Multi-OP
     memory as densely as possible.                                                    Instruction Formats
      • i.e., no wasted memory space
                                                                                           15       0



     No memory alignment restrictions
     for code:
      • Instructions can be placed anywhere
        in memory.
      • The sequencer fetches 64-bits of
        instruction at a time (from 8-byte
        boundaries) and performs an alignment
        operation to:
        • Isolate shorter opcodes                                                      16-bit wide memory
        • Realign larger opcodes that straddle 4-/8-byte
         boundaries                                                        No Memory Alignment Restrictions:
      • This realignment hardware is                                      Maximum Code Density and Minimum
       transparent to the user.                                                 System Memory Cost




34                                                    Core Architecture Overview
16-bit FIR Filter Example—0.5 cycles per tap
           31                               0

                  Input          Input          R0
                                                                        Input data
                                                                        delay line                          Coefficients
                Coefficient   Coefficient       R1
                                                                                         MAC into A0
                    MAC1              MAC0                   R0L              x0                                  x0          R1L
                                                                                                   A1
                                                                                              into
                                                                                         M AC

                x                 x                          R0H              x1         MAC into A0

                                                                                                   to A 1
                                                                                                                  x1          R1H
                                                                                              AC in
                      +                  +      A0
                                                             R0L              x2          M
                                                                                                                  x2          R1L

                                                                              x3                                  x3          R1H
                                                             R0H
                                                A1
      39                                    0




           Loop:    A1+=R0.H*R1.L, A0+=R0.L*R1.L || R0.L = [I0++] || nop;
           Loopend: A1+=R0.L*R1.H, A0+=R0.H*R1.H || R0.H = [I0++] || R1 = [I1++];



 Samples in R0                                                        • Performs operations in support of two filter outputs in
                                  Coefficients in R1
                                                                        each clock cycle



35                                                     Core Architecture Overview
Bus and Memory Architecture




36           Core Architecture Overview
Blackfin Memory Hierarchy
     The Blackfin architecture uses a memory hierarchy with a primary goal
     of achieving memory performance similar to that of the fastest memory
     (i.e., L1) with an overall cost close to that of the least expensive
     memory (i.e., L3).
      • Portions of L1 can be configured as cache, which allows increased memory
        performance by prefetching and storing copies of code/data from L2/L3.




                             L1 Memory                                        L2 Memory             L3 Memory
      CORE
                                 Internal                               Internal (if present)   External (e.g., SDRAM)
     (Registers)            Smallest capacity                              Larger than L1          Largest capacity
                           Single-cycle access                           Multicycle access          Highest latency




37                                               Core Architecture Overview
Internal Bus Structure of the ADSP-BF533




38                               Core Architecture Overview
Configurable Memory
     The best system performance can be achieved when executing code or
     fetching data out of L1 memory.
     Two methods can be used to fill the L1 memory—caching and dynamic
     downloading–the Blackfin processor supports both.
     • Microcontrollers have typically used the caching method, as they have
       large programs often residing in external memory and determinism is not
       as important.
     • DSPs have typically used dynamic downloading (e.g. code overlays), as they
       need direct control over which code runs in the fastest memory.
     The Blackfin processor allows the programmer to choose one or both
     methods to optimize system performance.
     • Portions of L1 instruction and data memory can be configured to be used as
      SRAM or as cache.
       • Enabling these 16K Byte blocks for use as cache reduces the amount of L1 available as
        SRAM. However, there is still space available for code, such as critical functions or interrupt
        handlers.




39                                             Core Architecture Overview
Cache and Memory Management
     Cache allows users to take advantage of single-cycle memory without
     having to specifically move instructions and or data “manually.”
      • L2/L3 memory can be used to hold large programs and data sets.
      • The paths to and from L1 memory are optimized to perform with
       cache enabled.
     Cache automatically optimizes code and data by keeping recently used
     information in L1.
      • LRU (Least Recently Used) algorithm is used for determining which cache line
       to replace.
     Cache is managed by the memory management unit though a number
     of CPLBs (Cachability, Protection, and Lookaside Buffers).
      • The CPLBs divide up the Blackfin memory space into pages and allow
       individual page control of memory management items such as:
        • User/Supervisor Access Protection
        • Read/Write Access Protection
        • Cacheable or Non-Cacheable




40                                            Core Architecture Overview
Additional Blackfin Core Features




41              Core Architecture Overview
Direct Memory Access (DMA)
     The Blackfin processors includes a DMA (Direct Memory Access)
     facility on all CPUs.
     • Enable moving data between memory and a peripheral, or memory to memory
      without using any core cycles
       • Core only involved in setting up DMA parameters
     • Core is interrupted when DMA completes, thereby improving efficiency of data
      flow operations.
       • Alternatively, DMA status allows polling operations.
     • The DMA controller has dedicated buses to connect between the DMA unit
      itself and:
       • Peripherals
       • External memory (e.g., L3 or SDRAM)
       • Core memory (e.g., L1)




42                                             Core Architecture Overview
Power Management Options
     Low active power
     • Flexible power management with automatic power-down for unused
       peripheral sections.
     • Dynamic power management allows dynamic modification of both
       frequency and voltage.
       • PLL can multiply CLKIN from 1x to 64x.
       • Core voltage can be optimized for the operating frequency.

     Low standby power
     • 5 power modes
        • Full on, active, sleep, deep sleep, hibernate
     • Real-time clock with alarm and wakeup features
        • RTC has its own power supply and clock.
        • It is unaffected by hardware or software reset.




43                                               Core Architecture Overview
Blackfin Processors Optimize Power Consumption

                                                          Vdd
             1.2V, 500 MHz

          Processor Operation                                       t
                                                                                                 0.9V, 250 MHz
                         PLL                                                                Processor Operation
                        Settling
                                                         0.8V, 100 MHz
                                                                                             PLL
                             Regulator                                                      Settling
                                                      Processor Operation      Regulator
          Power              Transition
                                                                               Transition
       Consumption

                                            Just varying the frequency



     mW


                                               Dynamic Power Management



                             Varying the voltage and frequency



44                                                Core Architecture Overview
Power Mode Transitions
     Under software control,
     the application can change
     from one power state
     to another.
     A set of libraries (i.e.,
     System Services) enable
     control of clocking, core
     voltage, and power states
     with just a function call.




45                                Core Architecture Overview
Hardware Debug Support




46         Core Architecture Overview
Advanced Support for Embedded Debug
     The JTAG port is also used to provide in-circuit emulation capabilities.
      • Nonintrusive debugging
      • BTC (background telemetry channel)


     Hardware breakpoints
      • Six instruction and two data watchpoints


     Performance monitor unit
      • Counters for cycles and occurrences of specific activities


     Execution trace buffer
      • Stores last 16 nonincremental PC values




47                                      Core Architecture Overview
Summary
     The Blackfin core has a number of features and instructions that
     support both control and DSP types of operations.
     The Blackfin’s high performance memory/bus architecture supports
     zero wait state instruction and dual fetches from L1 at the core clock
     rate, all at the same time.
      • Large applications that reside in the larger but slower external memory still
       benefit from the high performance L1 memory through the use of caching
       and/or dynamic downloading
     The Blackfin architecture enables both efficient implementation
     of media-related applications and efficient development of
     those applications.




48                                       Core Architecture Overview
Resources
     For detailed information on the Blackfin architecture, please refer to the
     Analog Devices website which has links to manuals, data sheets, FAQs,
     Knowledge Base, sample code, development tools and much more:
      • www.analog.com/blackfin



     For specific questions click on the “Ask a question” button.


     Kaztek Systems provides worldwide technical training on processor
     architecture and systems development using Analog Devices
     Processors. For more information
     Visit www.kaztek.com or Email info@kaztek.com




49                                   Core Architecture Overview

Mais conteúdo relacionado

Mais procurados

RISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingRISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingTushar Swami
 
Msp 430 architecture module 1
Msp 430 architecture module 1Msp 430 architecture module 1
Msp 430 architecture module 1SARALA T
 
Design of Accumulator Unit
Design of Accumulator UnitDesign of Accumulator Unit
Design of Accumulator UnitHarshad Koshti
 
ARM architcture
ARM architcture ARM architcture
ARM architcture Hossam Adel
 
Embedded system - Introduction to interfacing with peripherals
Embedded system - Introduction to interfacing with peripheralsEmbedded system - Introduction to interfacing with peripherals
Embedded system - Introduction to interfacing with peripheralsVibrant Technologies & Computers
 
8096 microcontrollers notes
8096 microcontrollers notes8096 microcontrollers notes
8096 microcontrollers notesDr.YNM
 
single stage amplifier Unit 5 AMVLSI
single stage amplifier Unit 5 AMVLSIsingle stage amplifier Unit 5 AMVLSI
single stage amplifier Unit 5 AMVLSIHarsha Raju
 
Basic Processing Unit
Basic Processing UnitBasic Processing Unit
Basic Processing UnitSlideshare
 
Peripheral Component Interconnect (PCI)
Peripheral Component Interconnect (PCI)Peripheral Component Interconnect (PCI)
Peripheral Component Interconnect (PCI)Suraj B.V.S.G
 

Mais procurados (20)

RISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingRISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set Computing
 
Principles of computer design
Principles of computer designPrinciples of computer design
Principles of computer design
 
Msp 430 architecture module 1
Msp 430 architecture module 1Msp 430 architecture module 1
Msp 430 architecture module 1
 
Design of Accumulator Unit
Design of Accumulator UnitDesign of Accumulator Unit
Design of Accumulator Unit
 
Ccp
CcpCcp
Ccp
 
Unit vi (1)
Unit vi (1)Unit vi (1)
Unit vi (1)
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
 
Microcontroller 8096
Microcontroller 8096Microcontroller 8096
Microcontroller 8096
 
Embedded system - Introduction to interfacing with peripherals
Embedded system - Introduction to interfacing with peripheralsEmbedded system - Introduction to interfacing with peripherals
Embedded system - Introduction to interfacing with peripherals
 
Unit 4
Unit 4Unit 4
Unit 4
 
8096 microcontrollers notes
8096 microcontrollers notes8096 microcontrollers notes
8096 microcontrollers notes
 
single stage amplifier Unit 5 AMVLSI
single stage amplifier Unit 5 AMVLSIsingle stage amplifier Unit 5 AMVLSI
single stage amplifier Unit 5 AMVLSI
 
Unit 3 mpmc
Unit 3 mpmcUnit 3 mpmc
Unit 3 mpmc
 
Actel fpga
Actel fpgaActel fpga
Actel fpga
 
Basic Processing Unit
Basic Processing UnitBasic Processing Unit
Basic Processing Unit
 
Pic16f84
Pic16f84Pic16f84
Pic16f84
 
ARM Processors
ARM ProcessorsARM Processors
ARM Processors
 
8051 memory
8051 memory8051 memory
8051 memory
 
Peripheral Component Interconnect (PCI)
Peripheral Component Interconnect (PCI)Peripheral Component Interconnect (PCI)
Peripheral Component Interconnect (PCI)
 
180 hybrid-coupler
180 hybrid-coupler180 hybrid-coupler
180 hybrid-coupler
 

Semelhante a Blackfin core architecture

2 introduction to arm architecture
2 introduction to arm architecture2 introduction to arm architecture
2 introduction to arm architecturesatish1jisatishji
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architectureTaha Malampatti
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberasodariyabhavesh
 
MIPS Assembly Language I
MIPS Assembly Language IMIPS Assembly Language I
MIPS Assembly Language ILiEdo
 
Basics Of Embedded Systems
Basics Of Embedded SystemsBasics Of Embedded Systems
Basics Of Embedded Systemsarlabstech
 
Hardware assited x86 emulation on godson 3
Hardware assited x86 emulation on godson 3Hardware assited x86 emulation on godson 3
Hardware assited x86 emulation on godson 3Takuya ASADA
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architectureZakaria Gomaa
 
My seminar new 28
My seminar new 28My seminar new 28
My seminar new 28rajeshkvdn
 
High Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and FutureHigh Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and Futurekarl.barnes
 
Microchip's PIC Micro Controller
Microchip's PIC Micro ControllerMicrochip's PIC Micro Controller
Microchip's PIC Micro ControllerMidhu S V Unnithan
 
PIC introduction + mapping
PIC introduction + mappingPIC introduction + mapping
PIC introduction + mappingOsaMa Hasan
 

Semelhante a Blackfin core architecture (20)

2 introduction to arm architecture
2 introduction to arm architecture2 introduction to arm architecture
2 introduction to arm architecture
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architecture
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furber
 
Mips and sparc
Mips and sparcMips and sparc
Mips and sparc
 
MIPS Assembly Language I
MIPS Assembly Language IMIPS Assembly Language I
MIPS Assembly Language I
 
8085 MICROPROCESSOR
8085 MICROPROCESSOR 8085 MICROPROCESSOR
8085 MICROPROCESSOR
 
Basics Of Embedded Systems
Basics Of Embedded SystemsBasics Of Embedded Systems
Basics Of Embedded Systems
 
Hardware assited x86 emulation on godson 3
Hardware assited x86 emulation on godson 3Hardware assited x86 emulation on godson 3
Hardware assited x86 emulation on godson 3
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architecture
 
Processor types
Processor typesProcessor types
Processor types
 
My seminar new 28
My seminar new 28My seminar new 28
My seminar new 28
 
ARM Architecture
ARM ArchitectureARM Architecture
ARM Architecture
 
High Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and FutureHigh Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and Future
 
MicroProcessors
MicroProcessors MicroProcessors
MicroProcessors
 
arm.pptx
arm.pptxarm.pptx
arm.pptx
 
Microchip's PIC Micro Controller
Microchip's PIC Micro ControllerMicrochip's PIC Micro Controller
Microchip's PIC Micro Controller
 
ARM cortex A15
ARM cortex A15ARM cortex A15
ARM cortex A15
 
PIC introduction + mapping
PIC introduction + mappingPIC introduction + mapping
PIC introduction + mapping
 
Arm Lecture
Arm LectureArm Lecture
Arm Lecture
 
SoC FPGA Technology
SoC FPGA TechnologySoC FPGA Technology
SoC FPGA Technology
 

Mais de Pantech ProLabs India Pvt Ltd

Choosing the right processor for embedded system design
Choosing the right processor for embedded system designChoosing the right processor for embedded system design
Choosing the right processor for embedded system designPantech ProLabs India Pvt Ltd
 

Mais de Pantech ProLabs India Pvt Ltd (20)

Registration process
Registration processRegistration process
Registration process
 
Choosing the right processor for embedded system design
Choosing the right processor for embedded system designChoosing the right processor for embedded system design
Choosing the right processor for embedded system design
 
Brain Computer Interface
Brain Computer InterfaceBrain Computer Interface
Brain Computer Interface
 
Electric Vehicle Design using Matlab
Electric Vehicle Design using MatlabElectric Vehicle Design using Matlab
Electric Vehicle Design using Matlab
 
Image processing application
Image processing applicationImage processing application
Image processing application
 
Internet of Things using Raspberry Pi
Internet of Things using Raspberry PiInternet of Things using Raspberry Pi
Internet of Things using Raspberry Pi
 
Internet of Things Using Arduino
Internet of Things Using ArduinoInternet of Things Using Arduino
Internet of Things Using Arduino
 
Brain controlled robot
Brain controlled robotBrain controlled robot
Brain controlled robot
 
Brain Computer Interface-Webinar
Brain Computer Interface-WebinarBrain Computer Interface-Webinar
Brain Computer Interface-Webinar
 
Development of Deep Learning Architecture
Development of Deep Learning ArchitectureDevelopment of Deep Learning Architecture
Development of Deep Learning Architecture
 
Future of AI
Future of AIFuture of AI
Future of AI
 
Gate driver design and inductance fabrication
Gate driver design and inductance fabricationGate driver design and inductance fabrication
Gate driver design and inductance fabrication
 
Brainsense -Brain computer Interface
Brainsense -Brain computer InterfaceBrainsense -Brain computer Interface
Brainsense -Brain computer Interface
 
Median filter Implementation using TMS320C6745
Median filter Implementation using TMS320C6745Median filter Implementation using TMS320C6745
Median filter Implementation using TMS320C6745
 
Introduction to Code Composer Studio 4
Introduction to Code Composer Studio 4Introduction to Code Composer Studio 4
Introduction to Code Composer Studio 4
 
Waveform Generation Using TMS320C6745 DSP
Waveform Generation Using TMS320C6745 DSPWaveform Generation Using TMS320C6745 DSP
Waveform Generation Using TMS320C6745 DSP
 
Interfacing UART with tms320C6745
Interfacing UART with tms320C6745Interfacing UART with tms320C6745
Interfacing UART with tms320C6745
 
Switch & LED using TMS320C6745 DSP
Switch & LED using TMS320C6745 DSPSwitch & LED using TMS320C6745 DSP
Switch & LED using TMS320C6745 DSP
 
Led blinking using TMS320C6745
Led blinking using TMS320C6745Led blinking using TMS320C6745
Led blinking using TMS320C6745
 
Introduction to tms320c6745 dsp
Introduction to tms320c6745 dspIntroduction to tms320c6745 dsp
Introduction to tms320c6745 dsp
 

Último

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Blackfin core architecture

  • 1. The World Leader in High Performance Signal Processing Solutions Core Architecture Overview Presented by: George Kadziolka President Kaztek Systems
  • 2. About This Module This module introduces the Blackfin® family and provides an overview of the Blackfin processor architecture. 2 Core Architecture Overview
  • 3. Module Outline Blackfin Family Overview The Blackfin Core • Arithmetic operations • Data fetching • Sequencing The Blackfin Bus Architecture and Memory • Modified Harvard architecture • Hierarchical memory structure • Flexible memory management Additional Blackfin Core Features • DMA • Dynamic power management • On-chip debug support Summary 3 Core Architecture Overview
  • 4. Blackfin Family Overview The Blackfin family consists of: • A broad range of Blackfin processors • Software development tools • Hardware evaluation and debug tools Extensive third-party support • Development tools • Operating systems • TCP/IP stacks • Hardware building blocks • Software solutions 4 Core Architecture Overview
  • 5. Blackfin Processors All Blackfin processors combine extensive DSP capability with high end MCU functions on the same core. • Creates a highly efficient and cost-effective solution. • A single software development tool chain All Blackfin processors are based on the same core architecture. • Once you understand one Blackfin processor, you can easily migrate from one family member to another. • Code compatible across family members. Processors vary in clock speed, amount of on-chip memory, peripheral suite, package types and sizes, power, and price. • Large selection lets you optimize your choice of a Blackfin processor for your application. 5 Core Architecture Overview
  • 6. Blackfin Family Peripherals The Blackfin family supports a wide variety of I/O: • EBIU (External Bus Interface Unit) • Parallel peripheral interface (PPI) • Serial ports (SPORTS) • GPIO • Timers • UARTS • SPI® • Ethernet • USB • CAN® • Two Wire Interface (TWI) • Pixel compositor • Lockbox™ secure technology • Host DMA • ATAPI • SDIO See the Blackfin selection guide for complete details 6 Core Architecture Overview
  • 7. Blackfin Processors Perform Signal Processing and Microcontroller Functions Traditional Model MCU ASIC • Control Signal Processing • Interfaces to sensors • Networking • Broad peripheral mix • RTC • Memory MCU Signal Processing ASIC • Watchdog • RTOS Signal Processing • MMU • Byte addressable New Model Blackfin core can perform all of these functions 7 Core Architecture Overview
  • 8. Blackfin Architecture What does it mean for the developer? Combining controller and DSP capabilities into a single core, along with rich I/O, enables development of efficient, low cost embedded media applications. • For example, multimedia over IP, digital cameras, telematics, software radio From a development perspective, a single core means there is only one tool chain. • An embedded application consisting of both control and signal processing modules is built using the same compiler. • The result is dense control code and high performance DSP code. 8 Core Architecture Overview
  • 9. Features Controller •L1 memory space for stack and heap •Dedicated stack and frame pointers •Byte addressability •Simple bit-level manipulation DSP • Fast, flexible arithmetic computational units • Unconstrained data flow to/from computational units • Extended precision and dynamic range • Efficient sequencing • Efficient I/O processing • The DSP aspect of the Blackfin core is optimized to perform FFTs and convolutions y[n] = ∑k =0 x[n − k ] ∗ h[k ] N −1 9 Core Architecture Overview
  • 10. Blackfin Core (e.g., ADSP-BF54x) 10 Core Architecture Overview
  • 11. The Blackfin Core 11 Core Architecture Overview
  • 12. The Blackfin Core The core consists of: Arithmetic unit • Supports SIMD operation • Load/store architecture Addressing unit • Supports dual data fetch Sequencer • Efficient program flow control Register files • Data • Addressing 12 Core Architecture Overview
  • 13. The Arithmetic Unit Performs arithmetic operations Dual 40-bit ALU (Arithmetic/ Logic Unit) • Performs 16-/32-/40-bit arithmetic and logical operations Dual 16 x 16 multiplier • Performs dual MACs (multiply-accumulates) when used with ALUs Barrel shifter • Performs shifts, rotates, bit operations 13 Core Architecture Overview
  • 14. Data Registers There are 8x 32-bit registers in the data register file. • Used to hold 32-bit vales or packed 16-bit There are also 2x 40-bit accumulators. • Typically used for MAC operations Data Registers 39 31 15 0 A0.X A0.H A0.L A1.X A1.H A1.L 31 15 0 R0 R0.H R0.L R1 R1.H R1.L R2 R3 R4 R4.H R4.L R5 R6 R7 R7.H R7.L 14 Core Architecture Overview
  • 15. 16-Bit ALU Operations—Examples The Algebraic Assembly syntax is intuitive and makes it easy to understand what the instruction is doing. Single 16-bit operation Dual 16-bit operation Quad 16-bit operation R6.H = R3.H + R2.L (s); R6 = R2 + | - R3; R3 = R0 - | - R1, R2 = R0 + | + R1; 31 16 0 31 16 0 31 16 0 31 16 0 R2 R2 R0 R0 R3 R3 R1 R1 + + - - - + + R6 R6 R2 R3 These operations effectively execute in a single cycle. 15 Core Architecture Overview
  • 16. 32-Bit ALU Operations—Examples Single 32-bit addition Dual 32-bit operation R6 = R2 + R3; R3 = R1 - R2, R4 = R1 + R2; 31 0 31 0 31 0 R2 R1 R1 R3 R2 R2 + - + R6 R4 R3 These operations effectively execute in a single cycle. 16 Core Architecture Overview
  • 17. Dual MAC Operations—Example Both MACs can be used at the same time to double the MAC throughput. The same two 32-bit input registers must be used (R2 and R3 in this example). Dual MAC operation R2 R3 x x - + A0 A1 A1 -= R2.H * R3.H, A0 += R2.L * R3.L; These operations effectively execute in a single cycle. NOTE: This can happen in parallel with a dual data fetch. 17 Core Architecture Overview
  • 18. Barrel Shifter Enable shifting or rotating any number of bits within a 16-/32-/40-bit register in a single cycle Perform individual bit operations on 32-bit data register contents • BITSET, BITCLR, BITTGL, BITTST Field Extract and Deposit instructions • Extract or insert a field of bits out of or into a 32-bit data register 18 Core Architecture Overview
  • 19. 8-Bit Video ALUs Four Video ALUs 19 Core Architecture Overview
  • 20. 8-Bit ALU Operations Four 8-bit ALUs provide parallel computational power targeted mainly for video operations. • Quad 8-bit add/subtract • Quad 8-bit average • SAA (Subtract-Absolute-Accumulate) instruction A quad 8-bit ALU instruction takes one cycle to complete. 64-Bit/8-Byte Field 64-Bit/8-Byte Field R3 R2 R1 R0 4 Bytes 4 Bytes Four 8-Bit Video ALUs 32-Bits Data Register File 20 Core Architecture Overview
  • 21. Additional Arithmetic Instructions There are a number of specialized instructions that are used to speed up the inner loop on various algorithms. Bitwise XOR • Enable creating LFSR (Linear Feedback Shift Registers) for use in CRC calculations or the generation of PRN sequences Bit stream multiplexing, add on sign, compare select • Convolutional encoder and Viterbi decoder support Add/Subtract with prescale up/down • IEEE 1180–compliant 2D 8 x 8 DCTs (Discrete Cosine Transforms) Vector search • Enable search a vector a pair at a time for greatest or least value 21 Core Architecture Overview
  • 22. The Addressing Unit The addressing unit generates addresses for data fetches. •Two DAG (Data Address Generator) arithmetic units enable generation of independent 32-bit wide addresses that can reach anywhere within the Blackfin memory space. •Up to two fetches can occur at the same time. 22 Core Architecture Overview
  • 23. Address Registers There are 6x general-purpose Pointer Registers. 31 0 • Used for GP 8-/16-/32-bit fetches P0 Address from memory Registers P1 Pointer Registers: There are four sets of registers used P2 P0-P5 are referred to as “preg.” for DSP-style data accesses. P3 P4 • Used for 16-/32-bit DSP data fetches such as dual data fetch, circular buffer P5 addressing, and bit reversal FP There are also dedicated stack (SP) SP and frame (FP) pointers. USP 31 0 31 0 31 0 31 0 • These are used for 32-bit accesses to I0 L0 B0 M0 stack frames. Index Registers: I1 L1 B1 M1 I0-I3 are referred to as “ireg.” I2 L2 B2 M2 I3 L3 B3 M3 23 Core Architecture Overview
  • 24. Addressing Addressing Unit supports: • Addressing only • With specified Pointer or Index Register • Provide address and post modify • Add an offset after the fetch is done • Circular buffering supported with this method • Provide address with an offset • Add an offset before the fetch, but no pointer update • Update address only • Modify address with reverse carry add All addressing is Register Indirect. • Index Registers I0-I3 (32-/16-bit accesses) • Pointer Registers P0–P5 (32-/16-/8-bit accesses) • Stack and Frame Pointer Registers (32-bit accesses) All addresses are Byte addresses. • Ordering is Little Endian. • Addresses must be aligned for the word size being fetched. • i.e., 32-bit fetches from addresses that are a multiple of four 24 Core Architecture Overview
  • 25. Circular Buffer Example Example Address Base address (B) and 0x00 0x00000001 1st Access 0x00000001 Starting address (I) = 0 0x00000002 0x00000002 0x04 4th Access Buffer length 0x08 0x00000003 0x00000003 L = 44 0x0C 0x00000004 0x00000004 (There are 11 data elements and each data element is 0x10 0x00000005 2nd Access 0x00000005 4-bytes) 0x14 0x00000006 0x00000006 5th Access Modify value M = 16 0x18 0x00000007 0x00000007 (4 elements * 4-bytes/element) 0x1C 0x00000008 0x00000008 0x20 0x00000009 3rd Access 0x00000009 Example memory access: 0x24 0x0000000A 0x0000000A R1 = [I0 ++ M2]; 0x28 0x0000000B 0x0000000B • The Addressing Unit supports Circular Buffer pointer addressing. • The process of boundary checking and pointer wrapping to stay in bounds happens in hardware with no overhead. • Buffers can be placed anywhere in memory without restriction due to the Base address registers. 25 Core Architecture Overview
  • 26. The Sequencer The sequencer’s function is to generate addresses for fetching instructions. • Uses a variety of registers to select the next address Aligns instructions as they are fetched • Always reads 64 bits from memory • Realigns what is fetched into individual 16-/32-/64-bit opcodes before sending to the execution pipeline Handles events • Interrupts and exceptions Conditional execution 26 Core Architecture Overview
  • 27. Program Flow 27 Core Architecture Overview
  • 28. Sequencer Registers These are registers that are used in the control of program flow. Arithmetic Status (ASTAT) tracks status bits for core operations. • Used in conditional execution of instructions. Return address registers. • Hold the 32-bit return address for program flow interruptions. Two sets of hardware loop management registers. • They manage up to two nested levels of zero overhead looping. • There are no core cycles spent managing the loop. • Looped code runs as efficiently as straight line code. Arithmetic Status LC0 Loop Counter ASTAT LT0 Loop Top Subroutine Return LB0 Loop Bottom RETS RETI Interrupt Return LC1 LT1 RETX Exception Return LB1 RETN NMI Return SYSCFG System Config RETE Emulation Return SEQSTAT Sequencer Status 28 Core Architecture Overview
  • 29. Instruction Pipeline The pipeline is fully interlocked. In the event of a data hazard, the sequencer automatically inserts stall cycles. Pipeline Stage Description Issue instruction address to IAB bus, start compare Instruction Fetch 1 (IF1) tag of instruction cache Instruction Fetch 2 (IF2) Wait for instruction data Instruction Fetch 3 (IF3) Read from IDB bus and align instruction Instruction Decode (DEC) Decode instructions Calculation of data addresses and branch target Address Calculation (AC) address Issue data address to DA0 and DA1 bus, start Data Fetch 1 (DF1) compare tag of data cache Data Fetch 2 (DF2) Read register files Read data from LD0 and LD1 bus, start multiply and Execute 1 (EX1) video instructions Execute 2 (EX2) Execute/Complete instructions (shift, add, logic, etc.) Writes back to register files, SD bus, and pointer Write Back (WB) updates (also referred to as the “commit” stage) 29 Core Architecture Overview
  • 30. Instruction Pipeline IF1 IF2 IF3 DEC AC DF1 DF2 EX1 EX2 WB Cycle N INST #1 N+1 INST #2 INST #1 N+2 INST #3 INST #2 INST #1 N+3 INST #4 INST #3 INST #2 INST #1 N+4 INST #5 INST #4 INST #3 INST #2 INST #1 N+5 INST #6 INST #5 INST #4 INST #3 INST #2 INST #1 N+6 INST #7 INST #6 INST #5 INST #4 INST #3 INST #2 INST #1 N+7 INST #8 INST #7 INST #6 INST #5 INST #4 INST #3 INST #2 INST #1 N+8 INST #9 INST #8 INST #7 INST #6 INST #5 INST #4 INST #3 INST #2 INST #1 N+9 INST #10 INST #9 INST #8 INST #7 INST #6 INST #5 INST #4 INST #3 INST #2 INST #1 N+10 INST #11 INST #10 INST #9 INST #8 INST #7 INST #6 INST #5 INST #4 INST #3 INST #2 N+11 INST #12 INST #11 INST #10 INST #9 INST #8 INST #7 INST #6 INST #5 INST #4 INST #3 N+12 INST #13 INST #12 INST #11 INST #10 INST #9 INST #8 INST #7 INST #6 INST #5 INST #4 Once the pipe is filled (e.g., at cycle N+9 in this case), one instruction will exit the pipeline on each core clock tick. • i.e., effectively, one instruction per core clock cycle executed 30 Core Architecture Overview
  • 31. Blackfin Event Handling Blackfin events include: • Interrupts • Generated by hardware (e.g., DMA complete) or software • Exceptions • Error condition or service related Handling is split between the CEC (Core Event Controller) and SIC (System Interrupt Controller). • CEC has 16 levels and deals with requests on a priority basis. • The nine lowest levels are general purpose and are used for Peripheral Interrupt Requests. • Each level has a 32-bit Interrupt Vector Register that points to the start of the ISR for that level. • SIC allows enabling Peripheral Interrupt Requests and mapping to specific CEC GP levels. • Allows setting peripheral request priorities 31 Core Architecture Overview
  • 32. System and Core Interrupts (ADSP-BF533 example) System Interrupt Source IVG #1 Core Event PLL Wakeup interrupt IVG7 Event Source IVG # Name DMA error (generic) IVG7 Highest Emulator 0 EMU PPI error interrupt IVG7 SPORT0 error interrupt IVG7 Reset 1 RST SPORT1 error interrupt IVG7 Non Maskable Interrupt 2 NMI SPI error interrupt IVG7 Exceptions 3 EVSW UART error interrupt IVG7 Reserved 4 - RTC interrupt IVG8 Hardware Error 5 IVHW P DMA 0 interrupt (PPI) IVG8 Core Timer 6 IVTMR R DMA 1 interrupt (SPORT0 RX) IVG9 I General Purpose 7 7 IVG7 O DMA 2 interrupt (SPORT0 TX) IVG9 R DMA 3 interrupt (SPORT1 RX) IVG9 General Purpose 8 8 IVG8 I DMA 4 interrupt (SPORT1 TX) IVG9 General Purpose 9 9 IVG9 T DMA 5 interrupt (SPI) IVG10 Y General Purpose 10 10 IVG10 DMA 6 interrupt (UART RX) IVG10 General Purpose 11 11 IVG11 DMA 7 interrupt (UART TX) IVG10 Timer0 interrupt IVG11 General Purpose 12 12 IVG12 Timer1 interrupt IVG11 General Purpose 13 13 IVG13 Timer2 interrupt IVG11 General Purpose 14 14 IVG14 PF interrupt A IVG12 General Purpose 15 15 IVG15 PF interrupt B IVG12 Lowest DMA 8/9 interrupt (MemDMA0) IVG13 1 Note: DMA 10/11 interrupt (MemDMA1) IVG13 Default IVG configuration shown. Watchdog timer interrupt IVG13 32 Core Architecture Overview
  • 33. Variable Instruction Lengths The Blackfin architecture uses three instruction opcode lengths to obtain the best code density while maintaining high performance. 16-bit instructions • Most control-type instructions and data fetches are 16-bits long to improve code density. 32-bit instructions • Most control-type instructions with an immediate value in the expression and most arithmetic instructions are 32-bits in length. Multi-issue 64-bit instructions • Certain 32-bit instructions can be executed in parallel with a pair of specific 16-bit instructions and specified with one 64-bit instruction. • Typically a 32-bit ALU/MAC instruction and one or two data fetch instructions • The delimiter symbol to separate instructions in a multi-issue instruction is a double pipe character “||.” Example: A1+=R0.L*R1.H, A0+=R0.L*R1.L || r0 = [i0++] || r1 = [i1++]; 33 Core Architecture Overview
  • 34. Instruction Packing When code is compiled and linked, 16-bit OP 32-bit OP the instructions are packed into 64-bit Multi-OP memory as densely as possible. Instruction Formats • i.e., no wasted memory space 15 0 No memory alignment restrictions for code: • Instructions can be placed anywhere in memory. • The sequencer fetches 64-bits of instruction at a time (from 8-byte boundaries) and performs an alignment operation to: • Isolate shorter opcodes 16-bit wide memory • Realign larger opcodes that straddle 4-/8-byte boundaries No Memory Alignment Restrictions: • This realignment hardware is Maximum Code Density and Minimum transparent to the user. System Memory Cost 34 Core Architecture Overview
  • 35. 16-bit FIR Filter Example—0.5 cycles per tap 31 0 Input Input R0 Input data delay line Coefficients Coefficient Coefficient R1 MAC into A0 MAC1 MAC0 R0L x0 x0 R1L A1 into M AC x x R0H x1 MAC into A0 to A 1 x1 R1H AC in + + A0 R0L x2 M x2 R1L x3 x3 R1H R0H A1 39 0 Loop: A1+=R0.H*R1.L, A0+=R0.L*R1.L || R0.L = [I0++] || nop; Loopend: A1+=R0.L*R1.H, A0+=R0.H*R1.H || R0.H = [I0++] || R1 = [I1++]; Samples in R0 • Performs operations in support of two filter outputs in Coefficients in R1 each clock cycle 35 Core Architecture Overview
  • 36. Bus and Memory Architecture 36 Core Architecture Overview
  • 37. Blackfin Memory Hierarchy The Blackfin architecture uses a memory hierarchy with a primary goal of achieving memory performance similar to that of the fastest memory (i.e., L1) with an overall cost close to that of the least expensive memory (i.e., L3). • Portions of L1 can be configured as cache, which allows increased memory performance by prefetching and storing copies of code/data from L2/L3. L1 Memory L2 Memory L3 Memory CORE Internal Internal (if present) External (e.g., SDRAM) (Registers) Smallest capacity Larger than L1 Largest capacity Single-cycle access Multicycle access Highest latency 37 Core Architecture Overview
  • 38. Internal Bus Structure of the ADSP-BF533 38 Core Architecture Overview
  • 39. Configurable Memory The best system performance can be achieved when executing code or fetching data out of L1 memory. Two methods can be used to fill the L1 memory—caching and dynamic downloading–the Blackfin processor supports both. • Microcontrollers have typically used the caching method, as they have large programs often residing in external memory and determinism is not as important. • DSPs have typically used dynamic downloading (e.g. code overlays), as they need direct control over which code runs in the fastest memory. The Blackfin processor allows the programmer to choose one or both methods to optimize system performance. • Portions of L1 instruction and data memory can be configured to be used as SRAM or as cache. • Enabling these 16K Byte blocks for use as cache reduces the amount of L1 available as SRAM. However, there is still space available for code, such as critical functions or interrupt handlers. 39 Core Architecture Overview
  • 40. Cache and Memory Management Cache allows users to take advantage of single-cycle memory without having to specifically move instructions and or data “manually.” • L2/L3 memory can be used to hold large programs and data sets. • The paths to and from L1 memory are optimized to perform with cache enabled. Cache automatically optimizes code and data by keeping recently used information in L1. • LRU (Least Recently Used) algorithm is used for determining which cache line to replace. Cache is managed by the memory management unit though a number of CPLBs (Cachability, Protection, and Lookaside Buffers). • The CPLBs divide up the Blackfin memory space into pages and allow individual page control of memory management items such as: • User/Supervisor Access Protection • Read/Write Access Protection • Cacheable or Non-Cacheable 40 Core Architecture Overview
  • 41. Additional Blackfin Core Features 41 Core Architecture Overview
  • 42. Direct Memory Access (DMA) The Blackfin processors includes a DMA (Direct Memory Access) facility on all CPUs. • Enable moving data between memory and a peripheral, or memory to memory without using any core cycles • Core only involved in setting up DMA parameters • Core is interrupted when DMA completes, thereby improving efficiency of data flow operations. • Alternatively, DMA status allows polling operations. • The DMA controller has dedicated buses to connect between the DMA unit itself and: • Peripherals • External memory (e.g., L3 or SDRAM) • Core memory (e.g., L1) 42 Core Architecture Overview
  • 43. Power Management Options Low active power • Flexible power management with automatic power-down for unused peripheral sections. • Dynamic power management allows dynamic modification of both frequency and voltage. • PLL can multiply CLKIN from 1x to 64x. • Core voltage can be optimized for the operating frequency. Low standby power • 5 power modes • Full on, active, sleep, deep sleep, hibernate • Real-time clock with alarm and wakeup features • RTC has its own power supply and clock. • It is unaffected by hardware or software reset. 43 Core Architecture Overview
  • 44. Blackfin Processors Optimize Power Consumption Vdd 1.2V, 500 MHz Processor Operation t 0.9V, 250 MHz PLL Processor Operation Settling 0.8V, 100 MHz PLL Regulator Settling Processor Operation Regulator Power Transition Transition Consumption Just varying the frequency mW Dynamic Power Management Varying the voltage and frequency 44 Core Architecture Overview
  • 45. Power Mode Transitions Under software control, the application can change from one power state to another. A set of libraries (i.e., System Services) enable control of clocking, core voltage, and power states with just a function call. 45 Core Architecture Overview
  • 46. Hardware Debug Support 46 Core Architecture Overview
  • 47. Advanced Support for Embedded Debug The JTAG port is also used to provide in-circuit emulation capabilities. • Nonintrusive debugging • BTC (background telemetry channel) Hardware breakpoints • Six instruction and two data watchpoints Performance monitor unit • Counters for cycles and occurrences of specific activities Execution trace buffer • Stores last 16 nonincremental PC values 47 Core Architecture Overview
  • 48. Summary The Blackfin core has a number of features and instructions that support both control and DSP types of operations. The Blackfin’s high performance memory/bus architecture supports zero wait state instruction and dual fetches from L1 at the core clock rate, all at the same time. • Large applications that reside in the larger but slower external memory still benefit from the high performance L1 memory through the use of caching and/or dynamic downloading The Blackfin architecture enables both efficient implementation of media-related applications and efficient development of those applications. 48 Core Architecture Overview
  • 49. Resources For detailed information on the Blackfin architecture, please refer to the Analog Devices website which has links to manuals, data sheets, FAQs, Knowledge Base, sample code, development tools and much more: • www.analog.com/blackfin For specific questions click on the “Ask a question” button. Kaztek Systems provides worldwide technical training on processor architecture and systems development using Analog Devices Processors. For more information Visit www.kaztek.com or Email info@kaztek.com 49 Core Architecture Overview