SlideShare uma empresa Scribd logo
1 de 40
DSP Architectures




Rensselaer at Hartford
ECSE 6620 - Fall 2001
     Lecture 16
         Jason M. Stripinis
    jasonstripinis@engineer.com
Basic Processor Structure




• Here we see a very simple processor structure - such as
  might be found in a small 8-bit microprocessor.
12 DEC 01           ECSE 6620 - Jason Stripinis2(jasonstripinis@eng
Basic Processor Functions
• ALU
  – Arithmetic Logic Unit - this circuit takes two operands on the
    inputs (labeled A and B) and produces a result on the output
    (labeled Y).
  – The operations will usually include, as a minimum:
      •   add, subtract
      •   and, or, not
      •   shift right, shift left
      •   ALUs in more complex processors will execute many more
          instructions.




12 DEC 01              ECSE 6620 - Jason Stripinis3(jasonstripinis@eng
Basic Processor Functions
• Register File
   – A set of storage locations (registers) for storing temporary results.
     Early machines had just one register (accumulator). Modern RISC
     processors will have at least 32 registers.
• Instruction Register
   – The instruction currently being executed by the processor is stored
     here.
• Control Unit
   – The control unit decodes the instruction in the instruction register
     and sets signals which control the operation of most other units of
     the processor. For example, the operation code (opcode) in the
     instruction will be used to determine the settings of control signals
     for the ALU which determine which operation (+,-,^,v,~,shift,etc)
     it performs.
12 DEC 01              ECSE 6620 - Jason Stripinis4(jasonstripinis@eng
Basic Processor Functions
• Clock
   – The vast majority of processors are synchronous, that is, they use a
     clock signal to determine when to capture the next data word and
     perform an operation on it. In a globally synchronous processor, a
     common clock needs to be routed (connected) to every unit in the
     processor.
• Program counter
   – The program counter holds the memory address of the next
     instruction to be executed. It is updated every instruction cycle to
     point to the next instruction in the program. Branch instructions
     change the program counter by other than a simple increment.




12 DEC 01              ECSE 6620 - Jason Stripinis5(jasonstripinis@eng
Basic Processor Functions
• Memory Address Register
   – This register is loaded with the address of the next data word to be
     fetched from or stored into main memory.
• Address Bus
   – Transfers addresses to memory and memory-mapped peripherals.
     It is driven by the processor acting as a bus master.
• Data Bus
   – Carries data to and from the processor, memory and peripherals. It
     will be driven by the data source, i.e. processor, memory, etc.
• Multiplexed Bus
   – To limit device pin counts and bus complexity, some processors
     MUX address and data onto the same bus, with an adverse affect
     on performance.
12 DEC 01              ECSE 6620 - Jason Stripinis6(jasonstripinis@eng
DSP Implementations
• DSP Algorithm
   – Series of mathematical operations that are applied to process a
     sequence of digital signals sampled from the real (analog) world
• Application examples
   –   Filtering
   –   FFT
   –   Noise cancellation
   –   Spectral Processing




12 DEC 01              ECSE 6620 - Jason Stripinis7(jasonstripinis@eng
Why is special architecture good for
    digital signal processing?
• DSPs are tailored to run DSP algorithms efficiently.
• Special functions to handle DSP algorithm demands:
   – Unique data access patterns
        • Streams of data requiring high bandwidth
        • Low data repetition but high code repetition
   –   Math operation focus (“number cruncher”)
   –   Real-time constraints
   –   Power and size constraints
   –   Cost requirement
   –   Attention to numeric effects (limited fixed point error)




12 DEC 01               ECSE 6620 - Jason Stripinis8(jasonstripinis@eng
DSP Functional Characteristics
• Typically require a few specific operations
• Consider a FIR Filter :




       This requires:
          –additions & multiplications
          –delays
          –array handling


12 DEC 01          ECSE 6620 - Jason Stripinis9(jasonstripinis@eng
DSP Typical Operations
• Additions & Multiplications
   – fetch two operands
   – perform the addition or multiplication (or both)
   – store the result


• Delays
   – store the result for later use


• Array Handling
   – fetch values from consecutive memory locations
   – copy data from register to register


12 DEC 01              ECSE 6620 - Jason Stripinis10
                                                   (jasonstripinis@eng
DSP Typical Operations
• To perform these basic operations most DSPs:
   – have a parallel multiply and add
   – have multiple memory accesses (to fetch two operands and store the
     result)
   – have sufficient registers to hold data temporarily
   – efficient address generation for array handling
   – special features such as delays or circular addressing




12 DEC 01              ECSE 6620 - Jason Stripinis11
                                                   (jasonstripinis@eng
DSP Arithmetic Logic Unit
• Most DSP operations require additions and multiplications
  together. So DSP processors usually have parallel
  hardware adders and multipliers which can be used with a
  single instruction:




12 DEC 01          ECSE 6620 - Jason Stripinis12
                                               (jasonstripinis@eng
Register Structure
• Delays require that intermediate values be held for later
  use.
• For example, when keeping a running total - the total can
  be kept within the processor to avoid wasting repeated
  reads from and writes to memory.
• For this reason DSP processors have lots of registers which
  can be used to hold intermediate values.
• Registers may be fixed-point or floating-point.




12 DEC 01           ECSE 6620 - Jason Stripinis13
                                                (jasonstripinis@eng
Memory Addressing
• Array handling requires that data can be fetched efficiently
  from consecutive memory locations.
• For this reason DSP processors have address registers
  which are used to hold addresses and can be used to
  generate the next needed address efficiently.
• Usually, the next needed address can be generated during
  the data fetch or store operation, and with no overhead.




12 DEC 01           ECSE 6620 - Jason Stripinis14
                                                (jasonstripinis@eng
Memory Addressing
• Example DSP address generation operations:

Instruction Name                    Description
                                    read the data pointed to by the address in
*rP        register indirect
                                    register rP
                                    having read the data, postincrement the address
*rP++      postincrement
                                    pointer to point to the next value in the array
                                    having read the data, postdecrement the address
*rP--      postdecrement            pointer to point to the previous value in the
                                    array
                                    having read the data, postincrement the address
*rP++rI    register postincrement   pointer by the amount held in register rI to point
                                    to rI values further down the array
                                    having read the data, postincrement the address
*rP++rIr   bit reversed             pointer to point to the next value in the array, as
                                    if the address bits were in bit reversed order


12 DEC 01                  ECSE 6620 - Jason Stripinis15
                                                       (jasonstripinis@eng
Memory Architectures for DSP
• For arithmetic the DSP needs to fetch two operands in a
  single instruction cycle.
• Since we also need to store the result and to read the
  instruction itself more than two memory accesses per
  instruction cycle are needed.
• Even the simplest DSP operation - an addition involving
  two operands and a store of the result to memory - requires
  four memory accesses (three to fetch the two operands and
  the instruction, plus a fourth to write the result)




12 DEC 01          ECSE 6620 - Jason Stripinis16
                                               (jasonstripinis@eng
Memory Architectures for DSP
• DSP processors usually support multiple memory accesses
  in the same instruction cycle.
• It is not possible to access two different memory addresses
  simultaneously over a single memory bus.
• There are two common methods to achieve multiple
  memory accesses per instruction cycle:
            • Harvard architecture
            • modified von Neumann architecture




12 DEC 01          ECSE 6620 - Jason Stripinis17
                                               (jasonstripinis@eng
Memory Architectures for DSP
                (Harvard Architecture)
• The Harvard architecture has two separate physical
  memory buses, allowing two simultaneous memory
  accesses.
• The true Harvard architecture dedicates one bus for
  fetching instructions, with the other available to fetch
  operands.
• This is inadequate for DSP operations, which usually
  involve at least two operands. So DSP Harvard
  architectures usually permit the 'program' bus to be used
  also for access of operands.



12 DEC 01           ECSE 6620 - Jason Stripinis18
                                                (jasonstripinis@eng
Memory Architectures for DSP
                (Harvard Architecture)
• Note that it is often necessary to fetch three things - the
  instruction plus two operands - and the Harvard
  architecture is inadequate to support this.
• So DSP Harvard architectures often also include a cache
  memory which can be used to store instructions which will
  be reused, leaving both Harvard buses free for fetching
  operands.
• The Harvard architecture plus cache - is sometimes called
  an extended Harvard architecture or Super Harvard
  ARChitecture (SHARC).


12 DEC 01           ECSE 6620 - Jason Stripinis19
                                                (jasonstripinis@eng
Memory Architectures for DSP
                (Harvard Architecture)
• The Harvard architecture requires two memory buses. This
  makes it expensive to bring off the chip - for example a
  DSP using 32 bit words and with a 32 bit address space
  requires at least 64 pins for each memory bus - a total of
  128 pins if the Harvard architecture is brought off the chip.
  This results in very large chips, which are difficult to
  design into a circuit.




12 DEC 01           ECSE 6620 - Jason Stripinis20
                                                (jasonstripinis@eng
Memory Architectures for DSP
            (von Neumann Architecture)
• The von Neumann architecture uses only a single memory
  bus. This is relatively cheap, requiring less pins that the
  Harvard architecture, and simple to use because the
  programmer can place instructions or data anywhere
  throughout the available memory.
• But it does not permit multiple memory accesses.
• The modified von Neumann architecture allows multiple
  memory accesses per instruction cycle by running the
  memory clock faster than the instruction cycle.




12 DEC 01          ECSE 6620 - Jason Stripinis21
                                               (jasonstripinis@eng
Memory Architectures for DSP
            (von Neumann Architecture)
• Each instruction cycle is divided into multiple 'machine
  states' and a memory access can be made in each machine
  state, permitting a multiple memory accesses per
  instruction cycle.
• The modified von Neumann architecture permits all the
  memory accesses needed to support addition or
  multiplication: fetch of the instruction; fetch of the two
  operands; and storage of the result.




12 DEC 01           ECSE 6620 - Jason Stripinis22
                                                (jasonstripinis@eng
Why use a special architecture for
     digital signal processing?
                         The Answers
      Unique data access patterns      Bit reversed addressing (FFT)
      Streams of data requiring high   Multiple access memory
      bandwidth                        architecture
      Low data repetition but high     Eliminate data cache (save $$)
      code repetition
      Math operation focus             MAC instruction
                                       Vector processing unit
      Real-time constraints            Zero-overhead loops
      Power and size constraints       Limited addition function
                                       units (unlike GPP)
      Cost requirement                 On-board peripherals (SOC)
      Attention to numeric effects     ALU with 16-bit operands and
      (limited fixed point error)      32-bit result



12 DEC 01             ECSE 6620 - Jason Stripinis23
                                                  (jasonstripinis@eng
DSP Generations
• 1st Generation (1979-1982)
   – Transition from experimental signal processors
• 2nd Generation (1985-1986)
   – Move from co-processor to stand-alone processor
• 3rd Generation (1987-1989)
   – Major hardware improvements to speed
• 4th Generation (1990-1996)
   – More on-chip integration (ADC, DAC, memory, multi-processor)
• 5th Generation (1997-)




12 DEC 01            ECSE 6620 - Jason Stripinis24
                                                 (jasonstripinis@eng
DSP Generations
              1st Generation (1979-1982)
• Primarily targeted at digital filtering
• Specialized co-processor for signal processing
• NMOS (n-Channel Metal Oxide Semi) fabrication

•   16-bit fixed point
•   fast multiplier (and adder)
•   Harvard architecture
•   Specialized Instruction set




12 DEC 01            ECSE 6620 - Jason Stripinis25
                                                 (jasonstripinis@eng
DSP Generations
               1st Generation (1979-1982)
• Example = Texas Instruments TMS32010
   –   16-bit fixed point
   –   Harvard architecture
   –   two Address registers
   –   one A register (adder)
   –   one P register (multiplier)
   –   one T register (data shift on delay line)
   –   No zero-overhead loop
   –   Specialized Instruction set
   –   MAC Time 400 ns (<100 ns today)
   –   50 ms per 1024-FFT



12 DEC 01               ECSE 6620 - Jason Stripinis26
                                                    (jasonstripinis@eng
DSP Generations
            1st Generation (1979-1982)
• Example = Texas Instruments TMS32010




12 DEC 01       ECSE 6620 - Jason Stripinis27
                                            (jasonstripinis@eng
DSP Generations
            2nd Generation (1985-1986)
• Move from co-processor to stand-alone processor
• CMOS (Complementary Metal Oxide Semi) fabrication
• Double the speed of first generation

•   Advances in memory architecture (more internal RAM)
•   better pipelining of functional units
•   address generators (bit-reversing)
•   Zero-overhead loop HW
•   Limited floating point in SW



12 DEC 01          ECSE 6620 - Jason Stripinis28
                                               (jasonstripinis@eng
DSP Generations
              2nd Generation (1985-1986)
• Example = Texas Instruments TMS32020 (1985)
   –   16-bit fixed point
   –   Harvard architecture
   –   Improved TMS32010
   –   RPTS allows pipelined instruction performed in single cycle
   –   Specialized Instruction set
   –   MAC Time 200 ns
   –   10 ms per 1024-FFT




12 DEC 01              ECSE 6620 - Jason Stripinis29
                                                   (jasonstripinis@eng
DSP Generations
              3rd Generation (1987-1989)
• Increased floating point support
   – 32-bit floating point hardware DSPs released
   – Floating point emulation on fixed point processors
   – IEEE754 support
• Hardware enhancements (large speed increase)
   –   dense CMOS fabrication
   –   on chip DMA
   –   instruction caches
   –   increased clock rates (first cores above 10 MHz)
• Increased complexity of SW



12 DEC 01              ECSE 6620 - Jason Stripinis30
                                                   (jasonstripinis@eng
DSP Generations
              3rd Generation (1987-1989)
• Example = Motorola DSP56001 (1988)
   –   24-bit data, instructions
   –   24-bit fixed point
   –   3 memory spaces (P, X, Y)
   –   parallel moves
   –   circular addressing
   –   MAC Time 75 ns (21 ns today)
   –   ~3 ms per 1024-FFT
• Other Examples:
   – AT&T DSP16A
   – Analog Devices ADSP-2100
   – TI TMS320C50

12 DEC 01            ECSE 6620 - Jason Stripinis31
                                                 (jasonstripinis@eng
DSP Generations
              4th Generation (1990-1996)
• Hardware integration
   –   ADC
   –   DAC
   –   more memory
   –   multiple DSPs on one chip
• Decreasing power consumption
   – 5.0 VDC → 3.3 VDC → 3.0 VDC → 2.7 VDC
• GPPs start to get DSP functions
   – SIMD
   – Leads to Intel introducing MMX (MultiMedia eXtensions) for x86




12 DEC 01             ECSE 6620 - Jason Stripinis32
                                                  (jasonstripinis@eng
DSP Generations
               4th Generation (1990-1996)
• Example = TI TMS320C541 (1995)
   –   Enhanced architecture
   –   Low voltage (3.3 VDC)
   –   More on-chip memory
   –   Application specific functional units
   –   MAC Time 20 ns (10 ns today)
   –   ~1 ms per 1024-FFT


• Example = TI TMS320C80
   – multiple processors per chip




12 DEC 01              ECSE 6620 - Jason Stripinis33
                                                   (jasonstripinis@eng
The GPP Option
• High-performance general-purpose processors for PCs and
  workstations are increasingly suitable for some DSP
  applications.
• E.g., Intel MMX Pentium, Motorola/IBM PowerPC 604e
• These processors achieve excellent to outstanding floating
  and/or fixed-point DSP performance via:
   –   Very high clock rates (200-500 MHz)
   –   Superscalar architectures
   –   Single-cycle multiplication and arithmetic operations
   –   Good memory bandwidth
   –   Branch prediction
   –   In some cases, single-instruction, multiple-data (SIMD) ops

12 DEC 01              ECSE 6620 - Jason Stripinis34
                                                   (jasonstripinis@eng
DSP Generations
                  5th Generation (1997-)
• Not the classic DSP architectures
   – SIMD (Single Instruction Multiple Data stream) instructions
   – VLIW (Very Long Instruction Words) allows RISC processing
       • High parallelism
       • Increased clock speeds
       • No longer application specific functional units (no MAC FU)
• Low voltage (2.5 VDC or less, even 1.2 VDC cores)
• MAC Time 3 ns (but can be power hungry)
• GPPs start to get DSP functions
   – Intel introduces MMX (MultiMedia eXtensions) for x86 in 1997
• Increased integration
   – MCU and DSP cores on same chip
   – MCU functions/ports added to DSPs
12 DEC 01             ECSE 6620 - Jason Stripinis35
                                                  (jasonstripinis@eng
DSP Generations
                   5th Generation (1997-)
• SIMD (Single Instruction Multiple Data) instructions
   –   Enhance throughput by allowing parallelism
   –   Requires multiple functional units and wider buses
   –   May support multiple data widths (different functional groups)
   –   Example = DSP16000




               WAS                              SIMD


12 DEC 01              ECSE 6620 - Jason Stripinis36
                                                   (jasonstripinis@eng
DSP Generations
                  5th Generation (1997-)
• VLIW (Very Long Instruction
  Words)
   – Instruction Level Parallelism (ILP) can
     be a major performance gain
       • Superscalar implementation requires
         larger die and more power to
         dynamically pipeline instructions
   – VLIW can be used to statically pipeline
     instructions at compile time (or even by
     hand!)
   – VLIW instruction words have fixed
     "slots" for instructions that map to the
     functional units available.


12 DEC 01             ECSE 6620 - Jason Stripinis37
                                                  (jasonstripinis@eng
DSP Generations
                 5th Generation (1997-)
• VLIW Advantages
   – huge theoretical pay off
       • less than 1 ns per MAC!
       • Less than 75 ns per 1024-FFT


• VLIW Drawbacks
   – Can be very difficult to program and debug
   – High power consumption if VLIW is not filled
   – Code size dramatically increases requiring more program memory




12 DEC 01             ECSE 6620 - Jason Stripinis38
                                                  (jasonstripinis@eng
DSP Generations
            5th Generation (1997-)
• VLIW Example = TI TMS320C6201




                                             32-bit Functional Units
                                             Lx = ALU
                                             Sx = Branching
                                                 and shifting
                                             Mx = Multiplier
                                             Dx = Data Store




12 DEC 01      ECSE 6620 - Jason Stripinis39
                                           (jasonstripinis@eng
DSP Generational Development
• DSP processor performance has increased by a factor of
  about 400x over the past 20 years
                   400
                   350
                   300
                   250
                   200
                   150                                 MAC (ns)
                   100
                    50
                     0
                         1st   2nd   3rd   4th   5th
                         Gen   Gen   Gen   Gen   Gen

• DSP architectures will be increasingly specialized for
  applications, especially communications applications
• General-purpose processors will become viable for many
  DSP applications

12 DEC 01          ECSE 6620 - Jason Stripinis40
                                               (jasonstripinis@eng

Mais conteĂşdo relacionado

Mais procurados

Introduction to Digital Signal processors
Introduction to Digital Signal processorsIntroduction to Digital Signal processors
Introduction to Digital Signal processorsPeriyanayagiS
 
RISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingRISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingTushar Swami
 
DRAM Cell - Working and Read and Write Operations
DRAM Cell - Working and Read and Write OperationsDRAM Cell - Working and Read and Write Operations
DRAM Cell - Working and Read and Write OperationsNaman Bhalla
 
Xilinx 4000 series
Xilinx 4000 seriesXilinx 4000 series
Xilinx 4000 seriesdragonpradeep
 
ARM7-ARCHITECTURE
ARM7-ARCHITECTURE ARM7-ARCHITECTURE
ARM7-ARCHITECTURE Dr.YNM
 
Pipelining approach
Pipelining approachPipelining approach
Pipelining approachGopinathD17
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOCA B Shinde
 
Layout & Stick Diagram Design Rules
Layout & Stick Diagram Design RulesLayout & Stick Diagram Design Rules
Layout & Stick Diagram Design Rulesvarun kumar
 
Serial Peripheral Interface(SPI)
Serial Peripheral Interface(SPI)Serial Peripheral Interface(SPI)
Serial Peripheral Interface(SPI)Dhaval Kaneria
 
ARM CORTEX M3 PPT
ARM CORTEX M3 PPTARM CORTEX M3 PPT
ARM CORTEX M3 PPTGaurav Verma
 
Arm modes
Arm modesArm modes
Arm modesabhi165
 
Architecture of 8051
Architecture of 8051Architecture of 8051
Architecture of 8051hello_priti
 
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set ArchitectureJaffer Haadi
 
Rs 232 interface
Rs 232 interfaceRs 232 interface
Rs 232 interfacePREMAL GAJJAR
 
Introduction to Embedded System I: Chapter 2 (5th portion)
Introduction to Embedded System I: Chapter 2 (5th portion)Introduction to Embedded System I: Chapter 2 (5th portion)
Introduction to Embedded System I: Chapter 2 (5th portion)Moe Moe Myint
 
CISC & RISC Architecture
CISC & RISC Architecture CISC & RISC Architecture
CISC & RISC Architecture Suvendu Kumar Dash
 
Pic microcontroller architecture
Pic microcontroller architecturePic microcontroller architecture
Pic microcontroller architectureDominicHendry
 

Mais procurados (20)

Introduction to Digital Signal processors
Introduction to Digital Signal processorsIntroduction to Digital Signal processors
Introduction to Digital Signal processors
 
RISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingRISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set Computing
 
DRAM Cell - Working and Read and Write Operations
DRAM Cell - Working and Read and Write OperationsDRAM Cell - Working and Read and Write Operations
DRAM Cell - Working and Read and Write Operations
 
Unit ii.arc of tms320 c5 xx
Unit ii.arc of tms320 c5 xxUnit ii.arc of tms320 c5 xx
Unit ii.arc of tms320 c5 xx
 
Xilinx 4000 series
Xilinx 4000 seriesXilinx 4000 series
Xilinx 4000 series
 
TMS320C5x
TMS320C5xTMS320C5x
TMS320C5x
 
ARM7-ARCHITECTURE
ARM7-ARCHITECTURE ARM7-ARCHITECTURE
ARM7-ARCHITECTURE
 
Pipelining approach
Pipelining approachPipelining approach
Pipelining approach
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOC
 
Layout & Stick Diagram Design Rules
Layout & Stick Diagram Design RulesLayout & Stick Diagram Design Rules
Layout & Stick Diagram Design Rules
 
Serial Peripheral Interface(SPI)
Serial Peripheral Interface(SPI)Serial Peripheral Interface(SPI)
Serial Peripheral Interface(SPI)
 
FPGA
FPGAFPGA
FPGA
 
ARM CORTEX M3 PPT
ARM CORTEX M3 PPTARM CORTEX M3 PPT
ARM CORTEX M3 PPT
 
Arm modes
Arm modesArm modes
Arm modes
 
Architecture of 8051
Architecture of 8051Architecture of 8051
Architecture of 8051
 
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set Architecture
 
Rs 232 interface
Rs 232 interfaceRs 232 interface
Rs 232 interface
 
Introduction to Embedded System I: Chapter 2 (5th portion)
Introduction to Embedded System I: Chapter 2 (5th portion)Introduction to Embedded System I: Chapter 2 (5th portion)
Introduction to Embedded System I: Chapter 2 (5th portion)
 
CISC & RISC Architecture
CISC & RISC Architecture CISC & RISC Architecture
CISC & RISC Architecture
 
Pic microcontroller architecture
Pic microcontroller architecturePic microcontroller architecture
Pic microcontroller architecture
 

Destaque

Digital Signal Processors - DSP's
Digital Signal Processors - DSP'sDigital Signal Processors - DSP's
Digital Signal Processors - DSP'sHicham Berkouk
 
Digital Signal Processor ( DSP ) [French]
Digital Signal Processor ( DSP )  [French]Digital Signal Processor ( DSP )  [French]
Digital Signal Processor ( DSP ) [French]Assia Mounir
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGSnehal Hedau
 
Real time DSP algorithms for Mobile communication
Real time DSP algorithms for Mobile communicationReal time DSP algorithms for Mobile communication
Real time DSP algorithms for Mobile communicationEmbedded Plus Trichy
 
Coursdsp tdi
Coursdsp tdiCoursdsp tdi
Coursdsp tdiMan Zoubinos
 
07 - MartinièreMonplaisir - Lyon - F2000 - 2012
07 - MartinièreMonplaisir - Lyon - F2000 - 201207 - MartinièreMonplaisir - Lyon - F2000 - 2012
07 - MartinièreMonplaisir - Lyon - F2000 - 2012CÊdric Frayssinet
 
Lecture: Digital Signal Processing Batch 2009
Lecture: Digital Signal Processing Batch 2009Lecture: Digital Signal Processing Batch 2009
Lecture: Digital Signal Processing Batch 2009ubaidis
 
Digital Signal Processing
Digital Signal Processing Digital Signal Processing
Digital Signal Processing Sri Rakesh
 
presentation on digital signal processing
presentation on digital signal processingpresentation on digital signal processing
presentation on digital signal processingsandhya jois
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image ProcessingSahil Biswas
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017NVIDIA
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare pptMandy Suzanne
 

Destaque (19)

Digital Signal Processors - DSP's
Digital Signal Processors - DSP'sDigital Signal Processors - DSP's
Digital Signal Processors - DSP's
 
Digital Signal Processor ( DSP ) [French]
Digital Signal Processor ( DSP )  [French]Digital Signal Processor ( DSP )  [French]
Digital Signal Processor ( DSP ) [French]
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSING
 
Dsp ppt
Dsp pptDsp ppt
Dsp ppt
 
Real time DSP algorithms for Mobile communication
Real time DSP algorithms for Mobile communicationReal time DSP algorithms for Mobile communication
Real time DSP algorithms for Mobile communication
 
Dsp algorithms 02
Dsp algorithms 02Dsp algorithms 02
Dsp algorithms 02
 
Chap1 dsp
Chap1 dspChap1 dsp
Chap1 dsp
 
Dsp book
Dsp bookDsp book
Dsp book
 
Coursdsp tdi
Coursdsp tdiCoursdsp tdi
Coursdsp tdi
 
07 - MartinièreMonplaisir - Lyon - F2000 - 2012
07 - MartinièreMonplaisir - Lyon - F2000 - 201207 - MartinièreMonplaisir - Lyon - F2000 - 2012
07 - MartinièreMonplaisir - Lyon - F2000 - 2012
 
CV
CVCV
CV
 
DSP Processor
DSP Processor DSP Processor
DSP Processor
 
Chap2 dsp
Chap2 dspChap2 dsp
Chap2 dsp
 
Lecture: Digital Signal Processing Batch 2009
Lecture: Digital Signal Processing Batch 2009Lecture: Digital Signal Processing Batch 2009
Lecture: Digital Signal Processing Batch 2009
 
Digital Signal Processing
Digital Signal Processing Digital Signal Processing
Digital Signal Processing
 
presentation on digital signal processing
presentation on digital signal processingpresentation on digital signal processing
presentation on digital signal processing
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image Processing
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Semelhante a DSP architecture

11-risc-cisc-and-isa-w.pptx
11-risc-cisc-and-isa-w.pptx11-risc-cisc-and-isa-w.pptx
11-risc-cisc-and-isa-w.pptxSuma Prakash
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memoryNico Ludwig
 
W8_1: Intro to UoS Educational Processor
W8_1: Intro to UoS Educational ProcessorW8_1: Intro to UoS Educational Processor
W8_1: Intro to UoS Educational ProcessorDaniel Roggen
 
DSP Processor.pptx
DSP Processor.pptxDSP Processor.pptx
DSP Processor.pptxAswathSelvaraj
 
Dsp ajal
Dsp  ajalDsp  ajal
Dsp ajalAJAL A J
 
Computer organization
Computer organizationComputer organization
Computer organizationishapadhy
 
Introduction to debugging linux applications
Introduction to debugging linux applicationsIntroduction to debugging linux applications
Introduction to debugging linux applicationscommiebstrd
 
Lecture02 types
Lecture02 typesLecture02 types
Lecture02 typesGanesh Chavan
 
Digital Signal processor ADSP 21XX family
Digital Signal processor ADSP 21XX familyDigital Signal processor ADSP 21XX family
Digital Signal processor ADSP 21XX familySaloni Rane
 
Unit-1_Digital Computers, number systemCOA[1].pptx
Unit-1_Digital Computers, number systemCOA[1].pptxUnit-1_Digital Computers, number systemCOA[1].pptx
Unit-1_Digital Computers, number systemCOA[1].pptxVanshJain322212
 
Chap2 - ADSP 21K Manual - Processor and Software Overview
Chap2 - ADSP 21K Manual - Processor and Software OverviewChap2 - ADSP 21K Manual - Processor and Software Overview
Chap2 - ADSP 21K Manual - Processor and Software OverviewSethCopeland
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
8 bit Microprocessor with Single Vectored Interrupt
8 bit Microprocessor with Single Vectored Interrupt8 bit Microprocessor with Single Vectored Interrupt
8 bit Microprocessor with Single Vectored InterruptHardik Manocha
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_clusterPrabhat gangwar
 
digital logic circuits, digital component
digital logic circuits, digital componentdigital logic circuits, digital component
digital logic circuits, digital componentRai University
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh PPlusOrMinusZero
 
B.sc cs-ii -u-1.2 digital logic circuits, digital component
B.sc cs-ii -u-1.2 digital logic circuits, digital componentB.sc cs-ii -u-1.2 digital logic circuits, digital component
B.sc cs-ii -u-1.2 digital logic circuits, digital componentRai University
 

Semelhante a DSP architecture (20)

11-risc-cisc-and-isa-w.pptx
11-risc-cisc-and-isa-w.pptx11-risc-cisc-and-isa-w.pptx
11-risc-cisc-and-isa-w.pptx
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
 
W8_1: Intro to UoS Educational Processor
W8_1: Intro to UoS Educational ProcessorW8_1: Intro to UoS Educational Processor
W8_1: Intro to UoS Educational Processor
 
DSP Processor.pptx
DSP Processor.pptxDSP Processor.pptx
DSP Processor.pptx
 
Dsp lab seminar
Dsp lab seminarDsp lab seminar
Dsp lab seminar
 
Dsp ajal
Dsp  ajalDsp  ajal
Dsp ajal
 
Computer organization
Computer organizationComputer organization
Computer organization
 
Introduction to debugging linux applications
Introduction to debugging linux applicationsIntroduction to debugging linux applications
Introduction to debugging linux applications
 
Lecture02 types
Lecture02 typesLecture02 types
Lecture02 types
 
Digital Signal processor ADSP 21XX family
Digital Signal processor ADSP 21XX familyDigital Signal processor ADSP 21XX family
Digital Signal processor ADSP 21XX family
 
Unit-1_Digital Computers, number systemCOA[1].pptx
Unit-1_Digital Computers, number systemCOA[1].pptxUnit-1_Digital Computers, number systemCOA[1].pptx
Unit-1_Digital Computers, number systemCOA[1].pptx
 
Chap2 - ADSP 21K Manual - Processor and Software Overview
Chap2 - ADSP 21K Manual - Processor and Software OverviewChap2 - ADSP 21K Manual - Processor and Software Overview
Chap2 - ADSP 21K Manual - Processor and Software Overview
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
8 bit Microprocessor with Single Vectored Interrupt
8 bit Microprocessor with Single Vectored Interrupt8 bit Microprocessor with Single Vectored Interrupt
8 bit Microprocessor with Single Vectored Interrupt
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
digital logic circuits, digital component
digital logic circuits, digital componentdigital logic circuits, digital component
digital logic circuits, digital component
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh P
 
B.sc cs-ii -u-1.2 digital logic circuits, digital component
B.sc cs-ii -u-1.2 digital logic circuits, digital componentB.sc cs-ii -u-1.2 digital logic circuits, digital component
B.sc cs-ii -u-1.2 digital logic circuits, digital component
 
Unit4.addressing modes 54 xx
Unit4.addressing modes 54 xxUnit4.addressing modes 54 xx
Unit4.addressing modes 54 xx
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

DSP architecture

  • 1. DSP Architectures Rensselaer at Hartford ECSE 6620 - Fall 2001 Lecture 16 Jason M. Stripinis jasonstripinis@engineer.com
  • 2. Basic Processor Structure • Here we see a very simple processor structure - such as might be found in a small 8-bit microprocessor. 12 DEC 01 ECSE 6620 - Jason Stripinis2(jasonstripinis@eng
  • 3. Basic Processor Functions • ALU – Arithmetic Logic Unit - this circuit takes two operands on the inputs (labeled A and B) and produces a result on the output (labeled Y). – The operations will usually include, as a minimum: • add, subtract • and, or, not • shift right, shift left • ALUs in more complex processors will execute many more instructions. 12 DEC 01 ECSE 6620 - Jason Stripinis3(jasonstripinis@eng
  • 4. Basic Processor Functions • Register File – A set of storage locations (registers) for storing temporary results. Early machines had just one register (accumulator). Modern RISC processors will have at least 32 registers. • Instruction Register – The instruction currently being executed by the processor is stored here. • Control Unit – The control unit decodes the instruction in the instruction register and sets signals which control the operation of most other units of the processor. For example, the operation code (opcode) in the instruction will be used to determine the settings of control signals for the ALU which determine which operation (+,-,^,v,~,shift,etc) it performs. 12 DEC 01 ECSE 6620 - Jason Stripinis4(jasonstripinis@eng
  • 5. Basic Processor Functions • Clock – The vast majority of processors are synchronous, that is, they use a clock signal to determine when to capture the next data word and perform an operation on it. In a globally synchronous processor, a common clock needs to be routed (connected) to every unit in the processor. • Program counter – The program counter holds the memory address of the next instruction to be executed. It is updated every instruction cycle to point to the next instruction in the program. Branch instructions change the program counter by other than a simple increment. 12 DEC 01 ECSE 6620 - Jason Stripinis5(jasonstripinis@eng
  • 6. Basic Processor Functions • Memory Address Register – This register is loaded with the address of the next data word to be fetched from or stored into main memory. • Address Bus – Transfers addresses to memory and memory-mapped peripherals. It is driven by the processor acting as a bus master. • Data Bus – Carries data to and from the processor, memory and peripherals. It will be driven by the data source, i.e. processor, memory, etc. • Multiplexed Bus – To limit device pin counts and bus complexity, some processors MUX address and data onto the same bus, with an adverse affect on performance. 12 DEC 01 ECSE 6620 - Jason Stripinis6(jasonstripinis@eng
  • 7. DSP Implementations • DSP Algorithm – Series of mathematical operations that are applied to process a sequence of digital signals sampled from the real (analog) world • Application examples – Filtering – FFT – Noise cancellation – Spectral Processing 12 DEC 01 ECSE 6620 - Jason Stripinis7(jasonstripinis@eng
  • 8. Why is special architecture good for digital signal processing? • DSPs are tailored to run DSP algorithms efficiently. • Special functions to handle DSP algorithm demands: – Unique data access patterns • Streams of data requiring high bandwidth • Low data repetition but high code repetition – Math operation focus (“number cruncher”) – Real-time constraints – Power and size constraints – Cost requirement – Attention to numeric effects (limited fixed point error) 12 DEC 01 ECSE 6620 - Jason Stripinis8(jasonstripinis@eng
  • 9. DSP Functional Characteristics • Typically require a few specific operations • Consider a FIR Filter : This requires: –additions & multiplications –delays –array handling 12 DEC 01 ECSE 6620 - Jason Stripinis9(jasonstripinis@eng
  • 10. DSP Typical Operations • Additions & Multiplications – fetch two operands – perform the addition or multiplication (or both) – store the result • Delays – store the result for later use • Array Handling – fetch values from consecutive memory locations – copy data from register to register 12 DEC 01 ECSE 6620 - Jason Stripinis10 (jasonstripinis@eng
  • 11. DSP Typical Operations • To perform these basic operations most DSPs: – have a parallel multiply and add – have multiple memory accesses (to fetch two operands and store the result) – have sufficient registers to hold data temporarily – efficient address generation for array handling – special features such as delays or circular addressing 12 DEC 01 ECSE 6620 - Jason Stripinis11 (jasonstripinis@eng
  • 12. DSP Arithmetic Logic Unit • Most DSP operations require additions and multiplications together. So DSP processors usually have parallel hardware adders and multipliers which can be used with a single instruction: 12 DEC 01 ECSE 6620 - Jason Stripinis12 (jasonstripinis@eng
  • 13. Register Structure • Delays require that intermediate values be held for later use. • For example, when keeping a running total - the total can be kept within the processor to avoid wasting repeated reads from and writes to memory. • For this reason DSP processors have lots of registers which can be used to hold intermediate values. • Registers may be fixed-point or floating-point. 12 DEC 01 ECSE 6620 - Jason Stripinis13 (jasonstripinis@eng
  • 14. Memory Addressing • Array handling requires that data can be fetched efficiently from consecutive memory locations. • For this reason DSP processors have address registers which are used to hold addresses and can be used to generate the next needed address efficiently. • Usually, the next needed address can be generated during the data fetch or store operation, and with no overhead. 12 DEC 01 ECSE 6620 - Jason Stripinis14 (jasonstripinis@eng
  • 15. Memory Addressing • Example DSP address generation operations: Instruction Name Description read the data pointed to by the address in *rP register indirect register rP having read the data, postincrement the address *rP++ postincrement pointer to point to the next value in the array having read the data, postdecrement the address *rP-- postdecrement pointer to point to the previous value in the array having read the data, postincrement the address *rP++rI register postincrement pointer by the amount held in register rI to point to rI values further down the array having read the data, postincrement the address *rP++rIr bit reversed pointer to point to the next value in the array, as if the address bits were in bit reversed order 12 DEC 01 ECSE 6620 - Jason Stripinis15 (jasonstripinis@eng
  • 16. Memory Architectures for DSP • For arithmetic the DSP needs to fetch two operands in a single instruction cycle. • Since we also need to store the result and to read the instruction itself more than two memory accesses per instruction cycle are needed. • Even the simplest DSP operation - an addition involving two operands and a store of the result to memory - requires four memory accesses (three to fetch the two operands and the instruction, plus a fourth to write the result) 12 DEC 01 ECSE 6620 - Jason Stripinis16 (jasonstripinis@eng
  • 17. Memory Architectures for DSP • DSP processors usually support multiple memory accesses in the same instruction cycle. • It is not possible to access two different memory addresses simultaneously over a single memory bus. • There are two common methods to achieve multiple memory accesses per instruction cycle: • Harvard architecture • modified von Neumann architecture 12 DEC 01 ECSE 6620 - Jason Stripinis17 (jasonstripinis@eng
  • 18. Memory Architectures for DSP (Harvard Architecture) • The Harvard architecture has two separate physical memory buses, allowing two simultaneous memory accesses. • The true Harvard architecture dedicates one bus for fetching instructions, with the other available to fetch operands. • This is inadequate for DSP operations, which usually involve at least two operands. So DSP Harvard architectures usually permit the 'program' bus to be used also for access of operands. 12 DEC 01 ECSE 6620 - Jason Stripinis18 (jasonstripinis@eng
  • 19. Memory Architectures for DSP (Harvard Architecture) • Note that it is often necessary to fetch three things - the instruction plus two operands - and the Harvard architecture is inadequate to support this. • So DSP Harvard architectures often also include a cache memory which can be used to store instructions which will be reused, leaving both Harvard buses free for fetching operands. • The Harvard architecture plus cache - is sometimes called an extended Harvard architecture or Super Harvard ARChitecture (SHARC). 12 DEC 01 ECSE 6620 - Jason Stripinis19 (jasonstripinis@eng
  • 20. Memory Architectures for DSP (Harvard Architecture) • The Harvard architecture requires two memory buses. This makes it expensive to bring off the chip - for example a DSP using 32 bit words and with a 32 bit address space requires at least 64 pins for each memory bus - a total of 128 pins if the Harvard architecture is brought off the chip. This results in very large chips, which are difficult to design into a circuit. 12 DEC 01 ECSE 6620 - Jason Stripinis20 (jasonstripinis@eng
  • 21. Memory Architectures for DSP (von Neumann Architecture) • The von Neumann architecture uses only a single memory bus. This is relatively cheap, requiring less pins that the Harvard architecture, and simple to use because the programmer can place instructions or data anywhere throughout the available memory. • But it does not permit multiple memory accesses. • The modified von Neumann architecture allows multiple memory accesses per instruction cycle by running the memory clock faster than the instruction cycle. 12 DEC 01 ECSE 6620 - Jason Stripinis21 (jasonstripinis@eng
  • 22. Memory Architectures for DSP (von Neumann Architecture) • Each instruction cycle is divided into multiple 'machine states' and a memory access can be made in each machine state, permitting a multiple memory accesses per instruction cycle. • The modified von Neumann architecture permits all the memory accesses needed to support addition or multiplication: fetch of the instruction; fetch of the two operands; and storage of the result. 12 DEC 01 ECSE 6620 - Jason Stripinis22 (jasonstripinis@eng
  • 23. Why use a special architecture for digital signal processing? The Answers Unique data access patterns Bit reversed addressing (FFT) Streams of data requiring high Multiple access memory bandwidth architecture Low data repetition but high Eliminate data cache (save $$) code repetition Math operation focus MAC instruction Vector processing unit Real-time constraints Zero-overhead loops Power and size constraints Limited addition function units (unlike GPP) Cost requirement On-board peripherals (SOC) Attention to numeric effects ALU with 16-bit operands and (limited fixed point error) 32-bit result 12 DEC 01 ECSE 6620 - Jason Stripinis23 (jasonstripinis@eng
  • 24. DSP Generations • 1st Generation (1979-1982) – Transition from experimental signal processors • 2nd Generation (1985-1986) – Move from co-processor to stand-alone processor • 3rd Generation (1987-1989) – Major hardware improvements to speed • 4th Generation (1990-1996) – More on-chip integration (ADC, DAC, memory, multi-processor) • 5th Generation (1997-) 12 DEC 01 ECSE 6620 - Jason Stripinis24 (jasonstripinis@eng
  • 25. DSP Generations 1st Generation (1979-1982) • Primarily targeted at digital filtering • Specialized co-processor for signal processing • NMOS (n-Channel Metal Oxide Semi) fabrication • 16-bit fixed point • fast multiplier (and adder) • Harvard architecture • Specialized Instruction set 12 DEC 01 ECSE 6620 - Jason Stripinis25 (jasonstripinis@eng
  • 26. DSP Generations 1st Generation (1979-1982) • Example = Texas Instruments TMS32010 – 16-bit fixed point – Harvard architecture – two Address registers – one A register (adder) – one P register (multiplier) – one T register (data shift on delay line) – No zero-overhead loop – Specialized Instruction set – MAC Time 400 ns (<100 ns today) – 50 ms per 1024-FFT 12 DEC 01 ECSE 6620 - Jason Stripinis26 (jasonstripinis@eng
  • 27. DSP Generations 1st Generation (1979-1982) • Example = Texas Instruments TMS32010 12 DEC 01 ECSE 6620 - Jason Stripinis27 (jasonstripinis@eng
  • 28. DSP Generations 2nd Generation (1985-1986) • Move from co-processor to stand-alone processor • CMOS (Complementary Metal Oxide Semi) fabrication • Double the speed of first generation • Advances in memory architecture (more internal RAM) • better pipelining of functional units • address generators (bit-reversing) • Zero-overhead loop HW • Limited floating point in SW 12 DEC 01 ECSE 6620 - Jason Stripinis28 (jasonstripinis@eng
  • 29. DSP Generations 2nd Generation (1985-1986) • Example = Texas Instruments TMS32020 (1985) – 16-bit fixed point – Harvard architecture – Improved TMS32010 – RPTS allows pipelined instruction performed in single cycle – Specialized Instruction set – MAC Time 200 ns – 10 ms per 1024-FFT 12 DEC 01 ECSE 6620 - Jason Stripinis29 (jasonstripinis@eng
  • 30. DSP Generations 3rd Generation (1987-1989) • Increased floating point support – 32-bit floating point hardware DSPs released – Floating point emulation on fixed point processors – IEEE754 support • Hardware enhancements (large speed increase) – dense CMOS fabrication – on chip DMA – instruction caches – increased clock rates (first cores above 10 MHz) • Increased complexity of SW 12 DEC 01 ECSE 6620 - Jason Stripinis30 (jasonstripinis@eng
  • 31. DSP Generations 3rd Generation (1987-1989) • Example = Motorola DSP56001 (1988) – 24-bit data, instructions – 24-bit fixed point – 3 memory spaces (P, X, Y) – parallel moves – circular addressing – MAC Time 75 ns (21 ns today) – ~3 ms per 1024-FFT • Other Examples: – AT&T DSP16A – Analog Devices ADSP-2100 – TI TMS320C50 12 DEC 01 ECSE 6620 - Jason Stripinis31 (jasonstripinis@eng
  • 32. DSP Generations 4th Generation (1990-1996) • Hardware integration – ADC – DAC – more memory – multiple DSPs on one chip • Decreasing power consumption – 5.0 VDC → 3.3 VDC → 3.0 VDC → 2.7 VDC • GPPs start to get DSP functions – SIMD – Leads to Intel introducing MMX (MultiMedia eXtensions) for x86 12 DEC 01 ECSE 6620 - Jason Stripinis32 (jasonstripinis@eng
  • 33. DSP Generations 4th Generation (1990-1996) • Example = TI TMS320C541 (1995) – Enhanced architecture – Low voltage (3.3 VDC) – More on-chip memory – Application specific functional units – MAC Time 20 ns (10 ns today) – ~1 ms per 1024-FFT • Example = TI TMS320C80 – multiple processors per chip 12 DEC 01 ECSE 6620 - Jason Stripinis33 (jasonstripinis@eng
  • 34. The GPP Option • High-performance general-purpose processors for PCs and workstations are increasingly suitable for some DSP applications. • E.g., Intel MMX Pentium, Motorola/IBM PowerPC 604e • These processors achieve excellent to outstanding floating and/or fixed-point DSP performance via: – Very high clock rates (200-500 MHz) – Superscalar architectures – Single-cycle multiplication and arithmetic operations – Good memory bandwidth – Branch prediction – In some cases, single-instruction, multiple-data (SIMD) ops 12 DEC 01 ECSE 6620 - Jason Stripinis34 (jasonstripinis@eng
  • 35. DSP Generations 5th Generation (1997-) • Not the classic DSP architectures – SIMD (Single Instruction Multiple Data stream) instructions – VLIW (Very Long Instruction Words) allows RISC processing • High parallelism • Increased clock speeds • No longer application specific functional units (no MAC FU) • Low voltage (2.5 VDC or less, even 1.2 VDC cores) • MAC Time 3 ns (but can be power hungry) • GPPs start to get DSP functions – Intel introduces MMX (MultiMedia eXtensions) for x86 in 1997 • Increased integration – MCU and DSP cores on same chip – MCU functions/ports added to DSPs 12 DEC 01 ECSE 6620 - Jason Stripinis35 (jasonstripinis@eng
  • 36. DSP Generations 5th Generation (1997-) • SIMD (Single Instruction Multiple Data) instructions – Enhance throughput by allowing parallelism – Requires multiple functional units and wider buses – May support multiple data widths (different functional groups) – Example = DSP16000 WAS SIMD 12 DEC 01 ECSE 6620 - Jason Stripinis36 (jasonstripinis@eng
  • 37. DSP Generations 5th Generation (1997-) • VLIW (Very Long Instruction Words) – Instruction Level Parallelism (ILP) can be a major performance gain • Superscalar implementation requires larger die and more power to dynamically pipeline instructions – VLIW can be used to statically pipeline instructions at compile time (or even by hand!) – VLIW instruction words have fixed "slots" for instructions that map to the functional units available. 12 DEC 01 ECSE 6620 - Jason Stripinis37 (jasonstripinis@eng
  • 38. DSP Generations 5th Generation (1997-) • VLIW Advantages – huge theoretical pay off • less than 1 ns per MAC! • Less than 75 ns per 1024-FFT • VLIW Drawbacks – Can be very difficult to program and debug – High power consumption if VLIW is not filled – Code size dramatically increases requiring more program memory 12 DEC 01 ECSE 6620 - Jason Stripinis38 (jasonstripinis@eng
  • 39. DSP Generations 5th Generation (1997-) • VLIW Example = TI TMS320C6201 32-bit Functional Units Lx = ALU Sx = Branching and shifting Mx = Multiplier Dx = Data Store 12 DEC 01 ECSE 6620 - Jason Stripinis39 (jasonstripinis@eng
  • 40. DSP Generational Development • DSP processor performance has increased by a factor of about 400x over the past 20 years 400 350 300 250 200 150 MAC (ns) 100 50 0 1st 2nd 3rd 4th 5th Gen Gen Gen Gen Gen • DSP architectures will be increasingly specialized for applications, especially communications applications • General-purpose processors will become viable for many DSP applications 12 DEC 01 ECSE 6620 - Jason Stripinis40 (jasonstripinis@eng