SlideShare uma empresa Scribd logo
1 de 95
MODULE – 2
Processor Design
1
Contents
■ Custom Single purpose Processor
– RT level Combinational Components
– RT level Sequential Components
– Custom Single Purpose Processor Design
– Optimizing custom single processors
– Optimizing original program, FSMD, datapath,
FSM
■ General Purpose Processors
– Basic Architecture
– Datapath
– Control unit
– Memory
– Pipelining
2
Contents (cont..)
■ Superscalar and VLIW Architectures
■ Application Specific Instruction Set Processors
(ASIPs)
– Microcontrollers
– DSP
– Less general ASIP environments
■ Selecting a Microprocessor/General purpose
processor
3
Introduction
■ Processor – Digital circuit to perform computation tasks
– Datapath
– Controller
■ General purpose processor
– Wide variety of computation tasks
■ Single purpose processor
– To carry out a particular computation task
– Common tasks
■ Custom single purpose processors
– Non-standard task
4
Introduction (cont..)
■ Why custom single purpose processor?
– Faster performance
■ Fewer clock cycles from customized datapath
■ Shorter clock cycles from simple functional units
– Smaller size
■ Simpler datapath
■ No program memory
– Less power consumption
■ More efficient computation
■ Drawbacks
– High NRE costs
– Time to market longer
– Flexibility reduced
5
Combinational Logic
■ Transistor – Basic electrical component in digital systems
■ Transistors  Logic Gates  Digital Systems
■ MOS transistor on silicon
– Acts as an on/off switch
– Voltage at “gate” controls whether current flows from source to
drain
6
source drain
oxide
gate
IC package IC
channel
Silicon substrate
gate
source
drain
Conducts
if gate=1
CMOS Transistor
Implementations■ Complementary Metal
Oxide Semiconductor
■ We refer to logic levels
– Typically 0 is 0V, 1 is 5V
■ nMOS conducts if gate=1
■ pMOS conducts if gate=0
■ Basic gates
7
x F = x'
1
Inverter
0
F = (xy)'
x
1
x
y
y
NAND gate
0
1
F = (x+y)'
x y
x
y
NOR gate
0
gate
source
drain
nMOS
Conducts
if gate=1
gate
source
drain
pMOS
Conducts
if gate=0
Basic Logic Gates
8
F = x y
AND
F = (x
y)’
NAND
F = x 
y
XOR
F = x
Driver
F = x’
Inverte
r
x F
F = x +
y
OR
F =
(x+y)’
NOR
x F
x
y
F
F
x
y
x
y
F
x
y
F
x
y
F
F =x y
XNOR
Fy
x
x
0
y
0
F
0
0 1 0
1 0 0
1 1 1
x
0
y
0
F
0
0 1 1
1 0 1
1 1 1
x
0
y
0
F
0
0 1 1
1 0 1
1 1 0
x
0
y
0
F
1
0 1 0
1 0 0
1 1 1
x
0
y
0
F
1
0 1 1
1 0 1
1 1 0
x
0
y
0
F
1
0 1 0
1 0 0
1 1 0
x F
0 0
1 1
x F
0 1
1 0
Combinational Logic Design
■ Combinational circuit
– Digital Circuit whose output is a function of
current inputs
– No memory of past inputs
■ Steps in designing a Combinational Logic Circuit
1. Problem Definition
2. Truth Table
3. Output Equations
4. Minimized Expressions
5. Logic Circuit
9
Combinational Logic Design
1. Problem Description
y is 1 if a is equal to 1, or b and c are 1.
z is 1 if b or c is equal to 1, but not both, or if all
are 1.
10
Combinational Logic Design
(cont..)
2. Truth Table
11
a b c y z
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
1 1 0 1 1
1 1 1 1 1
Combinational Logic Design
(cont..)3. Output Equations
y= a’bc + abc’ + ab’c + abc’ + abc
z= a’b’c + a’bc’ + ab’c + abc’ + abc
4. Minimized Expressions
y= a + bc
z= ab + b’c +bc’
12
Combinational Logic Design (cont..)
13
a
b
c
y
z
Combinational Logic Design
(cont..)■ Large circuits complex to design using logic gates
■ Eg- 16 inputs
– 216=64K rows in truth table
■ Reduce complexity by components that are abstract
than logic gates
14
Combinational Components
15
Sequential Logic Design
■ Sequential Circuit
– Output is a function of current as well as previous
input values
– Has memory
■ Basic sequential circuit – FLIP FLOP
– Stores a single bit
16
17
State Tables
Excitation Tables
Sequential Components
18
Sequential Logic Design (cont..)
■ Control Inputs
– Synchronous
– Asynchronous
■ Clear control lines are asynchronous
19
Sequential Logic Design
A) Problem Description
You want to construct a clock divider. Output a 1 for every
four clock cycles
20
B) State Diagram
21
c) Implementation Model
22
d) State Table
23
e) Minimized Expressions
24
f) Combinational Logic
25
Custom Single-purpose Processor
Basic Model
26
controller and datapath
controller datapath




external
control
inputs
external
control
outputs


external
data
inputs


external
data
outputs
datapath
control
inputs
datapath
control
outputs

 

a view inside the controller and datapath
controller datapath

 

state
register
next-state
and
control
logic
registers
functional
units
State Diagram Templates
27
Assignment statement
a = b
next statement
a = b
next
statement
Loop statement
while (cond) {
loop-body-
statements
}
next statement
loop-body-
statements
cond
next
statement
!cond
J:
C:
Branch statement
if (c1)
c1 stmts
else if c2
c2 stmts
else
other stmts
next statement
c1
c2 stmts
!c1*c2 !c1*!c2
next
statement
othersc1 stmts
J:
C:
Example: Greatest Common
Divisor■ First create algorithm
■ Convert algorithm to “complex” state machine
– Known as FSMD: finite-state machine with datapath
– Can use templates to perform such conversion
28
GCD
(a) Black-Box View
x_i y_i
d_o
go_i
b) Desired Functionality
29
0: int x, y;
1: while (1) {
2: while (!go_i);
3: x = x_i;
4: y = y_i;
5: while (x != y) {
6: if (x < y)
7: y = y - x;
else
8: x = x - y;
}
9: d_o = x;
}
c) State Diagram
30
y = y -x7: x = x - y8:
6-J:
x!=y
5:
!(x!=y)
x<y !(x<y)
6:
5-J:
1:
1
!1
x = x_i3:
y = y_i4:
2:
2-J:
!go_i
!(!go_i)
d_o = x
1-J:
9:
31
Creating the Datapath
■ Create a register for any
declared variable
■ Create a functional unit
for each arithmetic
operation
■ Connect the ports,
registers and functional
units
– Based on reads and
writes
– Use multiplexors for
multiple sources
■ Create unique identifier
– for each control input
and output of datapath
components
32
Creating the Controller
■ Stage 3 x_sel=0; x_ld=1;
– for loading ‘x’
■ Stage 4  y_sel=0; y_ld=1;
– For loading ‘y’
■ Stage 7  y_sel=1; y_ld=1;
– For loading the subtracted result y-x
■ Stage 8  x_sel=1; x_ld=1;
– For loading the subtracted result x-y
■ Stage 9 d_ld=1
– Load the output register
33
Controller Implementation Model
■ Inputs
– go_i Enable
– Q3-Q0 Output from
state register
– x_neq_y
– X_lt_y
■ Outputs
– x_sel, y_sel
– x_ld, y_ld
– d_ld
– I3 - I0
34
35
Completing the GCD Custom Single-
Purpose Processor Design
36

 

a view inside the controller and datapath
controller datapath

 

state
register
next-state
and
control
logic
registers
functional
units
■ We finished the datapath
■ We have a state table for the
next state and control logic
■ Truth table for the
combinational logic
■ This is not an optimized
design.
Optimizing Single-Purpose
Processors■ Optimization is the task of making design metric values the
best possible
■ GCD eg- If numbers are large, it will take more steps
– Speed decreases
■ Optimization opportunities
– Original Program
– FSMD
– Datapath
– FSM
37
Optimizing the Original Program
■ Analyze program attributes and look for areas of possible
improvement
– Number of computations
– Size of variable
– Time and space complexity
– Operations used
■ Multiplication and division very expensive
38
Optimizing the Original Program
(Cont..)
39
0: int x, y;
1: while (1) {
2: while (!go_i);
3: x = x_i;
4: y = y_i;
5: while (x != y) {
6: if (x < y)
7: y = y - x;
else
8: x = x - y;
}
9: d_o = x;
}
0: int x, y, r;
1: while (1) {
2: while (!go_i);
// x must be the larger number
3: if (x_i >= y_i) {
4: x=x_i;
5: y=y_i;
}
6: else {
7: x=y_i;
8: y=x_i;
}
9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
}
Original Program
Optimized Program
replace the subtraction
operation(s) with
modulo operation in
order to speed up
program
GCD(42, 8)
‱ 9 iterations to complete the loop
‱ x and y values evaluated as follows : (42, 8),
(34, 8), (26,8), (18,8), (10, 8), (2,8), (2,6),
(2,4), (2,2).
GCD(42,8)
‱ 3 iterations to complete the loop
‱ x and y values evaluated as follows: (42, 8),
(8,2), (2,0)
Optimizing the FSMD
■ Areas of possible improvements
– Merge states
■ States with constants on transitions can be eliminated, transition
taken is already known
■ States with independent operations can be merged
– Separate states
■ States which require complex operations (a*b*c*d) can be broken
into smaller states to reduce hardware size
– Scheduling
■ Task of assigning operations from the original program to
states in an FSMD
40
Optimizing the FSMD
41
Original FSMD Optimized FSMD
‱ Eliminate state 1 – transitions have constant
values
‱ Merge state 2 and state 2J – no loop operation in
between them
‱ Merge state 3 and state 4 – assignment
operations are independent of one another
‱ Merge state 5 and state 6 – transitions from state
6 can be done in state 5
‱ Eliminate state 5J and 6J – transitions from each
state can be done from state 7 and state 8,
respectively
‱ Eliminate state 1-J – transition from state 1-J can
be done directly from state 9
Optimizing the FSMD (cont..)
■ Consider a = b * c * d * e
■ Generating a single state for the operation requires 3 multipliers
in the datapath.
■ Multipliers are expensive
■ Break down the operation down into smaller operations
– t1 = b * c
– t2 = d * e
– a = t1 * t2
■ Each smaller operation has its own state
■ Only 1 multiplier is required in the datapath
42
Optimizing the FSMD (cont..)
■ Timing of output operations could be changed while the FSMD
is optimized
■ Reduced FSMD will generate GCD output in fewer clock cycles
■ Changing the timing would not be acceptable in all cases.
Eg- Clock divider
■ Thus, when optimizing FSMD, a designer must be aware of
whether output timing may or may not be modified.
43
Optimizing the Datapath
■ Sharing of functional units
– One-to-one mapping, as done previously, is not necessary
– If same operation occurs in different states, they can share
a single functional unit
■ Multi-functional units
– ALUs support a variety of operations, it can be shared
among operations occurring in different states
44
Optimizing the FSM
■ State Encoding
– Task of assigning a unique bit pattern to each state in an
FSM
– Size of state register and combinational logic vary
– Eg- FSM with n states – n! possible encoding ways
– Can be treated as an ordering problem
– More encodings are possible – Can use more than log2n
bits to encode ‘n’ states
– CAD tools – great aid in searching for the best encoding
■ State Minimization
– Task of merging equivalent states into a single state
■ State equivalent if for all possible input combinations the two
states generate the same outputs and transitions to the next same
state
45
■ Converting a sequential program into custom single purpose
processor
– Convert the program into FSMD
– Splitting FSMD into a simple FSM controlling datapath
– Performing sequential logic design on the FSM
■ In many cases, we prefer not to start with a program – but
instead directly with a FSMD
– Cycle by cycle timing of a system is central to the design
– Programming language don’t typically support cycle by
cycle description
46
RT-level Custom
Single-Purpose Processor Design
RT-level Custom
Single-Purpose Processor Design
■ Example
– Device to send an 8-bit number to another device (the
receiver)
– Receiver can receive all 8 bits at once
– Sender sends 4 bits at a time – First lower order 4 bits and
then the higher order 4 bits
■ Bridge should be designed that will enable the 2 devices to
communicate
47
RT-level Custom
Single-Purpose Processor Design
48
49
RT-level custom single-purpose processor
design (cont
)
General Purpose Processors -
Software
50
Introduction
■ General-Purpose Processor
– Processor designed for a variety of computation tasks
– Low unit cost, in part because manufacturer spreads NRE
over large numbers of units
■ Motorola sold half a billion 68HC05 microcontrollers in 1996 alone
– Carefully designed since higher NRE is acceptable
■ Can yield good performance, size and power
– Low NRE cost, short time-to-market/prototype, high
flexibility
■ User just writes software; no processor design
– Also known as “microprocessor” – “micro” used when they
were implemented on one or a few chips rather than entire
rooms
51
Basic Architecture
52
■ Control unit and
datapath
– Note similarity to
single-purpose
processor
■ Key differences
– Datapath is general
– Control unit doesn’t
store the algorithm
– the algorithm is
“programmed” into
the memory
Datapath Operations
53
‱ Load
‱ Read memory
location into
register
‱ ALU operation
– Input certain
registers through
ALU, store back in
register
‱ Store
– Write register to
memory location
Control Unit
■ Control unit: configures the
datapath operations
– Sequence of desired
operations
(“instructions”) stored in
memory – “program”
■ Instruction cycle – broken
into several sub-operations,
each one clock cycle, e.g.:
– Fetch: Get next
instruction into IR
– Decode: Determine what
the instruction means
– Fetch operands: Move
data from memory to
datapath register
– Execute: Move data
through the ALU
– Store results: Write data
from register to memory 54
Control Unit Sub-Operations
■ Fetch
– Get next
instruction
into IR
– PC: program
counter,
always points
to next
instruction
– IR: holds the
fetched
instruction
55
Control Unit Sub-Operations
■ Decode
– Determine
what the
instruction
means
56
Control Unit Sub-Operations
■ Fetch operands
– Move data
from memory
to datapath
register
57
Control Unit Sub-Operations
■ Execute
– Move data
through the
ALU
– This particular
instruction
does nothing
during this
sub-operation
58
Control Unit Sub-Operations
■ Store results
– Write data
from register
to memory
– This particular
instruction
does nothing
during this
sub-operation
59
Instruction Cycles
60
Instruction Cycles
61
Instruction Cycles
62
Architectural Considerations
■ N-bit processor
– N-bit ALU,
registers, buses,
memory data
interface
– Embedded: 8-bit,
16-bit, 32-bit
common
– Desktop/servers:
32-bit, even 64
■ PC size determines
address space
63
Architectural Considerations
■ Clock frequency
– Inverse of clock
period
– Must be longer
than longest
register to
register delay in
entire processor
– Memory access is
often the longest
64
Pipelining: Increasing Instruction Throughput
65
66
Two Memory Architectures
■ Princeton
– Fewer memory
wires
■ Harvard
– Simultaneous
program and
data memory
access
Processor
Program
memory
Data
memory
Processor
Memory
(program and data)
Harvard Princeton
Cache Memory
■ Memory access may be slow
■ Cache is small but fast memory
close to processor
– Holds copy of part of
memory
– Hits and misses
67
Processor
Memory
Cache
Fast/expensive technology, usually
on the same chip
Slower/cheaper technology, usually
on a different chip
Superscalar and VLIW
Architectures■ Performance can be improved by:
– Faster clock (but there’s a limit)
– Pipelining: slice up instruction into stages, overlap stages
– Multiple ALUs to support more than one instruction
stream
■ Superscalar
– Scalar: non-vector operations
– Fetches instructions in batches, executes as many as possible
■ May require extensive hardware to detect independent
instructions
– VLIW: each word in memory has multiple independent
instructions
■ Relies on the compiler to detect and schedule instructions
■ Currently growing in popularity
68
Programmer’s View
■ Programmer doesn’t need detailed understanding of
architecture
– Instead, needs to know what instructions can be executed
■ Two levels of instructions:
– Assembly level
– Structured languages (C, C++, Java, etc.)
■ Most development today done using structured languages
– But, some assembly level programming may still be necessary
– Drivers: portion of program that communicates with and/or controls
(drives) another device
■ Often have detailed timing considerations, extensive bit manipulation
■ Assembly level may be best for these
69
Assembly-Level Instructions
■ Instruction Set
– Defines the legal set of instructions for that processor
■ Data transfer: memory/register, register/register, I/O, etc.
■ Arithmetic/logical: move register through ALU and back
■ Branches: determine next PC value when not just PC+1
70
opcode operand1 operand2
opcode operand1 operand2
opcode operand1 operand2
opcode operand1 operand2
...
Instruction 1
Instruction 2
Instruction 3
Instruction 4
A Simple (Trivial) Instruction Set
71
Addressing Modes
72
Sample Programs
73
int total = 0;
for (int i=10; i!=0; i--)
total += i;
// next instructions...
C program
MOV R0, #0; // total = 0
MOV R1, #10; // i = 10
JZ R1, Next; // Done if i=0
ADD R0, R1; // total += i
MOV R2, #1; // constant 1
JZ R3, Loop; // Jump always
Loop:
Next: // next instructions...
SUB R1, R2; // i--
Equivalent assembly program
MOV R3, #0; // constant 0
0
1
2
3
5
6
7
Programmer Considerations
74
■ Program and data memory space
– Embedded processors often very limited
■ e.g., 64 Kbytes program, 256 bytes of RAM (expandable)
■ Registers: How many are there?
– Only a direct concern for assembly-level programmers
■ I/O
– How communicate with external signals?
■ Interrupts
Operating System
75
■ Optional software layer providing low-level services to a
program (application).
– File management, disk access
– Keyboard/display interfacing
– Scheduling multiple programs for execution
■ Or even just multiple threads from one program
– Program makes system calls to the OS
Development Environment
76
■ Development processor
– The processor on which we write and debug our programs
■ Usually a PC
■ Target processor
– The processor that the program will run on in our
embedded system
■ Often different from the development processor
Software Development Process
77
■ Compilers
– Cross compiler
■ Runs on one
processor, but
generates code
for another
■ Assemblers
■ Linkers
■ Debuggers
■ Profilers
Running a Program
■ If development processor is different than target,
how can we run our compiled code? Two options:
– Download to target processor
– Simulate
■ Simulation
– One method: Hardware description language
■ But slow, not always available
– Another method: Instruction set simulator (ISS)
■ Runs on development processor, but executes
instructions of target processor
78
Testing and Debugging
79
■ ISS
– Gives us control over
time – set breakpoints,
look at register values, set
values, step-by-step
execution, ...
– But, doesn’t interact with
real environment
■ Download to board
– Use device programmer
– Runs in real environment,
but not controllable
■ Compromise: Emulator
– Runs in real environment
– Supports some
controllability from the
PC
Application-Specific
Instruction-Set Processors (ASIPs)
80
■ General-Purpose Processors
– Sometimes too general to be effective in demanding
application
■ e.g., video processing – requires huge video buffers
and operations on large arrays of data, inefficient on a
GPP
– But single-purpose processor has high NRE, not
programmable
■ ASIP’s – targeted to a particular domain
– Contain architectural features specific to that domain
■ e.g., embedded control, digital signal processing, video
processing, network processing, telecommunications,
etc.
– Still programmable
A Common ASIP: Microcontroller
81
■ For embedded control applications
– Reading sensors, setting actuators
– Mostly dealing with events (bits): data is present, but not in huge
amounts
– e.g., VCR, disk drive, digital camera (assuming SPP for image
compression), washing machine, microwave oven
■ Microcontroller features
– On-chip peripherals
■ Timers, analog-digital converters, serial communication, etc.
■ Tightly integrated for programmer, typically part of register space
– On-chip program and data memory
– Direct programmer access to many of the chip’s pins
– Specialized instructions for bit-manipulation and other low-level
operations
■ Incorporating peripherals and memory onto the same IC – reduces the no.
of required IC’s  Compact and low power implementations
Another Common ASIP: Digital Signal Processors
(DSP)■ For signal processing applications
– Large amounts of digitized data, often streaming
■ Source – photo captured by a digital camera, a voice packet through
a network router
– Data transformations must be applied fast
– e.g., cell-phone voice filter, digital TV, music synthesizer
■ DSP features
– Several instruction execution units – Filtering, Transforming
vectors or metrics of data
– Multiple-accumulate single-cycle instruction, other instructions.
– Efficient vector operations – e.g., add two arrays
■ Vector ALUs, loop buffers, etc.
– Contains number of ADC, DAC, PWM, timers, counters etc.
– Commonly used DSP’s are well supported in terms of
compiler and other development tools  Easy and cheap to
integrate into most embedded systems. 82
Less General ASIP Environments
■ ASIP’s that are less general in nature
■ Designed to perform very domain specific processing while
allowing some degree of programmability.
■ ASIP’s designed for networking hardware May be designed to
be programmable with different network routing algorithms,
checksum, and packet processing protocols
83
Trend: Even More Customized
ASIPs
84
■ In the past, microprocessors were acquired as chips
■ Today, we increasingly acquire a processor as Intellectual
Property (IP)
– e.g., synthesizable VHDL model
■ Opportunity to add a custom datapath hardware and a few
custom instructions, or delete a few instructions
– Can have significant performance, power and size impacts
– Problem: need compiler/debugger for customized ASIP
■ Remember, most development uses structured languages
■ One solution: Automatic compiler/debugger generation
– e.g., www.tensillica.com
■ Another solution: Re-targetable compilers
– e.g., www.improvsys.com (customized VLIW architectures)
Selecting a Microprocessor
85
■ Issues
– Technical: speed, power, size, cost
– Other: development environment, prior expertise, licensing,
etc.
■ Speed: how evaluate a processor’s speed?
– Clock speed – but instructions per cycle may differ
– Instructions per second – but work per instruction may differ
– Dhrystone: Synthetic benchmark, developed in 1984.
Dhrystones/sec.
■ MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780).
A.k.a. Dhrystone MIPS. Commonly used today.
– So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second
– SPEC: set of more realistic benchmarks, but oriented to desktops
– EEMBC – EDN Embedded Benchmark Consortium, www.eembc.org
■ Suites of benchmarks: automotive, consumer electronics,
networking, office automation, telecommunications
General Purpose Processors
86
Processor Clock speed Periph. Bus Width MIPS Power Trans. Price
General Purpose Processors
Intel PIII 1GHz 2x16 K
L1, 256K
L2, MMX
32 ~900 97W ~7M $900
IBM
PowerPC
750X
550 MHz 2x32 K
L1, 256K
L2
32/64 ~1300 5W ~7M $900
MIPS
R5000
250 MHz 2x32 K
2 way set assoc.
32/64 NA NA 3.6M NA
StrongARM
SA-110
233 MHz None 32 268 1W 2.1M NA
Microcontroller
Intel
8051
12 MHz 4K ROM, 128 RAM,
32 I/O, Timer, UART
8 ~1 ~0.2W ~10K $7
Motorola
68HC811
3 MHz 4K ROM, 192 RAM,
32 I/O, Timer, WDT,
SPI
8 ~.5 ~0.1W ~10K $5
Digital Signal Processors
TI C5416 160 MHz 128K, SRAM, 3 T1
Ports, DMA, 13
ADC, 9 DAC
16/32 ~600 NA NA $34
Lucent
DSP32C
80 MHz 16K Inst., 2K Data,
Serial Ports, DMA
32 40 NA NA $75
Chapter Summary
87
■ General-purpose processors
– Good performance, low NRE, flexible
■ Controller, datapath, and memory
■ Structured languages prevail
– But some assembly level programming still necessary
■ Many tools available
– Including instruction-set simulators, and in-circuit emulators
■ ASIPs
– Microcontrollers, DSPs, network processors, more customized ASIPs
■ Choosing among processors is an important step
■ Designing a general-purpose processor is conceptually the same
as designing a single-purpose processor
Problems
88
1. An algorithm for matrix multiplication, assuming that we have one adder and
one multiplier, follows:
a. Convert the matrix multiplication algorithm into a state diagram.
b. Rewrite the matrix multiplication algorithm given the assumption that we have
3 adders and 6 multipliers.
c. If each multiplication takes 2 cycles to compute and each addition takes one
cycle to compute, how many cycles does it take to complete the matrix
multiplication given one adder and one multiplier? Three adders and six
multipliers?
d. If each adder requires 10 transistors to implement and each multiplier requires
100 transistors to implement, what is the total number of transistors to
implement the matrix multiplication circuit using 1 adder and 1 multiplier? Three
adders and six multipliers?
89
main()
{
int A[3][2]={ {1, 2}, {3,4}, {5,6}};
int B[2][3]= {{7, 8, 9}, (10, 11, 12}};
int C[3][3], i, j, k;
for(i=0; i<3; i++) {
for(j=0; j<3; j++) {
c[i][j]=0;
for(k=0;k<2;k++){
c[i][j]+=A[i][k]*B[k][j];
}
}
}
}
90
91
■ Cycles to complete matrix multiplication
– 1 adder + 1 multiplier = 54 cycles
– 3 adders + 6 multipliers = 9 cycles
■ Number of transistors
– 1 adder + 1 multiplier = 110 transistors
– 3 adders + 6 multipliers = 630 transistors
92
2. Design a single-purpose processor that outputs Fibonacci
numbers up to n places. Start with a function computing the
desired result, translate it into a state diagram, and sketch a
probable datapath.
93
94
95
c_ld c_sel x2_ld x2_sel
count_lt_ncount_ne_01
0

Mais conteĂșdo relacionado

Mais procurados

ESP32 WiFi & Bluetooth Module - Getting Started Guide
ESP32 WiFi & Bluetooth Module - Getting Started GuideESP32 WiFi & Bluetooth Module - Getting Started Guide
ESP32 WiFi & Bluetooth Module - Getting Started Guidehandson28
 
Logic synthesis using Verilog HDL
Logic synthesis using Verilog HDLLogic synthesis using Verilog HDL
Logic synthesis using Verilog HDLanand hd
 
Memory interface
Memory interfaceMemory interface
Memory interfaceDr. Girish GS
 
Xilinx 4000 series
Xilinx 4000 seriesXilinx 4000 series
Xilinx 4000 seriesdragonpradeep
 
Keil tutorial
Keil tutorialKeil tutorial
Keil tutorialanishgoel
 
Computer languages
Computer languagesComputer languages
Computer languagesABHINAV SINGH
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly LanguageMotaz Saad
 
Module 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and ProgrammingModule 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and ProgrammingAmogha Bandrikalli
 
Programming paradigm
Programming paradigmProgramming paradigm
Programming paradigmbusyking03
 
Vlsi design flow
Vlsi design flowVlsi design flow
Vlsi design flowRajendra Kumar
 
RISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingRISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingTushar Swami
 
Presentation On Logic Gate
Presentation On Logic Gate Presentation On Logic Gate
Presentation On Logic Gate Nazrul Islam
 
Fpga architectures and applications
Fpga architectures and applicationsFpga architectures and applications
Fpga architectures and applicationsSudhanshu Janwadkar
 
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its ApplicationsComplex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its Applicationselprocus
 
Field-programmable gate array
Field-programmable gate arrayField-programmable gate array
Field-programmable gate arrayPrinceArjun1999
 

Mais procurados (20)

Parity generator & checker
Parity generator & checkerParity generator & checker
Parity generator & checker
 
PIC Microcontroller | ADC Interfacing
PIC Microcontroller | ADC InterfacingPIC Microcontroller | ADC Interfacing
PIC Microcontroller | ADC Interfacing
 
ESP32 WiFi & Bluetooth Module - Getting Started Guide
ESP32 WiFi & Bluetooth Module - Getting Started GuideESP32 WiFi & Bluetooth Module - Getting Started Guide
ESP32 WiFi & Bluetooth Module - Getting Started Guide
 
Logic synthesis using Verilog HDL
Logic synthesis using Verilog HDLLogic synthesis using Verilog HDL
Logic synthesis using Verilog HDL
 
Memory interface
Memory interfaceMemory interface
Memory interface
 
Xilinx 4000 series
Xilinx 4000 seriesXilinx 4000 series
Xilinx 4000 series
 
Keil tutorial
Keil tutorialKeil tutorial
Keil tutorial
 
Computer languages
Computer languagesComputer languages
Computer languages
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly Language
 
Machine language
Machine languageMachine language
Machine language
 
Module 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and ProgrammingModule 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and Programming
 
Programming paradigm
Programming paradigmProgramming paradigm
Programming paradigm
 
Vlsi design flow
Vlsi design flowVlsi design flow
Vlsi design flow
 
Introduction to Embedded System
Introduction to Embedded SystemIntroduction to Embedded System
Introduction to Embedded System
 
RISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingRISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set Computing
 
Presentation On Logic Gate
Presentation On Logic Gate Presentation On Logic Gate
Presentation On Logic Gate
 
13 Boolean Algebra
13 Boolean Algebra13 Boolean Algebra
13 Boolean Algebra
 
Fpga architectures and applications
Fpga architectures and applicationsFpga architectures and applications
Fpga architectures and applications
 
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its ApplicationsComplex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
 
Field-programmable gate array
Field-programmable gate arrayField-programmable gate array
Field-programmable gate array
 

Semelhante a PROCESSOR DESIGN MODULE - CUSTOMIZING SINGLE PURPOSE PROCESSORS

Unit 2 - Single Purpose Processors
Unit 2 - Single Purpose ProcessorsUnit 2 - Single Purpose Processors
Unit 2 - Single Purpose ProcessorsButtaRajasekhar2
 
pandu-vivek (1)
pandu-vivek (1)pandu-vivek (1)
pandu-vivek (1)Vivek Shukla
 
Embedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaEmbedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaJohanAspro
 
COA Chapter 1.pdf
COA Chapter 1.pdfCOA Chapter 1.pdf
COA Chapter 1.pdfAbelAteme
 
eel6935_ch2.pdf
eel6935_ch2.pdfeel6935_ch2.pdf
eel6935_ch2.pdfSambasiva62
 
Sequential Logic Circuits
Sequential Logic CircuitsSequential Logic Circuits
Sequential Logic CircuitsDilum Bandara
 
20ME702– MECHATRONICS -UNIT-2.pptx
20ME702– MECHATRONICS -UNIT-2.pptx20ME702– MECHATRONICS -UNIT-2.pptx
20ME702– MECHATRONICS -UNIT-2.pptxMohanumar S
 
UNIT II –8085 MICROPROCESSOR AND 8051 MICROCONTROLLER---ME6702– MECHATRONICS
UNIT II –8085 MICROPROCESSOR AND 8051 MICROCONTROLLER---ME6702– MECHATRONICS UNIT II –8085 MICROPROCESSOR AND 8051 MICROCONTROLLER---ME6702– MECHATRONICS
UNIT II –8085 MICROPROCESSOR AND 8051 MICROCONTROLLER---ME6702– MECHATRONICS Mohanumar S
 
8085 MICROPROCESSOR.pptx
8085 MICROPROCESSOR.pptx8085 MICROPROCESSOR.pptx
8085 MICROPROCESSOR.pptxkarthik R
 
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander NasonovMultiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonoveurobsdcon
 
basic computer programming and micro programmed control
basic computer programming and micro programmed controlbasic computer programming and micro programmed control
basic computer programming and micro programmed controlRai University
 
Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Jonathan Salwan
 
Data Acquisition
Data AcquisitionData Acquisition
Data Acquisitionazhar557
 
MICROPROCESSORS AND MICROCONTROLLERS
MICROPROCESSORS AND MICROCONTROLLERSMICROPROCESSORS AND MICROCONTROLLERS
MICROPROCESSORS AND MICROCONTROLLERSselvakumar948
 
UNIT II MICROPROCESSOR AND MICROCONTROLLER
UNIT II MICROPROCESSOR AND MICROCONTROLLER UNIT II MICROPROCESSOR AND MICROCONTROLLER
UNIT II MICROPROCESSOR AND MICROCONTROLLER ravis205084
 
Practical Implementation of AAD by Antoine Savine, Brian Huge and Hans-Jorgen...
Practical Implementation of AAD by Antoine Savine, Brian Huge and Hans-Jorgen...Practical Implementation of AAD by Antoine Savine, Brian Huge and Hans-Jorgen...
Practical Implementation of AAD by Antoine Savine, Brian Huge and Hans-Jorgen...Antoine Savine
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoJava Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoValeriia Maliarenko
 
FPGA_Logic.pdf
FPGA_Logic.pdfFPGA_Logic.pdf
FPGA_Logic.pdfwafawafa52
 
Joel Falcou, Boost.SIMD
Joel Falcou, Boost.SIMDJoel Falcou, Boost.SIMD
Joel Falcou, Boost.SIMDSergey Platonov
 

Semelhante a PROCESSOR DESIGN MODULE - CUSTOMIZING SINGLE PURPOSE PROCESSORS (20)

Unit 2 - Single Purpose Processors
Unit 2 - Single Purpose ProcessorsUnit 2 - Single Purpose Processors
Unit 2 - Single Purpose Processors
 
pandu-vivek (1)
pandu-vivek (1)pandu-vivek (1)
pandu-vivek (1)
 
Embedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaEmbedded system Design introduction _ Karakola
Embedded system Design introduction _ Karakola
 
COA Chapter 1.pdf
COA Chapter 1.pdfCOA Chapter 1.pdf
COA Chapter 1.pdf
 
eel6935_ch2.pdf
eel6935_ch2.pdfeel6935_ch2.pdf
eel6935_ch2.pdf
 
Sequential Logic Circuits
Sequential Logic CircuitsSequential Logic Circuits
Sequential Logic Circuits
 
Clock Sources on ZedBoard.pdf
Clock Sources on ZedBoard.pdfClock Sources on ZedBoard.pdf
Clock Sources on ZedBoard.pdf
 
20ME702– MECHATRONICS -UNIT-2.pptx
20ME702– MECHATRONICS -UNIT-2.pptx20ME702– MECHATRONICS -UNIT-2.pptx
20ME702– MECHATRONICS -UNIT-2.pptx
 
UNIT II –8085 MICROPROCESSOR AND 8051 MICROCONTROLLER---ME6702– MECHATRONICS
UNIT II –8085 MICROPROCESSOR AND 8051 MICROCONTROLLER---ME6702– MECHATRONICS UNIT II –8085 MICROPROCESSOR AND 8051 MICROCONTROLLER---ME6702– MECHATRONICS
UNIT II –8085 MICROPROCESSOR AND 8051 MICROCONTROLLER---ME6702– MECHATRONICS
 
8085 MICROPROCESSOR.pptx
8085 MICROPROCESSOR.pptx8085 MICROPROCESSOR.pptx
8085 MICROPROCESSOR.pptx
 
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander NasonovMultiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
 
basic computer programming and micro programmed control
basic computer programming and micro programmed controlbasic computer programming and micro programmed control
basic computer programming and micro programmed control
 
Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach
 
Data Acquisition
Data AcquisitionData Acquisition
Data Acquisition
 
MICROPROCESSORS AND MICROCONTROLLERS
MICROPROCESSORS AND MICROCONTROLLERSMICROPROCESSORS AND MICROCONTROLLERS
MICROPROCESSORS AND MICROCONTROLLERS
 
UNIT II MICROPROCESSOR AND MICROCONTROLLER
UNIT II MICROPROCESSOR AND MICROCONTROLLER UNIT II MICROPROCESSOR AND MICROCONTROLLER
UNIT II MICROPROCESSOR AND MICROCONTROLLER
 
Practical Implementation of AAD by Antoine Savine, Brian Huge and Hans-Jorgen...
Practical Implementation of AAD by Antoine Savine, Brian Huge and Hans-Jorgen...Practical Implementation of AAD by Antoine Savine, Brian Huge and Hans-Jorgen...
Practical Implementation of AAD by Antoine Savine, Brian Huge and Hans-Jorgen...
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoJava Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey Kovalenko
 
FPGA_Logic.pdf
FPGA_Logic.pdfFPGA_Logic.pdf
FPGA_Logic.pdf
 
Joel Falcou, Boost.SIMD
Joel Falcou, Boost.SIMDJoel Falcou, Boost.SIMD
Joel Falcou, Boost.SIMD
 

Último

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 

Último (20)

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACÂź CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 

PROCESSOR DESIGN MODULE - CUSTOMIZING SINGLE PURPOSE PROCESSORS

  • 2. Contents ■ Custom Single purpose Processor – RT level Combinational Components – RT level Sequential Components – Custom Single Purpose Processor Design – Optimizing custom single processors – Optimizing original program, FSMD, datapath, FSM ■ General Purpose Processors – Basic Architecture – Datapath – Control unit – Memory – Pipelining 2
  • 3. Contents (cont..) ■ Superscalar and VLIW Architectures ■ Application Specific Instruction Set Processors (ASIPs) – Microcontrollers – DSP – Less general ASIP environments ■ Selecting a Microprocessor/General purpose processor 3
  • 4. Introduction ■ Processor – Digital circuit to perform computation tasks – Datapath – Controller ■ General purpose processor – Wide variety of computation tasks ■ Single purpose processor – To carry out a particular computation task – Common tasks ■ Custom single purpose processors – Non-standard task 4
  • 5. Introduction (cont..) ■ Why custom single purpose processor? – Faster performance ■ Fewer clock cycles from customized datapath ■ Shorter clock cycles from simple functional units – Smaller size ■ Simpler datapath ■ No program memory – Less power consumption ■ More efficient computation ■ Drawbacks – High NRE costs – Time to market longer – Flexibility reduced 5
  • 6. Combinational Logic ■ Transistor – Basic electrical component in digital systems ■ Transistors  Logic Gates  Digital Systems ■ MOS transistor on silicon – Acts as an on/off switch – Voltage at “gate” controls whether current flows from source to drain 6 source drain oxide gate IC package IC channel Silicon substrate gate source drain Conducts if gate=1
  • 7. CMOS Transistor Implementations■ Complementary Metal Oxide Semiconductor ■ We refer to logic levels – Typically 0 is 0V, 1 is 5V ■ nMOS conducts if gate=1 ■ pMOS conducts if gate=0 ■ Basic gates 7 x F = x' 1 Inverter 0 F = (xy)' x 1 x y y NAND gate 0 1 F = (x+y)' x y x y NOR gate 0 gate source drain nMOS Conducts if gate=1 gate source drain pMOS Conducts if gate=0
  • 8. Basic Logic Gates 8 F = x y AND F = (x y)’ NAND F = x  y XOR F = x Driver F = x’ Inverte r x F F = x + y OR F = (x+y)’ NOR x F x y F F x y x y F x y F x y F F =x y XNOR Fy x x 0 y 0 F 0 0 1 0 1 0 0 1 1 1 x 0 y 0 F 0 0 1 1 1 0 1 1 1 1 x 0 y 0 F 0 0 1 1 1 0 1 1 1 0 x 0 y 0 F 1 0 1 0 1 0 0 1 1 1 x 0 y 0 F 1 0 1 1 1 0 1 1 1 0 x 0 y 0 F 1 0 1 0 1 0 0 1 1 0 x F 0 0 1 1 x F 0 1 1 0
  • 9. Combinational Logic Design ■ Combinational circuit – Digital Circuit whose output is a function of current inputs – No memory of past inputs ■ Steps in designing a Combinational Logic Circuit 1. Problem Definition 2. Truth Table 3. Output Equations 4. Minimized Expressions 5. Logic Circuit 9
  • 10. Combinational Logic Design 1. Problem Description y is 1 if a is equal to 1, or b and c are 1. z is 1 if b or c is equal to 1, but not both, or if all are 1. 10
  • 11. Combinational Logic Design (cont..) 2. Truth Table 11 a b c y z 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1
  • 12. Combinational Logic Design (cont..)3. Output Equations y= a’bc + abc’ + ab’c + abc’ + abc z= a’b’c + a’bc’ + ab’c + abc’ + abc 4. Minimized Expressions y= a + bc z= ab + b’c +bc’ 12
  • 13. Combinational Logic Design (cont..) 13 a b c y z
  • 14. Combinational Logic Design (cont..)■ Large circuits complex to design using logic gates ■ Eg- 16 inputs – 216=64K rows in truth table ■ Reduce complexity by components that are abstract than logic gates 14
  • 16. Sequential Logic Design ■ Sequential Circuit – Output is a function of current as well as previous input values – Has memory ■ Basic sequential circuit – FLIP FLOP – Stores a single bit 16
  • 19. Sequential Logic Design (cont..) ■ Control Inputs – Synchronous – Asynchronous ■ Clear control lines are asynchronous 19
  • 20. Sequential Logic Design A) Problem Description You want to construct a clock divider. Output a 1 for every four clock cycles 20
  • 26. Custom Single-purpose Processor Basic Model 26 controller and datapath controller datapath 
 
 external control inputs external control outputs 
 external data inputs 
 external data outputs datapath control inputs datapath control outputs 
 
 a view inside the controller and datapath controller datapath 
 
 state register next-state and control logic registers functional units
  • 27. State Diagram Templates 27 Assignment statement a = b next statement a = b next statement Loop statement while (cond) { loop-body- statements } next statement loop-body- statements cond next statement !cond J: C: Branch statement if (c1) c1 stmts else if c2 c2 stmts else other stmts next statement c1 c2 stmts !c1*c2 !c1*!c2 next statement othersc1 stmts J: C:
  • 28. Example: Greatest Common Divisor■ First create algorithm ■ Convert algorithm to “complex” state machine – Known as FSMD: finite-state machine with datapath – Can use templates to perform such conversion 28 GCD (a) Black-Box View x_i y_i d_o go_i
  • 29. b) Desired Functionality 29 0: int x, y; 1: while (1) { 2: while (!go_i); 3: x = x_i; 4: y = y_i; 5: while (x != y) { 6: if (x < y) 7: y = y - x; else 8: x = x - y; } 9: d_o = x; }
  • 30. c) State Diagram 30 y = y -x7: x = x - y8: 6-J: x!=y 5: !(x!=y) x<y !(x<y) 6: 5-J: 1: 1 !1 x = x_i3: y = y_i4: 2: 2-J: !go_i !(!go_i) d_o = x 1-J: 9:
  • 31. 31
  • 32. Creating the Datapath ■ Create a register for any declared variable ■ Create a functional unit for each arithmetic operation ■ Connect the ports, registers and functional units – Based on reads and writes – Use multiplexors for multiple sources ■ Create unique identifier – for each control input and output of datapath components 32
  • 33. Creating the Controller ■ Stage 3 x_sel=0; x_ld=1; – for loading ‘x’ ■ Stage 4  y_sel=0; y_ld=1; – For loading ‘y’ ■ Stage 7  y_sel=1; y_ld=1; – For loading the subtracted result y-x ■ Stage 8  x_sel=1; x_ld=1; – For loading the subtracted result x-y ■ Stage 9 d_ld=1 – Load the output register 33
  • 34. Controller Implementation Model ■ Inputs – go_i Enable – Q3-Q0 Output from state register – x_neq_y – X_lt_y ■ Outputs – x_sel, y_sel – x_ld, y_ld – d_ld – I3 - I0 34
  • 35. 35
  • 36. Completing the GCD Custom Single- Purpose Processor Design 36 
 
 a view inside the controller and datapath controller datapath 
 
 state register next-state and control logic registers functional units ■ We finished the datapath ■ We have a state table for the next state and control logic ■ Truth table for the combinational logic ■ This is not an optimized design.
  • 37. Optimizing Single-Purpose Processors■ Optimization is the task of making design metric values the best possible ■ GCD eg- If numbers are large, it will take more steps – Speed decreases ■ Optimization opportunities – Original Program – FSMD – Datapath – FSM 37
  • 38. Optimizing the Original Program ■ Analyze program attributes and look for areas of possible improvement – Number of computations – Size of variable – Time and space complexity – Operations used ■ Multiplication and division very expensive 38
  • 39. Optimizing the Original Program (Cont..) 39 0: int x, y; 1: while (1) { 2: while (!go_i); 3: x = x_i; 4: y = y_i; 5: while (x != y) { 6: if (x < y) 7: y = y - x; else 8: x = x - y; } 9: d_o = x; } 0: int x, y, r; 1: while (1) { 2: while (!go_i); // x must be the larger number 3: if (x_i >= y_i) { 4: x=x_i; 5: y=y_i; } 6: else { 7: x=y_i; 8: y=x_i; } 9: while (y != 0) { 10: r = x % y; 11: x = y; 12: y = r; } 13: d_o = x; } Original Program Optimized Program replace the subtraction operation(s) with modulo operation in order to speed up program GCD(42, 8) ‱ 9 iterations to complete the loop ‱ x and y values evaluated as follows : (42, 8), (34, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). GCD(42,8) ‱ 3 iterations to complete the loop ‱ x and y values evaluated as follows: (42, 8), (8,2), (2,0)
  • 40. Optimizing the FSMD ■ Areas of possible improvements – Merge states ■ States with constants on transitions can be eliminated, transition taken is already known ■ States with independent operations can be merged – Separate states ■ States which require complex operations (a*b*c*d) can be broken into smaller states to reduce hardware size – Scheduling ■ Task of assigning operations from the original program to states in an FSMD 40
  • 41. Optimizing the FSMD 41 Original FSMD Optimized FSMD ‱ Eliminate state 1 – transitions have constant values ‱ Merge state 2 and state 2J – no loop operation in between them ‱ Merge state 3 and state 4 – assignment operations are independent of one another ‱ Merge state 5 and state 6 – transitions from state 6 can be done in state 5 ‱ Eliminate state 5J and 6J – transitions from each state can be done from state 7 and state 8, respectively ‱ Eliminate state 1-J – transition from state 1-J can be done directly from state 9
  • 42. Optimizing the FSMD (cont..) ■ Consider a = b * c * d * e ■ Generating a single state for the operation requires 3 multipliers in the datapath. ■ Multipliers are expensive ■ Break down the operation down into smaller operations – t1 = b * c – t2 = d * e – a = t1 * t2 ■ Each smaller operation has its own state ■ Only 1 multiplier is required in the datapath 42
  • 43. Optimizing the FSMD (cont..) ■ Timing of output operations could be changed while the FSMD is optimized ■ Reduced FSMD will generate GCD output in fewer clock cycles ■ Changing the timing would not be acceptable in all cases. Eg- Clock divider ■ Thus, when optimizing FSMD, a designer must be aware of whether output timing may or may not be modified. 43
  • 44. Optimizing the Datapath ■ Sharing of functional units – One-to-one mapping, as done previously, is not necessary – If same operation occurs in different states, they can share a single functional unit ■ Multi-functional units – ALUs support a variety of operations, it can be shared among operations occurring in different states 44
  • 45. Optimizing the FSM ■ State Encoding – Task of assigning a unique bit pattern to each state in an FSM – Size of state register and combinational logic vary – Eg- FSM with n states – n! possible encoding ways – Can be treated as an ordering problem – More encodings are possible – Can use more than log2n bits to encode ‘n’ states – CAD tools – great aid in searching for the best encoding ■ State Minimization – Task of merging equivalent states into a single state ■ State equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state 45
  • 46. ■ Converting a sequential program into custom single purpose processor – Convert the program into FSMD – Splitting FSMD into a simple FSM controlling datapath – Performing sequential logic design on the FSM ■ In many cases, we prefer not to start with a program – but instead directly with a FSMD – Cycle by cycle timing of a system is central to the design – Programming language don’t typically support cycle by cycle description 46 RT-level Custom Single-Purpose Processor Design
  • 47. RT-level Custom Single-Purpose Processor Design ■ Example – Device to send an 8-bit number to another device (the receiver) – Receiver can receive all 8 bits at once – Sender sends 4 bits at a time – First lower order 4 bits and then the higher order 4 bits ■ Bridge should be designed that will enable the 2 devices to communicate 47
  • 49. 49 RT-level custom single-purpose processor design (cont
)
  • 50. General Purpose Processors - Software 50
  • 51. Introduction ■ General-Purpose Processor – Processor designed for a variety of computation tasks – Low unit cost, in part because manufacturer spreads NRE over large numbers of units ■ Motorola sold half a billion 68HC05 microcontrollers in 1996 alone – Carefully designed since higher NRE is acceptable ■ Can yield good performance, size and power – Low NRE cost, short time-to-market/prototype, high flexibility ■ User just writes software; no processor design – Also known as “microprocessor” – “micro” used when they were implemented on one or a few chips rather than entire rooms 51
  • 52. Basic Architecture 52 ■ Control unit and datapath – Note similarity to single-purpose processor ■ Key differences – Datapath is general – Control unit doesn’t store the algorithm – the algorithm is “programmed” into the memory
  • 53. Datapath Operations 53 ‱ Load ‱ Read memory location into register ‱ ALU operation – Input certain registers through ALU, store back in register ‱ Store – Write register to memory location
  • 54. Control Unit ■ Control unit: configures the datapath operations – Sequence of desired operations (“instructions”) stored in memory – “program” ■ Instruction cycle – broken into several sub-operations, each one clock cycle, e.g.: – Fetch: Get next instruction into IR – Decode: Determine what the instruction means – Fetch operands: Move data from memory to datapath register – Execute: Move data through the ALU – Store results: Write data from register to memory 54
  • 55. Control Unit Sub-Operations ■ Fetch – Get next instruction into IR – PC: program counter, always points to next instruction – IR: holds the fetched instruction 55
  • 56. Control Unit Sub-Operations ■ Decode – Determine what the instruction means 56
  • 57. Control Unit Sub-Operations ■ Fetch operands – Move data from memory to datapath register 57
  • 58. Control Unit Sub-Operations ■ Execute – Move data through the ALU – This particular instruction does nothing during this sub-operation 58
  • 59. Control Unit Sub-Operations ■ Store results – Write data from register to memory – This particular instruction does nothing during this sub-operation 59
  • 63. Architectural Considerations ■ N-bit processor – N-bit ALU, registers, buses, memory data interface – Embedded: 8-bit, 16-bit, 32-bit common – Desktop/servers: 32-bit, even 64 ■ PC size determines address space 63
  • 64. Architectural Considerations ■ Clock frequency – Inverse of clock period – Must be longer than longest register to register delay in entire processor – Memory access is often the longest 64
  • 66. 66 Two Memory Architectures ■ Princeton – Fewer memory wires ■ Harvard – Simultaneous program and data memory access Processor Program memory Data memory Processor Memory (program and data) Harvard Princeton
  • 67. Cache Memory ■ Memory access may be slow ■ Cache is small but fast memory close to processor – Holds copy of part of memory – Hits and misses 67 Processor Memory Cache Fast/expensive technology, usually on the same chip Slower/cheaper technology, usually on a different chip
  • 68. Superscalar and VLIW Architectures■ Performance can be improved by: – Faster clock (but there’s a limit) – Pipelining: slice up instruction into stages, overlap stages – Multiple ALUs to support more than one instruction stream ■ Superscalar – Scalar: non-vector operations – Fetches instructions in batches, executes as many as possible ■ May require extensive hardware to detect independent instructions – VLIW: each word in memory has multiple independent instructions ■ Relies on the compiler to detect and schedule instructions ■ Currently growing in popularity 68
  • 69. Programmer’s View ■ Programmer doesn’t need detailed understanding of architecture – Instead, needs to know what instructions can be executed ■ Two levels of instructions: – Assembly level – Structured languages (C, C++, Java, etc.) ■ Most development today done using structured languages – But, some assembly level programming may still be necessary – Drivers: portion of program that communicates with and/or controls (drives) another device ■ Often have detailed timing considerations, extensive bit manipulation ■ Assembly level may be best for these 69
  • 70. Assembly-Level Instructions ■ Instruction Set – Defines the legal set of instructions for that processor ■ Data transfer: memory/register, register/register, I/O, etc. ■ Arithmetic/logical: move register through ALU and back ■ Branches: determine next PC value when not just PC+1 70 opcode operand1 operand2 opcode operand1 operand2 opcode operand1 operand2 opcode operand1 operand2 ... Instruction 1 Instruction 2 Instruction 3 Instruction 4
  • 71. A Simple (Trivial) Instruction Set 71
  • 73. Sample Programs 73 int total = 0; for (int i=10; i!=0; i--) total += i; // next instructions... C program MOV R0, #0; // total = 0 MOV R1, #10; // i = 10 JZ R1, Next; // Done if i=0 ADD R0, R1; // total += i MOV R2, #1; // constant 1 JZ R3, Loop; // Jump always Loop: Next: // next instructions... SUB R1, R2; // i-- Equivalent assembly program MOV R3, #0; // constant 0 0 1 2 3 5 6 7
  • 74. Programmer Considerations 74 ■ Program and data memory space – Embedded processors often very limited ■ e.g., 64 Kbytes program, 256 bytes of RAM (expandable) ■ Registers: How many are there? – Only a direct concern for assembly-level programmers ■ I/O – How communicate with external signals? ■ Interrupts
  • 75. Operating System 75 ■ Optional software layer providing low-level services to a program (application). – File management, disk access – Keyboard/display interfacing – Scheduling multiple programs for execution ■ Or even just multiple threads from one program – Program makes system calls to the OS
  • 76. Development Environment 76 ■ Development processor – The processor on which we write and debug our programs ■ Usually a PC ■ Target processor – The processor that the program will run on in our embedded system ■ Often different from the development processor
  • 77. Software Development Process 77 ■ Compilers – Cross compiler ■ Runs on one processor, but generates code for another ■ Assemblers ■ Linkers ■ Debuggers ■ Profilers
  • 78. Running a Program ■ If development processor is different than target, how can we run our compiled code? Two options: – Download to target processor – Simulate ■ Simulation – One method: Hardware description language ■ But slow, not always available – Another method: Instruction set simulator (ISS) ■ Runs on development processor, but executes instructions of target processor 78
  • 79. Testing and Debugging 79 ■ ISS – Gives us control over time – set breakpoints, look at register values, set values, step-by-step execution, ... – But, doesn’t interact with real environment ■ Download to board – Use device programmer – Runs in real environment, but not controllable ■ Compromise: Emulator – Runs in real environment – Supports some controllability from the PC
  • 80. Application-Specific Instruction-Set Processors (ASIPs) 80 ■ General-Purpose Processors – Sometimes too general to be effective in demanding application ■ e.g., video processing – requires huge video buffers and operations on large arrays of data, inefficient on a GPP – But single-purpose processor has high NRE, not programmable ■ ASIP’s – targeted to a particular domain – Contain architectural features specific to that domain ■ e.g., embedded control, digital signal processing, video processing, network processing, telecommunications, etc. – Still programmable
  • 81. A Common ASIP: Microcontroller 81 ■ For embedded control applications – Reading sensors, setting actuators – Mostly dealing with events (bits): data is present, but not in huge amounts – e.g., VCR, disk drive, digital camera (assuming SPP for image compression), washing machine, microwave oven ■ Microcontroller features – On-chip peripherals ■ Timers, analog-digital converters, serial communication, etc. ■ Tightly integrated for programmer, typically part of register space – On-chip program and data memory – Direct programmer access to many of the chip’s pins – Specialized instructions for bit-manipulation and other low-level operations ■ Incorporating peripherals and memory onto the same IC – reduces the no. of required IC’s  Compact and low power implementations
  • 82. Another Common ASIP: Digital Signal Processors (DSP)■ For signal processing applications – Large amounts of digitized data, often streaming ■ Source – photo captured by a digital camera, a voice packet through a network router – Data transformations must be applied fast – e.g., cell-phone voice filter, digital TV, music synthesizer ■ DSP features – Several instruction execution units – Filtering, Transforming vectors or metrics of data – Multiple-accumulate single-cycle instruction, other instructions. – Efficient vector operations – e.g., add two arrays ■ Vector ALUs, loop buffers, etc. – Contains number of ADC, DAC, PWM, timers, counters etc. – Commonly used DSP’s are well supported in terms of compiler and other development tools  Easy and cheap to integrate into most embedded systems. 82
  • 83. Less General ASIP Environments ■ ASIP’s that are less general in nature ■ Designed to perform very domain specific processing while allowing some degree of programmability. ■ ASIP’s designed for networking hardware May be designed to be programmable with different network routing algorithms, checksum, and packet processing protocols 83
  • 84. Trend: Even More Customized ASIPs 84 ■ In the past, microprocessors were acquired as chips ■ Today, we increasingly acquire a processor as Intellectual Property (IP) – e.g., synthesizable VHDL model ■ Opportunity to add a custom datapath hardware and a few custom instructions, or delete a few instructions – Can have significant performance, power and size impacts – Problem: need compiler/debugger for customized ASIP ■ Remember, most development uses structured languages ■ One solution: Automatic compiler/debugger generation – e.g., www.tensillica.com ■ Another solution: Re-targetable compilers – e.g., www.improvsys.com (customized VLIW architectures)
  • 85. Selecting a Microprocessor 85 ■ Issues – Technical: speed, power, size, cost – Other: development environment, prior expertise, licensing, etc. ■ Speed: how evaluate a processor’s speed? – Clock speed – but instructions per cycle may differ – Instructions per second – but work per instruction may differ – Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec. ■ MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780). A.k.a. Dhrystone MIPS. Commonly used today. – So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second – SPEC: set of more realistic benchmarks, but oriented to desktops – EEMBC – EDN Embedded Benchmark Consortium, www.eembc.org ■ Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications
  • 86. General Purpose Processors 86 Processor Clock speed Periph. Bus Width MIPS Power Trans. Price General Purpose Processors Intel PIII 1GHz 2x16 K L1, 256K L2, MMX 32 ~900 97W ~7M $900 IBM PowerPC 750X 550 MHz 2x32 K L1, 256K L2 32/64 ~1300 5W ~7M $900 MIPS R5000 250 MHz 2x32 K 2 way set assoc. 32/64 NA NA 3.6M NA StrongARM SA-110 233 MHz None 32 268 1W 2.1M NA Microcontroller Intel 8051 12 MHz 4K ROM, 128 RAM, 32 I/O, Timer, UART 8 ~1 ~0.2W ~10K $7 Motorola 68HC811 3 MHz 4K ROM, 192 RAM, 32 I/O, Timer, WDT, SPI 8 ~.5 ~0.1W ~10K $5 Digital Signal Processors TI C5416 160 MHz 128K, SRAM, 3 T1 Ports, DMA, 13 ADC, 9 DAC 16/32 ~600 NA NA $34 Lucent DSP32C 80 MHz 16K Inst., 2K Data, Serial Ports, DMA 32 40 NA NA $75
  • 87. Chapter Summary 87 ■ General-purpose processors – Good performance, low NRE, flexible ■ Controller, datapath, and memory ■ Structured languages prevail – But some assembly level programming still necessary ■ Many tools available – Including instruction-set simulators, and in-circuit emulators ■ ASIPs – Microcontrollers, DSPs, network processors, more customized ASIPs ■ Choosing among processors is an important step ■ Designing a general-purpose processor is conceptually the same as designing a single-purpose processor
  • 89. 1. An algorithm for matrix multiplication, assuming that we have one adder and one multiplier, follows: a. Convert the matrix multiplication algorithm into a state diagram. b. Rewrite the matrix multiplication algorithm given the assumption that we have 3 adders and 6 multipliers. c. If each multiplication takes 2 cycles to compute and each addition takes one cycle to compute, how many cycles does it take to complete the matrix multiplication given one adder and one multiplier? Three adders and six multipliers? d. If each adder requires 10 transistors to implement and each multiplier requires 100 transistors to implement, what is the total number of transistors to implement the matrix multiplication circuit using 1 adder and 1 multiplier? Three adders and six multipliers? 89
  • 90. main() { int A[3][2]={ {1, 2}, {3,4}, {5,6}}; int B[2][3]= {{7, 8, 9}, (10, 11, 12}}; int C[3][3], i, j, k; for(i=0; i<3; i++) { for(j=0; j<3; j++) { c[i][j]=0; for(k=0;k<2;k++){ c[i][j]+=A[i][k]*B[k][j]; } } } } 90
  • 91. 91
  • 92. ■ Cycles to complete matrix multiplication – 1 adder + 1 multiplier = 54 cycles – 3 adders + 6 multipliers = 9 cycles ■ Number of transistors – 1 adder + 1 multiplier = 110 transistors – 3 adders + 6 multipliers = 630 transistors 92
  • 93. 2. Design a single-purpose processor that outputs Fibonacci numbers up to n places. Start with a function computing the desired result, translate it into a state diagram, and sketch a probable datapath. 93
  • 94. 94
  • 95. 95 c_ld c_sel x2_ld x2_sel count_lt_ncount_ne_01 0