4. • Single-cycle control: hardwired
– Low CPI (1)
– Long clock period (to accommodate slowest instruction)
• Multi-cycle control: micro-programmed
– Short clock period
– High CPI
Single-cycle
Multi-cycle
insn0.(fetch,decode,exec) insn1.(fetch,decode,exec)
insn0.fetch insn0.dec insn0.exec insn1.fetch insn1.dec insn1.exec
time
Slide 3
5. • Start with multi-cycle design
• When insn0 goes from stage 1 to stage 2
… insn1 starts stage 1
• Each instruction passes through all stages
… but instructions enter and leave at faster rate
Multi-cycle insn0.fetch insn1.decinsn0.dec insn0.exec insn1.fetch insn1.exec
time
Pipelined
insn0.fetch insn0.dec
insn1.fetch
insn0.exec
insn1.dec
insn2.fetch
insn1.exec
insn2.dec insn2.exec
Can have as many insns in flight as there are stagesSlide 4
6. • A Pipelining is a series of stages, where some work is
done at each stage in parallel.
• The stages are connected one to the next to form a
pipe - instructions enter at one end, progress through
the stages, and exit at the other end.
Slide 5
7. Pipeline categories
Linear pipelines
A linear pipeline processor is a series of
processing stages and memory access.
Non-linear pipelines
A non-linear pipelining (also called dynamic pipeline) can
be configured to perform various functions at different
times. In a dynamic pipeline, there is also feed-forward or
feed-back connection. A non-linear pipeline also allows
very long instruction words.
Slide 6
9. InstructionPipeline
• Instruction pipeline has six operations,
Fetch instruction (FI)
Decode instruction (DI)
Calculate operands (CO)
Fetch operands (FO)
Execute instructions (EI)
Write result (WR)
Overlap these operations
Slide 8
10. Stage 1: Fetch Diagram
Instructio
nbits
IF / ID
Pipeline
register
Instructio
n
Cache
P
C
en
e
n
1
+
M
U
X
PC+
1
Decod
targe
t
Slide 9
11. Stage 1: Instructions Fetch
• Fetch an instruction from memory every cycle
– Use PC to index memory
– Increment PC (assume no branches for now)
• Write state to the pipeline register (IF/ID)
– The next stage will read this pipeline register
Slide 10
12. Stage 2: Decode Diagram
ID / EX
Pipeline
register
regA
content
s
regB
content
s
Register
File
reg
A
reg
B
e
n
Instructio
n
bits
IF / ID
Pipeline
register
PC+
1
PC+
1
Control
signal
s
Fetc
h
Execut
destRe
g
dat
a
targe
t
Slide 11
13. Stage 2: Instruction Decode
• Decodes opcode bits
– Set up Control signals for later stages
• Read input operands from register file
– Specified by decoded instruction bits
• Write state to the pipeline register (ID/EX)
– Opcode
– Register contents
– PC+1 (even though decode didn’t use it)
– Control signals (from insn) for opcode and destReg
Slide 12
14. Stage 3: Execute Diagram
ID / EX
Pipeline
register
regA
content
s
regB
content
s
ALU
resul
t
EX/Mem
Pipeline
register
PC+
1
Control
signal
s
Control
signal
s
PC+1
+offse
t
+
regB
content
s
A
L
UM
U
X
Decod
e
Memor
destReg
data
targe
t
Slide 13
15. Stage 3: Execution
• Perform ALU operations
– Calculate result of instruction
• Control signals select operation
• Contents of regA used as one input
• Either regB or constant offset (from insn) used as second input
– Calculate PC-relative branch target
• PC+1+(constant offset)
• Write state to the pipeline register (EX/Mem)
– ALU result, contents of regB, and PC+1+offset
– Control signals (from insn) for opcode and destReg
Slide 14
16. Stage 4: Memory Diagram
ALU
resul
t
Mem/WB
Pipeline
register
ALU
resul
t
EX/Mem
Pipeline
register
Control
signal
s
PC+1
+offse
t
regB
content
s
Loade
d
data
Control
signals
Execut
e
Write-
in_data
in_addr
Data Cache
en R/W
destReg
data
targe
t
Slide 15
17. Stage 4: Memory
• Perform data cache access
– ALU result contains address for LD or ST
– Opcode bits control R/W and enable signals
• Write state to the pipeline register (Mem/WB)
– ALU result and Loaded data
– Control signals (from insn) for opcode and destReg
Slide 16
18. Stage 5: Write-back Diagram
ALU
resul
t
Mem/WB
Pipeline
register
Control
signal
s
Loade
d
data
M
U
X
dat
a
destRe
g
M
U
X
Memor
y
Slide 17
19. Stage 5:Write Back
• Writing result to register file (if required)
– Write Loaded data to destReg for LD
– Write ALU result to destReg for arithmetic insn
– Opcode bits control register write enable signal
Slide 18
20. Putting It All Together
P
C
Ins
tCach
e
Register
file
M
U
X
A
L
U
1
Data
Cach
e
+
M U
X
IF/I
D
EX/Me
m
Mem/W
B
M
U
X
dest
op
ID/E
X
offse
t
PC+
1
PC+
1
+
targe
t
ALU
resul
t
des
t
op
val
B
des
t
op
ALU
result
mdat
a
eq
?
instructio
n
0
valA
val
B
R
0
R
1
R
2
R
3
R
4
R
5
R
6
R
7
reg
A
reg
B
dat
a
des
t
M
U
X
Slide 19
22. • Let see and examine our datapath and control
diagram.
• Associated resources with states.
• Ensure that flows do not conflict, or figure out how
to resolve
• Assert control in appropriate stage.
Slide 21
23. Register Memory: All ALU operation will be performed
in register Memory.
Instruction & Memory: Two kind of Memory System:-
1)instruction Memory 2)Data Memory
Only instruction can access through memory as ALU
operation will perform in Register operand.
5 Instruction cycle have to complete to execute the
operation
24. 5Steps of MIPS Datapath
Memory
Access
Write
Back
Instruction
Fetch
Instr. Decode
Reg. Fetch
Execute
Addr. Calc
L
M
D
ALU
MUX
Memory
RegFile
MUXMUX
Data
Memory
MUX
Sign
Extend
4
Adder
Zero?
Next SEQ PC
Addres
s
Next PC
WB Data
Inst
RD
RS1
RS2
Imm
What do we need to do to pipeline the process ?
Slide 22
25. 5 Steps of MIPS/DLX Datapath
Memory
Access
Write
Back
Instruction
Fetch
Instr. Decode
Reg. Fetch
Execute
Addr. Calc
Memory
MUXMUX
Data
Memory
Zero?
IF/I
D
ID/EX
RegFile
MUX
MEM/WB
EX/MEM
ALU
4
Adder
Next SEQ PC Next SEQ PC
RD RD RD
WB
• Data stationary control
– local decode for each instruction phase / pipeline stage
Next PC
Addres
s
RS1
RS2
Sign
Imm Extend
MUX
Slide 23
26. • Can help with answering questions like:
– how many cycles does it take to execute this code?
– what is the ALU doing during cycle 4?
– use this representation to help understand datapaths
Graphically Representing Pipelines
Slide 24
27. Visualizing Pipelining
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
g
AL
U
DMemIfetch Re Reg
Reg
AL
U
DMemIfetch Reg
Reg
AL
U
DMemIfetch Reg
Reg
AL
U
DMemIfetch Reg
Cycle 6 Cycle 7Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Slide 25
28. Conventional Pipelined Execution Representation
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
Program Flow
Time
Slide 26
29. SingleCycle,Multiple Cycle,vs. Pipeline
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Mem Wr
Mem Wr Ifetch Reg Exec
Multiple Cycle Implementation:
Load
Ifetch Reg Exec
Store
Exec Mem Wr
Clk
Single Cycle Implementation:
Load Store Waste
Mem Ifetch
R-type
Reg Exec Mem Wr
Pipeline Implementation:
Load Ifetch Reg Exec
Store Ifetch Reg
R-type Ifetch
Cycle 2Cycle 1
Slide 27
30. • Suppose we execute 100 instructions
• Single Cycle Machine
– 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
• Multicycle Machine
– 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns
• Ideal pipelined machine
– 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
Slide 28
31. Pipeline Performance
Pipelining
– At best no impact on latency
• Still need to wait “n” stages (cycles) for completion of instruction
– Improves “throughput”
• No single instruction executes faster but overall throughput is higher
• Average instruction execution time decreases
• Successive instructions complete in each successive cycle (no 5 cycle
wait
between instructions)
– Reality
• Clock determined by slowest stage
• Pipeline overhead
– Clock skew
– Register delay
– Pipeline fill and drain
Slide 29
32. Consider a non pipelined machine with 6 execution stages of lengths 50 ns, 50
ns, 60 ns, 60 ns, 50 ns, and 50 ns.
- Find the instruction latency on this machine.
- How much time does it take to execute 100 instructions? Solution:
Instruction latency = 50+50+60+60+50+50= 320 ns
Time to execute 100 instructions = 100*320 = 32000 ns
Suppose we introduce pipelining on this machine. Assume that when
introducing pipelining, the clock skew adds 5ns of overhead to each execution
stage.
- What is the instruction latency on the pipelined machine?
- How much time does it take to execute 100 instructions?
Solution:
Remember that in the pipelined implementation, the length of the pipe stages must
all be the same, i.e., the speed of the slowest stage plus overhead. With 5ns
overhead it comes to:
The length of pipelined stage = MAX(lengths of unpipelined stages) + overhead =
60 + 5 = 65 ns
Instruction latency = 65 ns
Time to execute 100 instructions = 65*6*1 + 65*1*99 = 390 + 6435 = 6825 ns
Slide 30
33. Slide 31
Three types of pipeline hazards
Structural hazard – They arise from resource
conflicts when the hardware cannot support all
possible combinations of instructions in
simultaneous overlapped execution.
– Data hazard – They arise when an instruction
depends on the result of a previous instruction in a
way that is exposed by the overlapping of
instructions in the pipeline.
– Control hazard – They arise from the pipelining of
branches and other instructions that change the PC.
34. Pipelining makes efficient use of resources.
Quicker time of execution of large number of
instructions
The parallelism is invisible to the programmer.
Slide 32
35. Pipelining involves adding hardware to thechip.
Inability to continuously run the pipeline at full
speed because of pipeline hazards which
disrupt the smooth execution ofthe pipeline.
Slide 33
36. The pipelining concept takes lot of advantages in many of the systems.
The pipelining has some of the hazards. The pipelining of instructions
which reduce the CPI, increases the speed of execution or operation,
and also increase throughput of
Overall system. This is basic concept of any system and lot of
improvement can be done in pipelining concept to increase the speed of
system. The future scope is to apply the concept
to different embedded systems and we can see how the performance
increases by this concept.
Slide 34