SlideShare a Scribd company logo
1 of 44
CMPN301: Computer Architecture
Pipelining
Mayada Hadhoud
Computer Engineering Department
Cairo University
Agenda
โ€ข What is pipelining?
โ€ข Characteristics of pipelining
โ€ข Pipelining Hazards
โ€“ Structural Hazard
โ€“ Data Hazard
โ€“ Control Hazard
ENGR9861 Winter 2007 RV
What Is A Pipeline?
โ€ข Pipelining is used by virtually all modern
microprocessors to enhance performance by
overlapping the execution of instructions.
4
What Is Pipelining
โ€ข Laundry Example
โ€ข 4 persons each have one load of
clothes to wash, dry, and fold
โ€ข Washer takes 30 minutes
โ€ข Dryer takes 40 minutes
โ€ข โ€œFolderโ€ takes 20 minutes
A B C D
5
What Is Pipelining
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
Appendix A - Pipelining 6
What Is Pipelining
Start work ASAP
โ€ข Pipelined laundry takes 3.5
hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
Appendix A - Pipelining 7
Pipelining Lessons
โ€ข Pipelining doesnโ€™t help latency of
single task, it helps throughput
of entire workload
โ€ข Pipeline rate limited by slowest
pipeline stage
โ€ข Multiple tasks operating
simultaneously
โ€ข Potential speedup = Number
pipe stages
โ€ข Unbalanced lengths of pipe
stages reduces speedup
โ€ข Time to โ€œfillโ€ pipeline and time
to โ€œdrain
A
B
C
D
6 PM 7 8 9
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
What Is
Pipelining
Pipelining Theoretical
Performance
โ€ข An ideal pipeline divides a task into k independent
sequential subtasks
โ€“ Each subtask requires 1 time unit to complete
โ€“ The task itself requires k time units to complete
โ€ข For n iterations of task, the execution times:
โ€“ With no pipelining: nk time units
โ€“ With pipelining: k + (n-1) time units
โ€ข Speedup of a k-stage pipeline is
โ€“ S = nk/[k+(n-1)] โ†’ = k for large n
Characteristics Of Pipelining
โ€ข The previous expression is ideal.
โ€ข In terms of a CPU, the implementation of
pipelining has the effect of reducing the
average instruction time, therefore reducing
the average CPI.
โ€ข EX: If each instruction in a microprocessor
takes 5 clock cycles (unpipelined) and we have
a 4 stage pipeline, the ideal average CPI with
the pipeline will be 1.25 .
RISC Instruction Set Basics (MIPS)
โ€ข Properties of RISC architectures:
โ€“ All operations on data apply to data in registers
and typically change the entire register (32-bits or
64-bits).
โ€“ The only operations that affect memory are
load/store operations. Memory to register and
register to memory.
โ€“ Usually, instructions are few and are typically one
size.
โ€ข ALU Instructions (R-type):
โ€ข Arithmetic operations, take two registers as operands.
The result is stored in a third register.
โ€ข Logical operations AND OR, XOR, shift
RISC Instruction Set Basics (MIPS)
Types of Instructions
R-Type Instruction Example
Immediate Format Instructions (I-type):
โ€ข Usually take a register (base register) as an operand and
a 16-bit immediate value. The sum of the two will
create the effective address. A second register acts as a
source in the case of a load operation.
โ€ข In the case of a store operation the second register
contains the data to be stored.
RISC Instruction Set Basics (MIPS)
Types of Instructions
I-Type Instruction Example
Jump Format (J-type)
โ€ข Conditional branches are transfers of control. As
described before, a branch causes an immediate value
to be added to the current program counter.
RISC Instruction Set Basics (MIPS)
Types of Instructions
RISC Instruction Set Implementation
โ€ข We first need to look at how instructions in the MIPS instruction
set are implemented without pipelining. Weโ€™ll assume that any
instruction of the subset of MIPS can be executed in at most 5
clock cycles.
โ€ข The five clock cycles will be broken up into the following steps:
โ€ข Instruction Fetch Cycle
โ€ข Instruction Decode/Register Fetch Cycle
โ€ข Execution Cycle
โ€ข Memory Access Cycle
โ€ข Write- Back
Fetching Instructions (IF)
โ€ข Fetching instructions involves
โ€“ reading the instruction from
the Instruction Memory
โ€“ updating the PC to hold the
address of the next
instruction
โ€“ PC is updated every cycle, so
it does not need an explicit
write control signal
โ€“ Instruction Memory is read
every cycle, so it doesnโ€™t need
an explicit read control signal
Read
Address
Instruction
Instruction
Memory
Add
PC
4
Decoding Instructions (ID)
โ€ข Decoding instructions involves
โ€“ sending the fetched instructionโ€™s opcode and
function field bits to the control unit
โ€“ reading two values from the Register File
โ€ข Register File addresses are contained in the instruction
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
Control
Unit
Executing R Format Operations (IE)
โ€ข R format operations
(add,sub,slt,and,or)
โ€“ perform the (op and funct) operation on values in rs and rt
โ€“ store the result back into the Register File (into location rd)
โ€“ The Register File is not written every cycle (e.g. sw), so we need an
explicit write control signal for the Register File
R-type:
31 25 20 15 5 0
op rs rt rd funct
shamt
10
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
overflow
zero
ALU control
RegWrite
Executing Load and Store Operations (IE)
โ€ข Load and store operations involve
โ€“ compute memory address by adding the base register (read from the Register File during
decode) to the 16-bit signed-extended offset field in the instruction
โ€“ store value (read from the Register File during decode) written to the Data Memory
โ€“ load value, read from the Data Memory, written to the Register File
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
overflow
zero
ALU control
RegWrite
Data
Memory
Address
Write Data
Read Data
Sign
Extend
MemWrite
MemRead
16 32
Executing Branch Operations (IE)
โ€ข Branch operations involves
โ€“ compare the operands read from the
Register File during decode for equality
(zero ALU output)
โ€“ compute the branch target address by
adding the updated PC to the 16-bit
signed-extended offset field in the instr
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
zero
ALU control
Sign
Extend
16 32
Shift
left 2
Add
4
Add
PC
Branch
target
address
(to branch
control logic)
Memory Access (MEM) Cycle
โ€ข If a load, the effective address computed from
the previous cycle is referenced and the
memory is read. The actual data transfer to
the register does not occur until the next
cycle.
โ€ข If a store, the data from the register is written
to the effective address in memory.
Write-Back (WB) Cycle
โ€ข Occurs with Register-Register ALU instructions
or load instructions.
โ€ข Simple operation whether the operation is a
register-register operation or a memory load
operation, the resulting data is written to the
appropriate register.
The single cycle datapath
Single Cycle Datapath with Control Unit
Read
Address
Instr[31-0]
Instruction
Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
ovf
zero
RegWrite
Data
Memory
Address
Write Data
Read Data
MemWrite
MemRead
Sign
Extend
16 32
MemtoReg
ALUSrc
Shift
left 2
Add
PCSrc
RegDst
ALU
control
1
1
1
0
0
0
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15
-11]
Control
Unit
Instr[31-26]
Branch
Read
Address
Instr[31-0]
Instruction
Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
ovf
zero
RegWrite
Data
Memory
Address
Write Data
Read Data
MemWrite
MemRead
Sign
Extend
16 32
MemtoReg
ALUSrc
Shift
left 2
Add
PCSrc
RegDst
ALU
control
1
1
1
0
0
0
0
1
ALUOp
Instr[5-0]
Instr[5-0]
Instr[25-21]
Instr[20-16]
Instr[15
-11]
Control
Unit
Instr[31-26]
Branch
R-type Instruction Data/Control Flow
Read
Address Instr[31-0]
Instruction
Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
ovf
zero
RegWrite
Data
Memory
Address
Write Data
Read Data
MemWrite
MemRead
Sign
Extend
16 32
MemtoReg
ALUSrc
Shift
left 2
Add
PCSrc
RegDst
ALU
control
1
1
1
0
0 0
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15
-11]
Control
Unit
Instr[31-26]
Branch
Load Word Instruction Data/Control Flow
Store Word
Instruction?
Read
Address
Instr[31-0]
Instruction
Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
ovf
zero
RegWrite
Data
Memory
Address
Write Data
Read Data
MemWrite
MemRead
Sign
Extend
16 32
MemtoReg
ALUSrc
Shift
left 2
Add
PCSrc
RegDst
ALU
control
1
1
1
0
0
0
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15
-11]
Control
Unit
Instr[31-26]
Branch
Branch Instruction Data/Control Flow
Fetch : 2 ns
Decode/ Reg Read : 1 ns
Execute : 2 ns
Memory : 2 ns
WB : 1 ns
Single Cycle Multi Cycle Pipelined
Clock Cycle Time Longest Inst. Time
= 2+1+2+2+1 = 8
ns
Longest stage time
= 2 ns
Longest stage time
= 2 ns
Execution Time
(1000 instruction
50% ALU, 10%
Store, 30%
Branch , 10%
Load)
1000 x 8 = 8000 ns 500 x 4 x 2 +100 x
4 x 2 + 300 x 3 x2 +
100 x 5 x 2 = 7600
ns
5 x 2 + (1000 -1) x
2 =2008 ns
The Basic Pipeline For MIPS
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7
Cycle 5
I
n
s
t
r.
O
r
d
e
r
34
CPU Pipelining: Example
๏‚ท Example : Single-Cycle, non-pipelined execution
๏‚ท Total time for 3 instructions: 24 ns
Instruc
tion
fetch
Reg ALU
Data
access
Reg
8ns
Instruc
tion
fetch
Reg ALU
Data
access
Reg
8ns
Instruc
tion
fetch
8 ns
Time
lw $1, 100($0)
lw $2, 200($0)
lw $3, 300($0)
2 4 6 8 1 0 1 2 14 1 6 1 8
. . .
P rog ram
ex e cution
o rd er
(in instructions)
35
CPU Pipelining: Example
๏‚ท Single-cycle, pipelined execution
๏€ญ Improve performance by increasing instruction throughput
๏€ญ Total time for 3 instructions = 14 ns
๏€ญ Each instruction adds 2 ns to total execution time
๏€ญ Stage time limited by slowest resource (2 ns)
๏€ญ Assumptions:
๏‚ท Write to register occurs in 1st half of clock
๏‚ท Read from register occurs in 2nd half of clock
R eg
R eg
R eg
2 4 6 8 1 0 1 2 1 4
Instruction
fetch
R eg A L U
D ata
access
Time
lw$1, 100($0)
lw$2, 200($0)
lw$3, 300($0)
2 ns
Instruction
fetch
R eg A L U
D ata
access
2 ns
Instruction
fetch
R eg A L U
D ata
access
2 n s 2 n s 2 n s 2 ns 2 n s
P rog ram
ex e cutio n
o rd er
(in in stru ctio n s)
CPU pipelining: Example
โ€ข Time without pipelining = 24 ns
โ€ข Time with pipelining = 14 ns (not = 24/5), WHY???
โ€“ Number of instructions is not large
โ€ข Letโ€™s increase the number of instructions
โ€“ If number of instructions = 1,000,000 instruction , the total
time with pipelining = 1,000,000 X 2 ns = 2,000,000 ns
โ€“ Time without pipelining = 1,000,000 X 8ns = 8,000,000 ns
โ€“ The speed up = 4 (increased)
The pipelined version of MIPS
Datapath
โ€ข Need registers between stages
โ€“
IF
ID
EX for Load
MEM for Load
WB for Load
Wrong
register
number
There is a BUG here
Corrected Datapath for Load
The pipelined data path with control
signals
Control Signals

More Related Content

Similar to CMPN301-Pipelining_V2.pptx

CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdfCS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdfAsst.prof M.Gokilavani
ย 
Parallel Processing Techniques Pipelining
Parallel Processing Techniques PipeliningParallel Processing Techniques Pipelining
Parallel Processing Techniques PipeliningRNShukla7
ย 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsButtaRajasekhar2
ย 
MIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxMIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxJEEVANANTHAMG6
ย 
Microchip's PIC Micro Controller
Microchip's PIC Micro ControllerMicrochip's PIC Micro Controller
Microchip's PIC Micro ControllerMidhu S V Unnithan
ย 
BTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptxBTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptxAshokRachapalli1
ย 
Unit iii
Unit iiiUnit iii
Unit iiiJanani S
ย 
Unit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptxUnit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptxMedicaps University
ย 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
ย 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
ย 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAAiman Hud
ย 
Parallel processing and pipelining
Parallel processing and pipeliningParallel processing and pipelining
Parallel processing and pipeliningmahesh kumar prajapat
ย 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of ProcessorsGaditek
ย 
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PerformanceLec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PerformanceHsien-Hsin Sean Lee, Ph.D.
ย 
Basic computer organization and design
Basic computer organization and designBasic computer organization and design
Basic computer organization and designmahesh kumar prajapat
ย 
CA UNIT III.pptx
CA UNIT III.pptxCA UNIT III.pptx
CA UNIT III.pptxssuser9dbd7e
ย 
Control unit design
Control unit designControl unit design
Control unit designDhaval Bagal
ย 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesMahmudul Hasan
ย 

Similar to CMPN301-Pipelining_V2.pptx (20)

CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdfCS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
ย 
Parallel Processing Techniques Pipelining
Parallel Processing Techniques PipeliningParallel Processing Techniques Pipelining
Parallel Processing Techniques Pipelining
ย 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose Processors
ย 
MIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxMIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptx
ย 
Microchip's PIC Micro Controller
Microchip's PIC Micro ControllerMicrochip's PIC Micro Controller
Microchip's PIC Micro Controller
ย 
BTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptxBTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptx
ย 
Unit iii
Unit iiiUnit iii
Unit iii
ย 
Unit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptxUnit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptx
ย 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
ย 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
ย 
Unit 4 COA.pptx
Unit 4 COA.pptxUnit 4 COA.pptx
Unit 4 COA.pptx
ย 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
ย 
Parallel processing and pipelining
Parallel processing and pipeliningParallel processing and pipelining
Parallel processing and pipelining
ย 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of Processors
ย 
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PerformanceLec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
ย 
Basic computer organization and design
Basic computer organization and designBasic computer organization and design
Basic computer organization and design
ย 
CO Module 5
CO Module 5CO Module 5
CO Module 5
ย 
CA UNIT III.pptx
CA UNIT III.pptxCA UNIT III.pptx
CA UNIT III.pptx
ย 
Control unit design
Control unit designControl unit design
Control unit design
ย 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
ย 

Recently uploaded

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
ย 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
ย 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .DerechoLaboralIndivi
ย 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
ย 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
ย 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
ย 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
ย 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSrknatarajan
ย 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
ย 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
ย 
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort ServiceCall Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
ย 

Recently uploaded (20)

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
ย 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
ย 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
ย 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
ย 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
ย 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
ย 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
ย 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
ย 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
ย 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
ย 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
ย 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
ย 
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort ServiceCall Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
ย 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
ย 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
ย 

CMPN301-Pipelining_V2.pptx

  • 1. CMPN301: Computer Architecture Pipelining Mayada Hadhoud Computer Engineering Department Cairo University
  • 2. Agenda โ€ข What is pipelining? โ€ข Characteristics of pipelining โ€ข Pipelining Hazards โ€“ Structural Hazard โ€“ Data Hazard โ€“ Control Hazard
  • 3. ENGR9861 Winter 2007 RV What Is A Pipeline? โ€ข Pipelining is used by virtually all modern microprocessors to enhance performance by overlapping the execution of instructions.
  • 4. 4 What Is Pipelining โ€ข Laundry Example โ€ข 4 persons each have one load of clothes to wash, dry, and fold โ€ข Washer takes 30 minutes โ€ข Dryer takes 40 minutes โ€ข โ€œFolderโ€ takes 20 minutes A B C D
  • 5. 5 What Is Pipelining Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? A B C D 30 40 20 30 40 20 30 40 20 30 40 20 6 PM 7 8 9 10 11 Midnight T a s k O r d e r Time
  • 6. Appendix A - Pipelining 6 What Is Pipelining Start work ASAP โ€ข Pipelined laundry takes 3.5 hours for 4 loads A B C D 6 PM 7 8 9 10 11 Midnight T a s k O r d e r Time 30 40 40 40 40 20
  • 7. Appendix A - Pipelining 7 Pipelining Lessons โ€ข Pipelining doesnโ€™t help latency of single task, it helps throughput of entire workload โ€ข Pipeline rate limited by slowest pipeline stage โ€ข Multiple tasks operating simultaneously โ€ข Potential speedup = Number pipe stages โ€ข Unbalanced lengths of pipe stages reduces speedup โ€ข Time to โ€œfillโ€ pipeline and time to โ€œdrain A B C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 What Is Pipelining
  • 8. Pipelining Theoretical Performance โ€ข An ideal pipeline divides a task into k independent sequential subtasks โ€“ Each subtask requires 1 time unit to complete โ€“ The task itself requires k time units to complete โ€ข For n iterations of task, the execution times: โ€“ With no pipelining: nk time units โ€“ With pipelining: k + (n-1) time units โ€ข Speedup of a k-stage pipeline is โ€“ S = nk/[k+(n-1)] โ†’ = k for large n
  • 9.
  • 10. Characteristics Of Pipelining โ€ข The previous expression is ideal. โ€ข In terms of a CPU, the implementation of pipelining has the effect of reducing the average instruction time, therefore reducing the average CPI. โ€ข EX: If each instruction in a microprocessor takes 5 clock cycles (unpipelined) and we have a 4 stage pipeline, the ideal average CPI with the pipeline will be 1.25 .
  • 11. RISC Instruction Set Basics (MIPS) โ€ข Properties of RISC architectures: โ€“ All operations on data apply to data in registers and typically change the entire register (32-bits or 64-bits). โ€“ The only operations that affect memory are load/store operations. Memory to register and register to memory. โ€“ Usually, instructions are few and are typically one size.
  • 12. โ€ข ALU Instructions (R-type): โ€ข Arithmetic operations, take two registers as operands. The result is stored in a third register. โ€ข Logical operations AND OR, XOR, shift RISC Instruction Set Basics (MIPS) Types of Instructions
  • 14. Immediate Format Instructions (I-type): โ€ข Usually take a register (base register) as an operand and a 16-bit immediate value. The sum of the two will create the effective address. A second register acts as a source in the case of a load operation. โ€ข In the case of a store operation the second register contains the data to be stored. RISC Instruction Set Basics (MIPS) Types of Instructions
  • 16. Jump Format (J-type) โ€ข Conditional branches are transfers of control. As described before, a branch causes an immediate value to be added to the current program counter. RISC Instruction Set Basics (MIPS) Types of Instructions
  • 17.
  • 18. RISC Instruction Set Implementation โ€ข We first need to look at how instructions in the MIPS instruction set are implemented without pipelining. Weโ€™ll assume that any instruction of the subset of MIPS can be executed in at most 5 clock cycles. โ€ข The five clock cycles will be broken up into the following steps: โ€ข Instruction Fetch Cycle โ€ข Instruction Decode/Register Fetch Cycle โ€ข Execution Cycle โ€ข Memory Access Cycle โ€ข Write- Back
  • 19. Fetching Instructions (IF) โ€ข Fetching instructions involves โ€“ reading the instruction from the Instruction Memory โ€“ updating the PC to hold the address of the next instruction โ€“ PC is updated every cycle, so it does not need an explicit write control signal โ€“ Instruction Memory is read every cycle, so it doesnโ€™t need an explicit read control signal Read Address Instruction Instruction Memory Add PC 4
  • 20. Decoding Instructions (ID) โ€ข Decoding instructions involves โ€“ sending the fetched instructionโ€™s opcode and function field bits to the control unit โ€“ reading two values from the Register File โ€ข Register File addresses are contained in the instruction Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 Control Unit
  • 21. Executing R Format Operations (IE) โ€ข R format operations (add,sub,slt,and,or) โ€“ perform the (op and funct) operation on values in rs and rt โ€“ store the result back into the Register File (into location rd) โ€“ The Register File is not written every cycle (e.g. sw), so we need an explicit write control signal for the Register File R-type: 31 25 20 15 5 0 op rs rt rd funct shamt 10 Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU overflow zero ALU control RegWrite
  • 22. Executing Load and Store Operations (IE) โ€ข Load and store operations involve โ€“ compute memory address by adding the base register (read from the Register File during decode) to the 16-bit signed-extended offset field in the instruction โ€“ store value (read from the Register File during decode) written to the Data Memory โ€“ load value, read from the Data Memory, written to the Register File Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU overflow zero ALU control RegWrite Data Memory Address Write Data Read Data Sign Extend MemWrite MemRead 16 32
  • 23. Executing Branch Operations (IE) โ€ข Branch operations involves โ€“ compare the operands read from the Register File during decode for equality (zero ALU output) โ€“ compute the branch target address by adding the updated PC to the 16-bit signed-extended offset field in the instr Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU zero ALU control Sign Extend 16 32 Shift left 2 Add 4 Add PC Branch target address (to branch control logic)
  • 24. Memory Access (MEM) Cycle โ€ข If a load, the effective address computed from the previous cycle is referenced and the memory is read. The actual data transfer to the register does not occur until the next cycle. โ€ข If a store, the data from the register is written to the effective address in memory.
  • 25. Write-Back (WB) Cycle โ€ข Occurs with Register-Register ALU instructions or load instructions. โ€ข Simple operation whether the operation is a register-register operation or a memory load operation, the resulting data is written to the appropriate register.
  • 26. The single cycle datapath
  • 27. Single Cycle Datapath with Control Unit Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst ALU control 1 1 1 0 0 0 0 1 ALUOp Instr[5-0] Instr[15-0] Instr[25-21] Instr[20-16] Instr[15 -11] Control Unit Instr[31-26] Branch
  • 28. Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst ALU control 1 1 1 0 0 0 0 1 ALUOp Instr[5-0] Instr[5-0] Instr[25-21] Instr[20-16] Instr[15 -11] Control Unit Instr[31-26] Branch R-type Instruction Data/Control Flow
  • 29. Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst ALU control 1 1 1 0 0 0 0 1 ALUOp Instr[5-0] Instr[15-0] Instr[25-21] Instr[20-16] Instr[15 -11] Control Unit Instr[31-26] Branch Load Word Instruction Data/Control Flow Store Word Instruction?
  • 30. Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst ALU control 1 1 1 0 0 0 0 1 ALUOp Instr[5-0] Instr[15-0] Instr[25-21] Instr[20-16] Instr[15 -11] Control Unit Instr[31-26] Branch Branch Instruction Data/Control Flow
  • 31. Fetch : 2 ns Decode/ Reg Read : 1 ns Execute : 2 ns Memory : 2 ns WB : 1 ns Single Cycle Multi Cycle Pipelined Clock Cycle Time Longest Inst. Time = 2+1+2+2+1 = 8 ns Longest stage time = 2 ns Longest stage time = 2 ns Execution Time (1000 instruction 50% ALU, 10% Store, 30% Branch , 10% Load) 1000 x 8 = 8000 ns 500 x 4 x 2 +100 x 4 x 2 + 300 x 3 x2 + 100 x 5 x 2 = 7600 ns 5 x 2 + (1000 -1) x 2 =2008 ns
  • 32. The Basic Pipeline For MIPS Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5 I n s t r. O r d e r
  • 33. 34 CPU Pipelining: Example ๏‚ท Example : Single-Cycle, non-pipelined execution ๏‚ท Total time for 3 instructions: 24 ns Instruc tion fetch Reg ALU Data access Reg 8ns Instruc tion fetch Reg ALU Data access Reg 8ns Instruc tion fetch 8 ns Time lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2 4 6 8 1 0 1 2 14 1 6 1 8 . . . P rog ram ex e cution o rd er (in instructions)
  • 34. 35 CPU Pipelining: Example ๏‚ท Single-cycle, pipelined execution ๏€ญ Improve performance by increasing instruction throughput ๏€ญ Total time for 3 instructions = 14 ns ๏€ญ Each instruction adds 2 ns to total execution time ๏€ญ Stage time limited by slowest resource (2 ns) ๏€ญ Assumptions: ๏‚ท Write to register occurs in 1st half of clock ๏‚ท Read from register occurs in 2nd half of clock R eg R eg R eg 2 4 6 8 1 0 1 2 1 4 Instruction fetch R eg A L U D ata access Time lw$1, 100($0) lw$2, 200($0) lw$3, 300($0) 2 ns Instruction fetch R eg A L U D ata access 2 ns Instruction fetch R eg A L U D ata access 2 n s 2 n s 2 n s 2 ns 2 n s P rog ram ex e cutio n o rd er (in in stru ctio n s)
  • 35. CPU pipelining: Example โ€ข Time without pipelining = 24 ns โ€ข Time with pipelining = 14 ns (not = 24/5), WHY??? โ€“ Number of instructions is not large โ€ข Letโ€™s increase the number of instructions โ€“ If number of instructions = 1,000,000 instruction , the total time with pipelining = 1,000,000 X 2 ns = 2,000,000 ns โ€“ Time without pipelining = 1,000,000 X 8ns = 8,000,000 ns โ€“ The speed up = 4 (increased)
  • 36. The pipelined version of MIPS Datapath โ€ข Need registers between stages โ€“
  • 37. IF
  • 38. ID
  • 43. The pipelined data path with control signals