SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
October 1, 2022
Computer Architecture
PIPELININING
Pipelining is an implementation technique
whereby multiple instructions are overlapped in
execution
Takes advantage of parallelism that exists among
the actions needed to execute an instruction.
Today, pipelining is the key implementation
technique used to make fast CPUs.
October 1, 2022
CS 513 Advance Computer Architecture 2
PIPELININING
A pipeline is like an assembly line.
In an automobile assembly line, there are many
steps, each contributing something to the
construction of the car.
Each step operates in parallel with the other
steps, although on a different car.
October 1, 2022
CS 513 Advance Computer Architecture 3
PIPELININING
In a computer pipeline, each step in the pipeline
completes a part of an instruction.
Like assembly line, different steps are completing
different parts of different instructions in parallel.
Each of these steps is called a pipe stage or a
pipe segment.
Stages are connected one to next to form a
pipe—instructions enter at one end, progress
through the stages, and exit at other end, just as
cars would in an assembly line.
October 1, 2022
CS 513 Advance Computer Architecture 4
PIPELININING
In an automobile assembly line, throughput is
defined as the number of cars per hour and is
determined by how often a completed car exits
the assembly line.
Likewise, the throughput of an instruction
pipeline is determined by how often an
instruction exits the pipeline.
Because the pipe stages are hooked together, all
the stages must be ready to proceed at the same
time, just as we would require in an assembly
line.
October 1, 2022
CS 513 Advance Computer Architecture 5
PIPELININING
Time required between moving an instruction one
step down the pipeline is called processor cycle.
Because all stages proceed at the same time, the
length of a processor cycle is determined by the
time required for the slowest pipe stage, just as in
an auto assembly line, the longest step would
determine the time between advancing the line.
In a computer, this processor cycle is usually 1
clock cycle (sometimes it is 2, rarely more).
October 1, 2022
CS 513 Advance Computer Architecture 6
PIPELININING
Pipeline designer's goal is to balance length of
each pipeline stage, just as designer of assembly
line tries to balance time for each step in process.
If the stages are perfectly balanced, then the time
per instruction on the pipelined processor—
assuming ideal conditions—is equal to
October 1, 2022
CS 513 Advance Computer Architecture 7
PIPELININING
Under these conditions, speedup from pipelining
equals number of pipe stages, just as an
assembly line with n stages can ideally produce
cars n times as fast.
If stages are not be perfectly balanced then
pipelining does involve some overhead.
Time per instruction on pipelined processor will
not have its minimum possible value, yet it can be
close.
October 1, 2022
CS 513 Advance Computer Architecture 8
PIPELININING
Pipelining yields a reduction in the average
execution time per instruction.
Reduction can be as decreasing number of clock
cycles per instruction (CPI), as decreasing clock
cycle time, or as a combination.
October 1, 2022
CS 513 Advance Computer Architecture 9
PIPELININING
Pipelining is an implementation technique that
exploits parallelism among the instructions in a
sequential instruction stream.
It has the substantial advantage that, unlike some
speedup techniques, it is not visible to the
programmer.
Concept of pipelining using a classic five-stage
pipeline.
October 1, 2022
CS 513 Advance Computer Architecture 10
RISC Instruction Set
RISC (reduced instruction set computer)
architecture or load-store architecture to
illustrate the basic concepts.
Our default RISC architecture is MIPS
(Microprocessor without Interlocked Pipeline
Stages)
RISC architectures are characterized by a few key
properties, which simplify implementation.
October 1, 2022
CS 513 Advance Computer Architecture 11
RISC Instruction Set
Properties
– All operations on data, apply to data in registers and
typically change the entire register (32 or 64 bits per
register).
– Only operations affect memory are load & store
operations move data from memory to a register or to
memory from a register.
– Load and store operations that load or store less than a
full register (e.g., a byte, 16 bits, or 32 bits) are also
available.
– The instruction formats are few in number with all
instructions typically being one size.
October 1, 2022
CS 513 Advance Computer Architecture 12
A Simple Implementation of a RISC
Instruction Set
How a RISC instruction set can be implemented
in a pipelined fashion, lets understand how it is
implemented without pipelining.
Lets discuss simple implementation where every
instruction takes at most 5 clock cycles.
Extend this basic implementation to a pipelined
version, resulting in a much lower CPI.
October 1, 2022
CS 513 Advance Computer Architecture 13
A Simple Implementation of a RISC
Instruction Set
The un pipelined implementation is not the most
economical or the highest-performance
implementation without pipelining.
Instead, it is designed to lead naturally to a
pipelined implementation.
October 1, 2022
CS 513 Advance Computer Architecture 14
A Simple Implementation of a
RISC Instruction Set
Every instruction in this RISC subset can be
implemented in at most 5 clock cycles. The 5
clock cycles are as follows.
1. Instruction fetch cycle (IF):
– Send the program counter (PC) to memory and fetch the
current instruction from memory.
– Update the PC to the next sequential PC by adding 4
(since each instruction is 4 bytes) to the PC.
October 1, 2022
CS 513 Advance Computer Architecture 15
A Simple Implementation of a
RISC Instruction Set
2. Instruction decode/register fetch cycle (ID):
– Decode the instruction and read the registers
corresponding to register source specifies from the
register file.
– Do the equality test on the registers as they are read, for
a possible branch.
– Compute the possible branch target address by adding
the sign-extended offset to the incremented PC.
October 1, 2022
CS 513 Advance Computer Architecture 16
A Simple Implementation of a
RISC Instruction Set
– Decoding is done in parallel with reading registers,
which is possible because the register specifiers are at
a fixed location in a RISC architecture.
This technique is known as fixed-field decoding
October 1, 2022
CS 513 Advance Computer Architecture 17
A Simple Implementation of a
RISC Instruction Set
3. Execution/'effective address cycle (EX):
ALU operates on operands prepared in prior
cycle, performing one of three functions
depending on the instruction type.
– Memory reference: ALU adds base register &
offset to form the effective address.
– Register-Register ALU instruction: ALU
performs operation specified by ALU opcode
on values read from the register file.
October 1, 2022
CS 513 Advance Computer Architecture 18
A Simple Implementation of a
RISC Instruction Set
– Register-Immediate ALU instruction: ALU
performs operation specified by the ALU
opcode on the first value read from the register
file and the sign-extended immediate.
In a load-store architecture effective address &
execution cycles can be combined into a
single clock cycle, since no instruction needs
to simultaneously calculate a data address &
perform an operation on the data.
October 1, 2022
CS 513 Advance Computer Architecture 19
A Simple Implementation of a
RISC Instruction Set
4. Memory access (MEM):
– If the instruction is a load, memory does a read using
the effective address computed in the previous cycle.
– If it is a store, then the memory writes the data from the
second register read from the register file using the
effective address.
October 1, 2022
CS 513 Advance Computer Architecture 20
A Simple Implementation of a
RISC Instruction Set
5. Write-back cycle (WB):
– Register-Register ALU instruction or Load instruction:
Write result into register file, whether it comes
from the memory system (for a load) or from ALU
(for an ALU instruction).
October 1, 2022
CS 513 Advance Computer Architecture 21
The Classic Five-Stage Pipeline for a RISC
Processor
We can pipeline the execution described above
with almost no changes by simply starting a new
instruction on each clock cycle.
Each of the clock cycles from the previous
section becomes a pipe stage—a cycle in the
pipeline.
This results in execution pattern, which is typical
way a pipeline structure is drawn.
October 1, 2022
CS 513 Advance Computer Architecture 22
The Classic Five-Stage Pipeline for a RISC
Processor
As each instruction takes 5 clock cycles to
complete, during each clock cycle the hardware
will initiate a new instruction and will be
executing some part of the five different
instructions
October 1, 2022
CS 513 Advance Computer Architecture 23
The Classic Five-Stage
Pipeline for a RISC Processor
Is Pipelining simple ? No its not
Some serious issues
Determine what happens on every clock cycle of the
processor & make sure don't try to perform two different
operations with same data path resource on same clock
cycle.
– For example, a single ALU cannot be asked to compute an effective
address and perform a subtract operation at the same time.
Ensure overlap of instructions in pipeline cannot cause
such a conflict.
October 1, 2022
CS 513 Advance Computer Architecture 24
The Classic Five-Stage
Pipeline for a RISC Processor
Fortunately, the simplicity of a RISC instruction
set makes resource evaluation relatively easy.
The major functional units are used in different
cycles, and hence overlapping execution of
multiple instructions introduces relatively few
conflicts.
October 1, 2022
CS 513 Advance Computer Architecture 25
The Classic Five-Stage Pipeline for a RISC
Processor
Some observations:
First, we use separate instruction and data memories,
which we would typically implement with separate
instruction and data caches
Second, the register file is used in the two stages: one for
reading in ID and one for writing in WB.
These uses are distinct, so we simply show the register file
in two places.
Means we need to perform two reads and one write every
clock cycle.
October 1, 2022
CS 513 Advance Computer Architecture 26
The Classic Five-Stage Pipeline for a RISC
Processor
October 1, 2022
CS 513 Advance Computer Architecture 27
The Classic Five-Stage Pipeline for a RISC
Processor
October 1, 2022
CS 513 Advance Computer Architecture 28
The Classic Five-Stage Pipeline for a RISC
Processor
Pipeline can be thought of as a series of data
paths shifted in time.
This shows the overlap among the parts of the
data path, with clock cycle 5 (CC 5) showing the
steady-state situation.
Because the register file is used as a source in
the ID stage and as a destination in the WB stage,
it appears twice.
October 1, 2022
CS 513 Advance Computer Architecture 29
The Classic Five-Stage Pipeline for a RISC
Processor
We show that it is read in one part of the stage
and written in another by using a solid line, on
the right or left, respectively, and a dashed line
on the other side.
The abbreviation IM is used for instruction
memory, DM for data memory, and CC for clock
cycle.
October 1, 2022
CS 513 Advance Computer Architecture 30
The Classic Five-Stage Pipeline for a RISC
Processor
To start a new instruction every clock, we must
increment and store the PC every clock.
– Must be done during the IF stage in preparation for the
next instruction.
Must have an adder to compute the potential
branch target during ID.
– As branch does not change the PC until the ID stage.
October 1, 2022
CS 513 Advance Computer Architecture 31
Classic Five-Stage Pipeline for RISC Processor
( Important issues )
Critical that instructions in the pipeline do not attempt
to use hardware resources at same time.
Instructions in different stages of the pipeline do not
interfere with one another.
Introduce pipeline registers between successive
stages of pipeline
– At end of a clock cycle all results from a given stage are
stored into a register that is used as input to next stage on
next clock cycle.
October 1, 2022
CS 513 Advance Computer Architecture 32
The Classic Five-Stage Pipeline for a RISC
Processor
October 1, 2022
CS 513 Advance Computer Architecture 33
Basic Performance Issues in Pipelining
Pipelining increases CPU instruction throughput
(number of instructions completed per unit of
time) but does not reduce execution time of an
individual instruction.
Slightly increases execution time of each
instruction due to overhead in control of
pipeline.
Increase in instruction throughput means that a
program runs faster & has lower total execution
time, even though no single instruction runs
faster.
October 1, 2022
CS 513 Advance Computer Architecture 34
Basic Performance Issues in Pipelining
Execution time of each instruction does not decrease
puts limits on depth of a pipeline.
Addition to limitations arising from pipeline latency,
limits arise from imbalance among the pipe stages
and from pipelining overhead.
Imbalance among pipe stages reduces performance
since clock can run no faster than time needed for
slowest pipeline stage.
Pipeline overhead arises from the combination of
pipeline register delay and clock skew.
October 1, 2022
CS 513 Advance Computer Architecture 35
Basic Performance Issues in Pipelining
Pipeline registers add setup time, which is time a
register input must be stable before clock signal
triggers a write occurs, plus propagation delay to
the clock cycle.
Clock skew, maximum delay between clock
arrives at any two registers.
Once clock cycle is as small as sum of clock
skew & latch overhead, no further pipelining is
useful, since there is no time left in cycle for
useful work
October 1, 2022
CS 513 Advance Computer Architecture 36
Major Hurdle of Pipelining
—Pipeline Hazards
Hazards, prevent next instruction in instruction stream
from executing during its designated clock cycle.
Hazards reduce the performance from the ideal
speedup gained by pipelining.
There are three classes of hazards
– Structural hazards
– Data hazards
– Control hazards
October 1, 2022
CS 513 Advance Computer Architecture 37
Hazards
Structural hazards arise from resource conflicts when
the hardware cannot support all possible
combinations of instructions simultaneously in
overlapped execution
Data hazards arise when an instruction depends on
the results of a previous instruction in a way that is
exposed by the overlapping of instructions in the
pipeline
Control hazards arise from the pipelining of branches
and other instructions that change the PC
October 1, 2022
CS 513 Advance Computer Architecture 38
Pipelining Hazards
Hazards stall the pipeline
Avoiding hazard requires some instructions to be
allowed to proceed while others are delayed
When an instruction is stalled, all instructions issued
later than stalled instruction are also stalled.
– Instructions issued earlier than stalled instruction must
continue, otherwise hazard will never clear.
– As a result, no new instructions are fetched during the stall.
October 1, 2022
CS 513 Advance Computer Architecture 39
Performance of Pipelines with Stalls
Stall causes pipeline performance to degrade
from ideal performance
Ideal CPI on a pipelined processor is almost 1.
If there are no stalls, speedup is equal to number
of pipeline stages, matching for ideal case.
October 1, 2022
CS 513 Advance Computer Architecture 40

Mais conteĂșdo relacionado

Semelhante a 10_Pipeline.pdf

INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELINING
INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELININGINCREASING THE THROUGHPUT USING EIGHT STAGE PIPELINING
INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELININGijiert bestjournal
 
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53KarthiSugumar
 
Central processing unit
Central processing unitCentral processing unit
Central processing unitKamal Acharya
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design ApproachA B Shinde
 
Presentation on risc pipeline
Presentation on risc pipelinePresentation on risc pipeline
Presentation on risc pipelineArijit Chakraborty
 
Architectures
ArchitecturesArchitectures
ArchitecturesDarshan B B
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlowChaudhary Manzoor
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlowkaran saini
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System AchitectureYashiUpadhyay3
 
CS304PC:Computer Organization and Architecture Session 30 RISC.pptx
CS304PC:Computer Organization and Architecture Session 30 RISC.pptxCS304PC:Computer Organization and Architecture Session 30 RISC.pptx
CS304PC:Computer Organization and Architecture Session 30 RISC.pptxAsst.prof M.Gokilavani
 
BMC: Bare Metal Container Open Source Summit Japan 2017
BMC: Bare Metal Container Open Source Summit Japan 2017BMC: Bare Metal Container Open Source Summit Japan 2017
BMC: Bare Metal Container Open Source Summit Japan 2017Kuniyasu Suzaki
 
Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Ashley Carter
 
Pipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan ChowdhuryPipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan ChowdhuryS. M. Risalat Hasan Chowdhury
 
Uni Processor Architecture
Uni Processor ArchitectureUni Processor Architecture
Uni Processor ArchitectureAshish KC
 
Design and development of a 5-stage Pipelined RISC processor based on MIPS
Design and development of a 5-stage Pipelined RISC processor based on MIPSDesign and development of a 5-stage Pipelined RISC processor based on MIPS
Design and development of a 5-stage Pipelined RISC processor based on MIPSIRJET Journal
 
Pipeline Mechanism
Pipeline MechanismPipeline Mechanism
Pipeline MechanismAshik Iqbal
 

Semelhante a 10_Pipeline.pdf (20)

INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELINING
INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELININGINCREASING THE THROUGHPUT USING EIGHT STAGE PIPELINING
INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELINING
 
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53
 
Central processing unit
Central processing unitCentral processing unit
Central processing unit
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design Approach
 
Chapter1.slides
Chapter1.slidesChapter1.slides
Chapter1.slides
 
Presentation on risc pipeline
Presentation on risc pipelinePresentation on risc pipeline
Presentation on risc pipeline
 
Architectures
ArchitecturesArchitectures
Architectures
 
Pipeline
PipelinePipeline
Pipeline
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System Achitecture
 
Risc & cisk
Risc & ciskRisc & cisk
Risc & cisk
 
R&c
R&cR&c
R&c
 
CS304PC:Computer Organization and Architecture Session 30 RISC.pptx
CS304PC:Computer Organization and Architecture Session 30 RISC.pptxCS304PC:Computer Organization and Architecture Session 30 RISC.pptx
CS304PC:Computer Organization and Architecture Session 30 RISC.pptx
 
BMC: Bare Metal Container Open Source Summit Japan 2017
BMC: Bare Metal Container Open Source Summit Japan 2017BMC: Bare Metal Container Open Source Summit Japan 2017
BMC: Bare Metal Container Open Source Summit Japan 2017
 
Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...
 
Pipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan ChowdhuryPipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan Chowdhury
 
Uni Processor Architecture
Uni Processor ArchitectureUni Processor Architecture
Uni Processor Architecture
 
Design and development of a 5-stage Pipelined RISC processor based on MIPS
Design and development of a 5-stage Pipelined RISC processor based on MIPSDesign and development of a 5-stage Pipelined RISC processor based on MIPS
Design and development of a 5-stage Pipelined RISC processor based on MIPS
 
Pipeline Mechanism
Pipeline MechanismPipeline Mechanism
Pipeline Mechanism
 

Último

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂșjo
 

Último (20)

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

10_Pipeline.pdf

  • 2. PIPELININING Pipelining is an implementation technique whereby multiple instructions are overlapped in execution Takes advantage of parallelism that exists among the actions needed to execute an instruction. Today, pipelining is the key implementation technique used to make fast CPUs. October 1, 2022 CS 513 Advance Computer Architecture 2
  • 3. PIPELININING A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing something to the construction of the car. Each step operates in parallel with the other steps, although on a different car. October 1, 2022 CS 513 Advance Computer Architecture 3
  • 4. PIPELININING In a computer pipeline, each step in the pipeline completes a part of an instruction. Like assembly line, different steps are completing different parts of different instructions in parallel. Each of these steps is called a pipe stage or a pipe segment. Stages are connected one to next to form a pipe—instructions enter at one end, progress through the stages, and exit at other end, just as cars would in an assembly line. October 1, 2022 CS 513 Advance Computer Architecture 4
  • 5. PIPELININING In an automobile assembly line, throughput is defined as the number of cars per hour and is determined by how often a completed car exits the assembly line. Likewise, the throughput of an instruction pipeline is determined by how often an instruction exits the pipeline. Because the pipe stages are hooked together, all the stages must be ready to proceed at the same time, just as we would require in an assembly line. October 1, 2022 CS 513 Advance Computer Architecture 5
  • 6. PIPELININING Time required between moving an instruction one step down the pipeline is called processor cycle. Because all stages proceed at the same time, the length of a processor cycle is determined by the time required for the slowest pipe stage, just as in an auto assembly line, the longest step would determine the time between advancing the line. In a computer, this processor cycle is usually 1 clock cycle (sometimes it is 2, rarely more). October 1, 2022 CS 513 Advance Computer Architecture 6
  • 7. PIPELININING Pipeline designer's goal is to balance length of each pipeline stage, just as designer of assembly line tries to balance time for each step in process. If the stages are perfectly balanced, then the time per instruction on the pipelined processor— assuming ideal conditions—is equal to October 1, 2022 CS 513 Advance Computer Architecture 7
  • 8. PIPELININING Under these conditions, speedup from pipelining equals number of pipe stages, just as an assembly line with n stages can ideally produce cars n times as fast. If stages are not be perfectly balanced then pipelining does involve some overhead. Time per instruction on pipelined processor will not have its minimum possible value, yet it can be close. October 1, 2022 CS 513 Advance Computer Architecture 8
  • 9. PIPELININING Pipelining yields a reduction in the average execution time per instruction. Reduction can be as decreasing number of clock cycles per instruction (CPI), as decreasing clock cycle time, or as a combination. October 1, 2022 CS 513 Advance Computer Architecture 9
  • 10. PIPELININING Pipelining is an implementation technique that exploits parallelism among the instructions in a sequential instruction stream. It has the substantial advantage that, unlike some speedup techniques, it is not visible to the programmer. Concept of pipelining using a classic five-stage pipeline. October 1, 2022 CS 513 Advance Computer Architecture 10
  • 11. RISC Instruction Set RISC (reduced instruction set computer) architecture or load-store architecture to illustrate the basic concepts. Our default RISC architecture is MIPS (Microprocessor without Interlocked Pipeline Stages) RISC architectures are characterized by a few key properties, which simplify implementation. October 1, 2022 CS 513 Advance Computer Architecture 11
  • 12. RISC Instruction Set Properties – All operations on data, apply to data in registers and typically change the entire register (32 or 64 bits per register). – Only operations affect memory are load & store operations move data from memory to a register or to memory from a register. – Load and store operations that load or store less than a full register (e.g., a byte, 16 bits, or 32 bits) are also available. – The instruction formats are few in number with all instructions typically being one size. October 1, 2022 CS 513 Advance Computer Architecture 12
  • 13. A Simple Implementation of a RISC Instruction Set How a RISC instruction set can be implemented in a pipelined fashion, lets understand how it is implemented without pipelining. Lets discuss simple implementation where every instruction takes at most 5 clock cycles. Extend this basic implementation to a pipelined version, resulting in a much lower CPI. October 1, 2022 CS 513 Advance Computer Architecture 13
  • 14. A Simple Implementation of a RISC Instruction Set The un pipelined implementation is not the most economical or the highest-performance implementation without pipelining. Instead, it is designed to lead naturally to a pipelined implementation. October 1, 2022 CS 513 Advance Computer Architecture 14
  • 15. A Simple Implementation of a RISC Instruction Set Every instruction in this RISC subset can be implemented in at most 5 clock cycles. The 5 clock cycles are as follows. 1. Instruction fetch cycle (IF): – Send the program counter (PC) to memory and fetch the current instruction from memory. – Update the PC to the next sequential PC by adding 4 (since each instruction is 4 bytes) to the PC. October 1, 2022 CS 513 Advance Computer Architecture 15
  • 16. A Simple Implementation of a RISC Instruction Set 2. Instruction decode/register fetch cycle (ID): – Decode the instruction and read the registers corresponding to register source specifies from the register file. – Do the equality test on the registers as they are read, for a possible branch. – Compute the possible branch target address by adding the sign-extended offset to the incremented PC. October 1, 2022 CS 513 Advance Computer Architecture 16
  • 17. A Simple Implementation of a RISC Instruction Set – Decoding is done in parallel with reading registers, which is possible because the register specifiers are at a fixed location in a RISC architecture. This technique is known as fixed-field decoding October 1, 2022 CS 513 Advance Computer Architecture 17
  • 18. A Simple Implementation of a RISC Instruction Set 3. Execution/'effective address cycle (EX): ALU operates on operands prepared in prior cycle, performing one of three functions depending on the instruction type. – Memory reference: ALU adds base register & offset to form the effective address. – Register-Register ALU instruction: ALU performs operation specified by ALU opcode on values read from the register file. October 1, 2022 CS 513 Advance Computer Architecture 18
  • 19. A Simple Implementation of a RISC Instruction Set – Register-Immediate ALU instruction: ALU performs operation specified by the ALU opcode on the first value read from the register file and the sign-extended immediate. In a load-store architecture effective address & execution cycles can be combined into a single clock cycle, since no instruction needs to simultaneously calculate a data address & perform an operation on the data. October 1, 2022 CS 513 Advance Computer Architecture 19
  • 20. A Simple Implementation of a RISC Instruction Set 4. Memory access (MEM): – If the instruction is a load, memory does a read using the effective address computed in the previous cycle. – If it is a store, then the memory writes the data from the second register read from the register file using the effective address. October 1, 2022 CS 513 Advance Computer Architecture 20
  • 21. A Simple Implementation of a RISC Instruction Set 5. Write-back cycle (WB): – Register-Register ALU instruction or Load instruction: Write result into register file, whether it comes from the memory system (for a load) or from ALU (for an ALU instruction). October 1, 2022 CS 513 Advance Computer Architecture 21
  • 22. The Classic Five-Stage Pipeline for a RISC Processor We can pipeline the execution described above with almost no changes by simply starting a new instruction on each clock cycle. Each of the clock cycles from the previous section becomes a pipe stage—a cycle in the pipeline. This results in execution pattern, which is typical way a pipeline structure is drawn. October 1, 2022 CS 513 Advance Computer Architecture 22
  • 23. The Classic Five-Stage Pipeline for a RISC Processor As each instruction takes 5 clock cycles to complete, during each clock cycle the hardware will initiate a new instruction and will be executing some part of the five different instructions October 1, 2022 CS 513 Advance Computer Architecture 23
  • 24. The Classic Five-Stage Pipeline for a RISC Processor Is Pipelining simple ? No its not Some serious issues Determine what happens on every clock cycle of the processor & make sure don't try to perform two different operations with same data path resource on same clock cycle. – For example, a single ALU cannot be asked to compute an effective address and perform a subtract operation at the same time. Ensure overlap of instructions in pipeline cannot cause such a conflict. October 1, 2022 CS 513 Advance Computer Architecture 24
  • 25. The Classic Five-Stage Pipeline for a RISC Processor Fortunately, the simplicity of a RISC instruction set makes resource evaluation relatively easy. The major functional units are used in different cycles, and hence overlapping execution of multiple instructions introduces relatively few conflicts. October 1, 2022 CS 513 Advance Computer Architecture 25
  • 26. The Classic Five-Stage Pipeline for a RISC Processor Some observations: First, we use separate instruction and data memories, which we would typically implement with separate instruction and data caches Second, the register file is used in the two stages: one for reading in ID and one for writing in WB. These uses are distinct, so we simply show the register file in two places. Means we need to perform two reads and one write every clock cycle. October 1, 2022 CS 513 Advance Computer Architecture 26
  • 27. The Classic Five-Stage Pipeline for a RISC Processor October 1, 2022 CS 513 Advance Computer Architecture 27
  • 28. The Classic Five-Stage Pipeline for a RISC Processor October 1, 2022 CS 513 Advance Computer Architecture 28
  • 29. The Classic Five-Stage Pipeline for a RISC Processor Pipeline can be thought of as a series of data paths shifted in time. This shows the overlap among the parts of the data path, with clock cycle 5 (CC 5) showing the steady-state situation. Because the register file is used as a source in the ID stage and as a destination in the WB stage, it appears twice. October 1, 2022 CS 513 Advance Computer Architecture 29
  • 30. The Classic Five-Stage Pipeline for a RISC Processor We show that it is read in one part of the stage and written in another by using a solid line, on the right or left, respectively, and a dashed line on the other side. The abbreviation IM is used for instruction memory, DM for data memory, and CC for clock cycle. October 1, 2022 CS 513 Advance Computer Architecture 30
  • 31. The Classic Five-Stage Pipeline for a RISC Processor To start a new instruction every clock, we must increment and store the PC every clock. – Must be done during the IF stage in preparation for the next instruction. Must have an adder to compute the potential branch target during ID. – As branch does not change the PC until the ID stage. October 1, 2022 CS 513 Advance Computer Architecture 31
  • 32. Classic Five-Stage Pipeline for RISC Processor ( Important issues ) Critical that instructions in the pipeline do not attempt to use hardware resources at same time. Instructions in different stages of the pipeline do not interfere with one another. Introduce pipeline registers between successive stages of pipeline – At end of a clock cycle all results from a given stage are stored into a register that is used as input to next stage on next clock cycle. October 1, 2022 CS 513 Advance Computer Architecture 32
  • 33. The Classic Five-Stage Pipeline for a RISC Processor October 1, 2022 CS 513 Advance Computer Architecture 33
  • 34. Basic Performance Issues in Pipelining Pipelining increases CPU instruction throughput (number of instructions completed per unit of time) but does not reduce execution time of an individual instruction. Slightly increases execution time of each instruction due to overhead in control of pipeline. Increase in instruction throughput means that a program runs faster & has lower total execution time, even though no single instruction runs faster. October 1, 2022 CS 513 Advance Computer Architecture 34
  • 35. Basic Performance Issues in Pipelining Execution time of each instruction does not decrease puts limits on depth of a pipeline. Addition to limitations arising from pipeline latency, limits arise from imbalance among the pipe stages and from pipelining overhead. Imbalance among pipe stages reduces performance since clock can run no faster than time needed for slowest pipeline stage. Pipeline overhead arises from the combination of pipeline register delay and clock skew. October 1, 2022 CS 513 Advance Computer Architecture 35
  • 36. Basic Performance Issues in Pipelining Pipeline registers add setup time, which is time a register input must be stable before clock signal triggers a write occurs, plus propagation delay to the clock cycle. Clock skew, maximum delay between clock arrives at any two registers. Once clock cycle is as small as sum of clock skew & latch overhead, no further pipelining is useful, since there is no time left in cycle for useful work October 1, 2022 CS 513 Advance Computer Architecture 36
  • 37. Major Hurdle of Pipelining —Pipeline Hazards Hazards, prevent next instruction in instruction stream from executing during its designated clock cycle. Hazards reduce the performance from the ideal speedup gained by pipelining. There are three classes of hazards – Structural hazards – Data hazards – Control hazards October 1, 2022 CS 513 Advance Computer Architecture 37
  • 38. Hazards Structural hazards arise from resource conflicts when the hardware cannot support all possible combinations of instructions simultaneously in overlapped execution Data hazards arise when an instruction depends on the results of a previous instruction in a way that is exposed by the overlapping of instructions in the pipeline Control hazards arise from the pipelining of branches and other instructions that change the PC October 1, 2022 CS 513 Advance Computer Architecture 38
  • 39. Pipelining Hazards Hazards stall the pipeline Avoiding hazard requires some instructions to be allowed to proceed while others are delayed When an instruction is stalled, all instructions issued later than stalled instruction are also stalled. – Instructions issued earlier than stalled instruction must continue, otherwise hazard will never clear. – As a result, no new instructions are fetched during the stall. October 1, 2022 CS 513 Advance Computer Architecture 39
  • 40. Performance of Pipelines with Stalls Stall causes pipeline performance to degrade from ideal performance Ideal CPI on a pipelined processor is almost 1. If there are no stalls, speedup is equal to number of pipeline stages, matching for ideal case. October 1, 2022 CS 513 Advance Computer Architecture 40