2. Out line
Definition of pipeline
Advantages and disadvantage
Type of pipeline (h/w) and (s/w)
Latency and throughput
Pipeline with Addressing mode
Pipeline with cache memory
It is technique of decomposing a sequential
process into suboperation, with each
suboperation completed in dedicated
segment that operates concurrently with
all other segments.
Pipeline is commonly known as an assembly
Each sub operation is to be performed in
a segment within a pipeline. Each
segment has one or two registers and a
5. The sub operations in each segment of the
pipeline are as follows:
8. Latency and throughput
Each instruction takes a certain time to
latency for that operation is how long does it
take to execute single instruction in the
The number of instructions that complete
1- Pipelining is widely used in modern processors .
2- Quicker time of execution large number of
3- More efficient use of processor.
4- Arrange the hardware so that more than one
operation can be performed at the same time.
5- This technique is efficient for applications that need
to repeat the same task in many time with different
set of data.
11. Idea of pipelining in computer
The processor execute the program by
fetching and executing instructions. One after
Let Fi and Ei refer to the fetch and execute
steps for instruction Ii
12. Use the Idea of Pipelining in a
I1 I2 I3
(a) Sequential execution
(b) Hardware organization
(c) Pipelined execution
Figure 8.1. Basic idea of instruction pipelining.
Clock cycle 1 2 3 4
Fetch + Execution
13. Use the Idea of Pipelining in a
Figure 8.2. A 4stage pipeline.
Clock cycle 1 2 3 4 5 6 7
(a) Instruction execution divided into four steps
F : Fetch
D : Decode
W : Write
(b) Hardware organization
B1 B2 B3
Fetch + Decode
+ Execution + Write
14. Use the Idea of Pipelining in a
Computer that has two separate hardware
units, one for fetching and another for
This buffer needed to enable the execution
unit while fetch unit fetching the next
The computer is controlled by a clock.
15. Role of Cache Memory
Each pipeline stage is expected to complete in one
The clock period should be long enough to let the
slowest pipeline stage to complete.
Faster stages can only wait for the slowest one to
Since main memory is very slow compared to the
execution, if each instruction needs to be fetched
from main memory, pipeline is almost useless.[ten
times greater than the time needed to perform
Fortunately, we have cache.
16. 1) Software Pipelining
1) Can Handle Complex Instructions.
2) Allows programs to be reused.
1) Help designer manage complexity – a complex
task can be divided into smaller, more
2) Hardware pipelining offers higher performance.
Types of pipeline
17. Types of pipeline
Arithmetic Pipeline : Pipeline arithmetic units are
usually found in very high speed computers.
Floating–point operations, multiplication of fixed-
point numbers, and similar computations in scientific
Instruction Pipeline: Pipeline processing can occur
also in the instruction stream. An instruction pipeline
reads consecutive instructions from memory while
previous instructions are being executed in other
18. Arithmetic Pipeline
 Compare the exponents
 Align the mantissa
 Add/sub the mantissa
 Normalize the result
X = A x 10a
= 0.9504 x 103
Y = B x 10b
= 0.8200 x 102
1) Compare exponents :
3 - 2 = 1
2) Align mantissas
X = 0.9504 x 103
Y = 0.08200 x 103
3) Add mantissas
Z = 1.0324 x 103
4) Normalize result
Z = 0.10324 x 104
Add or subtract
19. INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
 Fetch an instruction from memory
 Decode the instruction
 Calculate the effective address of the operand
 Fetch the operands from memory
 Execute the operation
 Store the result in the proper place
* Some instructions skip some phases
* Effective address calculation can be done in the part of the decoding phase
* Storage of the operation result into a register is done automatically in the execution
==> 4-Stage Pipeline
 FI: Fetch an instruction from memory
 DA: Decode the instruction and calculate the effective address of the operand
 FO: Fetch the operand
 EX: Execute the operation
21. Pipeline Performance
The potential increase in performance resulting from pipelining is
proportional to the number of pipeline stages.
However, this increase would be achieved only if all pipeline stages
require the same time to complete, and there is no interruption
throughout program execution.
Unfortunately, this is not true.
Floating point may involve many clock cycle
Stalling involves halting the flow of instructions until the required
result is ready to be used. However stalling wastes processor time
by doing nothing while waiting for the result.
Pipeline stall causes degradation in pipeline
22. Pipeline Performance
Any condition that causes a pipeline to stall is called
Data hazard – when an instruction depend on the result of a
previous instruction, but this result is not yet available.
Instruction (control) hazard – a delay in the availability of
an instruction causes the pipeline to stall for example branch.
Structural hazard – the situation when two instructions
require the use of a given hardware resource at the same time.
23. Data Hazards
We must ensure that the results obtained when instructions are
executed in a pipelined processor are identical to those obtained
when the same instructions are executed sequentially.
A ← 3 + A
B ← 4 × A
A ← 5 × C
B ← 20 + C
When two operations depend on each other, they must be
executed sequentially in the correct order.
Mul R2, R3, R4
Add R5, R4, R6
25. Data dependency solutions
Hardware interlocks: is a circuit that detects instructions whose
source operands are destinations of instructions
Farther up in the pipeline.
Operand forwarding : uses special h/w to detect a conflict and
then avoid it by routing the data through special paths between
pipeline segments .
delayed load :the compiler for such computers is designed to
detect a data conflict and reorder the instructions as necessary to
delay the loading of the Conflicting data by inserting no –operation
I1: Mul R2, R3, R4
I2: Add R5, R4, R6
26. Instruction Hazards
One of the major problems in operating an instruction
pipeline is the occurrence of branch instructions.
1- Unconditional branch always change the sequential
program flow by loading the program counter with the
2- Conditional branch the control selects the target
instruction if the condition is satisfied or the next
sequential instruction if the condition is not satisfied.
28. Unconditional Branches
The time lost as a result of a branch
instruction is referred to as the branch
The previous example instruction I3 is
wrongly fetched and branch target address k
will discard the i3.
Typically the Fetch unit has dedicated h/w
which will identify the branch target address
as quick as possible after an instruction is
29. Instruction Queue and Prefetching
branch instruction stalls the pipeline.
Many processor employs dedicated fetch unit
which will fetch the instruction and put them
into a queue.
It can store several instruction at a time.
A separate unit called dispatch unit, takes
instructions from the front of the queue and
send them to the execution unit.
31. 2- Conditional Braches
A conditional branch instruction introduces
the added hazard caused by the dependency
of the branch condition on the result of a
The decision to branch cannot be made until
the execution of that instruction has been
32. Delayed Branch
LOOP Shift_left R1
(a) Original program loop
LOOP Decrement R2
(b) Reordered instructions
Figure 8.12. Reordering of instructions for a delayed branch.
33. Addressing Modes
Addressing modes include simple ones and
In choosing the addressing modes to be
implemented in a pipelined processor, we
must consider the effect of each addressing
mode on instruction flow in the pipeline:
The extent to which complex addressing modes cause
the pipeline to stall
Whether a given mode is likely to be used by compilers
34. Addressing Modes
In a pipelined processor, complex addressing
modes do not necessarily lead to faster execution.
Advantage: reducing the number of instructions /
Disadvantage: cause pipeline to stall / more
hardware to decode / not convenient for compiler to
Conclusion: complex addressing modes are not
suitable for pipelined execution.
35. Addressing Modes
Good addressing modes should have:
Access to an operand does not require more than one
access to the memory
Only load and store instruction access memory operands
The addressing modes used do not have side effects
Register, register indirect, index
36. RISC pipeline
• RISC (Reduced Instruction Set Computer)
• 1- To use an efficient instruction pipeline
• a) to implement an instruction pipeline using a small number of
• suboperations, with each begin executed in one cycle.
• b) because the fixed length instruction format , the decoding of
• operation can occur at the same time as register selection.
• 2- Data transfer instruction in RISC are limited to load and store
• instruction.by using cache memory.
• 3-One of major advantage of RISC is ability to execute instruction
• at the rate of one per clock cycle that can achieve pipeline
• segments requiring just one clock cycle.
• 4- The compiler supported that translates the high-level language
• program into machine language program.
37. RISC pipeline
Instruction Cycle of Three-Stage Instruction Pipeline.
I: Instruction Fetch from program memory
A: Decode, Read Registers, ALU Operation
E: Transfer the output of ALU to a register, Transfer EA to a data
memory for loading or storing , Transfer branch address to the
Types of instructions
- 1- Data Manipulation Instructions
- 2- Load and Store Instructions
- 3- Program Control Instructions