2. Out line
Definition of pipeline
Advantages and disadvantage
Type of pipeline (h/w) and (s/w)
Latency and throughput
hazards
Pipeline with Addressing mode
Pipeline with cache memory
RISC Computer
3. pipeline
It is technique of decomposing a sequential
process into suboperation, with each
suboperation completed in dedicated
segment that operates concurrently with
all other segments.
Pipeline is commonly known as an assembly
line operation.
4. Example
Each sub operation is to be performed in
a segment within a pipeline. Each
segment has one or two registers and a
combinational circuit.
5. The sub operations in each segment of the
pipeline are as follows:
8. Latency and throughput
Latency
Each instruction takes a certain time to
complete.
latency for that operation is how long does it
take to execute single instruction in the
pipeline.
Throughput
The number of instructions that complete
per second.
9. advantages
1- Pipelining is widely used in modern processors .
2- Quicker time of execution large number of
instruction.
3- More efficient use of processor.
4- Arrange the hardware so that more than one
operation can be performed at the same time.
5- This technique is efficient for applications that need
to repeat the same task in many time with different
set of data.
11. Idea of pipelining in computer
The processor execute the program by
fetching and executing instructions. One after
the other.
Let Fi and Ei refer to the fetch and execute
steps for instruction Ii
12. Use the Idea of Pipelining in a
Computer
F
1
E
1
F
2
E
2
F
3
E
3
I1 I2 I3
(a) Sequential execution
Instruction
fetch
unit
Execution
unit
Interstage buffer
B1
(b) Hardware organization
Time
F1 E1
F2 E2
F3 E3
I1
I2
I3
Instruction
(c) Pipelined execution
Figure 8.1. Basic idea of instruction pipelining.
Clock cycle 1 2 3 4
Time
Fetch + Execution
13. Use the Idea of Pipelining in a
Computer
F4I4
F1
F2
F3
I1
I2
I3
D1
D2
D3
D4
E1
E2
E3
E4
W1
W2
W3
W4
Instruction
Figure 8.2. A 4stage pipeline.
Clock cycle 1 2 3 4 5 6 7
(a) Instruction execution divided into four steps
F : Fetch
instruction
D : Decode
instruction
and fetch
operands
E: Execute
operation
W : Write
results
Interstage buffers
(b) Hardware organization
B1 B2 B3
Time
Fetch + Decode
+ Execution + Write
14. Use the Idea of Pipelining in a
Computer
Computer that has two separate hardware
units, one for fetching and another for
executing them.
This buffer needed to enable the execution
unit while fetch unit fetching the next
instruction.
The computer is controlled by a clock.
15. Role of Cache Memory
Each pipeline stage is expected to complete in one
clock cycle.
The clock period should be long enough to let the
slowest pipeline stage to complete.
Faster stages can only wait for the slowest one to
complete.
Since main memory is very slow compared to the
execution, if each instruction needs to be fetched
from main memory, pipeline is almost useless.[ten
times greater than the time needed to perform
pipeline stage]
Fortunately, we have cache.
16. 1) Software Pipelining
1) Can Handle Complex Instructions.
2) Allows programs to be reused.
2)Hardware Pipelining
1) Help designer manage complexity – a complex
task can be divided into smaller, more
manageable pieces.
2) Hardware pipelining offers higher performance.
Types of pipeline
17. Types of pipeline
Arithmetic Pipeline : Pipeline arithmetic units are
usually found in very high speed computers.
Floating–point operations, multiplication of fixed-
point numbers, and similar computations in scientific
problem.
Instruction Pipeline: Pipeline processing can occur
also in the instruction stream. An instruction pipeline
reads consecutive instructions from memory while
previous instructions are being executed in other
segments.
18. Arithmetic Pipeline
Floating-point adder/subtracter
[1] Compare the exponents
[2] Align the mantissa
[3] Add/sub the mantissa
[4] Normalize the result
X = A x 10a
= 0.9504 x 103
Y = B x 10b
= 0.8200 x 102
1) Compare exponents :
3 - 2 = 1
2) Align mantissas
X = 0.9504 x 103
Y = 0.08200 x 103
3) Add mantissas
Z = 1.0324 x 103
4) Normalize result
Z = 0.10324 x 104
R
Compare
exponents
by subtraction
a b
R
Choose exponent
Exponents
R
A B
Align mantissa
Mantissas
Difference
R
Add or subtract
mantissas
R
Normalize
result
R
R
Adjust
exponent
R
Segment 1:
Segment 2:
Segment 3:
Segment 4:
19. INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
[1] Fetch an instruction from memory
[2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place
* Some instructions skip some phases
* Effective address calculation can be done in the part of the decoding phase
* Storage of the operation result into a register is done automatically in the execution
phase
==> 4-Stage Pipeline
[1] FI: Fetch an instruction from memory
[2] DA: Decode the instruction and calculate the effective address of the operand
[3] FO: Fetch the operand
[4] EX: Execute the operation
20. Execution of Three Instructions in a 4-Stage Pipeline
Conventional
21. Pipeline Performance
The potential increase in performance resulting from pipelining is
proportional to the number of pipeline stages.
However, this increase would be achieved only if all pipeline stages
require the same time to complete, and there is no interruption
throughout program execution.
Unfortunately, this is not true.
Floating point may involve many clock cycle
Stalling involves halting the flow of instructions until the required
result is ready to be used. However stalling wastes processor time
by doing nothing while waiting for the result.
Pipeline stall causes degradation in pipeline
performance.
22. Pipeline Performance
Any condition that causes a pipeline to stall is called
a hazard.
Data hazard – when an instruction depend on the result of a
previous instruction, but this result is not yet available.
Instruction (control) hazard – a delay in the availability of
an instruction causes the pipeline to stall for example branch.
Structural hazard – the situation when two instructions
require the use of a given hardware resource at the same time.
23. Data Hazards
We must ensure that the results obtained when instructions are
executed in a pipelined processor are identical to those obtained
when the same instructions are executed sequentially.
Hazard occurs
A ← 3 + A
B ← 4 × A
No hazard
A ← 5 × C
B ← 20 + C
When two operations depend on each other, they must be
executed sequentially in the correct order.
Another example:
Mul R2, R3, R4
Add R5, R4, R6
25. Data dependency solutions
Hardware interlocks: is a circuit that detects instructions whose
source operands are destinations of instructions
Farther up in the pipeline.
Operand forwarding : uses special h/w to detect a conflict and
then avoid it by routing the data through special paths between
pipeline segments .
delayed load :the compiler for such computers is designed to
detect a data conflict and reorder the instructions as necessary to
delay the loading of the Conflicting data by inserting no –operation
instructions. Example
I1: Mul R2, R3, R4
NOP
NOP
I2: Add R5, R4, R6
26. Instruction Hazards
One of the major problems in operating an instruction
pipeline is the occurrence of branch instructions.
1- Unconditional branch always change the sequential
program flow by loading the program counter with the
target address.
2- Conditional branch the control selects the target
instruction if the condition is satisfied or the next
sequential instruction if the condition is not satisfied.
28. Unconditional Branches
The time lost as a result of a branch
instruction is referred to as the branch
penalty.
The previous example instruction I3 is
wrongly fetched and branch target address k
will discard the i3.
Typically the Fetch unit has dedicated h/w
which will identify the branch target address
as quick as possible after an instruction is
fetched.
29. Instruction Queue and Prefetching
branch instruction stalls the pipeline.
Many processor employs dedicated fetch unit
which will fetch the instruction and put them
into a queue.
It can store several instruction at a time.
A separate unit called dispatch unit, takes
instructions from the front of the queue and
send them to the execution unit.
31. 2- Conditional Braches
A conditional branch instruction introduces
the added hazard caused by the dependency
of the branch condition on the result of a
previous instruction.
The decision to branch cannot be made until
the execution of that instruction has been
completed.
32. Delayed Branch
Add
LOOP Shift_left R1
Decrement
Branch=0
R2
LOOP
NEXT
(a) Original program loop
LOOP Decrement R2
Branch=0
Shift_left
LOOP
R1
NEXT
(b) Reordered instructions
Figure 8.12. Reordering of instructions for a delayed branch.
Add
R1,R3
R1,R3
33. Addressing Modes
Addressing modes include simple ones and
complex ones.
In choosing the addressing modes to be
implemented in a pipelined processor, we
must consider the effect of each addressing
mode on instruction flow in the pipeline:
Side effects
The extent to which complex addressing modes cause
the pipeline to stall
Whether a given mode is likely to be used by compilers
34. Addressing Modes
In a pipelined processor, complex addressing
modes do not necessarily lead to faster execution.
Advantage: reducing the number of instructions /
program space
Disadvantage: cause pipeline to stall / more
hardware to decode / not convenient for compiler to
work with
Conclusion: complex addressing modes are not
suitable for pipelined execution.
35. Addressing Modes
Good addressing modes should have:
Access to an operand does not require more than one
access to the memory
Only load and store instruction access memory operands
The addressing modes used do not have side effects
Register, register indirect, index
36. RISC pipeline
• RISC (Reduced Instruction Set Computer)
• 1- To use an efficient instruction pipeline
• a) to implement an instruction pipeline using a small number of
• suboperations, with each begin executed in one cycle.
• b) because the fixed length instruction format , the decoding of
the
• operation can occur at the same time as register selection.
• 2- Data transfer instruction in RISC are limited to load and store
• instruction.by using cache memory.
• 3-One of major advantage of RISC is ability to execute instruction
• at the rate of one per clock cycle that can achieve pipeline
• segments requiring just one clock cycle.
• 4- The compiler supported that translates the high-level language
• program into machine language program.
• .
37. RISC pipeline
Instruction Cycle of Three-Stage Instruction Pipeline.
I: Instruction Fetch from program memory
A: Decode, Read Registers, ALU Operation
E: Transfer the output of ALU to a register, Transfer EA to a data
memory for loading or storing , Transfer branch address to the
program counter.
Types of instructions
- 1- Data Manipulation Instructions
- 2- Load and Store Instructions
- 3- Program Control Instructions