INSTRUCTION LEVEL PARALLALISM

Kamran Ashraf
Kamran AshrafAn IT professional who anticipates internal customer needs, changes in the technology & implements the solutions on time
INSTRUCTION LEVEL
PARALLALISM
PRESENTED BY KAMRAN ASHRAF
13-NTU-4009
INTRODUCTION
 Instruction-level parallelism (ILP) is a
measure of how many operations in a
computer program can be performed
"in-parallel" at the same time
WHAT IS A PARALLEL INSTRUCTION?
 Parallel instructions are a set of instructions that do not depend on each other
to be executed.
 Hierarchy
 Bit level Parallelism
• 16 bit add on 8 bit processor
 Instruction level Parallelism
 Loop level Parallelism
• for (i=1; i<=1000; i= i+1)
x[i] = x[i] + y[i];
 Thread level Parallelism
• multi-core computers
EXAMPLE
Consider the following program:
1. e = a + b
2. f = c + d
3. g = e * f
 Operation 3 depends on the results of "e" and "f" which are calculated from operations 1 and
2, so "g" cannot be calculated until both of "e" and "f" are computed.
 However, operations 1 and 2 do not depend on any other operation, so they can be
computed simultaneously.
 If we assume that each operation can be completed in one unit of time then these three
instructions can be completed in a total of two units of time, giving an ILP factor of 3/2;
which means 3/2 = 1.5 greater than without ILP.
WHY ILP?
 One of the goals of compilers and processors designers is to use as much ILP as
possible.
 Ordinary programs are written execute instructions in sequence; one after the other, in
the order as written by programmers.
 ILP allows the compiler and the processor to overlap the execution of multiple
instructions or even to change the order in which instructions are executed.
ILP TECHNIQUES
Micro-architectural techniques that use ILP include:
 Instruction pipelining
 Superscalar
 Out-of-order execution
 Register renaming
 Speculative execution
 Branch prediction
INSTRUCTION PIPELINE
 An instruction pipeline is a technique
used in the design of modern
microprocessors, microcontrollers and
CPUs to increase their instruction
throughput (the number of instructions
that can be executed in a unit of time).
PIPELINING
 The main idea is to divide the processing of a CPU instruction
into a series of independent steps of "microinstructions with
storage at the end of each step.
 This allows the CPUs control logic to handle instructions at the
processing rate of the slowest step, which is much faster than
the time needed to process the instruction as a single step.
EXAMPLE
 For example, the RISC pipeline is broken into five stages with a set of flip flops between
each stage as follow:
 Instruction fetch
 Instruction decode & register fetch
 Execute
 Memory access
 Register write back
 The vertical axis is successive instructions, the horizontal axis is time. So in the green
column, the earliest instruction is in WB stage, and the latest instruction is undergoing
instruction fetch.
SUPERSCALER
 A superscalar CPU architecture
implements ILP inside a single processor
which allows faster CPU throughput at the
same clock rate.
WHY SUPERSCALER
 A superscalar processor executes more than one instruction during a clock
cycle
 Simultaneously dispatches multiple instructions to multiple redundant
functional units built inside the processor.
 Each functional unit is not a separate CPU core but an execution resource
inside the CPU such as an arithmetic logic unit, floating point unit (FPU), a
bit shifter, or a multiplier.
EXAMPLE
 Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a
maximum of two instructions per cycle can be completed.
OUT-OF-ORDER EXECUTION
 OoOE, is a technique used in most high-
performance microprocessors.
 The key concept is to allow the processor to
avoid a class of delays that occur when the data
needed to perform an operation are unavailable.
 Most modern CPU designs include support for out
of order execution.
STEPS
 Out-of-order processors breaks up the processing of instructions into these steps:
 Instruction fetch.
 Instruction dispatch to an instruction queue (also called instruction buffer)
 The instruction waits in the queue until its input operands are available.
 The instruction is issued to the appropriate functional unit and executed by that unit.
 The results are queued (Re-order Buffer).
 Only after all older instructions have their results written back to the register file, then this
result is written back to the register.
OTHER ILP TECHNIQUES
 Register renaming which is a technique used to avoid unnecessary serialization of
program operations caused by the reuse of registers by those operations, in order to
enable out-of-order execution.
 Speculative execution which allow the execution of complete instructions or parts of
instructions before being sure whether this execution is required.
 Branch prediction which is used to avoid delays cause of control dependencies to be
resolved. Branch prediction determines whether a conditional branch (jump) in the
instruction flow of a program is likely to be taken or not.
THANKS
1 de 16

Recomendados

Superscalar & superpipeline processor por
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processorMuhammad Ishaq
78.3K visualizações17 slides
Computer architecture pipelining por
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipeliningMazin Alwaaly
17K visualizações38 slides
Instruction pipeline: Computer Architecture por
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureMd. Saidur Rahman Kohinoor
56.9K visualizações21 slides
Pipeline hazard por
Pipeline hazardPipeline hazard
Pipeline hazardAJAL A J
61.9K visualizações61 slides
Parallel processing por
Parallel processingParallel processing
Parallel processingrajshreemuthiah
8.3K visualizações18 slides
Data transfer and manipulation por
Data transfer and manipulationData transfer and manipulation
Data transfer and manipulationSanjeev Patel
35.3K visualizações16 slides

Mais conteúdo relacionado

Mais procurados

Data transfer and manipulation por
Data transfer and manipulationData transfer and manipulation
Data transfer and manipulationrajshreemuthiah
3.9K visualizações13 slides
1.prallelism por
1.prallelism1.prallelism
1.prallelismMahesh Kumar Attri
25.5K visualizações30 slides
pipelining por
pipeliningpipelining
pipeliningSiddique Ibrahim
83K visualizações78 slides
Pipeline processing - Computer Architecture por
Pipeline processing - Computer Architecture Pipeline processing - Computer Architecture
Pipeline processing - Computer Architecture S. Hasnain Raza
9.2K visualizações14 slides
instruction cycle ppt por
instruction cycle pptinstruction cycle ppt
instruction cycle pptsheetal singh
19.5K visualizações9 slides
ADDRESSING MODES por
ADDRESSING MODESADDRESSING MODES
ADDRESSING MODESSadaf Rasheed
15.7K visualizações22 slides

Mais procurados(20)

Data transfer and manipulation por rajshreemuthiah
Data transfer and manipulationData transfer and manipulation
Data transfer and manipulation
rajshreemuthiah3.9K visualizações
1.prallelism por Mahesh Kumar Attri
1.prallelism1.prallelism
1.prallelism
Mahesh Kumar Attri25.5K visualizações
pipelining por Siddique Ibrahim
pipeliningpipelining
pipelining
Siddique Ibrahim83K visualizações
Pipeline processing - Computer Architecture por S. Hasnain Raza
Pipeline processing - Computer Architecture Pipeline processing - Computer Architecture
Pipeline processing - Computer Architecture
S. Hasnain Raza9.2K visualizações
instruction cycle ppt por sheetal singh
instruction cycle pptinstruction cycle ppt
instruction cycle ppt
sheetal singh19.5K visualizações
ADDRESSING MODES por Sadaf Rasheed
ADDRESSING MODESADDRESSING MODES
ADDRESSING MODES
Sadaf Rasheed15.7K visualizações
Virtual memory por Anuj Modi
Virtual memoryVirtual memory
Virtual memory
Anuj Modi37.1K visualizações
Presentation on risc pipeline por Arijit Chakraborty
Presentation on risc pipelinePresentation on risc pipeline
Presentation on risc pipeline
Arijit Chakraborty6K visualizações
Memory organization (Computer architecture) por Sandesh Jonchhe
Memory organization (Computer architecture)Memory organization (Computer architecture)
Memory organization (Computer architecture)
Sandesh Jonchhe13K visualizações
multiprocessors and multicomputers por Pankaj Kumar Jain
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputers
Pankaj Kumar Jain3.5K visualizações
Flynns classification por Yasir Khan
Flynns classificationFlynns classification
Flynns classification
Yasir Khan82.5K visualizações
Pipeline processing and space time diagram por Rahul Sharma
Pipeline processing and space time diagramPipeline processing and space time diagram
Pipeline processing and space time diagram
Rahul Sharma23K visualizações
Direct memory access (dma) por Zubair Khalid
Direct memory access (dma)Direct memory access (dma)
Direct memory access (dma)
Zubair Khalid16.3K visualizações
Pipeline hazards in computer Architecture ppt por mali yogesh kumar
Pipeline hazards in computer Architecture pptPipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture ppt
mali yogesh kumar20.9K visualizações
DeadLock in Operating-Systems por Venkata Sreeram
DeadLock in Operating-SystemsDeadLock in Operating-Systems
DeadLock in Operating-Systems
Venkata Sreeram8.9K visualizações
Unit 3 basic processing unit por chidabdu
Unit 3   basic processing unitUnit 3   basic processing unit
Unit 3 basic processing unit
chidabdu44.8K visualizações
Program control por Rahul Narang
Program controlProgram control
Program control
Rahul Narang16.7K visualizações
Branch prediction por Aneesh Raveendran
Branch predictionBranch prediction
Branch prediction
Aneesh Raveendran8K visualizações
VLIW Processors por Sudhanshu Janwadkar
VLIW ProcessorsVLIW Processors
VLIW Processors
Sudhanshu Janwadkar44.7K visualizações
CISC & RISC Architecture por Suvendu Kumar Dash
CISC & RISC Architecture CISC & RISC Architecture
CISC & RISC Architecture
Suvendu Kumar Dash27.6K visualizações

Similar a INSTRUCTION LEVEL PARALLALISM

Parallelism por
ParallelismParallelism
ParallelismMd Raseduzzaman
1.6K visualizações13 slides
Assembly p1 por
Assembly p1Assembly p1
Assembly p1raja khizar
168 visualizações41 slides
pipelining por
pipeliningpipelining
pipeliningsudhir saurav
11.3K visualizações26 slides
Basic MIPS implementation por
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
15.6K visualizações45 slides
Basic MIPS implementation por
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
1.4K visualizações45 slides
MIPS IMPLEMENTATION.pptx por
MIPS IMPLEMENTATION.pptxMIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxJEEVANANTHAMG6
7 visualizações47 slides

Similar a INSTRUCTION LEVEL PARALLALISM(20)

Parallelism por Md Raseduzzaman
ParallelismParallelism
Parallelism
Md Raseduzzaman1.6K visualizações
Assembly p1 por raja khizar
Assembly p1Assembly p1
Assembly p1
raja khizar168 visualizações
pipelining por sudhir saurav
pipeliningpipelining
pipelining
sudhir saurav11.3K visualizações
Basic MIPS implementation por kavitha2009
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
kavitha200915.6K visualizações
Basic MIPS implementation por kavitha2009
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
kavitha20091.4K visualizações
MIPS IMPLEMENTATION.pptx por JEEVANANTHAMG6
MIPS IMPLEMENTATION.pptxMIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptx
JEEVANANTHAMG67 visualizações
pipelining por Sadaf Rasheed
pipeliningpipelining
pipelining
Sadaf Rasheed496 visualizações
Chapter 3 por Rozase Patel
Chapter 3Chapter 3
Chapter 3
Rozase Patel148 visualizações
Pipelining , structural hazards por Munaam Munawar
Pipelining , structural hazardsPipelining , structural hazards
Pipelining , structural hazards
Munaam Munawar3.4K visualizações
Pipeline & Nonpipeline Processor por Smit Shah
Pipeline & Nonpipeline ProcessorPipeline & Nonpipeline Processor
Pipeline & Nonpipeline Processor
Smit Shah622 visualizações
Unit 5-lecture 5 por vishal choudhary
Unit 5-lecture 5Unit 5-lecture 5
Unit 5-lecture 5
vishal choudhary34 visualizações
What is simultaneous multithreading por Fraboni Ec
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
Fraboni Ec2K visualizações
POLITEKNIK MALAYSIA por Aiman Hud
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
Aiman Hud489 visualizações
Design pipeline architecture for various stage pipelines por Mahmudul Hasan
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
Mahmudul Hasan1.8K visualizações
Basic structure of computers by aniket bhute por Aniket Bhute
Basic structure of computers by aniket bhuteBasic structure of computers by aniket bhute
Basic structure of computers by aniket bhute
Aniket Bhute438 visualizações
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr... por IDES Editor
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
IDES Editor437 visualizações
Debate on RISC-CISC por kollatiMeenakshi
Debate on RISC-CISCDebate on RISC-CISC
Debate on RISC-CISC
kollatiMeenakshi111 visualizações
INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELINING por ijiert bestjournal
INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELININGINCREASING THE THROUGHPUT USING EIGHT STAGE PIPELINING
INCREASING THE THROUGHPUT USING EIGHT STAGE PIPELINING
ijiert bestjournal448 visualizações
Pipelining in Computer System Achitecture por YashiUpadhyay3
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System Achitecture
YashiUpadhyay3187 visualizações

Mais de Kamran Ashraf

The Maximum Subarray Problem por
The Maximum Subarray ProblemThe Maximum Subarray Problem
The Maximum Subarray ProblemKamran Ashraf
3.2K visualizações8 slides
Ubiquitous Computing por
Ubiquitous ComputingUbiquitous Computing
Ubiquitous ComputingKamran Ashraf
1.7K visualizações15 slides
Application programming interface sockets por
Application programming interface socketsApplication programming interface sockets
Application programming interface socketsKamran Ashraf
352 visualizações2 slides
Error Detection types por
Error Detection typesError Detection types
Error Detection typesKamran Ashraf
997 visualizações2 slides
VIRTUAL MEMORY por
VIRTUAL MEMORYVIRTUAL MEMORY
VIRTUAL MEMORYKamran Ashraf
4.1K visualizações33 slides
Graphic Processing Unit por
Graphic Processing UnitGraphic Processing Unit
Graphic Processing UnitKamran Ashraf
777 visualizações22 slides

Mais de Kamran Ashraf(6)

The Maximum Subarray Problem por Kamran Ashraf
The Maximum Subarray ProblemThe Maximum Subarray Problem
The Maximum Subarray Problem
Kamran Ashraf3.2K visualizações
Ubiquitous Computing por Kamran Ashraf
Ubiquitous ComputingUbiquitous Computing
Ubiquitous Computing
Kamran Ashraf1.7K visualizações
Application programming interface sockets por Kamran Ashraf
Application programming interface socketsApplication programming interface sockets
Application programming interface sockets
Kamran Ashraf352 visualizações
Error Detection types por Kamran Ashraf
Error Detection typesError Detection types
Error Detection types
Kamran Ashraf997 visualizações
VIRTUAL MEMORY por Kamran Ashraf
VIRTUAL MEMORYVIRTUAL MEMORY
VIRTUAL MEMORY
Kamran Ashraf4.1K visualizações
Graphic Processing Unit por Kamran Ashraf
Graphic Processing UnitGraphic Processing Unit
Graphic Processing Unit
Kamran Ashraf777 visualizações

INSTRUCTION LEVEL PARALLALISM

  • 1. INSTRUCTION LEVEL PARALLALISM PRESENTED BY KAMRAN ASHRAF 13-NTU-4009
  • 2. INTRODUCTION  Instruction-level parallelism (ILP) is a measure of how many operations in a computer program can be performed "in-parallel" at the same time
  • 3. WHAT IS A PARALLEL INSTRUCTION?  Parallel instructions are a set of instructions that do not depend on each other to be executed.  Hierarchy  Bit level Parallelism • 16 bit add on 8 bit processor  Instruction level Parallelism  Loop level Parallelism • for (i=1; i<=1000; i= i+1) x[i] = x[i] + y[i];  Thread level Parallelism • multi-core computers
  • 4. EXAMPLE Consider the following program: 1. e = a + b 2. f = c + d 3. g = e * f  Operation 3 depends on the results of "e" and "f" which are calculated from operations 1 and 2, so "g" cannot be calculated until both of "e" and "f" are computed.  However, operations 1 and 2 do not depend on any other operation, so they can be computed simultaneously.  If we assume that each operation can be completed in one unit of time then these three instructions can be completed in a total of two units of time, giving an ILP factor of 3/2; which means 3/2 = 1.5 greater than without ILP.
  • 5. WHY ILP?  One of the goals of compilers and processors designers is to use as much ILP as possible.  Ordinary programs are written execute instructions in sequence; one after the other, in the order as written by programmers.  ILP allows the compiler and the processor to overlap the execution of multiple instructions or even to change the order in which instructions are executed.
  • 6. ILP TECHNIQUES Micro-architectural techniques that use ILP include:  Instruction pipelining  Superscalar  Out-of-order execution  Register renaming  Speculative execution  Branch prediction
  • 7. INSTRUCTION PIPELINE  An instruction pipeline is a technique used in the design of modern microprocessors, microcontrollers and CPUs to increase their instruction throughput (the number of instructions that can be executed in a unit of time).
  • 8. PIPELINING  The main idea is to divide the processing of a CPU instruction into a series of independent steps of "microinstructions with storage at the end of each step.  This allows the CPUs control logic to handle instructions at the processing rate of the slowest step, which is much faster than the time needed to process the instruction as a single step.
  • 9. EXAMPLE  For example, the RISC pipeline is broken into five stages with a set of flip flops between each stage as follow:  Instruction fetch  Instruction decode & register fetch  Execute  Memory access  Register write back  The vertical axis is successive instructions, the horizontal axis is time. So in the green column, the earliest instruction is in WB stage, and the latest instruction is undergoing instruction fetch.
  • 10. SUPERSCALER  A superscalar CPU architecture implements ILP inside a single processor which allows faster CPU throughput at the same clock rate.
  • 11. WHY SUPERSCALER  A superscalar processor executes more than one instruction during a clock cycle  Simultaneously dispatches multiple instructions to multiple redundant functional units built inside the processor.  Each functional unit is not a separate CPU core but an execution resource inside the CPU such as an arithmetic logic unit, floating point unit (FPU), a bit shifter, or a multiplier.
  • 12. EXAMPLE  Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed.
  • 13. OUT-OF-ORDER EXECUTION  OoOE, is a technique used in most high- performance microprocessors.  The key concept is to allow the processor to avoid a class of delays that occur when the data needed to perform an operation are unavailable.  Most modern CPU designs include support for out of order execution.
  • 14. STEPS  Out-of-order processors breaks up the processing of instructions into these steps:  Instruction fetch.  Instruction dispatch to an instruction queue (also called instruction buffer)  The instruction waits in the queue until its input operands are available.  The instruction is issued to the appropriate functional unit and executed by that unit.  The results are queued (Re-order Buffer).  Only after all older instructions have their results written back to the register file, then this result is written back to the register.
  • 15. OTHER ILP TECHNIQUES  Register renaming which is a technique used to avoid unnecessary serialization of program operations caused by the reuse of registers by those operations, in order to enable out-of-order execution.  Speculative execution which allow the execution of complete instructions or parts of instructions before being sure whether this execution is required.  Branch prediction which is used to avoid delays cause of control dependencies to be resolved. Branch prediction determines whether a conditional branch (jump) in the instruction flow of a program is likely to be taken or not.