SlideShare uma empresa Scribd logo
1 de 31
Loop Parallelization & Pipelining 
AND 
Trends in Parallel System & Forms of 
Parallelism 
By 
Jagrat Gupta 
M.tech[CSE] 1st Year 
(Madhav Institute of Technology and Science, Gwalior-467005)
Loop Parallelization & Pipelining 
It describes the theory & application of loop transformations for 
vectorization and parallelization purposes. 
 Loop Transformation Theory:- 
Parallelization loop nests is one of the most fundamental program 
optimization techniques demanded in a vectorization & 
parallelization compiler. 
The main goal is to maximize the degree of parallelism or data 
locality in a loop nest. It also support efficient use of memory 
hierarchy on a parallel machine.
 Elementary Transformations:- 
 Permutation:- Simply interchange the i & j. 
Do i=1,N 
Do j=1,N 
A(j)=A(j)+C(i,j) 
End Do 
End Do 
Before Transformation 
Do j=1,N 
Do i=1,N 
A(j)=A(j)+C(I,j) 
End Do 
End Do 
After Transformation 
 Reversal:- Reversal of the ith loop is represented by the identity 
matrix with the ith element on the diagonal equal to -1.
Do i=1,N 
Do j=1,N 
A(I,j)=A(i-1,j+1) 
End Do 
End Do 
Before Transformation 
Do i=1,N 
Do j=-N,-1 
A(I,-j)=A(i-1,-j+1) 
End Do 
End Do 
After Transformation 
1 0 
0 -1
 Skewing:- Skewing loop Ij by the integer factor f w.r.t. loop Ii . In 
the following loop nest, the transformation performed is a skew 
of the inner loop with respect to the outer loop by a factor of 1. 
1 0 
1 1 
Do i=1,N 
Do j=1,N 
A(i,j)=A(i,j-1)+A(i-1,j) 
End Do 
End Do 
Before Transformation 
Do i=1,N 
Do j=1,N 
A(i,j-i)=A(i,j-i-1)+A(i-1,j-i) 
End Do 
End Do 
After Transformation
 Transformation Matrices:- 
 Unimodular transformations are defined by transformation 
matrices. 
 A unimodular martix has 3 important properties:- 
1) It is square, i.e. it map n dimensional iteration space into n-dimensional 
iteration space. 
2) It has all integers components, so it maps integer vectors to 
integer vectors. 
3) The absolute value of determinant is 1. 
 Wolf and Lam have stated the following conditions for 
unimodular transformations:- 
1) Let D be the set of distance vector s of a loop nest. A 
unimodular transformation T is legal if and only if, d € D 
T.d>=0 (Lexicographic positive) 
2) Loop i through j of a nested computation with dependence
Vector D are fully permutable if for all d € D. 
Do i=1,N 
Do j=1,N 
A(i,j)=f(A(i,j),A(i+1,j-1)) 
End Do 
End Do 
Let this code has the dependence vector d=(1,-1) . The Loop interchange 
Transformation is represented by the matrix 
0 1 
1 0
Here T.d= (-1,1) i.e. Negative 
Now if we compound the interchange with a reversal represented by 
the transformation matrix: 
T’= 
-1 0 
0 1 
Now T’.d= (-1,-1)=-(1,1) So matrix part is positive. So it is legal. 
 Parallelization and Wavefronting:- 
 The theory of loop transformation can be applied to execute loop 
iterations in parallel. 
0 1 
1 0 
0 -1 
1 0
 Parallelization Conditions:- The purpose of loop parallelization is 
to maximize the no of parallelizable loops. The algorithm for loop 
parallelization consists of two steps:- 
1) It first transforms the original loop nest into canonical form, 
namely fully permutable loop nest. 
2) It then transforms the fully permutable loop nest to exploit 
coarse and/or fine grain parallelism according to the target 
architecture. 
 Fine Grain Wavefronting:- 
• A nest of n fully permutable loops can be transformed into code 
containing at least (n-1) degrees of parallelism. So these (n-1) 
parallel loops can be obtained by skewing the innermost loop in 
the fully permutable nest by each of the other loop and moving 
the innermost loop to the outermost position. 
This Transformation is called Wavefront 
Transformation, is represented by the following matrix:-
1 1 - - - - - - - 1 1 
1 0 - - - - - - - 0 0 
0 1 - - - - - - - 0 0 
- - - - 
- - - - 
0 0 - - - - - - - 1 0 
• Fine grain parallelism is exploited on vector m/c, superscalar 
processors and systolic arrays. 
• Actually wavefront transformation automatically places the 
maximum doall loops in the innermost loops, maximizing fine 
grain parallelism.
 Coarse Grain Parallelism:- 
• A wavefront transformation produces the max degree of 
parallelism but makes the outermost loop sequential if any one. 
• a heuristic although non optimal approach for making loops 
doall is simply to identify loops Ii such that all di are zero. Those 
loops can be made outermost Doall. The remaining loops in the 
tile can be wavefronted to obtain the remaining parallelism. 
• The loop parallelization algorithm has a common step for fine 
grain and coarse grain parallelism in creating a n-deep fully 
permutable loop nest by skewing. The algorithm can be tailored 
for different machine based on the following guidelines:- 
1) Move Doall loops innermost for fine-grain machine. Apply a 
wavefront transformation to create up to (n-1) doall loops. 
2) Create outermost doall loops for coarse grain machine. Apply 
tilling to a fully permutable loop nest. 
3) Use tilling to create loops for both fine and coarse grain m/c.
 Tiling & Localization:- 
 The purpose is to reduce synchronization overhead and to 
enhance multiprocessor efficiency when loops are distributed for 
parallel execution. 
 It is possible to reduce to synchronization cost and improve data 
locality of parallelized loops via an optimization known as tiling. 
 In general tiling maps an n deep loop nest into 2n deep loop nest 
where the inner n loops include only a small fixed no of iteration. 
The outer loop of tiled code control the execution of tiles. 
 It also satisfy the property of Full permutability. 
 We can reduce synchronization cost in the following way-We 
first tile the loops and then apply the wavefront transformation to 
the controlling loops of the tiles. In this way, synchronization cost 
is reduced by the size of the tile.
 Tiling for Locality:- 
• Technique to improve the data locality of numerical algorithms. 
• It can be used for different levels of memory, caches & registers; 
multiple tiling can be used to achieve locality at multiple levels of 
the memory hierarchy simultaneously. 
Do i=1,N 
Do j=1,N 
Do k=1,N 
C(i,k)=C(i,k)+A(i,j)*B(j,k) 
End Do 
End Do 
End Do 
Before Tiling 
Do l=1,N,s 
Do m=1,N,s 
Do i=1,N 
Do j=l, min(l+s-1,N) 
Do k=m, min(m+s-1,N) 
C(i,k)=C(i,k)+A(i,j)*B(j,k) 
End Do 
End Do 
End Do 
End Do 
End Do 
After Tiling
• In the previous code some row of B & C are reused in the next 
iteration of the middle & outer loop. So tiling reorders the 
execution sequence such that iterations from loops of the outer 
dimensions are executed before all the iterations of the inner 
loops are completed. 
• Tiling reduces the no of interleaving iterations and the data 
fetched b/w data reuses. This allows reused data to still be in the 
cache or register file & hence reduces memory access.
Pipelining 
 Software Pipelining:- 
 Pipelining of successive iterations of the loop in the source 
programs. The advantage of s/w pipelining is to reduce the 
execution time with compact object code. 
 Pipelining of loop iterations:- (Lam`s Tutorials Notes) 
Do i=1,N 
A(i)= A(i)*B+C 
End Do 
• In the above code iterations are independents. It is assumed that 
each memory accesses (R or W) takes 1 cycles & each operation 
(Mul & Add) takes 2 cycles.
• Without Pipelining:- 
 1 Iteration require 6 cycles to be execute. So N Iteration Require 
6N Cycles to complete ignoring loop control overhead. 
Cycles Instructions Comment 
1. Read /Fetch A(i)/ 
2. Mul Multiply by B 
3. 
4. Add /Add to C/ 
5. 
6. Write /Store A(i)/ 
• With Pipelining:- 
 Now same code is executed on a 8-deep instruction pipeline.
Cycles Iterations 
1 2 3 4 
1 R 
2 Mul 
3 R 
4 Mul 
5 Add R 
6 Mul 
7 Add R 
8 W Mul 
9 Add 
10 W 
11 Add 
12 W 
13 
14 W
 Hence 4 Iterations are required 14 Clock Cycles. 
 Speed up factor= 24/14=1.7 
 For N Iterations, it is 6N/(2N+6).
Trends towards Parallel Systems 
 From an application point of view, the mainstream of usage of 
computer is experiencing a trend of four ascending levels of 
sophistication:- 
• Data processing. 
• Information processing. 
• Knowledge processing. 
• Intelligence processing. 
 Computer usage started with data processing, while is still a 
major task of today’s computers. With more and more data 
structures developed, many users are shifting to computer roles 
from pure data processing to information processing. 
 As the accumulated knowledge bases expanded rapidly in recent 
years, there grew a strong demand to use computers for 
knowledge processing.
 Intelligence is very difficult to create; its processing even more so. 
 Today's computers are very fast and obedient and have many 
reliable memory cells to be qualified for data-information-knowledge 
processing. Computers are far from being satisfactory 
in performing theorem proving, logical inference and creative 
thinking.
Forms Of Parallelism 
 Parallelism in Hardware (Uniprocessor) 
– Pipelining 
– Superscalar, VLIW etc. 
 Parallelism in Hardware (SIMD, Vector processors, GPUs) 
 Parallelism in Hardware (Multiprocessor) 
– Shared-memory multiprocessors 
– Distributed-memory multiprocessors 
– Chip-multiprocessors a.k.a. Multi-cores 
 Parallelism in Hardware (Multicomputers a.k.a. clusters) 
 Parallelism in Software 
– Task parallelism 
– Data parallelism
 Instruction Level Parallelism:- 
• Multiple instructions from the same instruction stream can be 
executed concurrently. The potential overlap among instructions is 
called instruction level parallelism. 
• Generated and managed by hardware (superscalar) or by compiler 
(VLIW). 
• Limited in practice by data and control dependences. 
• There are two approaches to instruction level parallelism: 
-Hardware. 
-Software. 
• Hardware level works upon dynamic parallelism whereas, the 
software level works on static parallelism. 
• Consider the following program: 
1. e = a + b 
2. f = c + d 
3. m = e * f
• Operation 3 depends on the results of operations 1 and 2, so it 
cannot be calculated until both of them are completed. However, 
operations 1 and 2 do not depend on any other operation, so 
they can be calculated simultaneously. If we assume that each 
operation can be completed in one unit of time then these three 
instructions can be completed in a total of two units of time, 
giving an ILP of 3/2. 
 Thread-level or task-level parallelism (TLP):- 
• Multiple threads or instruction sequences from the same 
application can be executed concurrently. 
• Generated by compiler/user and managed by compiler and 
hardware. 
• Limited in practice by communication/synchronization overheads 
and by algorithm characteristics.
 Data-level parallelism (DLP):- 
• Instructions from a single stream operate concurrently on several 
data 
• Limited by non-regular data manipulation patterns and by 
memory bandwidth. 
 Transaction-level parallelism:- 
• Multiple threads/processes from different transactions can be 
executed concurrently. 
• Limited by access to metadata and by interconnection bandwidth.
Parallel Computing 
• Use of multiple processors or computers working together on a 
common task. 
–Each processor works on its section of the problem. 
–Processors can exchange information . 
Grid of Problem to be solved 
CPU #1 works on this area of the 
problem 
exchange 
CPU #2 works on this area of the 
problem 
exchange 
CPU #3 works on this area 
of the problem 
CPU #4 works on this area 
of the problem
Why Do Parallel Computing? 
 Limits of single CPU computing 
–performance 
–available memory 
 Parallel computing allows one to: 
–solve problems that don’t fit on a single CPU 
–solve problems that can’t be solved in a reasonable time 
 We can solve… 
–larger problems 
–the same problem faster 
–more cases
Brent`s Theorem 
Statement:- Given A, a parallel algorithm with computation time t, if 
parallel algorithm A performs m computational operations, then 
processors can execute algorithm A in time:- 
t+(m-1)/p 
Proof:- :Let si be the no of computational operations performed by 
parallel algorithm A at step i, (1<=i<=t) 
Given t 
Σ si = m 
i=1 
Since we have p no of processors, we can simulate step I in time 
Ceil(si /p). So the entire computations of A can be performed with p 
processors in time :-
t t 
Σ ceil(si /p) <= Σ (si+p-1)/p 
i=1 i=1 
(Using the definition of ceiling Function) 
t t 
= Σ(p/p) + Σ(si -1 /p) 
i=1 i=1 
= t+(m-1)/p 
(Hence Proved)
Loop parallelization & pipelining

Mais conteúdo relacionado

Mais procurados

Dichotomy of parallel computing platforms
Dichotomy of parallel computing platformsDichotomy of parallel computing platforms
Dichotomy of parallel computing platformsSyed Zaid Irshad
 
Centralized shared memory architectures
Centralized shared memory architecturesCentralized shared memory architectures
Centralized shared memory architecturesGokuldhev mony
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memoryHamza Zahid
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecturePiyush Mittal
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performanceSyed Zaid Irshad
 
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture  (Advanced computer architecture)Multiprocessor Architecture  (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)vani261
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computingHeman Pathak
 
Multiprocessor
MultiprocessorMultiprocessor
MultiprocessorNeel Patel
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architectureArpan Baishya
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingRoshan Karunarathna
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applicationsBurhan Ahmed
 

Mais procurados (20)

Dichotomy of parallel computing platforms
Dichotomy of parallel computing platformsDichotomy of parallel computing platforms
Dichotomy of parallel computing platforms
 
Centralized shared memory architectures
Centralized shared memory architecturesCentralized shared memory architectures
Centralized shared memory architectures
 
Overview on NUMA
Overview on NUMAOverview on NUMA
Overview on NUMA
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memory
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performance
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Os Threads
Os ThreadsOs Threads
Os Threads
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture  (Advanced computer architecture)Multiprocessor Architecture  (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Sequential consistency model
Sequential consistency modelSequential consistency model
Sequential consistency model
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
 

Destaque

Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.MITS Gwalior
 
Code Optimization
Code OptimizationCode Optimization
Code Optimizationguest9f8315
 
parallel language and compiler
parallel language and compilerparallel language and compiler
parallel language and compilerVignesh Tamil
 
Makefile::Parallel - Dependency specification language
Makefile::Parallel - Dependency specification languageMakefile::Parallel - Dependency specification language
Makefile::Parallel - Dependency specification languageRuben Fonseca
 
openMP loop parallelization
openMP loop parallelizationopenMP loop parallelization
openMP loop parallelizationAlbert DeFusco
 
0580 y15 sp_4
0580 y15 sp_40580 y15 sp_4
0580 y15 sp_4King Ali
 
Parallel Programming Primer
Parallel Programming PrimerParallel Programming Primer
Parallel Programming PrimerSri Prasanna
 
Do you know matrix transformations
Do you know matrix transformationsDo you know matrix transformations
Do you know matrix transformationsTarun Gehlot
 
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelismdeviyasharwin
 
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Dr.K. Thirunadana Sikamani
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInteX Research Lab
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processingKamal Acharya
 
Flynns classification
Flynns classificationFlynns classification
Flynns classificationYasir Khan
 
Slideshare Powerpoint presentation
Slideshare Powerpoint presentationSlideshare Powerpoint presentation
Slideshare Powerpoint presentationelliehood
 

Destaque (17)

Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.Parallel programming model, language and compiler in ACA.
Parallel programming model, language and compiler in ACA.
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
 
parallel language and compiler
parallel language and compilerparallel language and compiler
parallel language and compiler
 
Makefile::Parallel - Dependency specification language
Makefile::Parallel - Dependency specification languageMakefile::Parallel - Dependency specification language
Makefile::Parallel - Dependency specification language
 
openMP loop parallelization
openMP loop parallelizationopenMP loop parallelization
openMP loop parallelization
 
0580 y15 sp_4
0580 y15 sp_40580 y15 sp_4
0580 y15 sp_4
 
Parallel Programming Primer
Parallel Programming PrimerParallel Programming Primer
Parallel Programming Primer
 
Do you know matrix transformations
Do you know matrix transformationsDo you know matrix transformations
Do you know matrix transformations
 
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelism
 
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer Architecture
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processing
 
pipelining
pipeliningpipelining
pipelining
 
M-Commerce
M-CommerceM-Commerce
M-Commerce
 
Parallel processing Concepts
Parallel processing ConceptsParallel processing Concepts
Parallel processing Concepts
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
 
Slideshare Powerpoint presentation
Slideshare Powerpoint presentationSlideshare Powerpoint presentation
Slideshare Powerpoint presentation
 

Semelhante a Loop parallelization & pipelining

Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptxSeungeon Baek
 
IOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentIOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentShubham Joshi
 
Optimizing Data Partitioning at Broadcasting the Data
Optimizing Data Partitioning at Broadcasting the DataOptimizing Data Partitioning at Broadcasting the Data
Optimizing Data Partitioning at Broadcasting the DataTakashi Yamanoue
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Universitat Politècnica de Catalunya
 
The Traveling Salesman Problem: A Neural Network Perspective
The Traveling Salesman Problem: A Neural Network PerspectiveThe Traveling Salesman Problem: A Neural Network Perspective
The Traveling Salesman Problem: A Neural Network Perspectivemustafa sarac
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with PerformersJoonhyung Lee
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptxasdq4
 
Computer arithmetic in computer architecture
Computer arithmetic in computer architectureComputer arithmetic in computer architecture
Computer arithmetic in computer architectureishapadhy
 
Graph Neural Networks (GNN) for Large-Scale Network Performance Evaluation.pptx
Graph Neural Networks (GNN) for Large-Scale Network Performance Evaluation.pptxGraph Neural Networks (GNN) for Large-Scale Network Performance Evaluation.pptx
Graph Neural Networks (GNN) for Large-Scale Network Performance Evaluation.pptxSan Jose State University
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Universität Rostock
 

Semelhante a Loop parallelization & pipelining (20)

Queue (1)(1).ppt
Queue (1)(1).pptQueue (1)(1).ppt
Queue (1)(1).ppt
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
 
IOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentIOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_present
 
Optimizing Data Partitioning at Broadcasting the Data
Optimizing Data Partitioning at Broadcasting the DataOptimizing Data Partitioning at Broadcasting the Data
Optimizing Data Partitioning at Broadcasting the Data
 
Unit iii update
Unit iii updateUnit iii update
Unit iii update
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
DAA Notes.pdf
DAA Notes.pdfDAA Notes.pdf
DAA Notes.pdf
 
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
 
The Traveling Salesman Problem: A Neural Network Perspective
The Traveling Salesman Problem: A Neural Network PerspectiveThe Traveling Salesman Problem: A Neural Network Perspective
The Traveling Salesman Problem: A Neural Network Perspective
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
Skyline queries
Skyline queriesSkyline queries
Skyline queries
 
Computer arithmetic in computer architecture
Computer arithmetic in computer architectureComputer arithmetic in computer architecture
Computer arithmetic in computer architecture
 
Graph Neural Networks (GNN) for Large-Scale Network Performance Evaluation.pptx
Graph Neural Networks (GNN) for Large-Scale Network Performance Evaluation.pptxGraph Neural Networks (GNN) for Large-Scale Network Performance Evaluation.pptx
Graph Neural Networks (GNN) for Large-Scale Network Performance Evaluation.pptx
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Chap4 slides
Chap4 slidesChap4 slides
Chap4 slides
 
Gene's law
Gene's lawGene's law
Gene's law
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 

Último

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 

Último (20)

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 

Loop parallelization & pipelining

  • 1. Loop Parallelization & Pipelining AND Trends in Parallel System & Forms of Parallelism By Jagrat Gupta M.tech[CSE] 1st Year (Madhav Institute of Technology and Science, Gwalior-467005)
  • 2. Loop Parallelization & Pipelining It describes the theory & application of loop transformations for vectorization and parallelization purposes.  Loop Transformation Theory:- Parallelization loop nests is one of the most fundamental program optimization techniques demanded in a vectorization & parallelization compiler. The main goal is to maximize the degree of parallelism or data locality in a loop nest. It also support efficient use of memory hierarchy on a parallel machine.
  • 3.  Elementary Transformations:-  Permutation:- Simply interchange the i & j. Do i=1,N Do j=1,N A(j)=A(j)+C(i,j) End Do End Do Before Transformation Do j=1,N Do i=1,N A(j)=A(j)+C(I,j) End Do End Do After Transformation  Reversal:- Reversal of the ith loop is represented by the identity matrix with the ith element on the diagonal equal to -1.
  • 4. Do i=1,N Do j=1,N A(I,j)=A(i-1,j+1) End Do End Do Before Transformation Do i=1,N Do j=-N,-1 A(I,-j)=A(i-1,-j+1) End Do End Do After Transformation 1 0 0 -1
  • 5.  Skewing:- Skewing loop Ij by the integer factor f w.r.t. loop Ii . In the following loop nest, the transformation performed is a skew of the inner loop with respect to the outer loop by a factor of 1. 1 0 1 1 Do i=1,N Do j=1,N A(i,j)=A(i,j-1)+A(i-1,j) End Do End Do Before Transformation Do i=1,N Do j=1,N A(i,j-i)=A(i,j-i-1)+A(i-1,j-i) End Do End Do After Transformation
  • 6.
  • 7.  Transformation Matrices:-  Unimodular transformations are defined by transformation matrices.  A unimodular martix has 3 important properties:- 1) It is square, i.e. it map n dimensional iteration space into n-dimensional iteration space. 2) It has all integers components, so it maps integer vectors to integer vectors. 3) The absolute value of determinant is 1.  Wolf and Lam have stated the following conditions for unimodular transformations:- 1) Let D be the set of distance vector s of a loop nest. A unimodular transformation T is legal if and only if, d € D T.d>=0 (Lexicographic positive) 2) Loop i through j of a nested computation with dependence
  • 8. Vector D are fully permutable if for all d € D. Do i=1,N Do j=1,N A(i,j)=f(A(i,j),A(i+1,j-1)) End Do End Do Let this code has the dependence vector d=(1,-1) . The Loop interchange Transformation is represented by the matrix 0 1 1 0
  • 9. Here T.d= (-1,1) i.e. Negative Now if we compound the interchange with a reversal represented by the transformation matrix: T’= -1 0 0 1 Now T’.d= (-1,-1)=-(1,1) So matrix part is positive. So it is legal.  Parallelization and Wavefronting:-  The theory of loop transformation can be applied to execute loop iterations in parallel. 0 1 1 0 0 -1 1 0
  • 10.  Parallelization Conditions:- The purpose of loop parallelization is to maximize the no of parallelizable loops. The algorithm for loop parallelization consists of two steps:- 1) It first transforms the original loop nest into canonical form, namely fully permutable loop nest. 2) It then transforms the fully permutable loop nest to exploit coarse and/or fine grain parallelism according to the target architecture.  Fine Grain Wavefronting:- • A nest of n fully permutable loops can be transformed into code containing at least (n-1) degrees of parallelism. So these (n-1) parallel loops can be obtained by skewing the innermost loop in the fully permutable nest by each of the other loop and moving the innermost loop to the outermost position. This Transformation is called Wavefront Transformation, is represented by the following matrix:-
  • 11. 1 1 - - - - - - - 1 1 1 0 - - - - - - - 0 0 0 1 - - - - - - - 0 0 - - - - - - - - 0 0 - - - - - - - 1 0 • Fine grain parallelism is exploited on vector m/c, superscalar processors and systolic arrays. • Actually wavefront transformation automatically places the maximum doall loops in the innermost loops, maximizing fine grain parallelism.
  • 12.  Coarse Grain Parallelism:- • A wavefront transformation produces the max degree of parallelism but makes the outermost loop sequential if any one. • a heuristic although non optimal approach for making loops doall is simply to identify loops Ii such that all di are zero. Those loops can be made outermost Doall. The remaining loops in the tile can be wavefronted to obtain the remaining parallelism. • The loop parallelization algorithm has a common step for fine grain and coarse grain parallelism in creating a n-deep fully permutable loop nest by skewing. The algorithm can be tailored for different machine based on the following guidelines:- 1) Move Doall loops innermost for fine-grain machine. Apply a wavefront transformation to create up to (n-1) doall loops. 2) Create outermost doall loops for coarse grain machine. Apply tilling to a fully permutable loop nest. 3) Use tilling to create loops for both fine and coarse grain m/c.
  • 13.  Tiling & Localization:-  The purpose is to reduce synchronization overhead and to enhance multiprocessor efficiency when loops are distributed for parallel execution.  It is possible to reduce to synchronization cost and improve data locality of parallelized loops via an optimization known as tiling.  In general tiling maps an n deep loop nest into 2n deep loop nest where the inner n loops include only a small fixed no of iteration. The outer loop of tiled code control the execution of tiles.  It also satisfy the property of Full permutability.  We can reduce synchronization cost in the following way-We first tile the loops and then apply the wavefront transformation to the controlling loops of the tiles. In this way, synchronization cost is reduced by the size of the tile.
  • 14.
  • 15.  Tiling for Locality:- • Technique to improve the data locality of numerical algorithms. • It can be used for different levels of memory, caches & registers; multiple tiling can be used to achieve locality at multiple levels of the memory hierarchy simultaneously. Do i=1,N Do j=1,N Do k=1,N C(i,k)=C(i,k)+A(i,j)*B(j,k) End Do End Do End Do Before Tiling Do l=1,N,s Do m=1,N,s Do i=1,N Do j=l, min(l+s-1,N) Do k=m, min(m+s-1,N) C(i,k)=C(i,k)+A(i,j)*B(j,k) End Do End Do End Do End Do End Do After Tiling
  • 16. • In the previous code some row of B & C are reused in the next iteration of the middle & outer loop. So tiling reorders the execution sequence such that iterations from loops of the outer dimensions are executed before all the iterations of the inner loops are completed. • Tiling reduces the no of interleaving iterations and the data fetched b/w data reuses. This allows reused data to still be in the cache or register file & hence reduces memory access.
  • 17. Pipelining  Software Pipelining:-  Pipelining of successive iterations of the loop in the source programs. The advantage of s/w pipelining is to reduce the execution time with compact object code.  Pipelining of loop iterations:- (Lam`s Tutorials Notes) Do i=1,N A(i)= A(i)*B+C End Do • In the above code iterations are independents. It is assumed that each memory accesses (R or W) takes 1 cycles & each operation (Mul & Add) takes 2 cycles.
  • 18. • Without Pipelining:-  1 Iteration require 6 cycles to be execute. So N Iteration Require 6N Cycles to complete ignoring loop control overhead. Cycles Instructions Comment 1. Read /Fetch A(i)/ 2. Mul Multiply by B 3. 4. Add /Add to C/ 5. 6. Write /Store A(i)/ • With Pipelining:-  Now same code is executed on a 8-deep instruction pipeline.
  • 19. Cycles Iterations 1 2 3 4 1 R 2 Mul 3 R 4 Mul 5 Add R 6 Mul 7 Add R 8 W Mul 9 Add 10 W 11 Add 12 W 13 14 W
  • 20.  Hence 4 Iterations are required 14 Clock Cycles.  Speed up factor= 24/14=1.7  For N Iterations, it is 6N/(2N+6).
  • 21. Trends towards Parallel Systems  From an application point of view, the mainstream of usage of computer is experiencing a trend of four ascending levels of sophistication:- • Data processing. • Information processing. • Knowledge processing. • Intelligence processing.  Computer usage started with data processing, while is still a major task of today’s computers. With more and more data structures developed, many users are shifting to computer roles from pure data processing to information processing.  As the accumulated knowledge bases expanded rapidly in recent years, there grew a strong demand to use computers for knowledge processing.
  • 22.  Intelligence is very difficult to create; its processing even more so.  Today's computers are very fast and obedient and have many reliable memory cells to be qualified for data-information-knowledge processing. Computers are far from being satisfactory in performing theorem proving, logical inference and creative thinking.
  • 23. Forms Of Parallelism  Parallelism in Hardware (Uniprocessor) – Pipelining – Superscalar, VLIW etc.  Parallelism in Hardware (SIMD, Vector processors, GPUs)  Parallelism in Hardware (Multiprocessor) – Shared-memory multiprocessors – Distributed-memory multiprocessors – Chip-multiprocessors a.k.a. Multi-cores  Parallelism in Hardware (Multicomputers a.k.a. clusters)  Parallelism in Software – Task parallelism – Data parallelism
  • 24.  Instruction Level Parallelism:- • Multiple instructions from the same instruction stream can be executed concurrently. The potential overlap among instructions is called instruction level parallelism. • Generated and managed by hardware (superscalar) or by compiler (VLIW). • Limited in practice by data and control dependences. • There are two approaches to instruction level parallelism: -Hardware. -Software. • Hardware level works upon dynamic parallelism whereas, the software level works on static parallelism. • Consider the following program: 1. e = a + b 2. f = c + d 3. m = e * f
  • 25. • Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of them are completed. However, operations 1 and 2 do not depend on any other operation, so they can be calculated simultaneously. If we assume that each operation can be completed in one unit of time then these three instructions can be completed in a total of two units of time, giving an ILP of 3/2.  Thread-level or task-level parallelism (TLP):- • Multiple threads or instruction sequences from the same application can be executed concurrently. • Generated by compiler/user and managed by compiler and hardware. • Limited in practice by communication/synchronization overheads and by algorithm characteristics.
  • 26.  Data-level parallelism (DLP):- • Instructions from a single stream operate concurrently on several data • Limited by non-regular data manipulation patterns and by memory bandwidth.  Transaction-level parallelism:- • Multiple threads/processes from different transactions can be executed concurrently. • Limited by access to metadata and by interconnection bandwidth.
  • 27. Parallel Computing • Use of multiple processors or computers working together on a common task. –Each processor works on its section of the problem. –Processors can exchange information . Grid of Problem to be solved CPU #1 works on this area of the problem exchange CPU #2 works on this area of the problem exchange CPU #3 works on this area of the problem CPU #4 works on this area of the problem
  • 28. Why Do Parallel Computing?  Limits of single CPU computing –performance –available memory  Parallel computing allows one to: –solve problems that don’t fit on a single CPU –solve problems that can’t be solved in a reasonable time  We can solve… –larger problems –the same problem faster –more cases
  • 29. Brent`s Theorem Statement:- Given A, a parallel algorithm with computation time t, if parallel algorithm A performs m computational operations, then processors can execute algorithm A in time:- t+(m-1)/p Proof:- :Let si be the no of computational operations performed by parallel algorithm A at step i, (1<=i<=t) Given t Σ si = m i=1 Since we have p no of processors, we can simulate step I in time Ceil(si /p). So the entire computations of A can be performed with p processors in time :-
  • 30. t t Σ ceil(si /p) <= Σ (si+p-1)/p i=1 i=1 (Using the definition of ceiling Function) t t = Σ(p/p) + Σ(si -1 /p) i=1 i=1 = t+(m-1)/p (Hence Proved)