SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Lecture 2
More about Parallel
Computing
Vajira Thambawita
Parallel Computer Memory Architectures
- Shared Memory
• Multiple processors can work independently but share the same
memory resources
• Shared memory machines can be divided into two groups based upon
memory access time:
UMA : Uniform Memory Access
NUMA : Non- Uniform Memory Access
Parallel Computer Memory Architectures
- Shared Memory
• Equal accesses and access times to memory
• Most commonly represented today by Symmetric
Multiprocessor (SMP) machines
Uniform Memory Access (UMA)
Parallel Computer Memory Architectures
- Shared Memory
Non - Uniform Memory Access (NUMA)
• Not all processors have equal memory
access time
Parallel Computer Memory Architectures
- Distributed Memory
• Processors have own memory
(There is no concept of global address space)
• It operates independently
• communications in message passing systems
are performed via send and receive operations
Parallel Computer Memory Architectures
– Hybrid Distributed-Shared Memory
• Use in largest and Fasted computers in the
world today
Parallel Programming Models
Shared Memory Model (without threads)
• In this programming model, processes/tasks share a common address
space, which they read and write to asynchronously.
Parallel Programming Models
Threads Model
• This programming model is a type of
shared memory programming.
• In the threads model of parallel
programming, a single "heavy weight"
process can have multiple "light
weight", concurrent execution paths.
• Ex: POSIX Threads, OpenMP,
Microsoft threads, Java Python
threads, CUDA threads for GPUs
Parallel Programming Models
Distributed Memory / Message
Passing Model
• A set of tasks that use their
own local memory during
computation. Multiple tasks
can reside on the same physical
machine and/or across an
arbitrary number of machines.
• Tasks exchange data through
communications by sending
and receiving messages.
• Ex:
• Message Passing Interface
(MPI)
Parallel Programming Models
Data Parallel Model
• May also be referred to as the Partitioned Global Address Space (PGAS)
model.
• Ex: Coarray Fortran, Unified Parallel C (UPC), X10
Parallel Programming Models
Hybrid Model
• A hybrid model combines more than one of the previously described
programming models.
Parallel Programming Models
SPMD and MPMD
Single Program Multiple Data (SPMD)
Multiple Program Multiple Data (MPMD)
Designing Parallel Programs
Automatic vs. Manual Parallelization
• Fully Automatic
• The compiler analyzes the source code and identifies opportunities for
parallelism.
• The analysis includes identifying inhibitors to parallelism and possibly a
cost weighting on whether or not the parallelism would actually improve
performance.
• Loops (do, for) are the most frequent target for automatic
parallelization.
• Programmer Directed
• Using "compiler directives" or possibly compiler flags, the programmer
explicitly tells the compiler how to parallelize the code.
• May be able to be used in conjunction with some degree of automatic
parallelization also.
Designing Parallel Programs
Understand the Problem and the Program
• Easy to parallelize problem
• Calculate the potential energy for each of several thousand independent
conformations of a molecule. When done, find the minimum energy
conformation.
• A problem with little-to-no parallelism
• Calculation of the Fibonacci series (0,1,1,2,3,5,8,13,21,...) by use of the
formula:
• F(n) = F(n-1) + F(n-2)
Designing Parallel Programs
Partitioning
• One of the first steps in designing a parallel program is to break the
problem into discrete "chunks" of work that can be distributed to
multiple tasks. This is known as decomposition or partitioning.
Two ways:
• Domain decomposition
• Functional decomposition
Designing Parallel Programs
Domain Decomposition
The data associated with a problem is decomposed
There are different ways to partition data:
Designing Parallel Programs
Functional Decomposition
The problem is decomposed according to the work that must be done
Designing Parallel Programs
You DON'T need communications
• Some types of problems can be
decomposed and executed in parallel with
virtually no need for tasks to share data.
• Ex: Every pixel in a black and white image
needs to have its color reversed
You DO need communications
• This require tasks to share data with each
other
• A 2-D heat diffusion problem requires a
task to know the temperatures calculated
by the tasks that have neighboring data
Designing Parallel Programs
Factors to Consider (designing your program's inter-task
communications)
• Communication overhead
• Latency vs. Bandwidth
• Visibility of communications
• Synchronous vs. asynchronous communications
• Scope of communications
• Efficiency of communications
Designing Parallel Programs
Granularity
• In parallel computing, granularity is a qualitative measure of the ratio of
computation to communication. (Computation / Communication)
• Periods of computation are typically separated from periods of
communication by synchronization events.
• Fine-grain Parallelism
• Coarse-grain Parallelism
Designing Parallel Programs
• Fine-grain Parallelism
• Relatively small amounts of computational work
are done between communication events
• Low computation to communication ratio
• Facilitates load balancing
• Implies high communication overhead and less
opportunity for performance enhancement
• If granularity is too fine it is possible that the
overhead required for communications and
synchronization between tasks takes longer than
the computation.
• Coarse-grain Parallelism
• Relatively large amounts of computational work
are done between communication/synchronization
events
• High computation to communication ratio
• Implies more opportunity for performance increase
• Harder to load balance efficiently
Designing Parallel Programs
I/O
• Rule #1: Reduce overall I/O as much as possible
• If you have access to a parallel file system, use it.
• Writing large chunks of data rather than small chunks is usually
significantly more efficient.
• Fewer, larger files performs better than many small files.
• Confine I/O to specific serial portions of the job, and then use parallel
communications to distribute data to parallel tasks. For example, Task 1
could read an input file and then communicate required data to other
tasks. Likewise, Task 1 could perform write operation after receiving
required data from all other tasks.
• Aggregate I/O operations across tasks - rather than having many tasks
perform I/O, have a subset of tasks perform it.
Designing Parallel Programs
Debugging
• TotalView from RogueWave Software
• DDT from Allinea
• Inspector from Intel
Performance Analysis and Tuning
• LC's web pages at https://hpc.llnl.gov/software/development-environment-
software
• TAU: http://www.cs.uoregon.edu/research/tau/docs.php
• HPCToolkit: http://hpctoolkit.org/documentation.html
• Open|Speedshop: http://www.openspeedshop.org/
• Vampir / Vampirtrace: http://vampir.eu/
• Valgrind: http://valgrind.org/
• PAPI: http://icl.cs.utk.edu/papi/
• mpitrace https://computing.llnl.gov/tutorials/bgq/index.html#mpitrace
• mpiP: http://mpip.sourceforge.net/
• memP: http://memp.sourceforge.net/
Summary

Mais conteúdo relacionado

Mais procurados

multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputersPankaj Kumar Jain
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingVengada Karthik Rangaraju
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machinesSyed Zaid Irshad
 
Multiprocessor
MultiprocessorMultiprocessor
MultiprocessorNeel Patel
 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interfaceMohit Raghuvanshi
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platformsSyed Zaid Irshad
 
Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI Hanif Durad
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platformsSyed Zaid Irshad
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Evaluation of morden computer & system attributes in ACA
Evaluation of morden computer &  system attributes in ACAEvaluation of morden computer &  system attributes in ACA
Evaluation of morden computer & system attributes in ACAPankaj Kumar Jain
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsDanish Javed
 
Cache coherence
Cache coherenceCache coherence
Cache coherenceEmployee
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Bhavik Vashi
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingRoshan Karunarathna
 

Mais procurados (20)

multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputers
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
Pram model
Pram modelPram model
Pram model
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machines
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Memory management
Memory managementMemory management
Memory management
 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interface
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platforms
 
Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platforms
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Evaluation of morden computer & system attributes in ACA
Evaluation of morden computer &  system attributes in ACAEvaluation of morden computer &  system attributes in ACA
Evaluation of morden computer & system attributes in ACA
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Course outline of parallel and distributed computing
Course outline of parallel and distributed computingCourse outline of parallel and distributed computing
Course outline of parallel and distributed computing
 
Parallel programming model
Parallel programming modelParallel programming model
Parallel programming model
 
Parallel Computing
Parallel Computing Parallel Computing
Parallel Computing
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 

Semelhante a Lecture 2 more about parallel computing

01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.pptHarshitPal37
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptRubenGabrielHernande
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processingSyed Zaid Irshad
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdfgiddy5
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxkrnaween
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memoryHamza Zahid
 
parallel programming models
 parallel programming models parallel programming models
parallel programming modelsSwetha S
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Sudarshan Mondal
 
PGAS Programming Model
PGAS Programming ModelPGAS Programming Model
PGAS Programming Modelch adnan
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptxJoeBaker69
 
Parallelization using open mp
Parallelization using open mpParallelization using open mp
Parallelization using open mpranjit banshpal
 
Research Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingResearch Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingShitalkumar Sukhdeve
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingSachin Gowda
 

Semelhante a Lecture 2 more about parallel computing (20)

01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt
 
Unit5
Unit5Unit5
Unit5
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memory
 
Hpc 6 7
Hpc 6 7Hpc 6 7
Hpc 6 7
 
parallel programming models
 parallel programming models parallel programming models
parallel programming models
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)
 
Chap 1(one) general introduction
Chap 1(one)  general introductionChap 1(one)  general introduction
Chap 1(one) general introduction
 
PGAS Programming Model
PGAS Programming ModelPGAS Programming Model
PGAS Programming Model
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
 
Parallelization using open mp
Parallelization using open mpParallelization using open mp
Parallelization using open mp
 
Research Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingResearch Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel Programming
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computing
 
parallel processing
parallel processingparallel processing
parallel processing
 

Mais de Vajira Thambawita

Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updatedVajira Thambawita
 
Lecture 12 localization and navigation
Lecture 12 localization and navigationLecture 12 localization and navigation
Lecture 12 localization and navigationVajira Thambawita
 
Lecture 11 neural network principles
Lecture 11 neural network principlesLecture 11 neural network principles
Lecture 11 neural network principlesVajira Thambawita
 
Lecture 10 mobile robot design
Lecture 10 mobile robot designLecture 10 mobile robot design
Lecture 10 mobile robot designVajira Thambawita
 
Lecture 08 robots and controllers
Lecture 08 robots and controllersLecture 08 robots and controllers
Lecture 08 robots and controllersVajira Thambawita
 
Lecture 06 pic programming in c
Lecture 06 pic programming in cLecture 06 pic programming in c
Lecture 06 pic programming in cVajira Thambawita
 
Lecture 05 pic io port programming
Lecture 05 pic io port programmingLecture 05 pic io port programming
Lecture 05 pic io port programmingVajira Thambawita
 
Lecture 04 branch call and time delay
Lecture 04  branch call and time delayLecture 04  branch call and time delay
Lecture 04 branch call and time delayVajira Thambawita
 
Lecture 02 mechatronics systems
Lecture 02 mechatronics systemsLecture 02 mechatronics systems
Lecture 02 mechatronics systemsVajira Thambawita
 
Lecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and RoboticsLecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and RoboticsVajira Thambawita
 
Lec 09 - Registers and Counters
Lec 09 - Registers and CountersLec 09 - Registers and Counters
Lec 09 - Registers and CountersVajira Thambawita
 
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITSLec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITSVajira Thambawita
 
Lec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential LogicLec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential LogicVajira Thambawita
 
Lec 05 - Combinational Logic
Lec 05 - Combinational LogicLec 05 - Combinational Logic
Lec 05 - Combinational LogicVajira Thambawita
 
Lec 04 - Gate-level Minimization
Lec 04 - Gate-level MinimizationLec 04 - Gate-level Minimization
Lec 04 - Gate-level MinimizationVajira Thambawita
 
Lec 03 - Combinational Logic Design
Lec 03 - Combinational Logic DesignLec 03 - Combinational Logic Design
Lec 03 - Combinational Logic DesignVajira Thambawita
 

Mais de Vajira Thambawita (20)

Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
 
Lecture 12 localization and navigation
Lecture 12 localization and navigationLecture 12 localization and navigation
Lecture 12 localization and navigation
 
Lecture 11 neural network principles
Lecture 11 neural network principlesLecture 11 neural network principles
Lecture 11 neural network principles
 
Lecture 10 mobile robot design
Lecture 10 mobile robot designLecture 10 mobile robot design
Lecture 10 mobile robot design
 
Lecture 09 control
Lecture 09 controlLecture 09 control
Lecture 09 control
 
Lecture 08 robots and controllers
Lecture 08 robots and controllersLecture 08 robots and controllers
Lecture 08 robots and controllers
 
Lecture 07 more about pic
Lecture 07 more about picLecture 07 more about pic
Lecture 07 more about pic
 
Lecture 06 pic programming in c
Lecture 06 pic programming in cLecture 06 pic programming in c
Lecture 06 pic programming in c
 
Lecture 05 pic io port programming
Lecture 05 pic io port programmingLecture 05 pic io port programming
Lecture 05 pic io port programming
 
Lecture 04 branch call and time delay
Lecture 04  branch call and time delayLecture 04  branch call and time delay
Lecture 04 branch call and time delay
 
Lecture 03 basics of pic
Lecture 03 basics of picLecture 03 basics of pic
Lecture 03 basics of pic
 
Lecture 02 mechatronics systems
Lecture 02 mechatronics systemsLecture 02 mechatronics systems
Lecture 02 mechatronics systems
 
Lecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and RoboticsLecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and Robotics
 
Lec 09 - Registers and Counters
Lec 09 - Registers and CountersLec 09 - Registers and Counters
Lec 09 - Registers and Counters
 
Lec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURELec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURE
 
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITSLec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
 
Lec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential LogicLec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential Logic
 
Lec 05 - Combinational Logic
Lec 05 - Combinational LogicLec 05 - Combinational Logic
Lec 05 - Combinational Logic
 
Lec 04 - Gate-level Minimization
Lec 04 - Gate-level MinimizationLec 04 - Gate-level Minimization
Lec 04 - Gate-level Minimization
 
Lec 03 - Combinational Logic Design
Lec 03 - Combinational Logic DesignLec 03 - Combinational Logic Design
Lec 03 - Combinational Logic Design
 

Último

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 

Último (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 

Lecture 2 more about parallel computing

  • 1. Lecture 2 More about Parallel Computing Vajira Thambawita
  • 2. Parallel Computer Memory Architectures - Shared Memory • Multiple processors can work independently but share the same memory resources • Shared memory machines can be divided into two groups based upon memory access time: UMA : Uniform Memory Access NUMA : Non- Uniform Memory Access
  • 3. Parallel Computer Memory Architectures - Shared Memory • Equal accesses and access times to memory • Most commonly represented today by Symmetric Multiprocessor (SMP) machines Uniform Memory Access (UMA)
  • 4. Parallel Computer Memory Architectures - Shared Memory Non - Uniform Memory Access (NUMA) • Not all processors have equal memory access time
  • 5. Parallel Computer Memory Architectures - Distributed Memory • Processors have own memory (There is no concept of global address space) • It operates independently • communications in message passing systems are performed via send and receive operations
  • 6. Parallel Computer Memory Architectures – Hybrid Distributed-Shared Memory • Use in largest and Fasted computers in the world today
  • 7. Parallel Programming Models Shared Memory Model (without threads) • In this programming model, processes/tasks share a common address space, which they read and write to asynchronously.
  • 8. Parallel Programming Models Threads Model • This programming model is a type of shared memory programming. • In the threads model of parallel programming, a single "heavy weight" process can have multiple "light weight", concurrent execution paths. • Ex: POSIX Threads, OpenMP, Microsoft threads, Java Python threads, CUDA threads for GPUs
  • 9. Parallel Programming Models Distributed Memory / Message Passing Model • A set of tasks that use their own local memory during computation. Multiple tasks can reside on the same physical machine and/or across an arbitrary number of machines. • Tasks exchange data through communications by sending and receiving messages. • Ex: • Message Passing Interface (MPI)
  • 10. Parallel Programming Models Data Parallel Model • May also be referred to as the Partitioned Global Address Space (PGAS) model. • Ex: Coarray Fortran, Unified Parallel C (UPC), X10
  • 11. Parallel Programming Models Hybrid Model • A hybrid model combines more than one of the previously described programming models.
  • 12. Parallel Programming Models SPMD and MPMD Single Program Multiple Data (SPMD) Multiple Program Multiple Data (MPMD)
  • 13. Designing Parallel Programs Automatic vs. Manual Parallelization • Fully Automatic • The compiler analyzes the source code and identifies opportunities for parallelism. • The analysis includes identifying inhibitors to parallelism and possibly a cost weighting on whether or not the parallelism would actually improve performance. • Loops (do, for) are the most frequent target for automatic parallelization. • Programmer Directed • Using "compiler directives" or possibly compiler flags, the programmer explicitly tells the compiler how to parallelize the code. • May be able to be used in conjunction with some degree of automatic parallelization also.
  • 14. Designing Parallel Programs Understand the Problem and the Program • Easy to parallelize problem • Calculate the potential energy for each of several thousand independent conformations of a molecule. When done, find the minimum energy conformation. • A problem with little-to-no parallelism • Calculation of the Fibonacci series (0,1,1,2,3,5,8,13,21,...) by use of the formula: • F(n) = F(n-1) + F(n-2)
  • 15. Designing Parallel Programs Partitioning • One of the first steps in designing a parallel program is to break the problem into discrete "chunks" of work that can be distributed to multiple tasks. This is known as decomposition or partitioning. Two ways: • Domain decomposition • Functional decomposition
  • 16. Designing Parallel Programs Domain Decomposition The data associated with a problem is decomposed There are different ways to partition data:
  • 17. Designing Parallel Programs Functional Decomposition The problem is decomposed according to the work that must be done
  • 18. Designing Parallel Programs You DON'T need communications • Some types of problems can be decomposed and executed in parallel with virtually no need for tasks to share data. • Ex: Every pixel in a black and white image needs to have its color reversed You DO need communications • This require tasks to share data with each other • A 2-D heat diffusion problem requires a task to know the temperatures calculated by the tasks that have neighboring data
  • 19. Designing Parallel Programs Factors to Consider (designing your program's inter-task communications) • Communication overhead • Latency vs. Bandwidth • Visibility of communications • Synchronous vs. asynchronous communications • Scope of communications • Efficiency of communications
  • 20. Designing Parallel Programs Granularity • In parallel computing, granularity is a qualitative measure of the ratio of computation to communication. (Computation / Communication) • Periods of computation are typically separated from periods of communication by synchronization events. • Fine-grain Parallelism • Coarse-grain Parallelism
  • 21. Designing Parallel Programs • Fine-grain Parallelism • Relatively small amounts of computational work are done between communication events • Low computation to communication ratio • Facilitates load balancing • Implies high communication overhead and less opportunity for performance enhancement • If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation. • Coarse-grain Parallelism • Relatively large amounts of computational work are done between communication/synchronization events • High computation to communication ratio • Implies more opportunity for performance increase • Harder to load balance efficiently
  • 22. Designing Parallel Programs I/O • Rule #1: Reduce overall I/O as much as possible • If you have access to a parallel file system, use it. • Writing large chunks of data rather than small chunks is usually significantly more efficient. • Fewer, larger files performs better than many small files. • Confine I/O to specific serial portions of the job, and then use parallel communications to distribute data to parallel tasks. For example, Task 1 could read an input file and then communicate required data to other tasks. Likewise, Task 1 could perform write operation after receiving required data from all other tasks. • Aggregate I/O operations across tasks - rather than having many tasks perform I/O, have a subset of tasks perform it.
  • 23. Designing Parallel Programs Debugging • TotalView from RogueWave Software • DDT from Allinea • Inspector from Intel Performance Analysis and Tuning • LC's web pages at https://hpc.llnl.gov/software/development-environment- software • TAU: http://www.cs.uoregon.edu/research/tau/docs.php • HPCToolkit: http://hpctoolkit.org/documentation.html • Open|Speedshop: http://www.openspeedshop.org/ • Vampir / Vampirtrace: http://vampir.eu/ • Valgrind: http://valgrind.org/ • PAPI: http://icl.cs.utk.edu/papi/ • mpitrace https://computing.llnl.gov/tutorials/bgq/index.html#mpitrace • mpiP: http://mpip.sourceforge.net/ • memP: http://memp.sourceforge.net/