SlideShare uma empresa Scribd logo
1 de 61
OpenMP
Algoritmi e Calcolo Parallelo

Prof. Pier Luca Lanzi
References

•
•
•

2

Using OpenMP: Portable Shared Memory Parallel
Programming,
Barbara Chapman, Gabriele Jost and Ruud van der Pas
OpenMP.org
http://openmp.org/
OpenMP Tutorial
https://computing.llnl.gov/tutorials/openMP/

Prof. Pier Luca Lanzi
Prologue
Prof. Pier Luca Lanzi
Threads for Dummies (1)

•

•
•

4

A thread is a sequence of instructions to be executed within a
program
Usually a program consist of a single thread of execution that starts in
main(). Each line of your code is executed in turn, exactly one line at a
time.
A (UNIX) process consists of
A virtual address space (i.e., stack, data, and code segments)
System resources (e.g., open files)
A thread of execution





Prof. Pier Luca Lanzi
Threads for Dummies (2)

•
•
•

•

5

In a threaded environment, one process consists of multiple threads
Each user process can have several threads of execution, different
parts of the user’s code can be executing simultaneously
The collection of threads form a single process!
Each thread in the process sees the same virtual address space, files,
etc.

Prof. Pier Luca Lanzi
You can manage the multiple threads in your
process yourself (but it is quite complex!)
Or you can let OpenMP do it almost automatically!

Prof. Pier Luca Lanzi
The Parallelization Model of OpenMP

Prof. Pier Luca Lanzi

7
Fork Join Parallelization

Prof. Pier Luca Lanzi

8
Introduction
Prof. Pier Luca Lanzi
OpenMP = Open Multi-Processing
API for multi-platform shared memory
multiprocessing programming
Supports C/C++ and Fortran
Available on Unix and on MS Windows
Compiler directives, Library routines, &
Environment variables
Prof. Pier Luca Lanzi
History of OpenMP

•
•
•
•

11

The OpenMP Architecture Review Board (ARB) published its first API
specifications, OpenMP for Fortran 1.0, in October 1997. October the
following year they released the C/C++ standard.
2000 saw version 2.0 of the Fortran specifications with version 2.0
of the C/C++ specifications being released in 2002.
Version 2.5 is a combined C/C++/Fortran specification that was
released in 2005.
Version 3.0, released in May, 2008, is the current version of the API
specifications. Included in the new features in 3.0 is the concept of
tasks and the task construct. These new features are summarized in
Appendix F of the OpenMP 3.0 specifications.
Prof. Pier Luca Lanzi
What are the Goals of OpenMP?

•

•
•

Standardization
Provide a standard among a variety of shared memory
architectures
Lean and Mean
Establish a simple and limited set of directives for programming
shared memory machines (significant parallelism can be
implemented by using just 3 or 4 directives)
Ease of Use
Provide capability to incrementally parallelize a serial program
(unlike message-passing libraries)
Provide the capability to implement both coarse-grain
and fine-grain parallelism
Portability
Supports Fortran (77, 90, and 95), C, and C++







•

12



Prof. Pier Luca Lanzi
How Does It Work?

•
•
•
•

13

OpenMP is an implementation of multithreading
A master thread "forks" a specified number of slave threads
Tasks are divided among slaves
Slaves run concurrently as the runtime environment allocating threads
to different processors

Prof. Pier Luca Lanzi
Most OpenMP constructs are
compiler directives or pragmas
The focus of OpenMP is to parallelize loops
OpenMP offers an incremental
approach to parallelism

Prof. Pier Luca Lanzi
Overview of OpenMP

•

15

A compiler directive in C/C++ is called a pragma (pragmatic information). It is
a preprocessor directive, thus it is declared with a hash (#). Compiler
directives specific to OpenMP in C/C++ are written in codes as follows:
#pragma omp <rest of pragma>
Prof. Pier Luca Lanzi
The OpenMP Programming Model

•
•

•
•
•

16

Based on compiler directives
Nested Parallelism Support
API allows parallel constructs inside other parallel constructs
Dynamic Threads
API allows to dynamically change the number of threads which
may used to execute different parallel regions
Input/Output
OpenMP specifies nothing about parallel I/O
The programmer has to insure that I/O is conducted correctly
within the context of a multi-threaded program
Memory consistency
Threads can "cache" their data and are not required to maintain
exact consistency with real memory all of the time
When it is critical that all threads view a shared variable
identically, the programmer is responsible for insuring that the
variable is FLUSHed by all threads as needed









Prof. Pier Luca Lanzi
Basic Elements
Prof. Pier Luca Lanzi
Hello World

18

#include <iostream>
using namespace std;
int main()
{
cout << "Hello Worldn";
}

$ ./hello
Hello World

Prof. Pier Luca Lanzi
Hello World in OpenMP

#include <omp.h>
$ ./hello
#include <iostream>
Hello World
using namespace std;
int main()
Hello World
{
Hello World
#pragma omp parallel num_threads(3)
{
cout << "Hello Worldn”;
// what if
// cout << "Hello World" << endl;
}
}

Prof. Pier Luca Lanzi

19
Compiling an OpenMP Program

•

•

20

OpenMP consists of a set of directives and macros that
are translated in C/C++ and thread instructions

To compile a program using OpenMP, we need to include
the flag “openmp”,
g++ -o example ./source/openmp00.cpp –fopenmp

•

The command compile the file openmp00.cpp using
openmp and generates the executable named “example”

Prof. Pier Luca Lanzi
What Does Really Happen?

•

•

21

OpenMP consists of a set of directives and macros that
are translated in C/C++ and thread instructions

We can check how the original program using OpenMP is
translated, using the command
g++ -fdump-tree-omplower ./source/openmp00.cpp –
fopenmp

Prof. Pier Luca Lanzi
OpenMP Directives

•
•
•
•

A valid OpenMP directive must appear after the #pragma omp name
name identify an OpenMP directive
The directive name can be followed by optional clauses (in any order)
An OpenMP directive precedes the structured block which is enclosed
by the directive itself
#pragma omp name [clause, ...]
{
…
}

•

22

Example
#pragma omp parallel num_threads(3)
{
cout << "Hello Worldn";
}
Prof. Pier Luca Lanzi
Directive Scoping

•

23

Static (Lexical) Extent
The code textually enclosed between the beginning and the end of
a structured block following a directive
The static extent of a directives does not span multiple routines or
code files
Orphaned Directive
An OpenMP directive that appears independently from another
enclosing directive. It exists outside of another directive's static
(lexical) extent.
Will span routines and possibly code files
Dynamic Extent
The dynamic extent of a directive includes both its static (lexical)
extent and the extents of its orphaned directives.





•

•





Prof. Pier Luca Lanzi
Directive Scoping: example
void f(){
#pragma omp name2 [clause, ...]
{
…
}
}
int main(){
#pragma omp name1 [clause, ...]
{
f();
…
}
}
Prof. Pier Luca Lanzi

24
Directive Scoping: example

25

void f(){
#pragma omp name2 [clause, ...]
{
…
}
}

Static extent of
name1

int main(){
#pragma omp name1 [clause, ...]
{
f();
…
}
}
Prof. Pier Luca Lanzi
Directive Scoping: example
void f(){
#pragma omp name2 [clause, ...]
{
…
}

Orphaned
directive

}

Dynamic extent of
name1

int main(){
#pragma omp name1 [clause, ...]
{
f();
…
}
}
Prof. Pier Luca Lanzi
Parallel Construct
Prof. Pier Luca Lanzi
The Parallel Construct/Directive

•

28

Plays a crucial role in OpenMP, a program without a
parallel construct will be executed sequentially
#pragma omp parallel [clause, clause, ... ]
{
// block
}

•
•

When a thread reaches a parallel directive, it creates a
team of threads and becomes the master of the team.
The master is a member of that team and has thread
number 0 within that team.

Prof. Pier Luca Lanzi
The Parallel Construct/Directive

•

•
•
•

29

Starting from the beginning of the parallel region, the code
is duplicated and all threads will execute that code.

There is an implied barrier at the end of a parallel section.
Only the master thread continues execution past this
point.
If any thread terminates within a parallel region, all
threads in the team will terminate, and the work done up
until that point is undefined.
Threads are numbered from 0 (master thread) to N-1

Prof. Pier Luca Lanzi
How Many Threads?

•

30

The number of threads in a parallel region is determined
by the following factors, in order of precedence:
Evaluation of the if clause
Setting of the num_threads clause
Use of the omp_set_num_threads() library function
Setting of the OMP_NUM_THREADS environment
variable
Implementation default, e.g., the number of CPUs on a
node







•

When present, the if clause must evaluate to true (i.e.,
non-zero integer) in order for a team of threads to be
created; otherwise, the region is executed serially by the
master thread. Prof. Pier Luca Lanzi
Run-Time Library Routines
Prof. Pier Luca Lanzi
Overview

•

32

The OpenMP standard defines an API for a variety of library functions:
Query the number of threads/processors, set number of threads to
use
General purpose locking routines (semaphores)
Portable wall clock timing routines






•
•
•

Set execution environment functions: nested parallelism, dynamic
adjustment of threads
It may be necessary to specify the include file "omp.h".
For the lock routines/functions
The lock variable must be accessed only through the locking
routines
The lock variable must have type omp_lock_t or type
omp_nest_lock_t, depending on the function being used.




Prof. Pier Luca Lanzi
Run-Time Routines

•

33

void omp_set_num_threads(int num_threads)

 Sets the number of threads that will be used in the next parallel
region
num_threads must be a postive integer


 This routine can only be called from the serial portions of the code
•

int omp_get_num_threads(void)

 Returns the number of threads that are currently in the team
executing the parallel region from which it is called

•

int omp_get_thread_num(void)

 Returns the thread number of the thread, within the team,


making this call.
This number will be between 0 and OMP_GET_NUM_THREADS1. The master thread of the team is thread 0
Prof. Pier Luca Lanzi
A better Hello World in OpenMP

34

#include <omp.h>
#include <iostream>
using namespace std;
int main()
{
#pragma omp parallel num_threads(3)
{
cout << "Hello World. I am thread";
cout << omp_get_thread_num() << endl;
}
}
$ ./hello
Hello World. I am thread 0
Hello World. I am thread 2
Hello World. I am thread 1
Prof. Pier Luca Lanzi
Conditional Compiling
// openmp03.cpp
#ifdef _OPENMP
#include <omp.h>
#else
#define omp_get_thread_num() 0
#endif
...
// numero del thread corrente
int TID = omp_get_thread_num();
...

Prof. Pier Luca Lanzi

35
Data Scope Attribute Clauses
Prof. Pier Luca Lanzi
Variables Scope in OpenMP

•

•
•

37

OpenMP is based upon the shared memory programming
model, most variables are shared by default

The OpenMP Data Scope Attribute Clauses are used to
explicitly define how variables should be scoped
Four main scope types
private
shared
default
reduction






Prof. Pier Luca Lanzi
Variables Scope in OpenMP

•

38

Data Scope Attribute Clauses are used in conjunction with
several directives to
Define how and which data variables in the serial
section of the program are transferred to the parallel
sections
Define which variables will be visible to all threads in
the parallel sections and which variables will be
privately allocated to all threads




•

Data Scope Attribute Clauses are effective only within
their lexical/static extent.

Prof. Pier Luca Lanzi
The private Clause

•

•
•

39

Syntax: private (list)
The private clause declares variables in its list to be
private to each thread
Private variables behave as follows:
a new object of the same type is declared once for
each thread in the team
all references to the original object are replaced with
references to the new object
variables should be assumed to be uninitialized for
each thread





Prof. Pier Luca Lanzi
Example: private Clause

40

#pragma omp parallel for private(i,a)
for (i=0; i<n; i++)
{
a = i+1;
printf("Thread %d has a value of a = %d for i = %dn",
omp_get_thread_num(),a,i);
} /*-- End of parallel for --*/

Prof. Pier Luca Lanzi
The shared Clause

•

•
•

•

41

Syntax: shared (list)
The shared clause declares variables in its list to be
shared among all threads in the team.
A shared variable exists in only one memory location and
all threads can read or write to that address.
It is the programmer's responsibility to ensure that multiple
threads properly access shared variables

Prof. Pier Luca Lanzi
Example: shared Clause
#pragma omp parallel for shared(a)
for (i=0; i<n; i++)
{
a[i] += i;
} /*-- End of parallel for --*/

Prof. Pier Luca Lanzi

42
The default clause

•

•
•
•
•

43

Syntax: default (shared | none)
The default scope of all variables is either set to shared or
none
The default clause allows the user to specify a default
scope for all variables in the lexical extent of any parallel
region
Specific variables can be exempted from the default using
specific clauses (e.g., private, shared, etc.)
Using none as a default requires that the programmer
explicitly scope all variables.

Prof. Pier Luca Lanzi
The reduction Clause

•

•
•
•
•

44

Syntax: reduction (operator: list)
Performs a reduction on the variables that appear in its list
Operator defines the reduction operation
A private copy for each list variable is created for each
thread
At the end of the reduction, the reduction variable is
applied to all private copies of the shared variable, and the
final result is written to the global shared variable

Prof. Pier Luca Lanzi
Work-Sharing
Prof. Pier Luca Lanzi
Work-Sharing Constructs

•
•
•
•

46

Divide the execution of the enclosed code region among the members of the
team that encounter it
A work-sharing construct must be enclosed dynamically within a parallel region in
order for the directive to execute in parallel
Work-sharing constructs do not launch new threads
There is no implied barrier upon entry to a work-sharing construct, however there is an
implied barrier at the end of a work sharing construct

for: shares iterations of a loop
across the team (data
parallelism)

sections: breaks work into
separate, discrete sections, each
executed by a thread (functional
parallelism)

Prof. Pier Luca Lanzi

single: serializes a
section of code.
The for Construct/Directive

•

•
•
•

•

47

#pragma omp for [clause ...]
for_loop
Three main clause types: schedule, nowait, and the data scope
schedule (type [,chunk]) describes how iterations of the loop are
divided among the threads in the team (default schedule is
implementation dependent)
nowait if specified, then threads do not synchronize at the end of the
parallel loop.
The data-scope clauses (private, shared, reduction)

Prof. Pier Luca Lanzi
The for Construct/Directive
(Schedule Types)

•

•
•
•

48

static: loop iterations are divided into blocks of size chunk
and then statically assigned to threads
If chunk is not specified, the iterations are evenly (if
possible) divided contiguously among the threads
dynamic: loop iterations are divided into blocks of size
chunk, and dynamically scheduled among the threads
When a thread finishes one chunk, it is dynamically
assigned another. The default chunk size is 1

Prof. Pier Luca Lanzi
Example: Dot Product (1)
int
main()
{
double sum;
vector<double> a(10000000), b(10000000);
generate(a.begin(),a.end(),drand48);
generate(b.begin(),b.end(),drand48);
sum = 0;
for (int i=0; i<a.size(); ++i)
{
sum = sum + a[i]*b[i];
}
cout << sum << endl;
}

Prof. Pier Luca Lanzi

49
Example: Dot Product (2)
// openmp03-dotp-openmp
int
main()
{
double sum;
vector<double> a(10000000), b(10000000);
generate(a.begin(),a.end(),drand48);
generate(b.begin(),b.end(),drand48);
sum = 0;
#pragma omp parallel for reduction(+: sum)
for (int i=0; i<a.size(); ++i)
{
sum = sum + a[i]*b[i];
}
cout << sum << endl;
}

Prof. Pier Luca Lanzi

50
Example: Dot Product (3)

51

// openmp03-dotp-openmp3
...
int ChunkSize = a.size()/omp_get_max_threads();
#pragma omp parallel
{
#pragma omp for schedule(dynamic, ChunkSize) reduction(+: sum)
for (int i=0; i<a.size(); ++i)
{
sum = sum + a[i]*b[i];
}
}

Prof. Pier Luca Lanzi
Example: Varying Loads

52

// openmp06
...
void f(vector<double> &a, vector<double> &b, int n)
{
int i, j;
#pragma omp parallel shared(n)
{
// schedule is dynamic of size 1 to manage the varying size of load
#pragma omp for schedule(dynamic,1) private (i,j) nowait
for (i = 1; i < n; i++)
for (j = 0; j < i; j++)
b[j + n*i] = (a[j + n*i] + a[j + n*(i-1)]) / 2.0;
}
}
//
//
//
//

time ./openmp04-seq 10000 (n is 10000^2)
11.20 real
5.68 user
0.68 sys
time ./openmp04 10000
4.55 real
13.07 user
0.62 sys
Prof. Pier Luca Lanzi
The sections Construct/Directive
#pragma omp sections [clause ...]
{
#pragma omp section
{
// execute A
}
#pragma omp section
{
// execute A
}
}

•

•

53

private
reduction
nowait
etc.

Specifies that the enclosed section(s) of code are to be divided
among the threads in the team
Each section is executed once by a thread in the team. Different
sections may be executed by different threads. It is possible that for a
thread to execute more than one section.
Prof. Pier Luca Lanzi
Example: Dot Product (vectors init)
#pragma omp sections nowait
{
#pragma omp section
{
int ita;
#pragma omp parallel for shared(a) private(ita)
schedule(dynamic,ChunkSize)
for(ita=0; ita<a.size(); ++ita)
{
a[ita] = drand48();
}
}
#pragma omp section
{
int itb;
#pragma omp parallel for shared(b) private(itb)
schedule(dynamic,ChunkSize)
for(itb=0; itb<b.size(); ++itb)
{
b[itb] = drand48();
}
}
}

Prof. Pier Luca Lanzi

54
The single Construct/Directive

•
•
•

55

The single construct is associated with the structured
block of code immediately following it and specifies that
this block should be executed by one thread only
It does not state which thread should execute the code
block
The thread chosen could vary from one run to another. It
can also differ for different single constructs within one
application
#pragma omp single [private | nowait | …]
{
...
Prof. Pier Luca Lanzi
}
Example: Single Construct/Directive
#pragma omp parallel shared(a,b) private(i)
{
#pragma omp single
{
a = 10;
printf("Single construct executed by thread %dn",
omp_get_thread_num());
} /* A barrier is automatically inserted here */
#pragma omp for
for (i=0; i<n; i++)
b[i] = a;
} /*-- End of parallel region --*/

Prof. Pier Luca Lanzi

56
Synchronization
Prof. Pier Luca Lanzi
The critical Construct/Directive

•

•
•
•

58

#pragma omp critical [ name ]
{
…
}
The critical directive specifies a region of code that must be executed
by only one thread at a time
If a thread is currently executing inside a critical region and another
thread reaches that critical region and attempts to execute it, it will
block until the first thread exits that critical region
The optional name enables multiple different critical regions
Names act as global identifiers
Critical regions with the same name are treated as the same
region
All critical sections which are unnamed, are treated as the same
section





Prof. Pier Luca Lanzi
Example: critical Construct/Directive

59

sum = 0;
#pragma omp parallel shared(n,a,sum) private(TID,sumLocal)
{
TID = omp_get_thread_num();
double sumLocal = 0;
#pragma omp for
for (size_t i=0; i<n; i++)
sumLocal += a[i];
#pragma omp critical (update_sum)
{
sum += sumLocal;
cout << "TID=" << TID << "sumLocal="
<< sumLocal << " sum=" << sum << endl;
}
} /*-- End of parallel region --*/
cout << "Value of sum after parallel region: " << sum << endl;

Prof. Pier Luca Lanzi
The barrier Construct/Directive

•

•
•
•

60

Syntax: #pragma omp barrier
The barrier directive synchronizes all threads in the team.
When a barrier directive is reached, a thread will wait at
that point until all other threads have reached that barrier.
All threads then resume executing in parallel the code that
follows the barrier.
All threads in a team (or none) must execute the barrier
region.

Prof. Pier Luca Lanzi
When to Use OpenMP?
Target platform is multicore or multiprocessor
Application is cross-platform
Parallelizable loops
Last-minute optimizations

Prof. Pier Luca Lanzi

Mais conteúdo relacionado

Mais procurados

Erlang Message Passing Concurrency, For The Win
Erlang  Message  Passing  Concurrency,  For  The  WinErlang  Message  Passing  Concurrency,  For  The  Win
Erlang Message Passing Concurrency, For The Win
l xf
 
Improving Robustness In Distributed Systems
Improving Robustness In Distributed SystemsImproving Robustness In Distributed Systems
Improving Robustness In Distributed Systems
l xf
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
CherryBerry2
 
bfgasnet_pr-v2
bfgasnet_pr-v2bfgasnet_pr-v2
bfgasnet_pr-v2
Zeus G
 

Mais procurados (20)

Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI
 
Erlang Message Passing Concurrency, For The Win
Erlang  Message  Passing  Concurrency,  For  The  WinErlang  Message  Passing  Concurrency,  For  The  Win
Erlang Message Passing Concurrency, For The Win
 
Use of an Oscilloscope - maXbox Starter33
Use of an Oscilloscope - maXbox Starter33Use of an Oscilloscope - maXbox Starter33
Use of an Oscilloscope - maXbox Starter33
 
Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.
 
Everything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryEverything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI Library
 
Parallel program design
Parallel program designParallel program design
Parallel program design
 
Pthread Library
Pthread LibraryPthread Library
Pthread Library
 
A closure ekon16
A closure ekon16A closure ekon16
A closure ekon16
 
Improving Robustness In Distributed Systems
Improving Robustness In Distributed SystemsImproving Robustness In Distributed Systems
Improving Robustness In Distributed Systems
 
Metrics ekon 14_2_kleiner
Metrics ekon 14_2_kleinerMetrics ekon 14_2_kleiner
Metrics ekon 14_2_kleiner
 
Intro to OpenMP
Intro to OpenMPIntro to OpenMP
Intro to OpenMP
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
 
Open mp intro_01
Open mp intro_01Open mp intro_01
Open mp intro_01
 
CSP as a Domain-Specific Language Embedded in Python and Jython
CSP as a Domain-Specific Language Embedded in Python and JythonCSP as a Domain-Specific Language Embedded in Python and Jython
CSP as a Domain-Specific Language Embedded in Python and Jython
 
Sour Pickles
Sour PicklesSour Pickles
Sour Pickles
 
bfgasnet_pr-v2
bfgasnet_pr-v2bfgasnet_pr-v2
bfgasnet_pr-v2
 
OpenMP
OpenMPOpenMP
OpenMP
 
Parallelization using open mp
Parallelization using open mpParallelization using open mp
Parallelization using open mp
 
Computer Networks Omnet
Computer Networks OmnetComputer Networks Omnet
Computer Networks Omnet
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 

Semelhante a Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP

Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
Anshul Sharma
 

Semelhante a Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP (20)

openmp
openmpopenmp
openmp
 
Lecture6
Lecture6Lecture6
Lecture6
 
6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx
 
2023comp90024_workshop.pdf
2023comp90024_workshop.pdf2023comp90024_workshop.pdf
2023comp90024_workshop.pdf
 
OpenMP
OpenMPOpenMP
OpenMP
 
Operating System Chapter 4 Multithreaded programming
Operating System Chapter 4 Multithreaded programmingOperating System Chapter 4 Multithreaded programming
Operating System Chapter 4 Multithreaded programming
 
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization
 
Threads
ThreadsThreads
Threads
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
MPI n OpenMP
MPI n OpenMPMPI n OpenMP
MPI n OpenMP
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
OpenMp
OpenMpOpenMp
OpenMp
 
Multithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdfMultithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdf
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Os lectures
Os lecturesOs lectures
Os lectures
 
Parllelizaion
ParllelizaionParllelizaion
Parllelizaion
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
 
Lecture5
Lecture5Lecture5
Lecture5
 

Mais de Pier Luca Lanzi

Mais de Pier Luca Lanzi (20)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules
 

Último

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Último (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP

  • 1. OpenMP Algoritmi e Calcolo Parallelo Prof. Pier Luca Lanzi
  • 2. References • • • 2 Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost and Ruud van der Pas OpenMP.org http://openmp.org/ OpenMP Tutorial https://computing.llnl.gov/tutorials/openMP/ Prof. Pier Luca Lanzi
  • 4. Threads for Dummies (1) • • • 4 A thread is a sequence of instructions to be executed within a program Usually a program consist of a single thread of execution that starts in main(). Each line of your code is executed in turn, exactly one line at a time. A (UNIX) process consists of A virtual address space (i.e., stack, data, and code segments) System resources (e.g., open files) A thread of execution    Prof. Pier Luca Lanzi
  • 5. Threads for Dummies (2) • • • • 5 In a threaded environment, one process consists of multiple threads Each user process can have several threads of execution, different parts of the user’s code can be executing simultaneously The collection of threads form a single process! Each thread in the process sees the same virtual address space, files, etc. Prof. Pier Luca Lanzi
  • 6. You can manage the multiple threads in your process yourself (but it is quite complex!) Or you can let OpenMP do it almost automatically! Prof. Pier Luca Lanzi
  • 7. The Parallelization Model of OpenMP Prof. Pier Luca Lanzi 7
  • 10. OpenMP = Open Multi-Processing API for multi-platform shared memory multiprocessing programming Supports C/C++ and Fortran Available on Unix and on MS Windows Compiler directives, Library routines, & Environment variables Prof. Pier Luca Lanzi
  • 11. History of OpenMP • • • • 11 The OpenMP Architecture Review Board (ARB) published its first API specifications, OpenMP for Fortran 1.0, in October 1997. October the following year they released the C/C++ standard. 2000 saw version 2.0 of the Fortran specifications with version 2.0 of the C/C++ specifications being released in 2002. Version 2.5 is a combined C/C++/Fortran specification that was released in 2005. Version 3.0, released in May, 2008, is the current version of the API specifications. Included in the new features in 3.0 is the concept of tasks and the task construct. These new features are summarized in Appendix F of the OpenMP 3.0 specifications. Prof. Pier Luca Lanzi
  • 12. What are the Goals of OpenMP? • • • Standardization Provide a standard among a variety of shared memory architectures Lean and Mean Establish a simple and limited set of directives for programming shared memory machines (significant parallelism can be implemented by using just 3 or 4 directives) Ease of Use Provide capability to incrementally parallelize a serial program (unlike message-passing libraries) Provide the capability to implement both coarse-grain and fine-grain parallelism Portability Supports Fortran (77, 90, and 95), C, and C++     • 12  Prof. Pier Luca Lanzi
  • 13. How Does It Work? • • • • 13 OpenMP is an implementation of multithreading A master thread "forks" a specified number of slave threads Tasks are divided among slaves Slaves run concurrently as the runtime environment allocating threads to different processors Prof. Pier Luca Lanzi
  • 14. Most OpenMP constructs are compiler directives or pragmas The focus of OpenMP is to parallelize loops OpenMP offers an incremental approach to parallelism Prof. Pier Luca Lanzi
  • 15. Overview of OpenMP • 15 A compiler directive in C/C++ is called a pragma (pragmatic information). It is a preprocessor directive, thus it is declared with a hash (#). Compiler directives specific to OpenMP in C/C++ are written in codes as follows: #pragma omp <rest of pragma> Prof. Pier Luca Lanzi
  • 16. The OpenMP Programming Model • • • • • 16 Based on compiler directives Nested Parallelism Support API allows parallel constructs inside other parallel constructs Dynamic Threads API allows to dynamically change the number of threads which may used to execute different parallel regions Input/Output OpenMP specifies nothing about parallel I/O The programmer has to insure that I/O is conducted correctly within the context of a multi-threaded program Memory consistency Threads can "cache" their data and are not required to maintain exact consistency with real memory all of the time When it is critical that all threads view a shared variable identically, the programmer is responsible for insuring that the variable is FLUSHed by all threads as needed       Prof. Pier Luca Lanzi
  • 18. Hello World 18 #include <iostream> using namespace std; int main() { cout << "Hello Worldn"; } $ ./hello Hello World Prof. Pier Luca Lanzi
  • 19. Hello World in OpenMP #include <omp.h> $ ./hello #include <iostream> Hello World using namespace std; int main() Hello World { Hello World #pragma omp parallel num_threads(3) { cout << "Hello Worldn”; // what if // cout << "Hello World" << endl; } } Prof. Pier Luca Lanzi 19
  • 20. Compiling an OpenMP Program • • 20 OpenMP consists of a set of directives and macros that are translated in C/C++ and thread instructions To compile a program using OpenMP, we need to include the flag “openmp”, g++ -o example ./source/openmp00.cpp –fopenmp • The command compile the file openmp00.cpp using openmp and generates the executable named “example” Prof. Pier Luca Lanzi
  • 21. What Does Really Happen? • • 21 OpenMP consists of a set of directives and macros that are translated in C/C++ and thread instructions We can check how the original program using OpenMP is translated, using the command g++ -fdump-tree-omplower ./source/openmp00.cpp – fopenmp Prof. Pier Luca Lanzi
  • 22. OpenMP Directives • • • • A valid OpenMP directive must appear after the #pragma omp name name identify an OpenMP directive The directive name can be followed by optional clauses (in any order) An OpenMP directive precedes the structured block which is enclosed by the directive itself #pragma omp name [clause, ...] { … } • 22 Example #pragma omp parallel num_threads(3) { cout << "Hello Worldn"; } Prof. Pier Luca Lanzi
  • 23. Directive Scoping • 23 Static (Lexical) Extent The code textually enclosed between the beginning and the end of a structured block following a directive The static extent of a directives does not span multiple routines or code files Orphaned Directive An OpenMP directive that appears independently from another enclosing directive. It exists outside of another directive's static (lexical) extent. Will span routines and possibly code files Dynamic Extent The dynamic extent of a directive includes both its static (lexical) extent and the extents of its orphaned directives.   • •    Prof. Pier Luca Lanzi
  • 24. Directive Scoping: example void f(){ #pragma omp name2 [clause, ...] { … } } int main(){ #pragma omp name1 [clause, ...] { f(); … } } Prof. Pier Luca Lanzi 24
  • 25. Directive Scoping: example 25 void f(){ #pragma omp name2 [clause, ...] { … } } Static extent of name1 int main(){ #pragma omp name1 [clause, ...] { f(); … } } Prof. Pier Luca Lanzi
  • 26. Directive Scoping: example void f(){ #pragma omp name2 [clause, ...] { … } Orphaned directive } Dynamic extent of name1 int main(){ #pragma omp name1 [clause, ...] { f(); … } } Prof. Pier Luca Lanzi
  • 28. The Parallel Construct/Directive • 28 Plays a crucial role in OpenMP, a program without a parallel construct will be executed sequentially #pragma omp parallel [clause, clause, ... ] { // block } • • When a thread reaches a parallel directive, it creates a team of threads and becomes the master of the team. The master is a member of that team and has thread number 0 within that team. Prof. Pier Luca Lanzi
  • 29. The Parallel Construct/Directive • • • • 29 Starting from the beginning of the parallel region, the code is duplicated and all threads will execute that code. There is an implied barrier at the end of a parallel section. Only the master thread continues execution past this point. If any thread terminates within a parallel region, all threads in the team will terminate, and the work done up until that point is undefined. Threads are numbered from 0 (master thread) to N-1 Prof. Pier Luca Lanzi
  • 30. How Many Threads? • 30 The number of threads in a parallel region is determined by the following factors, in order of precedence: Evaluation of the if clause Setting of the num_threads clause Use of the omp_set_num_threads() library function Setting of the OMP_NUM_THREADS environment variable Implementation default, e.g., the number of CPUs on a node      • When present, the if clause must evaluate to true (i.e., non-zero integer) in order for a team of threads to be created; otherwise, the region is executed serially by the master thread. Prof. Pier Luca Lanzi
  • 32. Overview • 32 The OpenMP standard defines an API for a variety of library functions: Query the number of threads/processors, set number of threads to use General purpose locking routines (semaphores) Portable wall clock timing routines    • • • Set execution environment functions: nested parallelism, dynamic adjustment of threads It may be necessary to specify the include file "omp.h". For the lock routines/functions The lock variable must be accessed only through the locking routines The lock variable must have type omp_lock_t or type omp_nest_lock_t, depending on the function being used.   Prof. Pier Luca Lanzi
  • 33. Run-Time Routines • 33 void omp_set_num_threads(int num_threads)  Sets the number of threads that will be used in the next parallel region num_threads must be a postive integer   This routine can only be called from the serial portions of the code • int omp_get_num_threads(void)  Returns the number of threads that are currently in the team executing the parallel region from which it is called • int omp_get_thread_num(void)  Returns the thread number of the thread, within the team,  making this call. This number will be between 0 and OMP_GET_NUM_THREADS1. The master thread of the team is thread 0 Prof. Pier Luca Lanzi
  • 34. A better Hello World in OpenMP 34 #include <omp.h> #include <iostream> using namespace std; int main() { #pragma omp parallel num_threads(3) { cout << "Hello World. I am thread"; cout << omp_get_thread_num() << endl; } } $ ./hello Hello World. I am thread 0 Hello World. I am thread 2 Hello World. I am thread 1 Prof. Pier Luca Lanzi
  • 35. Conditional Compiling // openmp03.cpp #ifdef _OPENMP #include <omp.h> #else #define omp_get_thread_num() 0 #endif ... // numero del thread corrente int TID = omp_get_thread_num(); ... Prof. Pier Luca Lanzi 35
  • 36. Data Scope Attribute Clauses Prof. Pier Luca Lanzi
  • 37. Variables Scope in OpenMP • • • 37 OpenMP is based upon the shared memory programming model, most variables are shared by default The OpenMP Data Scope Attribute Clauses are used to explicitly define how variables should be scoped Four main scope types private shared default reduction     Prof. Pier Luca Lanzi
  • 38. Variables Scope in OpenMP • 38 Data Scope Attribute Clauses are used in conjunction with several directives to Define how and which data variables in the serial section of the program are transferred to the parallel sections Define which variables will be visible to all threads in the parallel sections and which variables will be privately allocated to all threads   • Data Scope Attribute Clauses are effective only within their lexical/static extent. Prof. Pier Luca Lanzi
  • 39. The private Clause • • • 39 Syntax: private (list) The private clause declares variables in its list to be private to each thread Private variables behave as follows: a new object of the same type is declared once for each thread in the team all references to the original object are replaced with references to the new object variables should be assumed to be uninitialized for each thread    Prof. Pier Luca Lanzi
  • 40. Example: private Clause 40 #pragma omp parallel for private(i,a) for (i=0; i<n; i++) { a = i+1; printf("Thread %d has a value of a = %d for i = %dn", omp_get_thread_num(),a,i); } /*-- End of parallel for --*/ Prof. Pier Luca Lanzi
  • 41. The shared Clause • • • • 41 Syntax: shared (list) The shared clause declares variables in its list to be shared among all threads in the team. A shared variable exists in only one memory location and all threads can read or write to that address. It is the programmer's responsibility to ensure that multiple threads properly access shared variables Prof. Pier Luca Lanzi
  • 42. Example: shared Clause #pragma omp parallel for shared(a) for (i=0; i<n; i++) { a[i] += i; } /*-- End of parallel for --*/ Prof. Pier Luca Lanzi 42
  • 43. The default clause • • • • • 43 Syntax: default (shared | none) The default scope of all variables is either set to shared or none The default clause allows the user to specify a default scope for all variables in the lexical extent of any parallel region Specific variables can be exempted from the default using specific clauses (e.g., private, shared, etc.) Using none as a default requires that the programmer explicitly scope all variables. Prof. Pier Luca Lanzi
  • 44. The reduction Clause • • • • • 44 Syntax: reduction (operator: list) Performs a reduction on the variables that appear in its list Operator defines the reduction operation A private copy for each list variable is created for each thread At the end of the reduction, the reduction variable is applied to all private copies of the shared variable, and the final result is written to the global shared variable Prof. Pier Luca Lanzi
  • 46. Work-Sharing Constructs • • • • 46 Divide the execution of the enclosed code region among the members of the team that encounter it A work-sharing construct must be enclosed dynamically within a parallel region in order for the directive to execute in parallel Work-sharing constructs do not launch new threads There is no implied barrier upon entry to a work-sharing construct, however there is an implied barrier at the end of a work sharing construct for: shares iterations of a loop across the team (data parallelism) sections: breaks work into separate, discrete sections, each executed by a thread (functional parallelism) Prof. Pier Luca Lanzi single: serializes a section of code.
  • 47. The for Construct/Directive • • • • • 47 #pragma omp for [clause ...] for_loop Three main clause types: schedule, nowait, and the data scope schedule (type [,chunk]) describes how iterations of the loop are divided among the threads in the team (default schedule is implementation dependent) nowait if specified, then threads do not synchronize at the end of the parallel loop. The data-scope clauses (private, shared, reduction) Prof. Pier Luca Lanzi
  • 48. The for Construct/Directive (Schedule Types) • • • • 48 static: loop iterations are divided into blocks of size chunk and then statically assigned to threads If chunk is not specified, the iterations are evenly (if possible) divided contiguously among the threads dynamic: loop iterations are divided into blocks of size chunk, and dynamically scheduled among the threads When a thread finishes one chunk, it is dynamically assigned another. The default chunk size is 1 Prof. Pier Luca Lanzi
  • 49. Example: Dot Product (1) int main() { double sum; vector<double> a(10000000), b(10000000); generate(a.begin(),a.end(),drand48); generate(b.begin(),b.end(),drand48); sum = 0; for (int i=0; i<a.size(); ++i) { sum = sum + a[i]*b[i]; } cout << sum << endl; } Prof. Pier Luca Lanzi 49
  • 50. Example: Dot Product (2) // openmp03-dotp-openmp int main() { double sum; vector<double> a(10000000), b(10000000); generate(a.begin(),a.end(),drand48); generate(b.begin(),b.end(),drand48); sum = 0; #pragma omp parallel for reduction(+: sum) for (int i=0; i<a.size(); ++i) { sum = sum + a[i]*b[i]; } cout << sum << endl; } Prof. Pier Luca Lanzi 50
  • 51. Example: Dot Product (3) 51 // openmp03-dotp-openmp3 ... int ChunkSize = a.size()/omp_get_max_threads(); #pragma omp parallel { #pragma omp for schedule(dynamic, ChunkSize) reduction(+: sum) for (int i=0; i<a.size(); ++i) { sum = sum + a[i]*b[i]; } } Prof. Pier Luca Lanzi
  • 52. Example: Varying Loads 52 // openmp06 ... void f(vector<double> &a, vector<double> &b, int n) { int i, j; #pragma omp parallel shared(n) { // schedule is dynamic of size 1 to manage the varying size of load #pragma omp for schedule(dynamic,1) private (i,j) nowait for (i = 1; i < n; i++) for (j = 0; j < i; j++) b[j + n*i] = (a[j + n*i] + a[j + n*(i-1)]) / 2.0; } } // // // // time ./openmp04-seq 10000 (n is 10000^2) 11.20 real 5.68 user 0.68 sys time ./openmp04 10000 4.55 real 13.07 user 0.62 sys Prof. Pier Luca Lanzi
  • 53. The sections Construct/Directive #pragma omp sections [clause ...] { #pragma omp section { // execute A } #pragma omp section { // execute A } } • • 53 private reduction nowait etc. Specifies that the enclosed section(s) of code are to be divided among the threads in the team Each section is executed once by a thread in the team. Different sections may be executed by different threads. It is possible that for a thread to execute more than one section. Prof. Pier Luca Lanzi
  • 54. Example: Dot Product (vectors init) #pragma omp sections nowait { #pragma omp section { int ita; #pragma omp parallel for shared(a) private(ita) schedule(dynamic,ChunkSize) for(ita=0; ita<a.size(); ++ita) { a[ita] = drand48(); } } #pragma omp section { int itb; #pragma omp parallel for shared(b) private(itb) schedule(dynamic,ChunkSize) for(itb=0; itb<b.size(); ++itb) { b[itb] = drand48(); } } } Prof. Pier Luca Lanzi 54
  • 55. The single Construct/Directive • • • 55 The single construct is associated with the structured block of code immediately following it and specifies that this block should be executed by one thread only It does not state which thread should execute the code block The thread chosen could vary from one run to another. It can also differ for different single constructs within one application #pragma omp single [private | nowait | …] { ... Prof. Pier Luca Lanzi }
  • 56. Example: Single Construct/Directive #pragma omp parallel shared(a,b) private(i) { #pragma omp single { a = 10; printf("Single construct executed by thread %dn", omp_get_thread_num()); } /* A barrier is automatically inserted here */ #pragma omp for for (i=0; i<n; i++) b[i] = a; } /*-- End of parallel region --*/ Prof. Pier Luca Lanzi 56
  • 58. The critical Construct/Directive • • • • 58 #pragma omp critical [ name ] { … } The critical directive specifies a region of code that must be executed by only one thread at a time If a thread is currently executing inside a critical region and another thread reaches that critical region and attempts to execute it, it will block until the first thread exits that critical region The optional name enables multiple different critical regions Names act as global identifiers Critical regions with the same name are treated as the same region All critical sections which are unnamed, are treated as the same section    Prof. Pier Luca Lanzi
  • 59. Example: critical Construct/Directive 59 sum = 0; #pragma omp parallel shared(n,a,sum) private(TID,sumLocal) { TID = omp_get_thread_num(); double sumLocal = 0; #pragma omp for for (size_t i=0; i<n; i++) sumLocal += a[i]; #pragma omp critical (update_sum) { sum += sumLocal; cout << "TID=" << TID << "sumLocal=" << sumLocal << " sum=" << sum << endl; } } /*-- End of parallel region --*/ cout << "Value of sum after parallel region: " << sum << endl; Prof. Pier Luca Lanzi
  • 60. The barrier Construct/Directive • • • • 60 Syntax: #pragma omp barrier The barrier directive synchronizes all threads in the team. When a barrier directive is reached, a thread will wait at that point until all other threads have reached that barrier. All threads then resume executing in parallel the code that follows the barrier. All threads in a team (or none) must execute the barrier region. Prof. Pier Luca Lanzi
  • 61. When to Use OpenMP? Target platform is multicore or multiprocessor Application is cross-platform Parallelizable loops Last-minute optimizations Prof. Pier Luca Lanzi