SlideShare uma empresa Scribd logo
1 de 58
Baixar para ler offline
Parallel Programming Concepts
OpenHPI Course
Week 3 : Shared Memory Parallelism - Programming
Unit 3.1: Threads
Dr. Peter Tröger + Teaching Team
Text
Week 2 ! Week 3
■  Week 2
□  Parallelism and Concurrency
◊  Parallel programming is concurrent programming
◊  Concurrent software can leverage parallel hardware
□  Concurrency Problems - Race condition, deadlock, livelock, …
□  Critical Sections - Progress, semaphores, Mutex, …
□  Monitor Concept - Condition variables, wait(), notify(), …
□  Advanced Concurrency - Spinlocks, reader / writer locks, …
■  This week provides a walkthrough of parallel programming
approaches for shared memory systems
□  Not complete, not exhaustive
■  Consider coding exercises and additional readings
2
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Parallel Programming for Shared Memory
■  Processes
□  Concurrent processes with dedicated memory
□  Process management by operating system
□  Support for explicit memory sharing
■  Light-weight processes (LWP) / threads
□  Concurrent threads with shared process memory
□  Thread scheduling by operating system or library
□  Support for thread-local storage, if needed
■  Tasks
□  Concurrent tasks with shared process memory
□  Typically operating system not involved,
dynamic mapping to threads by task library
□  Support for private variables per task
3
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Parallel Programming for Shared Memory
4


Process
Explicitly Shared Memory
■  Different programming models for
concurrency in shared memory
■  Processes and threads mapped to
processing elements (cores)
■  Process- und thread-based
programming typically part of
operating system lectures
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Memory


Process
Memory
Thread
Thread
Task
Task
Task
Task
Concurrent Processes Concurrent Threads
Concurrent Tasks
Main Thread


Process
Memory
Main Thread


Process
Memory
Main Thread
 Thread
 Thread
POSIX Threads (PThreads)
■  Part of the POSIX specification for
operating system APIs
■  Implemented by all Unix-compatible
systems (Linux, MacOS X, Solaris, …)
■  Functionality
□  Thread lifecycle management
□  Mutex-based synchronization
□  Synchronization based on condition variables
□  Synchronization based on reader/writer locks
□  Optional support for barriers
■  Semaphore API is a separate POSIX specification (sem_ prefix)
5
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger


Process
Memory
Thread
Thread
Concurrent Threads
/*************************************************************************
AUTHOR: Blaise Barney
**************************************************************************/
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define NUM_THREADS 5
void *PrintHello(void *threadid)
{
long tid; tid = (long)threadid;
printf("Hello World! It's me, thread #%ld!n", tid);
pthread_exit(NULL);
}
int main(int argc, char *argv[])
{
pthread_t threads[NUM_THREADS];
int rc; long t;
for(t=0;t<NUM_THREADS;t++){
printf("In main: creating thread %ldn", t);
rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t);
if (rc){
printf("ERROR; return code from pthread_create() is %dn", rc);
exit(-1);
}
}
/* Last thing that main() should do */
pthread_exit(NULL);
}
POSIX Threads
■  pthread_create()
□  Run a given function as concurrent activity
□  Operating system scheduler decides upon parallel execution
■  pthread_join()
□  Blocks the caller until the specific thread terminates
□  Allows to determine exit code from pthread_exit()
■  pthread_exit()
□  Implicit call on function return
□  No release of any resources, cleanup handlers supported
7
Main thread
Worker Thread 1
pthread_create() pthread_join()
pthread_exit()
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
8
// #include statements omitted for space reasons
void *BusyWork(void *t) {
int i; long tid; double result=0.0; tid = (long)t;
printf("Thread %ld starting...n",tid);
for (i=0; i<1000000; i++) { result = result + sin(i) * tan(i); }
printf("Thread %ld done. Result = %en",tid, result);
pthread_exit((void*) t); }
int main (int argc, char *argv[]) {
pthread_t thread[NUM_THREADS]; pthread_attr_t attr; int rc; long t; void *status;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
for(t=0; t<NUM_THREADS; t++) {
printf("Main: creating thread %ldn", t);
rc = pthread_create(&thread[t], &attr, BusyWork, (void *)t);
if (rc) {
printf("ERROR; return code from pthread_create() is %dn", rc);
exit(-1);}}
pthread_attr_destroy(&attr);
for(t=0; t<NUM_THREADS; t++) {
rc = pthread_join(thread[t], &status);
if (rc) {
printf("ERROR; return code from pthread_join() is %dn", rc);
exit(-1); }
printf("Main: join with thread %ld, status %ldn",t,(long)status);}
printf("Main: program completed. Exiting.n");
pthread_exit(NULL); }
POSIX Threads
■  API supports (at least) mutex and condition variable concept
□  Thread synchronization and critical section protection
■  pthread_mutex_init()
□  Initialize new mutex, which is unlocked by default
□  Resource of the surrounding process
■  pthread_mutex_lock() and pthread_mutex_trylock()
□  Lock the mutex for the calling thread
□  Block / do not block if the mutex is already locked
■  pthread_mutex_unlock()
□  Release the mutex
□  Operating system decides which other thread is woken up
□  Focus on speed of operation, no deadlock prevention
9
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
10
#include <stdio.h>
#define NUMTHREADS 5
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int sharedData=0; int sharedData2=0;
void *theThread(void *parm)
{
printf("Thread attempts to lock mutexn");
pthread_mutex_lock(&mutex);
printf("Thread got the mutex lockn");
++sharedData; --sharedData2;
printf("Thread unlocks Mutexn");
pthread_mutex_unlock(&mutex);
return NULL;
}
int main(int argc, char **argv)
{
pthread_t thread[NUMTHREADS]; int i;
printf("Main thread attempts to lock mutexn");
pthread_mutex_lock(&mutex);
printf("Main thread got the mutex lockn");
for (i=0; i<NUMTHREADS; ++i) { // create 3 threads
pthread_create(&thread[i], NULL, theThread, NULL); }
printf("Wait a bit until we are 'done' with the shared datan");
sleep(3);
printf("Main thread unlocks mutexn");
pthread_mutex_unlock(&mutex);
for (i=0; i <NUMTHREADS; ++i) {
pthread_join(thread[i], NULL); }
pthread_mutex_destroy(&mutex);
return 0;}
POSIX API vs. Windows API
11
POSIX
 Windows
pthread_create()
 CreateThread()
pthread_exit()
 ExitThread()
pthread_cancel()
 TerminateThread()
pthread_mutex_init()
 CreateMutex()
pthread_mutex_lock()
 WaitForSingleObject()
pthread_mutex_trylock()
 WaitForSingleObject(hThread, 0)
Condition variables
 Auto-reset events
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Java, .NET, C++, PHP, …
■  High-level languages also offer threading support
■  Interpreted / JIT-compiled languages
□  Rich API for thread management and shared data structures
□  Example: Java Runnable interface, .NET System.Threading
□  Today mostly 1:1 mapping of high-level threads to operating
system threads (native threads)
■  Threads as part of the language definition (C++ 11) vs.
threads as operating system functionality (PThreads)
12
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
#include <thread>
#include <iostream>
void write_message(std::string const& message) {
std::cout<<message; }
int main() {
std::thread t(write_message, "hello world from std::threadn");
write_message("hello world from mainn");
t.join(); }
Parallel Programming Concepts
OpenHPI Course
Week 3 : Shared Memory Parallelism - Programming
Unit 3.2: Tasks with OpenMP
Dr. Peter Tröger + Teaching Team
Parallel Programming for Shared Memory
14


Process
Explicitly Shared Memory
■  Different programming models for
concurrency with shared memory
■  Processes and threads mapped to
processing elements (cores)
■  Task model supports more
fine-grained parallelization than
with native threads
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Memory


Process
Memory
Thread
Thread
Task
Task
Task
Task
Concurrent Processes Concurrent Threads
Concurrent Tasks
Main Thread


Process
Memory
Main Thread


Process
Memory
Main Thread
 Thread
 Thread
OpenMP
■  Language extension for C/C++ and Fortran
□  Versions in other languages available
■  Combination of compiler support and run-time library
□  Special compiler instructions in the code (“pragma”)
□  Expression of intended parallelization by the developer
□  Result is a binary that relies on OpenMP functionality
■  OpenMP library responsible for thread management
□  Transparent for application code
□  Additional configuration with environment variables
◊  OMP_NUM_THREADS: Upper limit for threads being used
◊  OMP_SCHEDULE: Scheduling type for parallel activities
■  State-of-the-art for portable parallel programming in C
15
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
OpenMP
■  Programming with the fork-join model
□  Master thread forks into declared tasks
□  Runtime environment may run them in parallel,
based on dynamic mapping to threads from a pool
□  Worker task barrier before finalization (join)
16
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
[Wikipedia]
OpenMP
■  Parallel region
□  Parallel tasks defined in a dedicated code block,
marked by #pragma omp parallel
□  Should have only one entry and one exit point
□  Implicit barrier at beginning and end of the block
17
[Wikipedia]
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Parallel Region
■  Encountering thread for the region generates implicit tasks
■  Task execution may suspend at some scheduling point:
□  At implicit barrier regions or barrier primitives
□  At task / taskwait constructs
□  At the end of a task region
18
#include <omp.h>
#include <stdio.h>
int main (int argc, char * const argv[]) {
#pragma omp parallel
printf("Hello from thread %d, nthreads %dn”,
omp_get_thread_num(),
omp_get_num_threads());
return 0;
}
>> gcc -fopenmp -o omp omp.c
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Work Sharing
■  Possibilities for creation of tasks inside a parallel region
□  omp sections - Define code blocks dividable among threads
◊  Implicit barrier at the end
□  omp for - Automatically divide loop iterations into tasks
◊  Implicit barrier at the end
□  omp single / master - Denotes a task to be executed only
by first arriving thread resp. the master thread
◊  Implicit barrier at the end
◊  Intended for critical sections
□  omp task - Explicitly define a task
■  Clause combinations possible:
#pragma omp parallel for
19
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Loop Parallelization
■  omp for:
Parallel execution of
iteration chunks
■  Implications on
exception handling,
break-out calls and
continue primitive
■  Mapping of threads to
iteration chunk tasks
controlled by
schedule clause
■  Large chunks are good
for caching and
overhead avoidance
■  Small chunks are good
for load balancing
20
PT 2012
#include <math.h>
void compute(int n, float *a, float *b, float *c,
float *y, float *z)
{
int i;
#pragma omp parallel
{
#pragma omp for schedule(static) nowait
for (i=0, i<n; i++) {
c[i] = (a[i] + b[i]) / 2.0;
z[i] = sqrt(c[i]);
y[i] = z[i-1] + a[i];
}
}
}
#pragma omp parallel for
for(i=0; i<n; i++) {
value = some_complex_function(a[i]);
#pragma omp critical
sum = sum+value;
}
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Loop Parallelization
■  schedule (static, [chunk])
□  Contiguous ranges of iterations (chunks) of equal size
□  Low overhead, round robin assignment to threads,
static scheduling
□  Default is one chunk per thread
■  schedule (dynamic, [chunk])
□  Threads grab iterations resp. chunks
□  Higher overhead, but good for unbalanced work load
■  schedule (guided, [chunk])
□  Dynamic schedule, shrinking ranges per step
□  Starts with large block, until minimum chunk size is reached
□  Good for computations with increasing iteration length
(e.g. prime sieve test)
21
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Data Sharing
■  shared variable: Name provides access in all tasks
□  Only tagging for the runtime, no critical section enforcement
■  private variable: Clone variable in each task
□  Results in one data copy per task
□  firstprivate:
Initialization with
last value before
region
□  lastprivate:
Result from last
loop cycle
or lexically last
section directive
22
#include <omp.h>
main () {
int nthreads, tid;
#pragma omp parallel private(tid)
{
tid = omp_get_thread_num();
printf("Hello World from thread = %dn", tid);
if (tid == 0) {
nthreads = omp_get_num_threads();
printf("Number of threads = %dn", nthreads);
}}}
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Memory Model
■  OpenMP considers the memory wall problem
□  Hide memory latency by deferring read / write operations
□  Task view on shared memory is not always consistent
■  Example: Keeping loop variable in a register for efficiency
□  Makes loop variable a thread-specific variable
□  Tasks in other threads see outdated values in memory
■  But: Sometimes a consistent view is demanded
□  flush operation - Finalize all read / write operations,
make view on shared memory consistent
□  Implicit flush on different occasions, such as barriers
□  Complicated issue, check documentation
23
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Task Scheduling
■  Classical task scheduling with central queue
□  All worker threads fetch tasks from a central queue
□  Scalability issue with increasing thread (resp. core) count
■  Work stealing in OpenMP (and other libraries)
□  Task queue per thread
□  Idling thread “steals” tasks from another thread
□  Independent from
thread scheduling
□  Only mutual
synchronization
□  No central
queue
24
Thread
New Task
Next Task
TaskQueue
Thread
New Task
Next Task
TaskQueue
Work Stealing
Parallel Programming Concepts
OpenHPI Course
Week 3 : Shared Memory Parallelism - Programming
Unit 3.3: Beyond OpenMP
Dr. Peter Tröger + Teaching Team
Cilk
■  C language combined with several new keywords
□  True language extension, instead of new compiler pragmas
□  Developed at MIT since 1994
□  Initial commercial version Cilk++ with C / C++ support
■  Since 2010, offered by Intel as Cilk Plus
□  Official language specification to foster other implementations
□  Support for Windows, Linux, and MacOS X
■  Basic concept of serialization
□  Cilk keywords may be replaced by empty operations
□  Leads to non-concurrent code
□  Promise that code semantics remain the same
26
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Intel Cilk Plus
■  Three keywords to express
potential parallelism
■  cilk_spawn: Asynchronous
function call
□  Runtime decides,
spawning is not mandated
■  cilk_sync: Wait until all
spawned calls are completed
□  Barrier for cilk_spawn activity
■  cilk_for: Allows loop iterations
to be performed in parallel
□  Runtime decides,
parallelization is not
mandated
27
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
cilk_for (int i=0; i<8; ++i)
{
do_work(i);
}
for (int i=0; i<8; ++i)
{
cilk_spawn do_work(i);
}
cilk_sync;
for (int i=0; i<8; ++i)
{
do_work(i);
}
Intel Cilk Plus
■  Cilk supports the high-level
expression of array operations
□  Gives the runtime a chance
to parallelize work
□  Intended for SIMD-style
operations without any
ordering constraints
■  New operator [:]
□  Specify data parallelism
on an array
□  array-expression[lower-
bound : length : stride]
□  Multi-dimensional sections
are supported: a[:][:]
■  Short-hand description for
complex loops
□  A[:]=5
for (i = 0; i < 10; i++)
A[i] = 5;
□  A[0:n] = 5;
□  A[0:5:2] = 5;
for (i = 0; i < 10; i += 2)
A[i] = 5;
□  A[:] = B[:];
□  A[:] = B[:] + 5;
□  D[:] = A[:] + B[:];
□  func (A[:]);
28
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Intel Threading Building Blocks (TBB)
■  Portable C++ library, toolkit for different operating systems
■  Also available as open source version
■  Complements basic OpenMP / Cilk features
□  Loop parallelization, synchronization, explicit tasks
■  High-level concurrent containers, recursion support
□  hash map, queue, vector, set
■  High-level parallel operations
□  Prefix scan, sorting, data-flow pipelining, reduction, …
□  Scalable library implementation, based on tasks
■  Unfair scheduling approach
□  Consider data locality, optimize for cache utilization
■  Comparable: Microsoft C++ Concurrency Runtime
29
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
class FibTask: public task {
public:
const long n;
long* const sum;
FibTask( long n_, long* sum_ ) :
n(n_), sum(sum_)
{}
task* execute() { // Overrides task::execute
long x, y;
FibTask& a = *new( allocate_child() ) FibTask(n-1,&x);
FibTask& b = *new( allocate_child() ) FibTask(n-2,&y);
// Set ref_count to 'two children plus one for wait".
set_ref_count(3);
// Start b running.
spawn( b );
// Start a running and wait for all children (a and b).
spawn_and_wait_for_all(a);
// Do the sum
*sum = x+y;
}
return NULL;
};
[intel.com]
fibn=fibn-1 + fibn-2
#include ″tbb/compat/thread″
#include ″tbb/tbb_allocator.h″ // zero_allocator defined here
#include ″tbb/atomic.h″
#include ″tbb/concurrent_vector.h″
using namespace tbb;
typedef concurrent_vector<atomic<Foo*>,
zero_allocator<atomic<Foo*> > > FooVector;
Foo* WaitForElement( const FooVector& v, size_t i ) {
// Wait for ith element to be allocated
while( i>=v.size() )
std::this_thread::yield();
// Wait for ith element to be constructed
while( v[i]==NULL )
std::this_thread::yield();
return v[i];
}
[intel.com]
Waiting for an element
Task-Based Concurrency in Java
■  Major concepts introduced with Java 5
■  Abstraction of task management with Executors
□  java.util.concurrent.Executor
□  Implementing object provides execute() method
□  Can execute submitted Runnable tasks
□  No assumption on where the task runs,
typically in managed thread pool
□  ThreadPoolExecutor provided by class library
■  java.util.concurrent.ExecutorService
□  Additional submit() function, which returns a Future object
■  Methods for submitting large collections of Callable s
32
High-Level Concurrency
33
Microsoft Parallel Patterns Library java.util.concurrent
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Functional Programming
■  Contrary paradigm to imperative programming
□  Program is a large set of functions
□  All these functions just map input to output
□  Treats execution as collection of function evaluations
■  Foundations in lambda calculus (1930‘s) and Lisp (late 50’s)
■  Side-effect free computation when functions have no local state
□  Function result depends only on input, not on shared data
□  Order of function evaluation becomes irrelevant
□  Automated parallelization can freely schedule the work
□  Race conditions become less probable
■  Trend to add functional programming paradigms in
imperative languages (anonymous functions, filter, map, …)
34
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Imperative to Functional
35 alert("get the lobster");!
PutInPot("lobster");!
PutInPot("water");!
!
alert("get the chicken");!
BoomBoom("chicken");!
BoomBoom("coconut");!
function Cook( i1, i2, f ) {!
alert("get the " + i1);!
f(i1); f(i2); } !
!
Cook( "lobster", "water", PutInPot);!
Cook( "chicken", "coconut", BoomBoom): !
http://www.joelonsoftware.com/items/2006/08/01.html
Optimize
•  Higher order functions
•  Functions as argument or
return value
•  Execution of parameter
function may be
transparently parallelized
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Imperative to Functional
36 alert("get the lobster");!
PutInPot("lobster");!
PutInPot("water");!
!
alert("get the chicken");!
BoomBoom("chicken");!
BoomBoom("coconut");!
http://www.joelonsoftware.com/items/2006/08/01.html
Optimize
function Cook( i1, i2, f ) {!
alert("get the " + i1);!
f(i1); f(i2); } !
!
Cook("lobster", "water", !
function(x) {alert("pot " + x); } );!
Cook("chicken", "coconut", !
function(x) {alert("boom " + x); });!
•  Anonymous functions
•  Also lambda function
or function literal
•  Convenient tool when
higher-order functions
are supported
Functional Programming
■  Higher order functions: Functions as argument or return value
■  Pure functions: No memory or I/O side effects
□  Constant result with side-effect free parameters
□  All functions (with available input) can run in parallel
■  Meanwhile many functional languages (again) available
□  JVM-based: Clojure, Scala (parts of it), …
□  Common Lisp, Erlang, F#, Haskell, ML, Ocaml, Scheme, …
■  Functional constructs (map, reduce, filter, folding, iterators,
immutable variables, …) in all popular languages (see week 6)
■  Perfect foundation for implicit parallelism
□  Instead of spawning tasks / threads, let the runtime decide
□  Demands discipline to produce pure functions
37
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Parallel Programming Concepts
OpenHPI Course
Week 3 : Shared Memory Parallelism - Programming
Unit 3.4: Scala
Dr. Peter Tröger + Teaching Team
Scala – “Scalable Language”
■  Example for the combination of OO and functional programming
□  Expressions, statements, blocks as in Java
□  Every value is an object, every operation is a method call
□  Functions as first-class concept
□  Programmer chooses concurrency syntax
□  Task-based parallelism supported by the language
■  Compiles to JVM (or .NET) byte code
■  Most language constructs are library functions
■  Interacts with class library of the runtime environment
■  Twitter moved some parts from Ruby to Scala in 2009
39
object HelloWorld extends App {
println("Hello, world!")
}
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Scala Basics
■  All data types are objects, all operations are methods
■  Operator / infix notation
□  7.5-1.5
□  “hello”+”world”
■  Object notation
□  (“hello”).+(“world”)
■  Implicit conversions, some given by default
□  (“hello”)*5
□  0.until(3) resp. 0 until 3
□  (1 to 4).foreach(println)
■  Type inference
□  var name = “Foo”
■  Immutable variables with “val”
□  val name = “Scala”
40
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Functions in Scala
■  Functions as first-class value
■  Possible to be passed as parameter, or used as result
■  () return value for procedures
def sumUpRange(f: Int => Int, a: Int, b: Int): Int =
if (a > b) 0 else f(a) + sum(f, a + 1, b)
def id(x: Int): Int = x
def sumUpIntRange (a: Int, b: Int): Int = sumUpRange(id, a, b)
def square(x: Int): Int = x * x
def sumSquareR (a: Int, b: Int): Int = sumUpRange(square, a, b)
■  Anonymous functions, type deduction
def sumUpSquares(a: Int, b: Int): Int =
sumUpRange(x => x * x, a, b)
41
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Example: Quicksort
■  Recursive
implementation of
Quicksort
■  Similar to other
imperative languages
■  swap as procedure
with empty result ()
■  Functions in functions
■  Read-only value
definition with val
42 def sort(xs: Array[Int]) {
def swap(i: Int, j: Int) {
val t = xs(i)
xs(i) = xs(j); xs(j) = t;
()
}
def sort_recursive(l: Int, r: Int) {
val pivot = xs((l + r) / 2)
var i = l; var j = r
while (i <= j) {
while (xs(i) < pivot) i += 1
while (xs(j) > pivot) j -= 1
if (i <= j) {
swap(i, j); i += 1; j -= 1
}}
if (l < j) sort_recursive(l, j)
if (i < r) sort_recursive(i, r)
}
sort_recursive(0, xs.length - 1)}
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Example: Quicksort
■  Functional style (same complexity, higher memory consumption)
□  Return empty / single element array as already sorted
□  Partition array elements according to pivot element
□  Higher-order function filter takes predicate function (“pivot > x”)
as argument and applies it for filtering
□  Sorting of sub-arrays with predefined sort function
43
def sort(xs: Array[Int]): Array[Int] = {
if (xs.length <= 1)
xs
else {
val pivot = xs(xs.length / 2)
Array.concat( sort(xs filter (pivot >)),
xs filter (pivot ==),
sort(xs filter (pivot <)))
}}OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Concurrent Programming with Scala
■  Implicit superclass is scala.AnyRef, provides typical monitor functions
scala> classOf[AnyRef].getMethods.foreach(println)
def wait()
def wait(msec: Long)
def notify()
def notifyAll()
■  Synchronized function, argument expression as critical section
def synchronized[A] (e: => A): A
■  Synchronized variable with put, blocking get and unset
val v=new scala.concurrent.SyncVar()
■  Futures, reader / writer locks, semaphores, ...
val x = future(someLengthyComputation)
anotherLengthyComputation
val y = f(x()) + g(x())
■  Explicit parallelism through spawn (expr) and Actor concept (see week 5)
44
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Parallel Collections
45 scala> var sum = 0
sum: Int = 0
scala> val list = (1 to 1000).toList.par
list: scala.collection.parallel.immutable.ParSeq[Int] =
ParVector(1, 2, 3,…
scala> list.foreach(sum += _); sum
res01: Int = 467766
scala> var sum = 0
sum: Int = 0
scala> list.foreach(sum += _); sum
res02: Int = 457073
scala> var sum = 0
sum: Int = 0
scala> list.foreach(sum += _); sum
res03: Int = 468520
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
■  foreach on a parallel
collection data type is
automatically parallelized
■  Rich support for different
data structures
■  Example: Parallel
collections can still lead
to race conditions
□  “+=“ operator reads
and writes on the
same variable
Parallel Programming Concepts
OpenHPI Course
Week 3 : Shared Memory Parallelism - Programming
Unit 3.5: Partitioned Global Address Space
Dr. Peter Tröger + Teaching Team
Memory Model
■  Concurrency a first-class language citizen
□  Demands a memory model for the language
◊  When is a written value visible?
◊  Example: OpenMP flush directive
□  Leads to ‘promises’ about the memory access behavior
□  Beyond them, compiler and ILP can optimize the code
■  Proper memory model brings predictable concurrency
■  Examples
□  X86 processor machine code specification (native code)
□  C++11 language specification (native code)
□  C# language specification
□  Java memory model specification in JSR-133
■  Is this enough?
47
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Socket
NUMA
■  Eight cores on 2 sockets in an SMP system
■  Memory controllers + chip interconnect realize a single memory
address space for the software
Core Core
L1 L1
L3 Cache
RAM
L2 L2
Core Core
L1
L2
L1
L2
Memory Controller
RAM
Chip
Interconnect
Socket
Core Core
L1 L1
L3 Cache
L2 L2
Core Core
L1
L2
L1
L2
Memory Controller
48
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
PGAS Languages
■  Non-uniform memory architectures (NUMA) became default
■  But: Understanding of memory in programming is flat
□  All variables are equal in access time
□  Considering the memory hierarchy is low-level coding
(e.g. cache-aware programming)
■  Partitioned global address space (PGAS) approach
□  Driven by high-performance computing community
□  Modern approach for large-scale NUMA
□  Explicit notion of memory partition per processor
◊  Data is designated as local (near) or global (possibly far)
◊  Programmer is aware of NUMA nodes
□  Performance optimization for deep memory hierarchies
49
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
PGAS Languages and Libraries
■  PGAS languages
□  Unified Parallel C (Ansi C)
□  Co-Array Fortran / Fortress (F90)
□  Titanium (Java), Chapel (Cray), X10 (IBM), …
□  All research, no wide-spread solution on industry level
■  Core data management functionality can be re-used as library
□  Global Arrays (GA) Toolkit
□  Global-Address Space Networking (GASNet)
◊  Used by many PGAS languages –
UPC, Co-Array Fortran, Titanium, Chapel
□  Aggregate Remote Memory Copy Interface (ARMCI)
□  Kernel Lattice Parallelism (KeLP)
50
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
X10
■  Parallel object-oriented PGAS language by IBM
□  Java derivate, compiles to C++ or pure Java code
□  Different binaries can interact through common runtime
□  Transport: Shared memory, TCP/IP, MPI, CUDA, …
□  Linux, MacOS X, Windows, AIX; X86, Power
□  Full developer support with Eclipse environment
■  Fork-join execution model
■  One application instance runs at a fixed number of places
□  Each place represents a NUMA node
□  Distinguishing between place-local and global data
□  main() method runs automatically at place 0
□  Each place has a private copy of static variables
51
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
X10
52
Place-shifting operations
• at(p) S
… …… …
Activities
Local
Heap
Place 0
……
…
Activities
Local
Heap
Place N
…
Global Reference
Distributed heap
• GlobalRef[T]
APGAS in X10: Places and Tasks
Task parallelism
• async S
• finish S
Concurrency control within a place
• when(c) S
• atomic S
■  Parallel tasks, each operating in one place of the PGAS
□  Direct variable access only in local place
■  Implementation strategy is flexible
□  One operating system process per place, manages thread pool
□  Work-stealing scheduler
[IBM]
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
X10
53
Place-shifting operations
• at(p) S
… …… …
Activities
Local
Heap
Place 0
……
…
Activities
Local
Heap
Place N
…
Global Reference
Distributed heap
• GlobalRef[T]
APGAS in X10: Places and Tasks
Task parallelism
• async S
• finish S
Concurrency control within a place
• when(c) S
• atomic S
■  async S
□  Creates a new task that executes S, returns immediately
□  S may reference all variables in the enclosing block
□  Runtime chooses a (NUMA) place for execution
■  finish S
□  Execute S and wait for all transitively spawned tasks (barrier)
[IBM]
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Example
54 public class Fib {
public static def fib(n:int) {
if (n<=2) return 1;
val f1:int;
val f2:int;
finish {
async { f1 = fib(n-1); }
f2 = fib(n-2);
}
return f1 + f2;
}
public static def main(args:Array[String](1))
{
val n =
(args.size > 0) ? int.parse(args(0)) : 10;
Console.OUT.println("Computing Fib("+n+")");
val f = fib(n);
Console.OUT.println("Fib("+n+") = "+f);
}
}
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
X10
55
Place-shifting operations
• at(p) S
… …… …
Activities
Local
Heap
Place 0
……
…
Activities
Local
Heap
Place N
…
Global Reference
Distributed heap
• GlobalRef[T]
APGAS in X10: Places and Tasks
Task parallelism
• async S
• finish S
Concurrency control within a place
• when(c) S
• atomic S
■  atomic S
□  Execute S atomically, with respect to all other atomic blocks
□  S must access only local data
■  when(c) S
□  Suspend current task until c, then execute S atomically
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
class Buffer[T]{T isref, T haszero} {
protected var date:T = null;
public def send(v:T){v != null} {
when(date==null) {
date=v;
}
}
public def receive() {
when(date != null) {
val v = date;
date = null;
return v;
}
}
}
class Account {
public var value:Int;
def transfer(src: Account, v:Int) {
atomic {
src.value -= v;
this.value += v;
}
}
}
Tasks Can Move
57
Place-shifting operations
• at(p) S
… …… …
Activities
Local
Heap
Place 0
……
…
Activities
Local
Heap
Place N
…
Global Reference
Distributed heap
• GlobalRef[T]
APGAS in X10: Places and Tasks
Task parallelism
• async S
• finish S
Concurrency control within a place
• when(c) S
• atomic S
■  at(p) S - Execute statement S at place p, block current task
■  at(p) e - Evaluate expression e at place p and return the result
■  at(p) async S
□  Create new task at p to run S, return immediately
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
Summary: Week 3
■  Short overview of shared memory parallel programming ideas
■  Different levels of abstractions
□  Process model, thread model, task model
■  Threads for concurrency and parallelization
□  Standardized POSIX interface
□  Java / .NET concurrency functionality
■  Tasks for concurrency and parallelization
□  OpenMP for C / C++, Java, .NET, Cilk, …
■  Functional language constructs for implicit parallelism
■  PGAS languages for NUMA optimization
58
Specialized languages help the programmer to achieve speedup.
What about accordingly specialized parallel hardware?
OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger

Mais conteúdo relacionado

Mais procurados

Peddle the Pedal to the Metal
Peddle the Pedal to the MetalPeddle the Pedal to the Metal
Peddle the Pedal to the MetalC4Media
 
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...Holden Karau
 
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiSome experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiMaho Nakata
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
 
Practical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingPractical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingLubomir Rintel
 
Everything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryEverything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryIntel® Software
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
 
Time space trade off
Time space trade offTime space trade off
Time space trade offanisha talwar
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android BenchmarksKoan-Sin Tan
 
On the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonOn the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonTakeshi Akutsu
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of pythonYung-Yu Chen
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on AndroidKoan-Sin Tan
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Intel® Software
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Koan-Sin Tan
 
Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow
Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlowHorovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow
Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlowDatabricks
 
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmLuca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmlinuxlab_conf
 

Mais procurados (20)

Peddle the Pedal to the Metal
Peddle the Pedal to the MetalPeddle the Pedal to the Metal
Peddle the Pedal to the Metal
 
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
 
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiSome experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
 
Multicore
MulticoreMulticore
Multicore
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Practical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingPractical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profiling
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Everything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryEverything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI Library
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Time space trade off
Time space trade offTime space trade off
Time space trade off
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
 
On the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonOn the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of Python
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of python
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on Android
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite
 
Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow
Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlowHorovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow
Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow
 
Tridiagonal solver in gpu
Tridiagonal solver in gpuTridiagonal solver in gpu
Tridiagonal solver in gpu
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmLuca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
 

Semelhante a OpenHPI - Parallel Programming Concepts - Week 3

Parallel program design
Parallel program designParallel program design
Parallel program designZongYing Lyu
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and ParallelizationDmitri Nesteruk
 
Medical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUsMedical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUsDaniel Blezek
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/MultitaskingSasha Kravchuk
 
Php and threads ZTS
Php and threads ZTSPhp and threads ZTS
Php and threads ZTSjulien pauli
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded ProgrammingSri Prasanna
 
Tutorial4 Threads
Tutorial4  ThreadsTutorial4  Threads
Tutorial4 Threadstech2click
 
1032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.21032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.2Stanley Ho
 
Threads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxThreads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxSoumen Santra
 
OSTEP Chapter2 Introduction
OSTEP Chapter2 IntroductionOSTEP Chapter2 Introduction
OSTEP Chapter2 IntroductionShuya Osaki
 
C# as a System Language
C# as a System LanguageC# as a System Language
C# as a System LanguageScyllaDB
 

Semelhante a OpenHPI - Parallel Programming Concepts - Week 3 (20)

Parallel program design
Parallel program designParallel program design
Parallel program design
 
PDCCLECTUREE.pptx
PDCCLECTUREE.pptxPDCCLECTUREE.pptx
PDCCLECTUREE.pptx
 
Chap7 slides
Chap7 slidesChap7 slides
Chap7 slides
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and Parallelization
 
Medical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUsMedical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUs
 
Threads
ThreadsThreads
Threads
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
posix.pdf
posix.pdfposix.pdf
posix.pdf
 
Php and threads ZTS
Php and threads ZTSPhp and threads ZTS
Php and threads ZTS
 
P-Threads
P-ThreadsP-Threads
P-Threads
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
Threads and Node.js
Threads and Node.jsThreads and Node.js
Threads and Node.js
 
Tutorial4 Threads
Tutorial4  ThreadsTutorial4  Threads
Tutorial4 Threads
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
P threads
P threadsP threads
P threads
 
1032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.21032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.2
 
Threads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxThreads Advance in System Administration with Linux
Threads Advance in System Administration with Linux
 
OSTEP Chapter2 Introduction
OSTEP Chapter2 IntroductionOSTEP Chapter2 Introduction
OSTEP Chapter2 Introduction
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
C# as a System Language
C# as a System LanguageC# as a System Language
C# as a System Language
 

Mais de Peter Tröger

WannaCry - An OS course perspective
WannaCry - An OS course perspectiveWannaCry - An OS course perspective
WannaCry - An OS course perspectivePeter Tröger
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and VirtualizationPeter Tröger
 
Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Peter Tröger
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsPeter Tröger
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded SystemsPeter Tröger
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.Peter Tröger
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.Peter Tröger
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Peter Tröger
 
Dependable Systems - Hardware Dependability with Redundancy (14/16)
Dependable Systems - Hardware Dependability with Redundancy (14/16)Dependable Systems - Hardware Dependability with Redundancy (14/16)
Dependable Systems - Hardware Dependability with Redundancy (14/16)Peter Tröger
 
Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Peter Tröger
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Peter Tröger
 
Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Peter Tröger
 
Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Peter Tröger
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Peter Tröger
 
Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Peter Tröger
 
Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Peter Tröger
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Peter Tröger
 
Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Peter Tröger
 
Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Peter Tröger
 
Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Peter Tröger
 

Mais de Peter Tröger (20)

WannaCry - An OS course perspective
WannaCry - An OS course perspectiveWannaCry - An OS course perspective
WannaCry - An OS course perspective
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and Virtualization
 
Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissions
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded Systems
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)
 
Dependable Systems - Hardware Dependability with Redundancy (14/16)
Dependable Systems - Hardware Dependability with Redundancy (14/16)Dependable Systems - Hardware Dependability with Redundancy (14/16)
Dependable Systems - Hardware Dependability with Redundancy (14/16)
 
Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
 
Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)
 
Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)
 
Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)
 
Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
 
Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)
 
Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)
 
Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0
 

Último

Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 

Último (20)

Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 

OpenHPI - Parallel Programming Concepts - Week 3

  • 1. Parallel Programming Concepts OpenHPI Course Week 3 : Shared Memory Parallelism - Programming Unit 3.1: Threads Dr. Peter Tröger + Teaching Team Text
  • 2. Week 2 ! Week 3 ■  Week 2 □  Parallelism and Concurrency ◊  Parallel programming is concurrent programming ◊  Concurrent software can leverage parallel hardware □  Concurrency Problems - Race condition, deadlock, livelock, … □  Critical Sections - Progress, semaphores, Mutex, … □  Monitor Concept - Condition variables, wait(), notify(), … □  Advanced Concurrency - Spinlocks, reader / writer locks, … ■  This week provides a walkthrough of parallel programming approaches for shared memory systems □  Not complete, not exhaustive ■  Consider coding exercises and additional readings 2 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 3. Parallel Programming for Shared Memory ■  Processes □  Concurrent processes with dedicated memory □  Process management by operating system □  Support for explicit memory sharing ■  Light-weight processes (LWP) / threads □  Concurrent threads with shared process memory □  Thread scheduling by operating system or library □  Support for thread-local storage, if needed ■  Tasks □  Concurrent tasks with shared process memory □  Typically operating system not involved, dynamic mapping to threads by task library □  Support for private variables per task 3 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 4. Parallel Programming for Shared Memory 4 Process Explicitly Shared Memory ■  Different programming models for concurrency in shared memory ■  Processes and threads mapped to processing elements (cores) ■  Process- und thread-based programming typically part of operating system lectures OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger Memory Process Memory Thread Thread Task Task Task Task Concurrent Processes Concurrent Threads Concurrent Tasks Main Thread Process Memory Main Thread Process Memory Main Thread Thread Thread
  • 5. POSIX Threads (PThreads) ■  Part of the POSIX specification for operating system APIs ■  Implemented by all Unix-compatible systems (Linux, MacOS X, Solaris, …) ■  Functionality □  Thread lifecycle management □  Mutex-based synchronization □  Synchronization based on condition variables □  Synchronization based on reader/writer locks □  Optional support for barriers ■  Semaphore API is a separate POSIX specification (sem_ prefix) 5 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger Process Memory Thread Thread Concurrent Threads
  • 6. /************************************************************************* AUTHOR: Blaise Barney **************************************************************************/ #include <pthread.h> #include <stdio.h> #include <stdlib.h> #define NUM_THREADS 5 void *PrintHello(void *threadid) { long tid; tid = (long)threadid; printf("Hello World! It's me, thread #%ld!n", tid); pthread_exit(NULL); } int main(int argc, char *argv[]) { pthread_t threads[NUM_THREADS]; int rc; long t; for(t=0;t<NUM_THREADS;t++){ printf("In main: creating thread %ldn", t); rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t); if (rc){ printf("ERROR; return code from pthread_create() is %dn", rc); exit(-1); } } /* Last thing that main() should do */ pthread_exit(NULL); }
  • 7. POSIX Threads ■  pthread_create() □  Run a given function as concurrent activity □  Operating system scheduler decides upon parallel execution ■  pthread_join() □  Blocks the caller until the specific thread terminates □  Allows to determine exit code from pthread_exit() ■  pthread_exit() □  Implicit call on function return □  No release of any resources, cleanup handlers supported 7 Main thread Worker Thread 1 pthread_create() pthread_join() pthread_exit() OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 8. 8 // #include statements omitted for space reasons void *BusyWork(void *t) { int i; long tid; double result=0.0; tid = (long)t; printf("Thread %ld starting...n",tid); for (i=0; i<1000000; i++) { result = result + sin(i) * tan(i); } printf("Thread %ld done. Result = %en",tid, result); pthread_exit((void*) t); } int main (int argc, char *argv[]) { pthread_t thread[NUM_THREADS]; pthread_attr_t attr; int rc; long t; void *status; pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); for(t=0; t<NUM_THREADS; t++) { printf("Main: creating thread %ldn", t); rc = pthread_create(&thread[t], &attr, BusyWork, (void *)t); if (rc) { printf("ERROR; return code from pthread_create() is %dn", rc); exit(-1);}} pthread_attr_destroy(&attr); for(t=0; t<NUM_THREADS; t++) { rc = pthread_join(thread[t], &status); if (rc) { printf("ERROR; return code from pthread_join() is %dn", rc); exit(-1); } printf("Main: join with thread %ld, status %ldn",t,(long)status);} printf("Main: program completed. Exiting.n"); pthread_exit(NULL); }
  • 9. POSIX Threads ■  API supports (at least) mutex and condition variable concept □  Thread synchronization and critical section protection ■  pthread_mutex_init() □  Initialize new mutex, which is unlocked by default □  Resource of the surrounding process ■  pthread_mutex_lock() and pthread_mutex_trylock() □  Lock the mutex for the calling thread □  Block / do not block if the mutex is already locked ■  pthread_mutex_unlock() □  Release the mutex □  Operating system decides which other thread is woken up □  Focus on speed of operation, no deadlock prevention 9 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 10. 10 #include <stdio.h> #define NUMTHREADS 5 pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; int sharedData=0; int sharedData2=0; void *theThread(void *parm) { printf("Thread attempts to lock mutexn"); pthread_mutex_lock(&mutex); printf("Thread got the mutex lockn"); ++sharedData; --sharedData2; printf("Thread unlocks Mutexn"); pthread_mutex_unlock(&mutex); return NULL; } int main(int argc, char **argv) { pthread_t thread[NUMTHREADS]; int i; printf("Main thread attempts to lock mutexn"); pthread_mutex_lock(&mutex); printf("Main thread got the mutex lockn"); for (i=0; i<NUMTHREADS; ++i) { // create 3 threads pthread_create(&thread[i], NULL, theThread, NULL); } printf("Wait a bit until we are 'done' with the shared datan"); sleep(3); printf("Main thread unlocks mutexn"); pthread_mutex_unlock(&mutex); for (i=0; i <NUMTHREADS; ++i) { pthread_join(thread[i], NULL); } pthread_mutex_destroy(&mutex); return 0;}
  • 11. POSIX API vs. Windows API 11 POSIX Windows pthread_create() CreateThread() pthread_exit() ExitThread() pthread_cancel() TerminateThread() pthread_mutex_init() CreateMutex() pthread_mutex_lock() WaitForSingleObject() pthread_mutex_trylock() WaitForSingleObject(hThread, 0) Condition variables Auto-reset events OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 12. Java, .NET, C++, PHP, … ■  High-level languages also offer threading support ■  Interpreted / JIT-compiled languages □  Rich API for thread management and shared data structures □  Example: Java Runnable interface, .NET System.Threading □  Today mostly 1:1 mapping of high-level threads to operating system threads (native threads) ■  Threads as part of the language definition (C++ 11) vs. threads as operating system functionality (PThreads) 12 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger #include <thread> #include <iostream> void write_message(std::string const& message) { std::cout<<message; } int main() { std::thread t(write_message, "hello world from std::threadn"); write_message("hello world from mainn"); t.join(); }
  • 13. Parallel Programming Concepts OpenHPI Course Week 3 : Shared Memory Parallelism - Programming Unit 3.2: Tasks with OpenMP Dr. Peter Tröger + Teaching Team
  • 14. Parallel Programming for Shared Memory 14 Process Explicitly Shared Memory ■  Different programming models for concurrency with shared memory ■  Processes and threads mapped to processing elements (cores) ■  Task model supports more fine-grained parallelization than with native threads OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger Memory Process Memory Thread Thread Task Task Task Task Concurrent Processes Concurrent Threads Concurrent Tasks Main Thread Process Memory Main Thread Process Memory Main Thread Thread Thread
  • 15. OpenMP ■  Language extension for C/C++ and Fortran □  Versions in other languages available ■  Combination of compiler support and run-time library □  Special compiler instructions in the code (“pragma”) □  Expression of intended parallelization by the developer □  Result is a binary that relies on OpenMP functionality ■  OpenMP library responsible for thread management □  Transparent for application code □  Additional configuration with environment variables ◊  OMP_NUM_THREADS: Upper limit for threads being used ◊  OMP_SCHEDULE: Scheduling type for parallel activities ■  State-of-the-art for portable parallel programming in C 15 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 16. OpenMP ■  Programming with the fork-join model □  Master thread forks into declared tasks □  Runtime environment may run them in parallel, based on dynamic mapping to threads from a pool □  Worker task barrier before finalization (join) 16 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger [Wikipedia]
  • 17. OpenMP ■  Parallel region □  Parallel tasks defined in a dedicated code block, marked by #pragma omp parallel □  Should have only one entry and one exit point □  Implicit barrier at beginning and end of the block 17 [Wikipedia] OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 18. Parallel Region ■  Encountering thread for the region generates implicit tasks ■  Task execution may suspend at some scheduling point: □  At implicit barrier regions or barrier primitives □  At task / taskwait constructs □  At the end of a task region 18 #include <omp.h> #include <stdio.h> int main (int argc, char * const argv[]) { #pragma omp parallel printf("Hello from thread %d, nthreads %dn”, omp_get_thread_num(), omp_get_num_threads()); return 0; } >> gcc -fopenmp -o omp omp.c OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 19. Work Sharing ■  Possibilities for creation of tasks inside a parallel region □  omp sections - Define code blocks dividable among threads ◊  Implicit barrier at the end □  omp for - Automatically divide loop iterations into tasks ◊  Implicit barrier at the end □  omp single / master - Denotes a task to be executed only by first arriving thread resp. the master thread ◊  Implicit barrier at the end ◊  Intended for critical sections □  omp task - Explicitly define a task ■  Clause combinations possible: #pragma omp parallel for 19 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 20. Loop Parallelization ■  omp for: Parallel execution of iteration chunks ■  Implications on exception handling, break-out calls and continue primitive ■  Mapping of threads to iteration chunk tasks controlled by schedule clause ■  Large chunks are good for caching and overhead avoidance ■  Small chunks are good for load balancing 20 PT 2012 #include <math.h> void compute(int n, float *a, float *b, float *c, float *y, float *z) { int i; #pragma omp parallel { #pragma omp for schedule(static) nowait for (i=0, i<n; i++) { c[i] = (a[i] + b[i]) / 2.0; z[i] = sqrt(c[i]); y[i] = z[i-1] + a[i]; } } } #pragma omp parallel for for(i=0; i<n; i++) { value = some_complex_function(a[i]); #pragma omp critical sum = sum+value; } OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 21. Loop Parallelization ■  schedule (static, [chunk]) □  Contiguous ranges of iterations (chunks) of equal size □  Low overhead, round robin assignment to threads, static scheduling □  Default is one chunk per thread ■  schedule (dynamic, [chunk]) □  Threads grab iterations resp. chunks □  Higher overhead, but good for unbalanced work load ■  schedule (guided, [chunk]) □  Dynamic schedule, shrinking ranges per step □  Starts with large block, until minimum chunk size is reached □  Good for computations with increasing iteration length (e.g. prime sieve test) 21 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 22. Data Sharing ■  shared variable: Name provides access in all tasks □  Only tagging for the runtime, no critical section enforcement ■  private variable: Clone variable in each task □  Results in one data copy per task □  firstprivate: Initialization with last value before region □  lastprivate: Result from last loop cycle or lexically last section directive 22 #include <omp.h> main () { int nthreads, tid; #pragma omp parallel private(tid) { tid = omp_get_thread_num(); printf("Hello World from thread = %dn", tid); if (tid == 0) { nthreads = omp_get_num_threads(); printf("Number of threads = %dn", nthreads); }}} OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 23. Memory Model ■  OpenMP considers the memory wall problem □  Hide memory latency by deferring read / write operations □  Task view on shared memory is not always consistent ■  Example: Keeping loop variable in a register for efficiency □  Makes loop variable a thread-specific variable □  Tasks in other threads see outdated values in memory ■  But: Sometimes a consistent view is demanded □  flush operation - Finalize all read / write operations, make view on shared memory consistent □  Implicit flush on different occasions, such as barriers □  Complicated issue, check documentation 23 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 24. Task Scheduling ■  Classical task scheduling with central queue □  All worker threads fetch tasks from a central queue □  Scalability issue with increasing thread (resp. core) count ■  Work stealing in OpenMP (and other libraries) □  Task queue per thread □  Idling thread “steals” tasks from another thread □  Independent from thread scheduling □  Only mutual synchronization □  No central queue 24 Thread New Task Next Task TaskQueue Thread New Task Next Task TaskQueue Work Stealing
  • 25. Parallel Programming Concepts OpenHPI Course Week 3 : Shared Memory Parallelism - Programming Unit 3.3: Beyond OpenMP Dr. Peter Tröger + Teaching Team
  • 26. Cilk ■  C language combined with several new keywords □  True language extension, instead of new compiler pragmas □  Developed at MIT since 1994 □  Initial commercial version Cilk++ with C / C++ support ■  Since 2010, offered by Intel as Cilk Plus □  Official language specification to foster other implementations □  Support for Windows, Linux, and MacOS X ■  Basic concept of serialization □  Cilk keywords may be replaced by empty operations □  Leads to non-concurrent code □  Promise that code semantics remain the same 26 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 27. Intel Cilk Plus ■  Three keywords to express potential parallelism ■  cilk_spawn: Asynchronous function call □  Runtime decides, spawning is not mandated ■  cilk_sync: Wait until all spawned calls are completed □  Barrier for cilk_spawn activity ■  cilk_for: Allows loop iterations to be performed in parallel □  Runtime decides, parallelization is not mandated 27 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger cilk_for (int i=0; i<8; ++i) { do_work(i); } for (int i=0; i<8; ++i) { cilk_spawn do_work(i); } cilk_sync; for (int i=0; i<8; ++i) { do_work(i); }
  • 28. Intel Cilk Plus ■  Cilk supports the high-level expression of array operations □  Gives the runtime a chance to parallelize work □  Intended for SIMD-style operations without any ordering constraints ■  New operator [:] □  Specify data parallelism on an array □  array-expression[lower- bound : length : stride] □  Multi-dimensional sections are supported: a[:][:] ■  Short-hand description for complex loops □  A[:]=5 for (i = 0; i < 10; i++) A[i] = 5; □  A[0:n] = 5; □  A[0:5:2] = 5; for (i = 0; i < 10; i += 2) A[i] = 5; □  A[:] = B[:]; □  A[:] = B[:] + 5; □  D[:] = A[:] + B[:]; □  func (A[:]); 28 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 29. Intel Threading Building Blocks (TBB) ■  Portable C++ library, toolkit for different operating systems ■  Also available as open source version ■  Complements basic OpenMP / Cilk features □  Loop parallelization, synchronization, explicit tasks ■  High-level concurrent containers, recursion support □  hash map, queue, vector, set ■  High-level parallel operations □  Prefix scan, sorting, data-flow pipelining, reduction, … □  Scalable library implementation, based on tasks ■  Unfair scheduling approach □  Consider data locality, optimize for cache utilization ■  Comparable: Microsoft C++ Concurrency Runtime 29 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 30. class FibTask: public task { public: const long n; long* const sum; FibTask( long n_, long* sum_ ) : n(n_), sum(sum_) {} task* execute() { // Overrides task::execute long x, y; FibTask& a = *new( allocate_child() ) FibTask(n-1,&x); FibTask& b = *new( allocate_child() ) FibTask(n-2,&y); // Set ref_count to 'two children plus one for wait". set_ref_count(3); // Start b running. spawn( b ); // Start a running and wait for all children (a and b). spawn_and_wait_for_all(a); // Do the sum *sum = x+y; } return NULL; }; [intel.com] fibn=fibn-1 + fibn-2
  • 31. #include ″tbb/compat/thread″ #include ″tbb/tbb_allocator.h″ // zero_allocator defined here #include ″tbb/atomic.h″ #include ″tbb/concurrent_vector.h″ using namespace tbb; typedef concurrent_vector<atomic<Foo*>, zero_allocator<atomic<Foo*> > > FooVector; Foo* WaitForElement( const FooVector& v, size_t i ) { // Wait for ith element to be allocated while( i>=v.size() ) std::this_thread::yield(); // Wait for ith element to be constructed while( v[i]==NULL ) std::this_thread::yield(); return v[i]; } [intel.com] Waiting for an element
  • 32. Task-Based Concurrency in Java ■  Major concepts introduced with Java 5 ■  Abstraction of task management with Executors □  java.util.concurrent.Executor □  Implementing object provides execute() method □  Can execute submitted Runnable tasks □  No assumption on where the task runs, typically in managed thread pool □  ThreadPoolExecutor provided by class library ■  java.util.concurrent.ExecutorService □  Additional submit() function, which returns a Future object ■  Methods for submitting large collections of Callable s 32
  • 33. High-Level Concurrency 33 Microsoft Parallel Patterns Library java.util.concurrent OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 34. Functional Programming ■  Contrary paradigm to imperative programming □  Program is a large set of functions □  All these functions just map input to output □  Treats execution as collection of function evaluations ■  Foundations in lambda calculus (1930‘s) and Lisp (late 50’s) ■  Side-effect free computation when functions have no local state □  Function result depends only on input, not on shared data □  Order of function evaluation becomes irrelevant □  Automated parallelization can freely schedule the work □  Race conditions become less probable ■  Trend to add functional programming paradigms in imperative languages (anonymous functions, filter, map, …) 34 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 35. OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger Imperative to Functional 35 alert("get the lobster");! PutInPot("lobster");! PutInPot("water");! ! alert("get the chicken");! BoomBoom("chicken");! BoomBoom("coconut");! function Cook( i1, i2, f ) {! alert("get the " + i1);! f(i1); f(i2); } ! ! Cook( "lobster", "water", PutInPot);! Cook( "chicken", "coconut", BoomBoom): ! http://www.joelonsoftware.com/items/2006/08/01.html Optimize •  Higher order functions •  Functions as argument or return value •  Execution of parameter function may be transparently parallelized
  • 36. OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger Imperative to Functional 36 alert("get the lobster");! PutInPot("lobster");! PutInPot("water");! ! alert("get the chicken");! BoomBoom("chicken");! BoomBoom("coconut");! http://www.joelonsoftware.com/items/2006/08/01.html Optimize function Cook( i1, i2, f ) {! alert("get the " + i1);! f(i1); f(i2); } ! ! Cook("lobster", "water", ! function(x) {alert("pot " + x); } );! Cook("chicken", "coconut", ! function(x) {alert("boom " + x); });! •  Anonymous functions •  Also lambda function or function literal •  Convenient tool when higher-order functions are supported
  • 37. Functional Programming ■  Higher order functions: Functions as argument or return value ■  Pure functions: No memory or I/O side effects □  Constant result with side-effect free parameters □  All functions (with available input) can run in parallel ■  Meanwhile many functional languages (again) available □  JVM-based: Clojure, Scala (parts of it), … □  Common Lisp, Erlang, F#, Haskell, ML, Ocaml, Scheme, … ■  Functional constructs (map, reduce, filter, folding, iterators, immutable variables, …) in all popular languages (see week 6) ■  Perfect foundation for implicit parallelism □  Instead of spawning tasks / threads, let the runtime decide □  Demands discipline to produce pure functions 37 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 38. Parallel Programming Concepts OpenHPI Course Week 3 : Shared Memory Parallelism - Programming Unit 3.4: Scala Dr. Peter Tröger + Teaching Team
  • 39. Scala – “Scalable Language” ■  Example for the combination of OO and functional programming □  Expressions, statements, blocks as in Java □  Every value is an object, every operation is a method call □  Functions as first-class concept □  Programmer chooses concurrency syntax □  Task-based parallelism supported by the language ■  Compiles to JVM (or .NET) byte code ■  Most language constructs are library functions ■  Interacts with class library of the runtime environment ■  Twitter moved some parts from Ruby to Scala in 2009 39 object HelloWorld extends App { println("Hello, world!") } OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 40. Scala Basics ■  All data types are objects, all operations are methods ■  Operator / infix notation □  7.5-1.5 □  “hello”+”world” ■  Object notation □  (“hello”).+(“world”) ■  Implicit conversions, some given by default □  (“hello”)*5 □  0.until(3) resp. 0 until 3 □  (1 to 4).foreach(println) ■  Type inference □  var name = “Foo” ■  Immutable variables with “val” □  val name = “Scala” 40 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 41. Functions in Scala ■  Functions as first-class value ■  Possible to be passed as parameter, or used as result ■  () return value for procedures def sumUpRange(f: Int => Int, a: Int, b: Int): Int = if (a > b) 0 else f(a) + sum(f, a + 1, b) def id(x: Int): Int = x def sumUpIntRange (a: Int, b: Int): Int = sumUpRange(id, a, b) def square(x: Int): Int = x * x def sumSquareR (a: Int, b: Int): Int = sumUpRange(square, a, b) ■  Anonymous functions, type deduction def sumUpSquares(a: Int, b: Int): Int = sumUpRange(x => x * x, a, b) 41 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 42. Example: Quicksort ■  Recursive implementation of Quicksort ■  Similar to other imperative languages ■  swap as procedure with empty result () ■  Functions in functions ■  Read-only value definition with val 42 def sort(xs: Array[Int]) { def swap(i: Int, j: Int) { val t = xs(i) xs(i) = xs(j); xs(j) = t; () } def sort_recursive(l: Int, r: Int) { val pivot = xs((l + r) / 2) var i = l; var j = r while (i <= j) { while (xs(i) < pivot) i += 1 while (xs(j) > pivot) j -= 1 if (i <= j) { swap(i, j); i += 1; j -= 1 }} if (l < j) sort_recursive(l, j) if (i < r) sort_recursive(i, r) } sort_recursive(0, xs.length - 1)} OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 43. Example: Quicksort ■  Functional style (same complexity, higher memory consumption) □  Return empty / single element array as already sorted □  Partition array elements according to pivot element □  Higher-order function filter takes predicate function (“pivot > x”) as argument and applies it for filtering □  Sorting of sub-arrays with predefined sort function 43 def sort(xs: Array[Int]): Array[Int] = { if (xs.length <= 1) xs else { val pivot = xs(xs.length / 2) Array.concat( sort(xs filter (pivot >)), xs filter (pivot ==), sort(xs filter (pivot <))) }}OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 44. Concurrent Programming with Scala ■  Implicit superclass is scala.AnyRef, provides typical monitor functions scala> classOf[AnyRef].getMethods.foreach(println) def wait() def wait(msec: Long) def notify() def notifyAll() ■  Synchronized function, argument expression as critical section def synchronized[A] (e: => A): A ■  Synchronized variable with put, blocking get and unset val v=new scala.concurrent.SyncVar() ■  Futures, reader / writer locks, semaphores, ... val x = future(someLengthyComputation) anotherLengthyComputation val y = f(x()) + g(x()) ■  Explicit parallelism through spawn (expr) and Actor concept (see week 5) 44 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 45. Parallel Collections 45 scala> var sum = 0 sum: Int = 0 scala> val list = (1 to 1000).toList.par list: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3,… scala> list.foreach(sum += _); sum res01: Int = 467766 scala> var sum = 0 sum: Int = 0 scala> list.foreach(sum += _); sum res02: Int = 457073 scala> var sum = 0 sum: Int = 0 scala> list.foreach(sum += _); sum res03: Int = 468520 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger ■  foreach on a parallel collection data type is automatically parallelized ■  Rich support for different data structures ■  Example: Parallel collections can still lead to race conditions □  “+=“ operator reads and writes on the same variable
  • 46. Parallel Programming Concepts OpenHPI Course Week 3 : Shared Memory Parallelism - Programming Unit 3.5: Partitioned Global Address Space Dr. Peter Tröger + Teaching Team
  • 47. Memory Model ■  Concurrency a first-class language citizen □  Demands a memory model for the language ◊  When is a written value visible? ◊  Example: OpenMP flush directive □  Leads to ‘promises’ about the memory access behavior □  Beyond them, compiler and ILP can optimize the code ■  Proper memory model brings predictable concurrency ■  Examples □  X86 processor machine code specification (native code) □  C++11 language specification (native code) □  C# language specification □  Java memory model specification in JSR-133 ■  Is this enough? 47 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 48. Socket NUMA ■  Eight cores on 2 sockets in an SMP system ■  Memory controllers + chip interconnect realize a single memory address space for the software Core Core L1 L1 L3 Cache RAM L2 L2 Core Core L1 L2 L1 L2 Memory Controller RAM Chip Interconnect Socket Core Core L1 L1 L3 Cache L2 L2 Core Core L1 L2 L1 L2 Memory Controller 48 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 49. PGAS Languages ■  Non-uniform memory architectures (NUMA) became default ■  But: Understanding of memory in programming is flat □  All variables are equal in access time □  Considering the memory hierarchy is low-level coding (e.g. cache-aware programming) ■  Partitioned global address space (PGAS) approach □  Driven by high-performance computing community □  Modern approach for large-scale NUMA □  Explicit notion of memory partition per processor ◊  Data is designated as local (near) or global (possibly far) ◊  Programmer is aware of NUMA nodes □  Performance optimization for deep memory hierarchies 49 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 50. PGAS Languages and Libraries ■  PGAS languages □  Unified Parallel C (Ansi C) □  Co-Array Fortran / Fortress (F90) □  Titanium (Java), Chapel (Cray), X10 (IBM), … □  All research, no wide-spread solution on industry level ■  Core data management functionality can be re-used as library □  Global Arrays (GA) Toolkit □  Global-Address Space Networking (GASNet) ◊  Used by many PGAS languages – UPC, Co-Array Fortran, Titanium, Chapel □  Aggregate Remote Memory Copy Interface (ARMCI) □  Kernel Lattice Parallelism (KeLP) 50 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 51. X10 ■  Parallel object-oriented PGAS language by IBM □  Java derivate, compiles to C++ or pure Java code □  Different binaries can interact through common runtime □  Transport: Shared memory, TCP/IP, MPI, CUDA, … □  Linux, MacOS X, Windows, AIX; X86, Power □  Full developer support with Eclipse environment ■  Fork-join execution model ■  One application instance runs at a fixed number of places □  Each place represents a NUMA node □  Distinguishing between place-local and global data □  main() method runs automatically at place 0 □  Each place has a private copy of static variables 51 OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 52. X10 52 Place-shifting operations • at(p) S … …… … Activities Local Heap Place 0 …… … Activities Local Heap Place N … Global Reference Distributed heap • GlobalRef[T] APGAS in X10: Places and Tasks Task parallelism • async S • finish S Concurrency control within a place • when(c) S • atomic S ■  Parallel tasks, each operating in one place of the PGAS □  Direct variable access only in local place ■  Implementation strategy is flexible □  One operating system process per place, manages thread pool □  Work-stealing scheduler [IBM] OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 53. X10 53 Place-shifting operations • at(p) S … …… … Activities Local Heap Place 0 …… … Activities Local Heap Place N … Global Reference Distributed heap • GlobalRef[T] APGAS in X10: Places and Tasks Task parallelism • async S • finish S Concurrency control within a place • when(c) S • atomic S ■  async S □  Creates a new task that executes S, returns immediately □  S may reference all variables in the enclosing block □  Runtime chooses a (NUMA) place for execution ■  finish S □  Execute S and wait for all transitively spawned tasks (barrier) [IBM] OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 54. Example 54 public class Fib { public static def fib(n:int) { if (n<=2) return 1; val f1:int; val f2:int; finish { async { f1 = fib(n-1); } f2 = fib(n-2); } return f1 + f2; } public static def main(args:Array[String](1)) { val n = (args.size > 0) ? int.parse(args(0)) : 10; Console.OUT.println("Computing Fib("+n+")"); val f = fib(n); Console.OUT.println("Fib("+n+") = "+f); } } OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 55. X10 55 Place-shifting operations • at(p) S … …… … Activities Local Heap Place 0 …… … Activities Local Heap Place N … Global Reference Distributed heap • GlobalRef[T] APGAS in X10: Places and Tasks Task parallelism • async S • finish S Concurrency control within a place • when(c) S • atomic S ■  atomic S □  Execute S atomically, with respect to all other atomic blocks □  S must access only local data ■  when(c) S □  Suspend current task until c, then execute S atomically OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 56. class Buffer[T]{T isref, T haszero} { protected var date:T = null; public def send(v:T){v != null} { when(date==null) { date=v; } } public def receive() { when(date != null) { val v = date; date = null; return v; } } } class Account { public var value:Int; def transfer(src: Account, v:Int) { atomic { src.value -= v; this.value += v; } } }
  • 57. Tasks Can Move 57 Place-shifting operations • at(p) S … …… … Activities Local Heap Place 0 …… … Activities Local Heap Place N … Global Reference Distributed heap • GlobalRef[T] APGAS in X10: Places and Tasks Task parallelism • async S • finish S Concurrency control within a place • when(c) S • atomic S ■  at(p) S - Execute statement S at place p, block current task ■  at(p) e - Evaluate expression e at place p and return the result ■  at(p) async S □  Create new task at p to run S, return immediately OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger
  • 58. Summary: Week 3 ■  Short overview of shared memory parallel programming ideas ■  Different levels of abstractions □  Process model, thread model, task model ■  Threads for concurrency and parallelization □  Standardized POSIX interface □  Java / .NET concurrency functionality ■  Tasks for concurrency and parallelization □  OpenMP for C / C++, Java, .NET, Cilk, … ■  Functional language constructs for implicit parallelism ■  PGAS languages for NUMA optimization 58 Specialized languages help the programmer to achieve speedup. What about accordingly specialized parallel hardware? OpenHPI | Parallel Programming Concepts | Dr. Peter Tröger