3. Why now?
Manycore paradigm shift
CPU speeds reach production challenges
(not at the limit yet)
growth
Processor features
Hyper-threading
SIMD
4. CPU Scope
Past: more Yesterday
transistors per chip 1x-core
Present: more cores
per chip Today
2x-core norm
Future: even more 4x-
cores per chip;
Tomorrow
NUMA & other 32x-core?
specialties
5. Machine Scope
Most clients are
concerned with Machine
one-machine use
Clustering helps
Cluster
leverage
performance
Clouds Cloud
6. Multithreading vs. Parallelization
Multithreading
Using threads/thread pool to perform async
operations
Explicit (# of threads known)
Parallelization
Implicit parallelization
No explicit thread operation
8. Managed
System.Threading
Libraries
Parallel Extensions (TPL + PLINQ)
PowerThreading
Languages/frameworks
Sing#, CCR
Remoting, WCF, MPI.NET, PureMPI.NET, etc.
Use over many machines
9. Unmanaged
OpenMP
– #pragma directives in C++ code
Intel multi-core libraries
Threading Building Blocks (low-level)
Integrated Performance Primitives
Math Kernel Library (also has MPI support)
MPI, PVM, etc.
Use over many machines
13. A Look at Delegates
Making delegate for function is easy
Given void a() { … }
– ThreadStart del = a;
Given void a(int n) { … }
– Action<int> del = a;
Given float a(int n, double m) {…}
– Func<int, double, float> del = a;
Otherwise, make your own!
14. Delegate Methods
Invoke()
Synchronous, blocks your thread
BeginInvoke
Executes in ThreadPool
Returns IAsyncResult
EndInvoke
Waits for completion
Takes the IAsyncResult from BeginInvoke
15. Usage
Fire and forget
– del.BeginInvoke(null, null);
Fire, and wait until done
– IAsyncResult ar = del.BeginInvoke(null,null);
…
del.EndInvoke(ar);
Fire, and call a function when done
– del.BeginInvoke(firedWhenDone, null);
Callback parameter
16. WaitOne and WaitAll
To wait until either delegate completes
– WaitHandle.WaitOne(
new ThreadStart[] {
ar1.AsyncWaitHandle,
ar2.AsyncWaitHandle
}); // wait until either completes
To wait until all delegates complete
Use WaitAll instead of WaitOne
– [MTAThread]-specific, use Pulse & Wait instead
17. Example
Execute a() and b() in parallel; wait on both
ThreadStart delA = a;
ThreadStart delB = b;
IAsyncResult arA = delA.BeginInvoke(null, null);
IAsyncResult arB = delB.BeginInvoke(null, null);
WaitHandle.WaitAll(new [] {
arA.AsyncWaitHandle,
arB.AsyncWaitHandle });
18. LINQ Example
Execute a() and b() in parallel; wait on both
WaitHandle.WaitAll(
new [] { a, b }
Implicitly make an array of delegates
.Select (f =>f.BeginInvoke(null,null)
Call each delegate
.AsyncWaitHandle)
.ToArray()); Get a wait handle of each
Convert from
IEnumerable to array
19. Asynchronous Programming Model (APM)
Basic goal
– IAsyncResult ar =
del.BeginXXX(null,null);
…
del.EndXXX(ar);
Supported by Framework classes, e.g.,
– FileStream
– WebRequest
20. Difficulties
Async calls do not always succeed
Timeout
Exceptions
Cancelation
Results in too many functions/anonymous
delegates
Async workflow code becomes difficult to read
21. PowerThreading
A free library from Resource locks
Wintellect (Jeffrey ReaderWriterGate
Richter) Async. prog. model
Get it at AsyncEnumerator
wintellect.com SyncGate
Other features
Also check out
IO
PowerCollections State manager
NumaInformation :)
22. AsyncEnumerator
Simplifies APM programming
No need to manually manage
IAsyncResult cookies
Fewer functions, cleaner code
23. Usage patterns
1 async op → process
X async ops → process all
X async ops → process each one as it
completes
X async ops → process some, discard the rest
X async ops → process some until
cancellation/timeout occurs, discard the rest
24. AsyncEnumerator Basics
Has three methods
Execute(IEnumerator<Int32>)
BeginExecute
EndExecute
Also exists as AsyncEnumerator<T> when a
return value is required
25. Inside the Function
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
WebRequest wr = WebRequest.Create(uri);
wr.BeginGetResponse(ae.End(), null);
yield return 1;
WebResponse resp = wr.EndGetResponse(
ae.DequeueAsyncResult());
// use response
}
26. Signature
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
Function must return IEnumerator<Int32>
WebRequestwr = WebRequest.Create(uri);
Function must accept AsyncEnumerator as
wr.BeginGetResponse(ae.End(), null);
one of the parameters (order unimportant)
yield return 1;
WebResponseresp = wr.EndGetResponse(
ae.DequeueAsyncResult());
// use response
}
28. Yield
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
WebRequest wr = WebRequest.Create(uri);
wr.BeginGetResponse(ae.End(), null);
yield return 1;
WebResponseresp = wr.EndGetResponse(
Now yield return the number of pending
asynchronous operations
ae.DequeueAsyncResult());
// use response
}
29. Wait & Process
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
WebRequest wr = WebRequest.Create(uri);
wr.BeginGetResponse(ae.End(), null);
yield return 1; Call the asyncEndXXX() methods
WebResponse resp = wr.EndGetResponse(
ae.DequeueAsyncResult());
// use response Pass ae.DequeueAsyncResult() as parameter
}
30. Usage
Init the enumerator
– var ae = new AsyncEnumerator();
Use it, passing itself as a parameter
– ae.Execute(GetFile(
ae, “http://nesteruk.org”));
31. Exception Handling
Break out of function
– try {
resp = wr.EndGetResponse(
ae.DequeueAsyncResult());
} catch (WebException e) {
// process e
yield break;
}
Propagate a parameter
32. Discard Groups
Sometimes, you want to ignore the result of
some calls
E.g., you already got the data elsewhere
To discard a group of calls
Use overloaded End(…) methods to specify
Group number
Cleanup delegate
Call DiscardGroup(…) with group number
33. Cancellation
External code can cancel the iterator
– ae.Cancel(…)
Or specify a timeout
– ae.SetCancelTimeout(…)
Check whether iterator is cancelled with
– ae.IsCanceled(…)
just call yield break if it is
35. Parallelization
Algorithms vary
(e.g., matrix multiplication)
Some not so
(e.g., matrix inversion)
Some not at all
parallelize them
36. Parallel Extensions to .NET Framework (PFX)
A library for parallelization
Consists of
Task Parallel Library
Parallel LINQ (PLINQ)
Currently in CTP stage
Maybe in .NET 4.0?
42. Cancelation
Parallel.For takes an Action<Int32>
delegate
Can also take an
Action<Int32, ParallelState>
ParallelState keeps track of the state of parallel
execution
ParallelState.Stop() stops execution in all threads
43. Parallel.For Exceptions
The AggregateException class holds all
exceptions thrown
Created even if only one thread throws
Used by both Parallel.Xxx and PLINQ
Original exceptions stored in
InnerExceptions property.
44. LazyInit<T>
Lazy initialization of a single variable
Options
– AllowMultipleExecution
Init function can be called by many threads, only
one value published
– EnsureSingleExecution
Init function executed only once
– ThreadLocal
One init call & value per thread
46. Futures
A future is the name of a value that will
eventually be produced by a computation
Thus, we can decide what to do with the
value before we know it
47. Futures of T
• Future is a factory
• Future<T> is the actual future (and also has
factory methods)
To make a future
– var f = Future.Create(() => g());
To use a future
Get f.Value
The accessor does an async computation
48. Tasks & TaskManager
A better Thread+ThreadPool combination
TaskManager
A very clever thread pool :)
Adjusts worker threads to # of CPUs/cores
Keeps all cores busy
Task
A unit of work
May (or may not) run concurrently
http://channel9.msdn.com/posts/DanielMoth/Parall
elFX-Task-and-friends/
49. Task
Just like a future, a task takes an Action<T>
– Task t = Task.Create(DoSomeWork);
Overloads exist :)
Fires off immediately. To wait on completion
– t.Wait();
Unlike the thread pool, task manager will use
as many threads as there are cores
50. Parallel LINQ (PLINQ)
Parallel evaluation in
LINQ to Objects
LINQ to XML
Features
IParallelEnumerable<T>
ParallelEnumerable.AsParallel static
method
51. Example
IEnumerable<T> data = ...;
var q = data.AsParallel()
.Where(x => p(x))
.Orderby(x => k(x))
.Select(x => f(x));
foreach (var e in q)
a(e);
53. Message Passing Interface
An API for general-purpose IPC
Works across cores & machines
C++ and Fortran
Some Intel libraries support explicitly
http://www.mcs.anl.gov/research/projects/m
pich2/
54. PureMPI.NET
A free library available at http://purempi.net
Uses WCF endpoints for communication
Uses MPI syntax
Features
A library DLL for WCF functionality
An EXE for easy deployment over network
55. How it works
Your computers run a service that connects
them together
Your program exposes WCF endpoints
You use the MPI interfaces to communicate
56. Communicator & Rank
A communicator is a group of computers
In most scenarios, you would have one group
MPI_COMM_WORLD
comm
Useful for determine whether we are the
57. Main
static void Main(string[] args)
{ MPIEnvironment app.config
using (ProcessorGroup processors =
new ProcessorGroup("MPIEnvironment",
MpiProcess))
{ Run MpiProcess on all machines
processors.Start(); Start each one
processors.WaitForCompletion(); Wait on all
}
}
58. Sending & Receiving
Blocking or non-blocking methods
Send/Receive (blocking)
Begin|End Send/Receive (async)
Invoked on the comm
59. Send/Receive
static void MpiProcess(IDictionary<string, Comm> comms)
{ Get a default comm from dictionary
Comm comm = comms["MPI_COMM_WORLD"];
if (comm.Rank == 0)
{ Get a message from 1 (blocking)
string msg = comm.Receive<string>(1, string.Empty);
Console.WriteLine("Got " + msg);
}
else if (comm.Rank == 1)
{
comm.Send(0, string.Empty, "Hello");
} Send a message to 0 (also blocking)
}
60. Extras
Can use async ops
Can send to all (Broadcast)
Can distribute work and then collect it
(Gather/Scatter)