SlideShare uma empresa Scribd logo
1 de 61
Baixar para ler offline
Multithreading and
      Parallelization
               Dmitri Nesteruk
dmitrinesteruk@gmail.com | http://nesteruk.org/seminars
Agenda
 Overview
 Multithreading
   PowerThreading (AsyncEnumerator)
 Multi-core parallelization
   Parallel Extensions to .NET Framework
 Multi-computer parallelization
   PureMPI.NET
Why now?
 Manycore paradigm shift
   CPU speeds reach production challenges
   (not at the limit yet)

   growth
 Processor features
   Hyper-threading
   SIMD
CPU Scope
 Past: more             Yesterday
 transistors per chip    1x-core

 Present: more cores
 per chip                 Today
                           2x-core norm
 Future: even more         4x-

 cores per chip;
                            Tomorrow
 NUMA & other                 32x-core?
 specialties
Machine Scope
 Most clients are
 concerned with     Machine
 one-machine use
 Clustering helps
                     Cluster
 leverage
 performance
 Clouds               Cloud
Multithreading vs. Parallelization
 Multithreading
    Using threads/thread pool to perform async
    operations
    Explicit (# of threads known)
 Parallelization
    Implicit parallelization
    No explicit thread operation
Ways to Parallelize/Multithread
                             System.Threading
             Managed         Parr. Extensions
                             Libraries

                             OpenMP
            Unmanaged        Libraries

                             GPGPU
            Specialized      FPGA
Managed
 System.Threading
 Libraries
   Parallel Extensions (TPL + PLINQ)
   PowerThreading
 Languages/frameworks
   Sing#, CCR
 Remoting, WCF, MPI.NET, PureMPI.NET, etc.
   Use over many machines
Unmanaged
 OpenMP
 – #pragma directives in C++ code
 Intel multi-core libraries
   Threading Building Blocks (low-level)
   Integrated Performance Primitives
   Math Kernel Library (also has MPI support)
 MPI, PVM, etc.
   Use over many machines
Specialized Ex. (Intrinsic Parallelization)
  GPU Computation (GPGPU)
    Calculations on graphic card
    Uses programmable pixel shaders
    See, e.g., NVidia CUDA, GPGPU.org
  FPGA
    Hardware-specific solutions
    E.g., in-socket accelerators
    Requires HDL programming & custom hardware
Part I

Multithreading: a look at
AsyncEnumerator
Multithreading
 Goals
   Do stuff concurrently
   Preserve safety/consistency
 Tools
   Threads
   ThreadPool
   Synchronization objects
   Framework async APIs
A Look at Delegates
 Making delegate for function is easy
 Given void a() { … }
  – ThreadStart del = a;
 Given void a(int n) { … }
  – Action<int> del = a;
 Given float a(int n, double m) {…}
  – Func<int, double, float> del = a;
 Otherwise, make your own!
Delegate Methods
 Invoke()
   Synchronous, blocks your thread 
 BeginInvoke
   Executes in ThreadPool
   Returns IAsyncResult
 EndInvoke
   Waits for completion
   Takes the IAsyncResult from BeginInvoke
Usage
 Fire and forget
  – del.BeginInvoke(null, null);
 Fire, and wait until done
  – IAsyncResult ar = del.BeginInvoke(null,null);
    …
    del.EndInvoke(ar);
 Fire, and call a function when done
  – del.BeginInvoke(firedWhenDone, null);
                      Callback parameter
WaitOne and WaitAll
 To wait until either delegate completes
  – WaitHandle.WaitOne(
      new ThreadStart[] {
        ar1.AsyncWaitHandle,
        ar2.AsyncWaitHandle
      }); // wait until either completes
 To wait until all delegates complete
    Use WaitAll instead of WaitOne
  – [MTAThread]-specific, use Pulse & Wait instead
Example
Execute a() and b() in parallel; wait on both

ThreadStart delA = a;
ThreadStart delB = b;
IAsyncResult arA = delA.BeginInvoke(null, null);
IAsyncResult arB = delB.BeginInvoke(null, null);
WaitHandle.WaitAll(new [] {
  arA.AsyncWaitHandle,
  arB.AsyncWaitHandle });
LINQ Example
Execute a() and b() in parallel; wait on both
WaitHandle.WaitAll(
  new [] { a, b }
   Implicitly make an array of delegates
  .Select (f =>f.BeginInvoke(null,null)
                                    Call each delegate
                                 .AsyncWaitHandle)
  .ToArray());                      Get a wait handle of each
   Convert from
   IEnumerable to array
Asynchronous Programming Model (APM)
 Basic goal
  – IAsyncResult ar =
      del.BeginXXX(null,null);
    …
    del.EndXXX(ar);
 Supported by Framework classes, e.g.,
  – FileStream
  – WebRequest
Difficulties
  Async calls do not always succeed
    Timeout
    Exceptions
    Cancelation
  Results in too many functions/anonymous
  delegates
    Async workflow code becomes difficult to read
PowerThreading
 A free library from   Resource locks
 Wintellect (Jeffrey    ReaderWriterGate
 Richter)              Async. prog. model
 Get it at              AsyncEnumerator
 wintellect.com         SyncGate
                       Other features
 Also check out
                        IO
 PowerCollections       State manager
                        NumaInformation :)
AsyncEnumerator
 Simplifies APM programming
 No need to manually manage
 IAsyncResult cookies
 Fewer functions, cleaner code
Usage patterns
 1 async op → process
 X async ops → process all
 X async ops → process each one as it
 completes
 X async ops → process some, discard the rest
 X async ops → process some until
 cancellation/timeout occurs, discard the rest
AsyncEnumerator Basics
 Has three methods
   Execute(IEnumerator<Int32>)
   BeginExecute
   EndExecute
 Also exists as AsyncEnumerator<T> when a
 return value is required
Inside the Function
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
  yield return 1;
  WebResponse resp = wr.EndGetResponse(
    ae.DequeueAsyncResult());
  // use response
}
Signature
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  Function must return IEnumerator<Int32>
WebRequestwr = WebRequest.Create(uri);
  Function must accept AsyncEnumerator as
wr.BeginGetResponse(ae.End(), null);
  one of the parameters (order unimportant)
  yield return 1;
WebResponseresp = wr.EndGetResponse(
ae.DequeueAsyncResult());
  // use response
}
Callback
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
yieldthe asyncBeginXXX() methods
  Call return 1;
WebResponseresp = wr.EndGetResponse(
  Pass ae.End() as callback parameter
ae.DequeueAsyncResult());
  // use response
}
Yield
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
  yield return 1;
WebResponseresp = wr.EndGetResponse(
  Now yield return the number of pending
  asynchronous operations
ae.DequeueAsyncResult());
  // use response
}
Wait & Process
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
  yield return 1;    Call the asyncEndXXX() methods
  WebResponse resp = wr.EndGetResponse(
    ae.DequeueAsyncResult());
  // use response    Pass ae.DequeueAsyncResult() as parameter

}
Usage
 Init the enumerator
  – var ae = new AsyncEnumerator();
 Use it, passing itself as a parameter
  – ae.Execute(GetFile(
      ae, “http://nesteruk.org”));
Exception Handling
 Break out of function
  – try {
      resp = wr.EndGetResponse(
        ae.DequeueAsyncResult());
    } catch (WebException e) {
      // process e
      yield break;
    }
 Propagate a parameter
Discard Groups
 Sometimes, you want to ignore the result of
 some calls
   E.g., you already got the data elsewhere
 To discard a group of calls
   Use overloaded End(…) methods to specify
     Group number
     Cleanup delegate
   Call DiscardGroup(…) with group number
Cancellation
 External code can cancel the iterator
  – ae.Cancel(…)
 Or specify a timeout
  – ae.SetCancelTimeout(…)
 Check whether iterator is cancelled with
  – ae.IsCanceled(…)
    just call yield break if it is
Part II

Parallel Extensions to .NET
Framework TPL and PLINQ
Parallelization
 Algorithms vary

    (e.g., matrix multiplication)
    Some not so
    (e.g., matrix inversion)
    Some not at all

 parallelize them
Parallel Extensions to .NET Framework (PFX)
 A library for parallelization
 Consists of
    Task Parallel Library
    Parallel LINQ (PLINQ)
 Currently in CTP stage
 Maybe in .NET 4.0?
Task Parallel Library Features
 System.Linq
    Parallel LINQ
 System.Theading
    Implicit parallelism (Parallel.Xxx)
 System.Threading.Collections
    Thread-safe stack and queue
 System.Threading.Tasks
    Task manager, tasks, futures
System.Threading
 Implicit               Parallel.For | ForEach
 parallelization
 (Parallel.For and      LazyInit<T>
 ForEach)               WriteOnce<T>
 Aggregate
                        AggregateException
 exceptions
 Other useful classes
                        Other goodies 
Parallel.For
 Parallelizes a for loop
 Instead of

 for (int i = 0; i < 10; ++i) { … }

 We write

 Parallel.For(0, 10, i => { … });
Parallel.For Overloads
 Step size
 ParallelState for cancelation
 Thread-local initialization
 Thread-local finalization
 References to a TaskManager
 Task creation options
Parallel.ForEach
 Same features as Parallel.For except
    No counters or steps
 Takes an IEnumerable<T> 
Cancelation
 Parallel.For takes an Action<Int32>
 delegate
 Can also take an
 Action<Int32, ParallelState>
   ParallelState keeps track of the state of parallel
   execution
   ParallelState.Stop() stops execution in all threads
Parallel.For Exceptions
 The AggregateException class holds all
 exceptions thrown
 Created even if only one thread throws
 Used by both Parallel.Xxx and PLINQ
 Original exceptions stored in
 InnerExceptions property.
LazyInit<T>
 Lazy initialization of a single variable
 Options
  – AllowMultipleExecution
    Init function can be called by many threads, only
    one value published
  – EnsureSingleExecution
    Init function executed only once
  – ThreadLocal
    One init call & value per thread
WriteOnce<T>
 Single-assignment structure
 Just like Nullable:
   HasValue
   Value
 Also try methods
   TryGetValue
   TrySetValue
Futures
 A future is the name of a value that will
 eventually be produced by a computation
 Thus, we can decide what to do with the
 value before we know it
Futures of T
• Future is a factory
• Future<T> is the actual future (and also has
  factory methods)
  To make a future
  – var f = Future.Create(() => g());
  To use a future
    Get f.Value
    The accessor does an async computation
Tasks & TaskManager
 A better Thread+ThreadPool combination
 TaskManager
   A very clever thread pool :)
   Adjusts worker threads to # of CPUs/cores
   Keeps all cores busy
 Task
   A unit of work
   May (or may not) run concurrently
 http://channel9.msdn.com/posts/DanielMoth/Parall
 elFX-Task-and-friends/
Task
 Just like a future, a task takes an Action<T>
  – Task t = Task.Create(DoSomeWork);
    Overloads exist :)
 Fires off immediately. To wait on completion
  – t.Wait();
 Unlike the thread pool, task manager will use
 as many threads as there are cores
Parallel LINQ (PLINQ)
 Parallel evaluation in
    LINQ to Objects
    LINQ to XML
 Features
    IParallelEnumerable<T>
    ParallelEnumerable.AsParallel static
    method
Example
IEnumerable<T> data = ...;
var q = data.AsParallel()
  .Where(x => p(x))
  .Orderby(x => k(x))
  .Select(x => f(x));

foreach (var e in q)
  a(e);
Part III

Interprocess communication with
PureMPI.NET
Message Passing Interface
 An API for general-purpose IPC
 Works across cores & machines
 C++ and Fortran
   Some Intel libraries support explicitly
 http://www.mcs.anl.gov/research/projects/m
 pich2/
PureMPI.NET
 A free library available at http://purempi.net
 Uses WCF endpoints for communication
 Uses MPI syntax
 Features
   A library DLL for WCF functionality
   An EXE for easy deployment over network
How it works
 Your computers run a service that connects
 them together
 Your program exposes WCF endpoints
 You use the MPI interfaces to communicate
Communicator & Rank
 A communicator is a group of computers
   In most scenarios, you would have one group
   MPI_COMM_WORLD

 comm
   Useful for determine whether we are the
Main
static void Main(string[] args)
{                           MPIEnvironment           app.config

  using (ProcessorGroup processors =
    new ProcessorGroup("MPIEnvironment",
                       MpiProcess))
  {                     Run MpiProcess on all machines

    processors.Start(); Start each one
    processors.WaitForCompletion(); Wait on all
  }
}
Sending & Receiving
 Blocking or non-blocking methods
   Send/Receive (blocking)
   Begin|End Send/Receive (async)
   Invoked on the comm
Send/Receive
static void MpiProcess(IDictionary<string, Comm> comms)
{              Get a default comm from dictionary
  Comm comm = comms["MPI_COMM_WORLD"];
  if (comm.Rank == 0)
  {                 Get a message from 1 (blocking)
    string msg = comm.Receive<string>(1, string.Empty);
    Console.WriteLine("Got " + msg);
  }
  else if (comm.Rank == 1)
  {
    comm.Send(0, string.Empty, "Hello");
  }           Send a message to 0 (also blocking)
}
Extras
 Can use async ops
 Can send to all (Broadcast)
 Can distribute work and then collect it
 (Gather/Scatter)
Thank You!

Mais conteúdo relacionado

Mais procurados

Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
CherryBerry2
 
Erlang Message Passing Concurrency, For The Win
Erlang  Message  Passing  Concurrency,  For  The  WinErlang  Message  Passing  Concurrency,  For  The  Win
Erlang Message Passing Concurrency, For The Win
l xf
 
Medical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUsMedical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUs
Daniel Blezek
 

Mais procurados (20)

Advance ROP Attacks
Advance ROP AttacksAdvance ROP Attacks
Advance ROP Attacks
 
OpenMP And C++
OpenMP And C++OpenMP And C++
OpenMP And C++
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
 
Open mp library functions and environment variables
Open mp library functions and environment variablesOpen mp library functions and environment variables
Open mp library functions and environment variables
 
Automatic Reference Counting @ Pragma Night
Automatic Reference Counting @ Pragma NightAutomatic Reference Counting @ Pragma Night
Automatic Reference Counting @ Pragma Night
 
MPI n OpenMP
MPI n OpenMPMPI n OpenMP
MPI n OpenMP
 
Erlang Message Passing Concurrency, For The Win
Erlang  Message  Passing  Concurrency,  For  The  WinErlang  Message  Passing  Concurrency,  For  The  Win
Erlang Message Passing Concurrency, For The Win
 
Wait for your fortune without Blocking!
Wait for your fortune without Blocking!Wait for your fortune without Blocking!
Wait for your fortune without Blocking!
 
openmp
openmpopenmp
openmp
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?
 
Why GC is eating all my CPU?
Why GC is eating all my CPU?Why GC is eating all my CPU?
Why GC is eating all my CPU?
 
Parallelization using open mp
Parallelization using open mpParallelization using open mp
Parallelization using open mp
 
Erlang
ErlangErlang
Erlang
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
 
Medical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUsMedical Image Processing Strategies for multi-core CPUs
Medical Image Processing Strategies for multi-core CPUs
 
OpenMp
OpenMpOpenMp
OpenMp
 
Openmp
OpenmpOpenmp
Openmp
 
JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020
 

Semelhante a .Net Multithreading and Parallelization

Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
Sri Prasanna
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
guest40fc7cd
 

Semelhante a .Net Multithreading and Parallelization (20)

.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Copper: A high performance workflow engine
Copper: A high performance workflow engineCopper: A high performance workflow engine
Copper: A high performance workflow engine
 
Async fun
Async funAsync fun
Async fun
 
Binary Studio Academy: Concurrency in C# 5.0
Binary Studio Academy: Concurrency in C# 5.0Binary Studio Academy: Concurrency in C# 5.0
Binary Studio Academy: Concurrency in C# 5.0
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Parallel Programming With Dot Net
Parallel Programming With Dot NetParallel Programming With Dot Net
Parallel Programming With Dot Net
 
OpenHPI - Parallel Programming Concepts - Week 3
OpenHPI - Parallel Programming Concepts - Week 3OpenHPI - Parallel Programming Concepts - Week 3
OpenHPI - Parallel Programming Concepts - Week 3
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
Operating System Chapter 4 Multithreaded programming
Operating System Chapter 4 Multithreaded programmingOperating System Chapter 4 Multithreaded programming
Operating System Chapter 4 Multithreaded programming
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
 
Training – Going Async
Training – Going AsyncTraining – Going Async
Training – Going Async
 
Parallel and Async Programming With C#
Parallel and Async Programming With C#Parallel and Async Programming With C#
Parallel and Async Programming With C#
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at Pinterest
 
Threads, Queues, and More: Async Programming in iOS
Threads, Queues, and More: Async Programming in iOSThreads, Queues, and More: Async Programming in iOS
Threads, Queues, and More: Async Programming in iOS
 
Parallel program design
Parallel program designParallel program design
Parallel program design
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Infinispan @ JBUG Milano
Infinispan @ JBUG MilanoInfinispan @ JBUG Milano
Infinispan @ JBUG Milano
 
Infinispan and Enterprise Data Grid
Infinispan and Enterprise Data GridInfinispan and Enterprise Data Grid
Infinispan and Enterprise Data Grid
 
Task and Data Parallelism
Task and Data ParallelismTask and Data Parallelism
Task and Data Parallelism
 

Mais de Dmitri Nesteruk

Converting Managed Languages to C++
Converting Managed Languages to C++Converting Managed Languages to C++
Converting Managed Languages to C++
Dmitri Nesteruk
 
Dynamics CRM Data Integration
Dynamics CRM Data IntegrationDynamics CRM Data Integration
Dynamics CRM Data Integration
Dmitri Nesteruk
 
ReSharper Architecture & Extensions
ReSharper Architecture & ExtensionsReSharper Architecture & Extensions
ReSharper Architecture & Extensions
Dmitri Nesteruk
 

Mais de Dmitri Nesteruk (20)

Good Ideas in Programming Languages
Good Ideas in Programming LanguagesGood Ideas in Programming Languages
Good Ideas in Programming Languages
 
Design Pattern Observations
Design Pattern ObservationsDesign Pattern Observations
Design Pattern Observations
 
CallSharp: Automatic Input/Output Matching in .NET
CallSharp: Automatic Input/Output Matching in .NETCallSharp: Automatic Input/Output Matching in .NET
CallSharp: Automatic Input/Output Matching in .NET
 
Design Patterns in Modern C++
Design Patterns in Modern C++Design Patterns in Modern C++
Design Patterns in Modern C++
 
C# Tricks
C# TricksC# Tricks
C# Tricks
 
Introduction to Programming Bots
Introduction to Programming BotsIntroduction to Programming Bots
Introduction to Programming Bots
 
Converting Managed Languages to C++
Converting Managed Languages to C++Converting Managed Languages to C++
Converting Managed Languages to C++
 
Monte Carlo C++
Monte Carlo C++Monte Carlo C++
Monte Carlo C++
 
Tpl DataFlow
Tpl DataFlowTpl DataFlow
Tpl DataFlow
 
YouTrack: Not Just an Issue Tracker
YouTrack: Not Just an Issue TrackerYouTrack: Not Just an Issue Tracker
YouTrack: Not Just an Issue Tracker
 
Проект X2C
Проект X2CПроект X2C
Проект X2C
 
Domain Transformations
Domain TransformationsDomain Transformations
Domain Transformations
 
Victor CG Erofeev - Metro UI
Victor CG Erofeev - Metro UIVictor CG Erofeev - Metro UI
Victor CG Erofeev - Metro UI
 
Developer Efficiency
Developer EfficiencyDeveloper Efficiency
Developer Efficiency
 
Distributed Development
Distributed DevelopmentDistributed Development
Distributed Development
 
Dynamics CRM Data Integration
Dynamics CRM Data IntegrationDynamics CRM Data Integration
Dynamics CRM Data Integration
 
ReSharper Presentation for NUGs
ReSharper Presentation for NUGsReSharper Presentation for NUGs
ReSharper Presentation for NUGs
 
ReSharper Architecture & Extensions
ReSharper Architecture & ExtensionsReSharper Architecture & Extensions
ReSharper Architecture & Extensions
 
Web mining
Web miningWeb mining
Web mining
 
Data mapping tutorial
Data mapping tutorialData mapping tutorial
Data mapping tutorial
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

.Net Multithreading and Parallelization

  • 1. Multithreading and Parallelization Dmitri Nesteruk dmitrinesteruk@gmail.com | http://nesteruk.org/seminars
  • 2. Agenda Overview Multithreading PowerThreading (AsyncEnumerator) Multi-core parallelization Parallel Extensions to .NET Framework Multi-computer parallelization PureMPI.NET
  • 3. Why now? Manycore paradigm shift CPU speeds reach production challenges (not at the limit yet) growth Processor features Hyper-threading SIMD
  • 4. CPU Scope Past: more Yesterday transistors per chip 1x-core Present: more cores per chip Today 2x-core norm Future: even more 4x- cores per chip; Tomorrow NUMA & other 32x-core? specialties
  • 5. Machine Scope Most clients are concerned with Machine one-machine use Clustering helps Cluster leverage performance Clouds Cloud
  • 6. Multithreading vs. Parallelization Multithreading Using threads/thread pool to perform async operations Explicit (# of threads known) Parallelization Implicit parallelization No explicit thread operation
  • 7. Ways to Parallelize/Multithread System.Threading Managed Parr. Extensions Libraries OpenMP Unmanaged Libraries GPGPU Specialized FPGA
  • 8. Managed System.Threading Libraries Parallel Extensions (TPL + PLINQ) PowerThreading Languages/frameworks Sing#, CCR Remoting, WCF, MPI.NET, PureMPI.NET, etc. Use over many machines
  • 9. Unmanaged OpenMP – #pragma directives in C++ code Intel multi-core libraries Threading Building Blocks (low-level) Integrated Performance Primitives Math Kernel Library (also has MPI support) MPI, PVM, etc. Use over many machines
  • 10. Specialized Ex. (Intrinsic Parallelization) GPU Computation (GPGPU) Calculations on graphic card Uses programmable pixel shaders See, e.g., NVidia CUDA, GPGPU.org FPGA Hardware-specific solutions E.g., in-socket accelerators Requires HDL programming & custom hardware
  • 11. Part I Multithreading: a look at AsyncEnumerator
  • 12. Multithreading Goals Do stuff concurrently Preserve safety/consistency Tools Threads ThreadPool Synchronization objects Framework async APIs
  • 13. A Look at Delegates Making delegate for function is easy Given void a() { … } – ThreadStart del = a; Given void a(int n) { … } – Action<int> del = a; Given float a(int n, double m) {…} – Func<int, double, float> del = a; Otherwise, make your own!
  • 14. Delegate Methods Invoke() Synchronous, blocks your thread  BeginInvoke Executes in ThreadPool Returns IAsyncResult EndInvoke Waits for completion Takes the IAsyncResult from BeginInvoke
  • 15. Usage Fire and forget – del.BeginInvoke(null, null); Fire, and wait until done – IAsyncResult ar = del.BeginInvoke(null,null); … del.EndInvoke(ar); Fire, and call a function when done – del.BeginInvoke(firedWhenDone, null); Callback parameter
  • 16. WaitOne and WaitAll To wait until either delegate completes – WaitHandle.WaitOne( new ThreadStart[] { ar1.AsyncWaitHandle, ar2.AsyncWaitHandle }); // wait until either completes To wait until all delegates complete Use WaitAll instead of WaitOne – [MTAThread]-specific, use Pulse & Wait instead
  • 17. Example Execute a() and b() in parallel; wait on both ThreadStart delA = a; ThreadStart delB = b; IAsyncResult arA = delA.BeginInvoke(null, null); IAsyncResult arB = delB.BeginInvoke(null, null); WaitHandle.WaitAll(new [] { arA.AsyncWaitHandle, arB.AsyncWaitHandle });
  • 18. LINQ Example Execute a() and b() in parallel; wait on both WaitHandle.WaitAll( new [] { a, b } Implicitly make an array of delegates .Select (f =>f.BeginInvoke(null,null) Call each delegate .AsyncWaitHandle) .ToArray()); Get a wait handle of each Convert from IEnumerable to array
  • 19. Asynchronous Programming Model (APM) Basic goal – IAsyncResult ar = del.BeginXXX(null,null); … del.EndXXX(ar); Supported by Framework classes, e.g., – FileStream – WebRequest
  • 20. Difficulties Async calls do not always succeed Timeout Exceptions Cancelation Results in too many functions/anonymous delegates Async workflow code becomes difficult to read
  • 21. PowerThreading A free library from Resource locks Wintellect (Jeffrey ReaderWriterGate Richter) Async. prog. model Get it at AsyncEnumerator wintellect.com SyncGate Other features Also check out IO PowerCollections State manager NumaInformation :)
  • 22. AsyncEnumerator Simplifies APM programming No need to manually manage IAsyncResult cookies Fewer functions, cleaner code
  • 23. Usage patterns 1 async op → process X async ops → process all X async ops → process each one as it completes X async ops → process some, discard the rest X async ops → process some until cancellation/timeout occurs, discard the rest
  • 24. AsyncEnumerator Basics Has three methods Execute(IEnumerator<Int32>) BeginExecute EndExecute Also exists as AsyncEnumerator<T> when a return value is required
  • 25. Inside the Function internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; WebResponse resp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response }
  • 26. Signature internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { Function must return IEnumerator<Int32> WebRequestwr = WebRequest.Create(uri); Function must accept AsyncEnumerator as wr.BeginGetResponse(ae.End(), null); one of the parameters (order unimportant) yield return 1; WebResponseresp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response }
  • 27. Callback internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yieldthe asyncBeginXXX() methods Call return 1; WebResponseresp = wr.EndGetResponse( Pass ae.End() as callback parameter ae.DequeueAsyncResult()); // use response }
  • 28. Yield internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; WebResponseresp = wr.EndGetResponse( Now yield return the number of pending asynchronous operations ae.DequeueAsyncResult()); // use response }
  • 29. Wait & Process internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; Call the asyncEndXXX() methods WebResponse resp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response Pass ae.DequeueAsyncResult() as parameter }
  • 30. Usage Init the enumerator – var ae = new AsyncEnumerator(); Use it, passing itself as a parameter – ae.Execute(GetFile( ae, “http://nesteruk.org”));
  • 31. Exception Handling Break out of function – try { resp = wr.EndGetResponse( ae.DequeueAsyncResult()); } catch (WebException e) { // process e yield break; } Propagate a parameter
  • 32. Discard Groups Sometimes, you want to ignore the result of some calls E.g., you already got the data elsewhere To discard a group of calls Use overloaded End(…) methods to specify Group number Cleanup delegate Call DiscardGroup(…) with group number
  • 33. Cancellation External code can cancel the iterator – ae.Cancel(…) Or specify a timeout – ae.SetCancelTimeout(…) Check whether iterator is cancelled with – ae.IsCanceled(…) just call yield break if it is
  • 34. Part II Parallel Extensions to .NET Framework TPL and PLINQ
  • 35. Parallelization Algorithms vary (e.g., matrix multiplication) Some not so (e.g., matrix inversion) Some not at all parallelize them
  • 36. Parallel Extensions to .NET Framework (PFX) A library for parallelization Consists of Task Parallel Library Parallel LINQ (PLINQ) Currently in CTP stage Maybe in .NET 4.0?
  • 37. Task Parallel Library Features System.Linq Parallel LINQ System.Theading Implicit parallelism (Parallel.Xxx) System.Threading.Collections Thread-safe stack and queue System.Threading.Tasks Task manager, tasks, futures
  • 38. System.Threading Implicit Parallel.For | ForEach parallelization (Parallel.For and LazyInit<T> ForEach) WriteOnce<T> Aggregate AggregateException exceptions Other useful classes Other goodies 
  • 39. Parallel.For Parallelizes a for loop Instead of for (int i = 0; i < 10; ++i) { … } We write Parallel.For(0, 10, i => { … });
  • 40. Parallel.For Overloads Step size ParallelState for cancelation Thread-local initialization Thread-local finalization References to a TaskManager Task creation options
  • 41. Parallel.ForEach Same features as Parallel.For except No counters or steps Takes an IEnumerable<T> 
  • 42. Cancelation Parallel.For takes an Action<Int32> delegate Can also take an Action<Int32, ParallelState> ParallelState keeps track of the state of parallel execution ParallelState.Stop() stops execution in all threads
  • 43. Parallel.For Exceptions The AggregateException class holds all exceptions thrown Created even if only one thread throws Used by both Parallel.Xxx and PLINQ Original exceptions stored in InnerExceptions property.
  • 44. LazyInit<T> Lazy initialization of a single variable Options – AllowMultipleExecution Init function can be called by many threads, only one value published – EnsureSingleExecution Init function executed only once – ThreadLocal One init call & value per thread
  • 45. WriteOnce<T> Single-assignment structure Just like Nullable: HasValue Value Also try methods TryGetValue TrySetValue
  • 46. Futures A future is the name of a value that will eventually be produced by a computation Thus, we can decide what to do with the value before we know it
  • 47. Futures of T • Future is a factory • Future<T> is the actual future (and also has factory methods) To make a future – var f = Future.Create(() => g()); To use a future Get f.Value The accessor does an async computation
  • 48. Tasks & TaskManager A better Thread+ThreadPool combination TaskManager A very clever thread pool :) Adjusts worker threads to # of CPUs/cores Keeps all cores busy Task A unit of work May (or may not) run concurrently http://channel9.msdn.com/posts/DanielMoth/Parall elFX-Task-and-friends/
  • 49. Task Just like a future, a task takes an Action<T> – Task t = Task.Create(DoSomeWork); Overloads exist :) Fires off immediately. To wait on completion – t.Wait(); Unlike the thread pool, task manager will use as many threads as there are cores
  • 50. Parallel LINQ (PLINQ) Parallel evaluation in LINQ to Objects LINQ to XML Features IParallelEnumerable<T> ParallelEnumerable.AsParallel static method
  • 51. Example IEnumerable<T> data = ...; var q = data.AsParallel() .Where(x => p(x)) .Orderby(x => k(x)) .Select(x => f(x)); foreach (var e in q) a(e);
  • 53. Message Passing Interface An API for general-purpose IPC Works across cores & machines C++ and Fortran Some Intel libraries support explicitly http://www.mcs.anl.gov/research/projects/m pich2/
  • 54. PureMPI.NET A free library available at http://purempi.net Uses WCF endpoints for communication Uses MPI syntax Features A library DLL for WCF functionality An EXE for easy deployment over network
  • 55. How it works Your computers run a service that connects them together Your program exposes WCF endpoints You use the MPI interfaces to communicate
  • 56. Communicator & Rank A communicator is a group of computers In most scenarios, you would have one group MPI_COMM_WORLD comm Useful for determine whether we are the
  • 57. Main static void Main(string[] args) { MPIEnvironment app.config using (ProcessorGroup processors = new ProcessorGroup("MPIEnvironment", MpiProcess)) { Run MpiProcess on all machines processors.Start(); Start each one processors.WaitForCompletion(); Wait on all } }
  • 58. Sending & Receiving Blocking or non-blocking methods Send/Receive (blocking) Begin|End Send/Receive (async) Invoked on the comm
  • 59. Send/Receive static void MpiProcess(IDictionary<string, Comm> comms) { Get a default comm from dictionary Comm comm = comms["MPI_COMM_WORLD"]; if (comm.Rank == 0) { Get a message from 1 (blocking) string msg = comm.Receive<string>(1, string.Empty); Console.WriteLine("Got " + msg); } else if (comm.Rank == 1) { comm.Send(0, string.Empty, "Hello"); } Send a message to 0 (also blocking) }
  • 60. Extras Can use async ops Can send to all (Broadcast) Can distribute work and then collect it (Gather/Scatter)