O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Toub parallelism tour_oct2009

Carregando em…3

Confira estes a seguir

1 de 41 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (20)


Semelhante a Toub parallelism tour_oct2009 (20)

Mais recentes (20)


Toub parallelism tour_oct2009

  1. 1. http://go.microsoft.com/?linkid=9692084<br />
  2. 2. Parallel Programming with Visual Studio 2010 and the .NET Framework 4<br />Stephen Toub<br />Microsoft Corporation<br />October 2009<br />
  3. 3. Agenda<br />Why Parallelism, Why Now?<br />Difficulties w/ Visual Studio 2008 & .NET 3.5<br />Solutions w/ Visual Studio 2010 & .NET 4<br />Parallel LINQ<br />Task Parallel Library<br />New Coordination & Synchronization Primitives<br />New Parallel Debugger Windows<br />New Profiler Concurrency Visualizations<br />
  4. 4. Moore’s Law<br />“The number of transistors incorporated in a chip will approximately double every 24 months.” <br />Gordon Moore<br />Intel Co-Founder<br />http://www.intel.com/pressroom/kits/events/moores_law_40th/<br />
  5. 5. Moore’s Law: Alive and Well?<br />The number of transistors doubles every two years…<br />More than 1 billiontransistors<br />in 2006!<br />http://upload.wikimedia.org/wikipedia/commons/2/25/Transistor_Count_and_Moore%27s_Law_-_2008_1024.png<br />
  6. 6. Moore’s Law: Feel the Heat! <br />Sun’s Surface<br />10,000<br />1,000<br />100<br />10<br />1<br />Rocket Nozzle<br />Nuclear Reactor<br />Power Density (W/cm2)<br />Pentium® processors<br />Hot Plate<br />8080<br />‘70 ‘80 ’90 ’00 ‘10<br />486<br />386<br />Intel Developer Forum, Spring 2004 - Pat Gelsinger<br />
  7. 7. Moore’s Law: But Different<br />Frequencies will NOT get much faster!<br />Maybe 5 to 10% every year or so, a few more times…<br />And these modest gains would make the chips A LOThotter!<br />http://www.tomshw.it/cpu.php?guide=20051121<br />
  8. 8. The Manycore Shift<br />“[A]fter decades of single core processors, the high volume processor industry has gone from single to dual to quad-core in just the last two years. Moore’s Law scaling should easily let us hit the 80-core mark in mainstream processors within the next ten years and quite possibly even less.”-- Justin Rattner, CTO, Intel (February 2007)<br />“If you haven’t done so already, now is the time to take a hard look at the design of your application, determine what operations are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from concurrency.”-- Herb Sutter, C++ Architect at Microsoft (March 2005)<br />
  9. 9. I'm convinced… now what?<br />Multithreaded programming is “hard” today<br />Doable by only a subgroup of senior specialists<br />Parallel patterns are not prevalent, well known, nor easy to implement<br />So many potential problems<br />Businesses have little desire to “go deep”<br />Best devs should focus on business value, not concurrency<br />Need simple ways to allow all devs to write concurrent code<br />
  10. 10. Example: “Race Car Drivers”<br />IEnumerable<RaceCarDriver> drivers = ...;<br />varresults = new List<RaceCarDriver>();<br />foreach(var driver in drivers)<br />{<br />if (driver.Name == queryName &&<br />driver.Wins.Count >= queryWinCount)<br /> {<br />results.Add(driver);<br /> }<br />}<br />results.Sort((b1, b2) => <br /> b1.Age.CompareTo(b2.Age));<br />
  11. 11. Manual Parallel Solution<br />IEnumerable<RaceCarDriver> drivers = …;<br />varresults = new List<RaceCarDriver>();<br />intpartitionsCount = Environment.ProcessorCount;<br />intremainingCount = partitionsCount;<br />varenumerator = drivers.GetEnumerator();<br />try {<br />using (vardone = new ManualResetEvent(false)) {<br />for(inti = 0; i < partitionsCount; i++) {<br />ThreadPool.QueueUserWorkItem(delegate {<br />while(true) {<br />RaceCarDriver driver;<br />lock (enumerator) {<br />if (!enumerator.MoveNext()) break;<br /> driver = enumerator.Current;<br /> }<br />if (driver.Name == queryName &&<br />driver.Wins.Count >= queryWinCount) {<br />lock(results) results.Add(driver);<br /> }<br /> }<br />if (Interlocked.Decrement(refremainingCount) == 0) done.Set();<br /> });<br /> }<br />done.WaitOne();<br />results.Sort((b1, b2) => b1.Age.CompareTo(b2.Age));<br /> }<br />}<br />finally { if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose(); }<br />
  12. 12. P<br />LINQ Solution<br />.AsParallel()<br />varresults = from driver in drivers<br />where driver.Name == queryName &&<br />driver.Wins.Count >= queryWinCount<br />orderbydriver.Ageascending<br />select driver;<br />
  13. 13. Visual Studio 2010Tools, Programming Models, Runtimes<br />Programming Models<br />Tools<br />.NET Framework 4<br />Visual C++ 10<br />Visual <br />Studio<br />IDE<br />Parallel LINQ<br />Parallel Pattern Library<br />Parallel<br />Debugger Tool Windows<br />AgentsLibrary<br />Task Parallel <br />Library<br />Data Structures<br />Data Structures<br />Concurrency Runtime<br />Profiler Concurrency<br />Analysis<br />Task Scheduler<br />ThreadPool<br />Task Scheduler<br />Resource Manager<br />Resource Manager<br />Operating System<br />Windows<br />Threads<br />UMS Threads<br />Key:<br />Managed<br />Native<br />Tooling<br />
  14. 14. Parallel Extensions<br />What is it?<br />Pure .NET libraries<br />No compiler changes necessary<br />mscorlib.dll, System.dll, System.Core.dll<br />Lightweight, user-mode runtime<br />Key ThreadPool enhancements<br />Supports imperative and declarative, data and task parallelism<br />Declarative data parallelism (PLINQ)<br />Imperative data and task parallelism (Task Parallel Library)<br />New coordination/synchronization constructs<br />Why do we need it?<br />Supports parallelism in any .NET language<br />Delivers reduced concept count and complexity, better time to solution<br />Begins to move parallelism capabilities from concurrency experts to domain experts<br />How do we get it?<br />Built into the core of .NET 4<br />Debugging and profiling support in Visual Studio 2010<br />
  15. 15. Architecture<br />PLINQ Execution Engine<br />.NET Program<br />Parallel Algorithms<br />Declarative<br />Queries<br />Query Analysis<br />Data Partitioning<br />Chunk<br />Range<br />Hash<br />Striped<br />Repartitioning<br />Custom<br />Operator Types<br />Merging<br />Sync and Async<br />Order Preserving<br />Buffered<br />Inverted<br />Map<br />Filter<br />Sort<br />Search<br />Reduce<br />Group<br />Join<br />…<br />C# Compiler<br />VB Compiler<br />Task Parallel Library<br />Coordination Data Structures<br />C++ Compiler<br />Loop replacementsImperative Task Parallelism<br />Scheduling<br />Thread-safe Collections<br />Synchronization Types<br />Coordination Types<br />F# Compiler<br />Other .NET Compiler<br />Threads<br />IL<br />Proc 1<br />Proc p<br />…<br />
  16. 16. Language Integrated Query (LINQ)<br />LINQ enabled data sources<br />Others…<br />C#<br />Visual Basic<br />.NET Standard Query Operators<br />LINQ-enabled ADO.NET<br />LINQ<br />To SQL<br />LINQ<br />To XML<br />LINQ<br />To Objects<br />LINQ<br />To Datasets<br />LINQ<br />To Entities<br /><book><br /> <title/><br /> <author/><br /> <price/><br /></book><br />Relational<br />Objects<br />XML<br />
  17. 17. Writing a LINQ-to-Objects Query<br />Two ways to write queries<br />Comprehensions<br />Syntax extensions to C# and Visual Basic<br />APIs<br />Used as extension methods on IEnumerable<T><br />System.Linq.Enumerable class<br />Compiler converts the former into the latter<br />API implementation does the actual work<br />var q = from x in Y where p(x) orderby x.f1select x.f2;<br />var q = Y.Where(x => p(x)).OrderBy(x => x.f1).Select(x => x.f2);<br />var q = Enumerable.Select(<br />Enumerable.OrderBy(<br />Enumerable.Where(Y, x => p(x)),<br />x => x.f1),<br />x => x.f2);<br />
  18. 18. LINQ Query Operators<br /><ul><li>In .NET 4, ~50 operators w/ ~175 overloads</li></ul>Aggregate(3)<br />All(1)<br />Any(2)<br />AsEnumerable(1)<br />Average(20)<br />Cast(1)<br />Concat(1)<br />Contains(2)<br />Count(2)<br />DefaultIfEmpty(2)<br />Distinct(2)<br />ElementAt(1)<br />ElementAtOrDefault(1)<br />Empty(1)<br />Except(2)<br />First(2)<br />FirstOrDefault(2)<br />GroupBy(8)<br />GroupJoin(2)<br />Intersect(2)<br />Join(2)<br />Last(2)<br />LastOrDefault(2)<br />LongCount(2)<br />Max(22)<br />Min(22)<br />OfType(1)<br />OrderBy(2)<br />OrderByDescending(2)<br />Range(1)<br />Repeat(1)<br />Reverse(1)<br />Select(2)<br />SelectMany(4)<br />SequenceEqual(2)<br />Single(2)<br />SingleOrDefault(2)<br />Skip(1)<br />SkipWhile(2)<br />Sum(20)<br />Take(1)<br />TakeWhile(2)<br />ThenBy(2)<br />ThenByDescending(2)<br />ToArray(1)<br />ToDictionary(4)<br />ToList(1)<br />ToLookup(4)<br />Union(2)<br />Where(2)<br />Zip(1)<br />var operators = from method in typeof(Enumerable).GetMethods(<br />BindingFlags.Public | BindingFlags.Static | BindingFlags.DeclaredOnly)<br /> group method by method.Name into methods<br />orderbymethods.Key<br /> select new { Name = methods.Key, Count=methods.Count() };<br />
  19. 19. Query Operators, cont.<br />Tree of operators<br />Producers<br />No input<br />Examples: Range, Repeat<br />Consumer/producers<br />Transform input stream(s) into output stream<br />Examples: Select, Where, Join, Skip, Take<br />Consumers<br />Reduce to a single value<br />Examples: Aggregate, Min, Max, First<br />Many are unary while others are binary<br /><ul><li>Data-intensive bulk transformations</li></ul>…<br />Select<br />Join<br />Where<br />Where<br />
  20. 20. Implementation of a Query Operator<br />What might an implementation look like?<br />Does it have to be this way?<br />What if we could do this in… parallel?!<br />public static IEnumerable<TSource> Where<TSource>(<br /> this IEnumerable<TSource> source, <br />Func<TSource, bool> predicate)<br />{<br /> if (source == null || predicate == null) <br /> throw new ArgumentNullException();<br />foreach (var item in source)<br /> {<br /> if (predicate(item)) yield return item;<br /> }<br />}<br />public static IEnumerable<TSource> Where<TSource>(<br /> this IEnumerable<TSource> source, <br />Func<TSource, bool> predicate)<br />{<br /> ...<br />}<br />
  21. 21. Parallel LINQ (PLINQ)<br />Utilizes parallel hardware for LINQ queries<br />Abstracts away most parallelism details<br />Partitions and merges data intelligently<br />Supports all .NET Standard Query Operators<br />Plus a few knobs<br />Works for any IEnumerable<T><br />Optimizations for other types (T[], IList<T>)<br />Supports custom partitioning (Partitioner<T>)<br />Built on top of the rest of Parallel Extensions<br />
  22. 22. Programming Model<br />Minimal impact to existing LINQ programming model<br />AsParallel extension method<br />ParallelEnumerable class<br />Implements the Standard Query Operators, but for ParallelQuery<T><br />public static ParallelQuery<T> <br />AsParallel<T>(this IEnumerable<T> source);<br />public static ParallelQuery<TSource> <br />Where<TSource>( this ParallelQuery<TSource> source, <br />Func<TSource, bool> predicate)<br />
  23. 23. Writing a PLINQ Query<br />Two ways to write queries<br />Comprehensions<br />Syntax extensions to C# and Visual Basic<br />APIs<br />Used as extension methods on ParallelQuery<T><br />System.Linq.ParallelEnumerable class<br />Compiler converts the former into the latter <br />As with serial LINQ, API implementation does the actual work<br />var q = from x in Y.AsParallel()where p(x) orderby x.f1select x.f2;<br />var q = Y.AsParallel().Where(x => p(x)).OrderBy(x => x.f1).Select(x => x.f2);<br />var q = ParallelEnumerable.Select(<br />ParallelEnumerable.OrderBy(<br />ParallelEnumerable.Where(Y.AsParallel(), x => p(x)),<br />x => x.f1),<br />x => x.f2);<br />
  24. 24. PLINQ Knobs<br />Additional Extension Methods<br />WithDegreeOfParallelism<br />AsOrdered<br />WithCancellation<br />WithMergeOptions<br />WithExecutionMode<br />var results = from driver in drivers.AsParallel().WithDegreeOfParallelism(4)<br /> where driver.Name == queryName &&<br />driver.Wins.Count >= queryWinCount<br />orderbydriver.Age ascending<br /> select driver;<br />var results = from driver in drivers.AsParallel().AsOrdered()<br /> where driver.Name == queryName &&<br />driver.Wins.Count >= queryWinCount<br />orderbydriver.Age ascending<br /> select driver;<br />
  25. 25. Partitioning<br /><ul><li>Input to a single operator is partitioned into p disjoint subsets
  26. 26. Operators are replicated across the partitions
  27. 27. Example</li></ul> from x in A where p(x) …<br /><ul><li>Partitions execute in (almost) complete isolation</li></ul>… Task 1 …<br />where p(x)<br />A<br />… Tasks 2..n-1 …<br />… Task n…<br />where p(x)<br />
  28. 28. Partitioning: Load Balancing<br />DynamicScheduling<br />Static Scheduling (Range)<br />CPU0<br />CPU1<br />…<br />CPUN<br />CPU0<br />CPU1<br />…<br />CPUN<br />5<br />5<br />7<br />3<br />1<br />7<br />3<br />1<br />6<br />6<br />8<br />8<br />2<br />2<br />4<br />4<br />
  29. 29. Several partitioning schemes built-in<br />Chunk<br />Works with any IEnumerable<T><br />Single enumerator shared; chunks handed out on-demand<br />Range<br />Works only with IList<T><br />Input divided into contiguous regions, one per partition<br />Stripe<br />Works only with IList<T><br />Elements handed out round-robin to each partition<br />Hash<br />Works with any IEnumerable<T><br />Elements assigned to partition based on hash code<br />Custom partitioning available through Partitioner<T><br />Partitioner.Createavailable for tighter control over built-in partitioning schemes<br />Partitioning: Algorithms<br />
  30. 30. Operator Fusion<br /><ul><li>Naïve approach: partition and merge for each operator
  31. 31. Example: (from x in D.AsParallel() where p(x) select x*x*x).Sum();
  32. 32. Partition and merge mean synchronization => scalabilitybottleneck
  33. 33. Instead, we can fuse operators together:
  34. 34. Minimizes number of partitioning/merging steps necessary</li></ul>… Task 1 …<br />… Task 1 …<br />… Task 1 …<br />where p(x)<br />select x3<br />Sum()<br />D<br />#<br />… Task n…<br />… Task n…<br />… Task n…<br />where p(x)<br />select x3<br />Sum()<br />… Task 1 …<br />where p(x)<br />select x3<br />Sum()<br />D<br />#<br />… Task n…<br />where p(x)<br />select x3<br />Sum()<br />
  35. 35. Merging<br />Pipelined: separate consumer thread<br />Default for GetEnumerator()<br />And hence foreach loops<br />AutoBuffered, NoBuffering<br />Access to data as its available<br />But more synchronization overhead<br />Stop-and-go: consumer helps<br />Sorts, ToArray, ToList, etc.<br />FullyBuffered<br />Minimizes context switches<br />But higher latency and more memory<br />Inverted: no merging needed<br />ForAll extension method<br />Most efficient by far<br />But not always applicable<br />Requires side-effects<br />Thread 2<br />Thread 1<br />Thread 1<br />Thread 3<br />Thread 4<br />Thread 1<br />Thread 1<br />Thread 1<br />Thread 2<br />Thread 3<br />Thread 1<br />Thread 1<br />Thread 1<br />Thread 2<br />Thread 3<br />
  36. 36. Parallelism Blockers<br />Ordering not guaranteed<br />Exceptions<br />Thread affinity<br />Operations with < 1.0 speedup<br />Side effects and mutability are serious issues<br />Most queries do not use side effects, but it’s possible…<br />int[] values = new int[] { 0, 1, 2 };var q = from x in values.AsParallel() select x * 2;int[] scaled = q.ToArray(); // == { 0, 2, 4 }?<br />System.AggregateException<br />object[] data = new object[] { "foo", null, null };var q = from x in data.AsParallel() select o.ToString();<br />controls.AsParallel().ForAll(c => c.Size = ...);<br />IEnumerable<int> input = …;<br />var doubled = from x in input.AsParallel() select x*2;<br />Random rand = new Random();<br />var q = from i in Enumerable.Range(0, 10000).AsParallel() <br /> select rand.Next();<br />
  37. 37. Task Parallel LibraryLoops<br />Loops are a common source of work<br />Can be parallelized when iterations are independent<br />Body doesn’t depend on mutable state / synchronization used<br />Synchronous<br />All iterations finish, regularly or exceptionally<br />Lots of knobs<br />Breaking, task-local state, custom partitioning, cancellation, scheduling, degree of parallelism<br />Visual Studio 2010 profiler support (as with PLINQ)<br />for (inti = 0; i < n; i++) work(i);<br />…<br />foreach (T e in data) work(e);<br />Parallel.For(0, n, i => work(i));<br />…<br />Parallel.ForEach(data, e => work(e));<br />
  38. 38. Task Parallel LibraryStatements<br />Sequence of statements<br />When independent, can be parallelized<br />Synchronous (same as loops)<br />Under the covers<br />May use Parallel.For, may use Tasks<br />StatementA();<br />StatementB;<br />StatementC();<br />Parallel.Invoke(<br /> () => StatementA() ,<br /> () => StatementB ,<br /> () => StatementC() );<br />
  39. 39. Task Parallel LibraryTasks<br />System.Threading.Tasks<br />Task<br />Represents an asynchronous operation<br />Supports waiting, cancellation, continuations, …<br />Parent/child relationships<br />1st-class debugging support in Visual Studio 2010<br />Task<TResult> : Task<br />Tasks that return results<br />TaskCompletionSource<TResult><br />Create Task<TResult>s to represent other operations<br />TaskScheduler<br />Represents a scheduler that executes tasks<br />Extensible<br />TaskScheduler.Default => ThreadPool<br />
  40. 40. Global Queue<br />Worker Thread 1<br />Worker Thread 1<br />ThreadPool in .NET 3.5<br />…<br />Item 4<br />Item 5<br />Program Thread<br />Item 1<br />Item 2<br />Item 3<br />Item 6<br />Thread Management:<br /><ul><li>Starvation Detection
  41. 41. Idle Thread Retirement</li></li></ul><li>ThreadPool in .NET 4<br />Local<br />Work-Stealing Queue<br />Local Work-Stealing Queue<br />Lock-Free<br />Global Queue<br />…<br />Worker Thread 1<br />Worker Thread p<br />…<br />Task 6<br />Task 3<br />Program Thread<br />Task 4<br />Task 1<br />Task 5<br />Task 2<br />Thread Management:<br /><ul><li>Starvation Detection
  42. 42. Idle Thread Retirement
  43. 43. Hill-climbing</li></li></ul><li>New Primitives<br />Public, and used throughout PLINQ and TPL<br />Address many of today’s core concurrency issues<br />Thread-safe, scalable collections<br />IProducerConsumerCollection<T><br />ConcurrentQueue<T><br />ConcurrentStack<T><br />ConcurrentBag<T><br />ConcurrentDictionary<TKey,TValue><br />Phases and work exchange<br />Barrier <br />BlockingCollection<T><br />CountdownEvent<br />Partitioning<br />{Orderable}Partitioner<T><br />Partitioner.Create<br />Exception handling<br />AggregateException<br />Initialization<br />Lazy<T><br />LazyInitializer.EnsureInitialized<T><br />ThreadLocal<T><br />Locks<br />ManualResetEventSlim<br />SemaphoreSlim<br />SpinLock<br />SpinWait<br />Cancellation<br />CancellationToken{Source}<br />
  44. 44. What Can I Do with These Cores?<br />Offload<br />Free up your UI<br />Go faster whenever you can<br />Parallelize the parallelizable<br />Do more<br />Use more data to get better results<br />Add more features<br />Speculate<br />Pre-fetch, Pre-process<br />Evaluate multiple solutions<br />
  45. 45. Performance Tips<br />Compute intensive and/or large data sets<br />Work done should be at least 1,000s of cycles<br />Measure, and combine/optimize as necessary<br />Use the Visual Studio concurrency profiler<br />Look for common anti-patterns: load imbalance, lock convoys, etc.<br />Parallelize fine-grained but not too fine-grained<br />e.g. Parallelize outer loop, unless N is insufficiently large to offer enough parallelism<br />Consider parallelizing only inner, or both, at that point<br />Consider unrolling<br />Do not be gratuitous in task creation<br />Lightweight, but still requires object allocation, etc.<br />Prefer isolation & immutability over synchronization<br />Synchronization => !Scalable<br />Try to avoid shared state<br />Have realistic expectations<br />
  46. 46. Amdahl’s Law<br />Theoretical maximum speedup determined by amount of sequential code<br />
  47. 47. To Infinity And Beyond…<br />The “Manycore Shift” is happening<br />Parallelism in your code is inevitable<br />Visual Studio 2010 and .NET 4 will help<br />Parallel Computing Dev Center<br />http://msdn.com/concurrency<br />Download Beta 2 (“go-live” license)<br />http://go.microsoft.com/?linkid=9692084<br />Team Blogs<br />Managed: http://blogs.msdn.com/pfxteam<br />Native: http://blogs.msdn.com/nativeconcurrency<br />Tools: http://blogs.msdn.com/visualizeconcurrency<br />Forums<br />http://social.msdn.microsoft.com/Forums/en-US/category/parallelcomputing<br />We love feedback!<br />
  48. 48. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.<br />The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.<br />