SlideShare uma empresa Scribd logo
1 de 36
Exploiting Parallelism with Multi-core Technologies ,[object Object],[object Object],[object Object],[object Object]
Coding with TBB Contest ,[object Object],[object Object],[object Object],[object Object],[object Object]
Problem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Three Approaches for Improvement ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Family Tree  1988 2001 2006 1995 Languages *Other names and brands may be claimed as the property of others Cilk  space efficient scheduler cache-oblivious algorithms Threaded-C continuation tasks task stealing OpenMP* fork/join tasks OpenMP taskqueue while & recursion Pragmas Chare Kernel small tasks JSR-166 (FJTask) containers Intel® Threading Building Blocks   STL generic programming STAPL recursive ranges ECMA .NET* parallel iteration classes Libraries
Enter Intel® Threading Building Blocks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example (just a peek) void ApplyFoo( size_t n , int x  ) { for(size_t i=range_begin; i<range_end; ++i) Foo( i,x ); } SERIAL VERSION void ParallelApplyFoo(size_t n, int x) { parallel_for( blocked_range<size_t>(0,n,10),  <>(const blocked_range<size_t>& range) { for(size_t i=range.begin(); i<range.end(); ++i)  Foo(i,x); } ); } PARALLEL VERSION (the way I wish I could write it)
Parallel Version  (as it can be written) class ApplyFoo { public: int my_x; ApplyFoo( int x ) : my_x(x) {} void operator()(const blocked_range<size_t>& range) const { for(size_t i=range.begin(); i!=range.end(); ++i)  Foo(i,my_x); } }; void ParallelApplyFoo(size_t n, int x) { parallel_for(blocked_range<size_t>(0,n,10),  ApplyFoo(x)); }
Underlying concepts
Generic Programming ,[object Object],[object Object],[object Object],[object Object],[object Object]
Key Features of Intel Threading Building Blocks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Relaxed Sequential Semantics ,[object Object],[object Object]
Synchronization Primitives atomic, spin_mutex, spin_rw_mutex, queuing_mutex, queuing_rw_mutex, mutex Generic Parallel Algorithms parallel_for parallel_while parallel_reduce pipeline parallel_sort parallel_scan Concurrent Containers concurrent_hash_map concurrent_queue concurrent_vector Task scheduler Memory Allocation cache_aligned_allocator scalable_allocator
Serial Example static void SerialUpdateVelocity() { for( int i=1; i<UniverseHeight-1; ++i ) for( int j=1; j<UniverseWidth-1; ++j )  V[i][j] += (S[i][j] - S[i][j-1] + T[i][j] - T[i-1][j])*M[i]; }
Parallel Version blue = original code red = provided by TBB black = boilerplate for library struct UpdateVelocityBody { void operator()( const   blocked_range <int>& range ) const { int end =  range.end (); for( int i=   range.begin ();  i<end; ++i ) {   for( int j=1; j<UniverseWidth-1; ++j ) { V[i][j] += (S[i][j] - S[i][j-1] + T[i][j] - T[i-1][j])*M[i]; } } } void ParallelUpdateVelocity() { parallel_for (   blocked_range<int> (  1, UniverseHeight-1),    UpdateVelocityBody(),  auto_partitioner()  ); } Task Parallel control structure Task subdivision handler
Range is Generic ,[object Object],[object Object],[object Object],[object Object],Destructor R::~R() True if range is empty bool R::empty() const True if range can be partitioned bool R::is_divisible() const Split  r  into two subranges R::R (R& r,  split ) Copy constructor R::R (const  R&)
parallel_reduce parallel_scan parallel_while parallel_sort pipeline
Parallel  pipeline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parallel  pipeline Parallel stage scales because it can process items in parallel or out of order.  Serial stage processes items one at a time in order. Another serial stage. Items wait for turn in serial stage Controls excessive parallelism by limiting total number of items flowing through pipeline. Uses sequence numbers recover order for serial stage. Tag incoming items with sequence numbers Throughput limited by throughput  of slowest serial stage. 1 3 2 4 5 6 7 8 9 10 11 12
Concurrent Containers, Mutual Exclusion Memory Allocator Task Scheduler
Concurrent Containers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Concurrent interface requirements ,[object Object],[object Object],[object Object],extern std::queue q; if(!q.empty()) { item=q.front();  q.pop(); } At this instant, another thread  might pop last element.
concurrent_vector <T> ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],// Append sequence [begin,end) to x in thread-safe way. template<typename T> void Append( concurrent_vector <T> &x, const T *begin, const T *end ) { std::copy(begin, end, x.begin() + x. grow_by (end-begin) ) } Example
concurrent_queue <T> ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
concurrent_hash <Key,T,HashCompare> ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
struct MyHashCompare { static long  hash ( const char* x ) { long h = 0; for( const char* s = x; *s; s++ ) h = (h*157)^*s; return h; } static bool  equal ( const char* x, const char* y ) { return strcmp(x,y)==0; } }; typedef  concurrent_hash_map <const char*,int,MyHashCompare> StringTable; StringTable MyTable; Example: map strings to integers void MyUpdateCount( const char* x ) { StringTable::accessor  a; MyTable. insert ( a, x ); a->second += 1; } Multiple threads can insert and update entries concurrently. accessor object acts as smart pointer and writer lock.
Synchronization Primitives ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: spin_rw_mutex promotion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scalable Memory Allocator ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Task Scheduler ,[object Object],[object Object],[object Object],TBB Approach Problem Task chunking and work-stealing help balance load Load imbalance Programmer specifies tasks, not threads. Program complexity “ Greedy” scheduling often wins “ Fair” scheduling One scheduler thread per hardware thread Oversubscription
Task Scheduler ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],#include “tbb/task_scheduler_init.h” using namespace tbb; int main() { task_scheduler_init  init; … . return 0; }
Tasking Development tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
open source tour
Open Source – quick ‘tour’ ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Coding with TBB Contest ,[object Object],[object Object],[object Object],[object Object]
Learn more… ,[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

Implementation of “Parma Polyhedron Library”-functions in MATLAB
Implementation of “Parma Polyhedron Library”-functions in MATLABImplementation of “Parma Polyhedron Library”-functions in MATLAB
Implementation of “Parma Polyhedron Library”-functions in MATLAB
Leo Asselborn
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
PyData
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
Linaro
 

Mais procurados (20)

Конверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемыеКонверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемые
 
C optimization notes
C optimization notesC optimization notes
C optimization notes
 
Re-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for XtextRe-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for Xtext
 
Code GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory SubsystemCode GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory Subsystem
 
Implementation of “Parma Polyhedron Library”-functions in MATLAB
Implementation of “Parma Polyhedron Library”-functions in MATLABImplementation of “Parma Polyhedron Library”-functions in MATLAB
Implementation of “Parma Polyhedron Library”-functions in MATLAB
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
 
Async await in C++
Async await in C++Async await in C++
Async await in C++
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
 
Return Oriented Programming
Return Oriented ProgrammingReturn Oriented Programming
Return Oriented Programming
 
An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loops
 
ADMS'13 High-Performance Holistic XML Twig Filtering Using GPUs
ADMS'13  High-Performance Holistic XML Twig Filtering Using GPUsADMS'13  High-Performance Holistic XML Twig Filtering Using GPUs
ADMS'13 High-Performance Holistic XML Twig Filtering Using GPUs
 
OpenMP And C++
OpenMP And C++OpenMP And C++
OpenMP And C++
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
tokyotalk
tokyotalktokyotalk
tokyotalk
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
A closure ekon16
A closure ekon16A closure ekon16
A closure ekon16
 
Numba
NumbaNumba
Numba
 

Destaque

Destaque (6)

Scale Up Performance with Intel® Development
Scale Up Performance with Intel® DevelopmentScale Up Performance with Intel® Development
Scale Up Performance with Intel® Development
 
Starting cilk development on windows
Starting cilk development on windowsStarting cilk development on windows
Starting cilk development on windows
 
Develop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster ReadyDevelop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster Ready
 
Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...
Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...
Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...
 
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
 
Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...
 

Semelhante a Os Reindersfinal

Writing a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdfWriting a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdf
RomanKhavronenko
 
The D Programming Language - Why I love it!
The D Programming Language - Why I love it!The D Programming Language - Why I love it!
The D Programming Language - Why I love it!
ryutenchi
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
Rust All Hands Winter 2011
Rust All Hands Winter 2011Rust All Hands Winter 2011
Rust All Hands Winter 2011
Patrick Walton
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
 

Semelhante a Os Reindersfinal (20)

ParaSail
ParaSail  ParaSail
ParaSail
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Modern C++
Modern C++Modern C++
Modern C++
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 
A Survey of Concurrency Constructs
A Survey of Concurrency ConstructsA Survey of Concurrency Constructs
A Survey of Concurrency Constructs
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environments
 
Clojure concurrency
Clojure concurrencyClojure concurrency
Clojure concurrency
 
Lockless
LocklessLockless
Lockless
 
Writing a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdfWriting a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdf
 
Kotlin coroutines and spring framework
Kotlin coroutines and spring frameworkKotlin coroutines and spring framework
Kotlin coroutines and spring framework
 
Concurrency Constructs Overview
Concurrency Constructs OverviewConcurrency Constructs Overview
Concurrency Constructs Overview
 
The D Programming Language - Why I love it!
The D Programming Language - Why I love it!The D Programming Language - Why I love it!
The D Programming Language - Why I love it!
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Java concurrency
Java concurrencyJava concurrency
Java concurrency
 
C language
C languageC language
C language
 
Rust All Hands Winter 2011
Rust All Hands Winter 2011Rust All Hands Winter 2011
Rust All Hands Winter 2011
 
C language
C languageC language
C language
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 

Mais de oscon2007

J Ruby Whirlwind Tour
J Ruby Whirlwind TourJ Ruby Whirlwind Tour
J Ruby Whirlwind Tour
oscon2007
 
Solr Presentation5
Solr Presentation5Solr Presentation5
Solr Presentation5
oscon2007
 
Os Fitzpatrick Sussman Wiifm
Os Fitzpatrick Sussman WiifmOs Fitzpatrick Sussman Wiifm
Os Fitzpatrick Sussman Wiifm
oscon2007
 
Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
oscon2007
 
Os Lanphier Brashears
Os Lanphier BrashearsOs Lanphier Brashears
Os Lanphier Brashears
oscon2007
 
Os Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman SwpOs Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman Swp
oscon2007
 
Os Berlin Dispelling Myths
Os Berlin Dispelling MythsOs Berlin Dispelling Myths
Os Berlin Dispelling Myths
oscon2007
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
oscon2007
 
Os Jonphillips
Os JonphillipsOs Jonphillips
Os Jonphillips
oscon2007
 
Os Urnerupdated
Os UrnerupdatedOs Urnerupdated
Os Urnerupdated
oscon2007
 

Mais de oscon2007 (20)

J Ruby Whirlwind Tour
J Ruby Whirlwind TourJ Ruby Whirlwind Tour
J Ruby Whirlwind Tour
 
Solr Presentation5
Solr Presentation5Solr Presentation5
Solr Presentation5
 
Os Borger
Os BorgerOs Borger
Os Borger
 
Os Harkins
Os HarkinsOs Harkins
Os Harkins
 
Os Fitzpatrick Sussman Wiifm
Os Fitzpatrick Sussman WiifmOs Fitzpatrick Sussman Wiifm
Os Fitzpatrick Sussman Wiifm
 
Os Bunce
Os BunceOs Bunce
Os Bunce
 
Yuicss R7
Yuicss R7Yuicss R7
Yuicss R7
 
Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
 
Os Fogel
Os FogelOs Fogel
Os Fogel
 
Os Lanphier Brashears
Os Lanphier BrashearsOs Lanphier Brashears
Os Lanphier Brashears
 
Os Tucker
Os TuckerOs Tucker
Os Tucker
 
Os Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman SwpOs Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman Swp
 
Os Furlong
Os FurlongOs Furlong
Os Furlong
 
Os Berlin Dispelling Myths
Os Berlin Dispelling MythsOs Berlin Dispelling Myths
Os Berlin Dispelling Myths
 
Os Kimsal
Os KimsalOs Kimsal
Os Kimsal
 
Os Pruett
Os PruettOs Pruett
Os Pruett
 
Os Alrubaie
Os AlrubaieOs Alrubaie
Os Alrubaie
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
 
Os Jonphillips
Os JonphillipsOs Jonphillips
Os Jonphillips
 
Os Urnerupdated
Os UrnerupdatedOs Urnerupdated
Os Urnerupdated
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Os Reindersfinal

  • 1.
  • 2.
  • 3.
  • 4.
  • 5. Family Tree 1988 2001 2006 1995 Languages *Other names and brands may be claimed as the property of others Cilk space efficient scheduler cache-oblivious algorithms Threaded-C continuation tasks task stealing OpenMP* fork/join tasks OpenMP taskqueue while & recursion Pragmas Chare Kernel small tasks JSR-166 (FJTask) containers Intel® Threading Building Blocks STL generic programming STAPL recursive ranges ECMA .NET* parallel iteration classes Libraries
  • 6.
  • 7. Example (just a peek) void ApplyFoo( size_t n , int x ) { for(size_t i=range_begin; i<range_end; ++i) Foo( i,x ); } SERIAL VERSION void ParallelApplyFoo(size_t n, int x) { parallel_for( blocked_range<size_t>(0,n,10), <>(const blocked_range<size_t>& range) { for(size_t i=range.begin(); i<range.end(); ++i) Foo(i,x); } ); } PARALLEL VERSION (the way I wish I could write it)
  • 8. Parallel Version (as it can be written) class ApplyFoo { public: int my_x; ApplyFoo( int x ) : my_x(x) {} void operator()(const blocked_range<size_t>& range) const { for(size_t i=range.begin(); i!=range.end(); ++i) Foo(i,my_x); } }; void ParallelApplyFoo(size_t n, int x) { parallel_for(blocked_range<size_t>(0,n,10), ApplyFoo(x)); }
  • 10.
  • 11.
  • 12.
  • 13. Synchronization Primitives atomic, spin_mutex, spin_rw_mutex, queuing_mutex, queuing_rw_mutex, mutex Generic Parallel Algorithms parallel_for parallel_while parallel_reduce pipeline parallel_sort parallel_scan Concurrent Containers concurrent_hash_map concurrent_queue concurrent_vector Task scheduler Memory Allocation cache_aligned_allocator scalable_allocator
  • 14. Serial Example static void SerialUpdateVelocity() { for( int i=1; i<UniverseHeight-1; ++i ) for( int j=1; j<UniverseWidth-1; ++j ) V[i][j] += (S[i][j] - S[i][j-1] + T[i][j] - T[i-1][j])*M[i]; }
  • 15. Parallel Version blue = original code red = provided by TBB black = boilerplate for library struct UpdateVelocityBody { void operator()( const blocked_range <int>& range ) const { int end = range.end (); for( int i= range.begin (); i<end; ++i ) { for( int j=1; j<UniverseWidth-1; ++j ) { V[i][j] += (S[i][j] - S[i][j-1] + T[i][j] - T[i-1][j])*M[i]; } } } void ParallelUpdateVelocity() { parallel_for ( blocked_range<int> ( 1, UniverseHeight-1), UpdateVelocityBody(), auto_partitioner() ); } Task Parallel control structure Task subdivision handler
  • 16.
  • 18.
  • 19. Parallel pipeline Parallel stage scales because it can process items in parallel or out of order. Serial stage processes items one at a time in order. Another serial stage. Items wait for turn in serial stage Controls excessive parallelism by limiting total number of items flowing through pipeline. Uses sequence numbers recover order for serial stage. Tag incoming items with sequence numbers Throughput limited by throughput of slowest serial stage. 1 3 2 4 5 6 7 8 9 10 11 12
  • 20. Concurrent Containers, Mutual Exclusion Memory Allocator Task Scheduler
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. struct MyHashCompare { static long hash ( const char* x ) { long h = 0; for( const char* s = x; *s; s++ ) h = (h*157)^*s; return h; } static bool equal ( const char* x, const char* y ) { return strcmp(x,y)==0; } }; typedef concurrent_hash_map <const char*,int,MyHashCompare> StringTable; StringTable MyTable; Example: map strings to integers void MyUpdateCount( const char* x ) { StringTable::accessor a; MyTable. insert ( a, x ); a->second += 1; } Multiple threads can insert and update entries concurrently. accessor object acts as smart pointer and writer lock.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 34.
  • 35.
  • 36.