10. Task sharing Check condition Work done? Done Task Set Acquire Task Try to get task Task Got task? No, retry Task Task Perform task Task New tasks? No, continue Add Task Task Add task
11. System Model CUDA Global Memory Gather and scatter Compare-And-Swap Fetch-And-Inc Multiprocessors Maximum number ofconcurrent thread blocks Global Memory Multi-processor Multi-processor Multi-processor Thread Block Thread Block Thread Block Thread Block Thread Block Thread Block Thread Block Thread Block Thread Block
12. Synchronization Blocking Uses mutual exclusion to only allow one process at a time to access the object. Lockfree Multiple processes can access the object concurrently. At least one operation in a set of concurrent operations finishes in a finite number of its own steps. Waitfree Multiple processes can access the object concurrently. Every operation finishes in a finite number of its own steps.
19. Non-blocking Queue TB 1 TB 1 Head TB 2 TB 2 T1 T2 T3 T4 Tail TB n Reference P. Tsigas and Y. Zhang, A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems[SPAA01]
25. Task stealing T1 TB 1 T3 T2 TB 2 TB n Reference Arora N. S., Blumofe R. D., Plaxton C. G. , Thread Scheduling for Multiprogrammed Multiprocessors [SPAA 98]
55. Previous work Korch M., Raubert T., A comparison of task pools for dynamic load balancing of irregular algorithms, Concurrency and Computation: Practice & Experience, 16, 2003 Heirich A., Arvo J., A competetive analysis of load balancing strategies for parallel ray tracing, Journal of Supercomputing, 12, 1998 Foley T., Sugerman J., KD-tree acceleration structures for a GPU raytracer, Graphics Hardware 2005
56. Conclusion Synchronization plays a significant role in dynamic load-balancing Lock-free data structures/synchronization scales well and looks promising also in the GPU general purpose programming Locks perform poorly It is good that operations such as CAS and FAA have been introduced in the new GPUs Work stealing could outperform static load balancing