Results 1 -
5 of
5
Hierarchical work stealing on manycore clusters
- In Fifth Conference on Partitioned Global Address Space Programming Models
, 2011
"... Partitioned Global Address Space languages like UPC offer a convenient way of expressing large shared data structures, especially for irregular structures that require asynchronous random access. But the static SPMD parallelism model of UPC does not support divide and conquer parallelism or other fo ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Partitioned Global Address Space languages like UPC offer a convenient way of expressing large shared data structures, especially for irregular structures that require asynchronous random access. But the static SPMD parallelism model of UPC does not support divide and conquer parallelism or other forms of dynamic parallelism. We introduce a dynamic tasking library for UPC that provides a simple and effective way of adding task parallelism to SPMD programs. The task library, called HotSLAW, provides a high-level API that abstracts concurrent task management details and performs dynamic load balancing. To achieve scalability, we propose a topology-aware hierarchical work stealing strategy that exploits locality in distributed-memory clusters. Our approach, named HotSLAW, extends state of the art techniques in shared- and distributed-memory implementations with two mechanisms: Hierarchical Victim Selection (HVS) finds the nearest victim thread to preserve locality and Hierarchical Chunk Selection (HCS) dynamically determines the amount of work to steal based on the locality of the victim thread. We evaluate the performance of our runtime on shared- and distributed-memory systems using irregular applications. On shared memory, HotSLAW provides performance comparable or better than hand tuned OpenMP implementations. On distributed memory systems, the combination of Hierarchical Victim Selection and Hierarchical Chunk Selection provides better performance than state of the art approaches using a random victim selection with a StealHalf strategy for the workload considered. 1.
Selective Recovery From Failures In A Task Parallel Programming Model
"... Abstract—We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tracking mechanism.Compared withconventionalcheckpoint/restarttechniques, this system offers a recovery penalty that is pro ..."
Abstract
- Add to MetaCart
Abstract—We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tracking mechanism.Compared withconventionalcheckpoint/restarttechniques, this system offers a recovery penalty that is proportional to the degree of failure rather than the system size. We evaluate this system using the Self Consistent Field (SCF) kernel which forms an important component in ab initio methods for computational chemistry. Experimental results indicate that fault tolerant task pools are robust in the presence of an arbitrary number of failures and that they offer low overhead in the absence of faults. Keywords-Parallel processing, fault tolerance, task parallelism, Global Arrays, PGAS, selective recovery
Programming
, 2011
"... Creating efficient, scalable dynamic parallel runtime systems for chip multiprocessors (CMPs) requires understanding the overheads that manifest at high core counts and small task sizes. In this article, we assess these overheads on Intel’s Threading Building Blocks (TBB) and OpenMP. First, we use r ..."
Abstract
- Add to MetaCart
Creating efficient, scalable dynamic parallel runtime systems for chip multiprocessors (CMPs) requires understanding the overheads that manifest at high core counts and small task sizes. In this article, we assess these overheads on Intel’s Threading Building Blocks (TBB) and OpenMP. First, we use real hardware and simulations to detail various scheduler and synchronization overheads. We find that these can amount to 47 % of TBB benchmark runtime and 80 % of OpenMP benchmark runtime. Second, we propose load balancing techniques such as occupancy-based and criticality-guided task stealing, to boost performance. Overall, our study provides valuable insights for creating robust, scalable runtime libraries.
Faster Work Stealing With Return Barriers Vivek Kumar Australian National University
"... Work-stealing is a promising approach for effectively exploiting software parallelism on parallel hardware. A programmer who uses work-stealing explicitly identifies potential parallelism and the runtime then schedules work, keeping otherwise idle hardware busy while relieving overloaded hardware of ..."
Abstract
- Add to MetaCart
Work-stealing is a promising approach for effectively exploiting software parallelism on parallel hardware. A programmer who uses work-stealing explicitly identifies potential parallelism and the runtime then schedules work, keeping otherwise idle hardware busy while relieving overloaded hardware of its burden. However, work-stealing comes with substantial overheads. Our prior work demonstrates that using the exception handling mechanism of modern VMs and gathering the runtime information directly from the victim’s execution stack can significantly reduce these overheads. In this paper we identify the overhead associated with managing the work-stealing related information on a victim’s execution stack. A return barrier is a mechanism for intercepting the popping of a stack frame, and thus is a powerful tool for optimizing mechanisms that involve scanning of stack state. We present the design and preliminary findings of using return barriers on a victim’s execution stack to reduce these overheads. We evaluate our design using classical work-stealing benchmarks. On these benchmarks, compared to our prior design, we are able to reduce the overheads by as much as 58%. These preliminary findings give further hope to an already promising technique of harnessing rich features of a modern VM inside a work-stealing scheduler.

