Results 1 
5 of
5
Online Scheduling of Parallel Programs on Heterogeneous Systems with Applications to Cilk
 Theory of Computing Systems Special Issue on SPAA
, 2002
"... We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.
Scheduling Cilk Multithreaded Parallel Programs on Processors of Different Speeds
 In Proceedings of the 12th Annual Symposium on Parallel Algorithms and Architectures
, 2000
"... We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and de ne the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it kee...
Using Cilk to Write Multiprocessor Chess Programs
 The Journal of the International Computer Chess Association
, 2001
"... This paper overviews the Cilk language, illustrating how Cilk supports the programming of parallel gametree search and other chess mechanisms ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
This paper overviews the Cilk language, illustrating how Cilk supports the programming of parallel gametree search and other chess mechanisms
A MultiThreaded Runtime System for a MultiProcessor/MultiNode Cluster
 Masterâ€™s thesis, Univ. of
, 2001
"... We designed and implemented an EARTH (Ecient Architecture for Running THreads) runtime system for a multiprocessor/multinode, cluster. For portability, we built this runtime system on top of Pthreads under Linux. This implementation enables the overlapping of communication and computation on a clu ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We designed and implemented an EARTH (Ecient Architecture for Running THreads) runtime system for a multiprocessor/multinode, cluster. For portability, we built this runtime system on top of Pthreads under Linux. This implementation enables the overlapping of communication and computation on a cluster of Symmetric MultiProcessors (SMP), and lets the interruptions generated by the arrival of new data drive the system, rather than relying on network polling. We describe how our implementation of a multithreading model on a multiprocessor/multinode system arranges the execution and the synchronization activities to make the best use of the resources available, and how the interaction between the local processing and the network activities are organized.
Efficient Scheduling of Strict Multithreaded Computations
, 1999
"... In this paper we study the problem of eciently scheduling a wide class of multithreaded computations, called strict; that is, computations in which all dependencies from a thread go to the thread's ancestors in the computation tree. Strict multithreaded computations allow the limited use of ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this paper we study the problem of eciently scheduling a wide class of multithreaded computations, called strict; that is, computations in which all dependencies from a thread go to the thread's ancestors in the computation tree. Strict multithreaded computations allow the limited use of synchronization primitives. We present the rst fully distributed scheduling algorithm which applies to any strict multithreaded computation. The algorithm is asynchronous, online and follows the workstealing paradigm. We prove that our algorithm is ecient not only in terms of its memory requirements and its execution time, but also in terms of its communication complexity. Our analysis applies to both shared and distributed memory machines. More specically, the expected execution time of our algorithm is O(T 1 =P +hT1 ), where T 1 is the minimum serial execution time, T1 is the minimum execution time with an innite number of processors, P is the number of processors and h is the maxi...