Results 1 - 10
of
13
Symbolic Bounds Analysis of Pointers, Array Indices, and Accessed Memory Regions
- PLDI 2000
, 2000
"... This paper presents a novel framework for the symbolic bounds analysis of pointers, array indices, and accessed memory regions. Our framework formulates each analysis problem as a system of inequality constraints between symbolic bound polynomials. It then reduces the constraint system to a linear p ..."
Abstract
-
Cited by 100 (14 self)
- Add to MetaCart
This paper presents a novel framework for the symbolic bounds analysis of pointers, array indices, and accessed memory regions. Our framework formulates each analysis problem as a system of inequality constraints between symbolic bound polynomials. It then reduces the constraint system to a linear program. The solution to the linear program provides symbolic lower and upper bounds for the values of pointer and array index variables and for the regions of memory that each statement and procedure accesses. This approach eliminates fundamental problems associated with applying standard xed-point approaches to symbolic analysis problems. Experimental results from our implemented compiler show that the analysis can solve several important problems, including static race detection, automatic parallelization, static detection of array bounds violations, elimination of array bounds checks, and reduction of the number of bits used to store computed values.
Analyses of Load Stealing Models Based on Differential Equations
- In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1998
"... In this paper we develop models for and analyze several randomized work stealing algorithms in a dynamic setting. Our models represent the limiting behavior of systems as the number of processors grows to infinity using differential equations. The advantages of this approach include the ability to m ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
In this paper we develop models for and analyze several randomized work stealing algorithms in a dynamic setting. Our models represent the limiting behavior of systems as the number of processors grows to infinity using differential equations. The advantages of this approach include the ability to model a large variety of systems and to provide accurate numerical approximations of system behavior even when the number of processors is relatively small. We show how this approach can yield significant intuition about the behavior of work stealing algorithms in realistic settings.
Satin: Efficient Parallel Divide-and-Conquer
- in Java, in: Euro-PAR 2000, no. 1900 in Lecture Notes in Computer Science
, 2000
"... Satin is a system for running divide and conquer programs on distributed memory systems (and ultimately on wide-area metacomputing systems). Satin extends Java with three simple Cilk-like primitives for divide and conquer programming. The Satin compiler and runtime system cooperate to implement th ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Satin is a system for running divide and conquer programs on distributed memory systems (and ultimately on wide-area metacomputing systems). Satin extends Java with three simple Cilk-like primitives for divide and conquer programming. The Satin compiler and runtime system cooperate to implement these primitives eciently on a distributed system, using work stealing to distribute the jobs. Satin optimizes the overhead of local jobs using on-demand serialization, which avoids copying and serialization of parameters for jobs that are not stolen. This optimization is implemented using explicit invocation records. We have implemented Satin by extending the Manta compiler. We discuss the performance of ten applications on a Myrinet-based cluster.
Virtual Data Space - Load Balancing for Irregular Applications
- Parallel Computing
, 2000
"... Load balancing is a key issue in the development of parallel algorithms with irregular structures. Existing load balancing systems each support only one specific programming paradigm and thus are of limited use. The system VDS presented here allows concurrent use of various paradigms such as fork-jo ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Load balancing is a key issue in the development of parallel algorithms with irregular structures. Existing load balancing systems each support only one specific programming paradigm and thus are of limited use. The system VDS presented here allows concurrent use of various paradigms such as fork-join, weighted tasks, and static dags (directed acyclic graphs that are known in advance). The system provides visual performance evaluation tools to facilitate the efficient application of the system. VDS supports various communication interfaces including PVM and MPI. Thus, VDS-applications can be run on architectures ranging from workstation clusters to massively parallel systems.
Compact DAG Representation and its Dynamic Scheduling
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1999
"... Scheduling large task graphs is an important issue in parallel computing. In this paper we tackle the two following problems : (1) how to schedule a task graph, when it is too large to t into memory? (2) How to build a generic program such that parameter values of a task graph can be given at run-ti ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Scheduling large task graphs is an important issue in parallel computing. In this paper we tackle the two following problems : (1) how to schedule a task graph, when it is too large to t into memory? (2) How to build a generic program such that parameter values of a task graph can be given at run-time? Our answers feature the parameterized task graph (PTG), which is a symbolic representation of the task graph. We propose a dynamic scheduling algorithm which takes a PTG as an entry and allows to generate a generic program. We present a theoritical study which shows that our algorithm nds good schedules for coarse grain task graphs, has a very low memory cost and a low computational complexity. When the average number of operations of each task is large enough, we prove that the scheduling overhead is negligible with respect to the makespan. We also provide experimental results that demonstrate the feasibility of our approach using several compute-intensive kernels found in numerical s...
Online Scheduling of Parallel Programs on Heterogeneous Systems with Applications to Cilk
- Theory of Computing Systems Special Issue on SPAA
, 2002
"... We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.
Thread Migration in a Parallel Graph Reducer
- In IFL’02 — Intl. Workshop on the Implementation of Functional Languages. Springer-Verlag, LNCS 2670
, 2002
"... To support high level coordination, parallel functional languages need eective and automatic work distribution mechanisms. Many implementations distribute potential work, i.e. sparks or closures, but there is good evidence that the performance of certain classes of program can be improved if cur ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
To support high level coordination, parallel functional languages need eective and automatic work distribution mechanisms. Many implementations distribute potential work, i.e. sparks or closures, but there is good evidence that the performance of certain classes of program can be improved if current work, or threads, are also distributed.
Using Cilk to Write Multiprocessor Chess Programs
- The Journal of the International Computer Chess Association
, 2001
"... This paper overviews the Cilk language, illustrating how Cilk supports the programming of parallel game-tree search and other chess mechanisms ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper overviews the Cilk language, illustrating how Cilk supports the programming of parallel game-tree search and other chess mechanisms
A Multi-Threaded Runtime System for a Multi-Processor/Multi-Node Cluster
- Master’s thesis, Univ. of
, 2001
"... We designed and implemented an EARTH (Ecient Architecture for Running THreads) runtime system for a multi-processor/multi-node, cluster. For portability, we built this runtime system on top of Pthreads under Linux. This implementation enables the overlapping of communication and computation on a clu ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We designed and implemented an EARTH (Ecient Architecture for Running THreads) runtime system for a multi-processor/multi-node, cluster. For portability, we built this runtime system on top of Pthreads under Linux. This implementation enables the overlapping of communication and computation on a cluster of Symmetric Multi-Processors (SMP), and lets the interruptions generated by the arrival of new data drive the system, rather than relying on network polling. We describe how our implementation of a multi-threading model on a multi-processor/multi-node system arranges the execution and the synchronization activities to make the best use of the resources available, and how the interaction between the local processing and the network activities are organized.
Efficient Scheduling of Strict Multithreaded Computations
, 1999
"... In this paper we study the problem of eciently scheduling a wide class of multithreaded computations, called strict; that is, computations in which all dependencies from a thread go to the thread's ancestors in the computation tree. Strict multithreaded computations allow the limited use of synch ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper we study the problem of eciently scheduling a wide class of multithreaded computations, called strict; that is, computations in which all dependencies from a thread go to the thread's ancestors in the computation tree. Strict multithreaded computations allow the limited use of synchronization primitives. We present the rst fully distributed scheduling algorithm which applies to any strict multithreaded computation. The algorithm is asynchronous, on-line and follows the work-stealing paradigm. We prove that our algorithm is ecient not only in terms of its memory requirements and its execution time, but also in terms of its communication complexity. Our analysis applies to both shared and distributed memory machines. More specically, the expected execution time of our algorithm is O(T 1 =P +hT1 ), where T 1 is the minimum serial execution time, T1 is the minimum execution time with an innite number of processors, P is the number of processors and h is the maxi...

