Results 1 - 10
of
80
DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors
- IEEE Transactions on Parallel and Distributed Systems
"... We present a low complexity heuristic named the Dominant Sequence Clustering algorithm (DSC) for scheduling parallel tasks on an unbounded number of completely connected processors. The performance of DSC is comparable or even better on average than many other higher complexity algorithms. We assume ..."
Abstract
-
Cited by 151 (9 self)
- Add to MetaCart
We present a low complexity heuristic named the Dominant Sequence Clustering algorithm (DSC) for scheduling parallel tasks on an unbounded number of completely connected processors. The performance of DSC is comparable or even better on average than many other higher complexity algorithms. We assume no task duplication and nonzero communication overhead between processors. Finding the optimum solution for arbitrary directed acyclic task graphs (DAGs) is NP-complete. DSC finds optimal schedules for special classes of DAGs such as fork, join, coarse grain trees and some fine grain trees. It guarantees a performance within a factor of two of the optimum for general coarse grain DAGs. We compare DSC with three higher complexity general scheduling algorithms, the MD by Wu and Gajski [19], the ETF by Hwang, Chow, Anger and Lee [12] and Sarkar's clustering algorithm [17]. We also give a sample of important practical applications where DSC has been found useful. Index Terms -- Clustering, dire...
On The Granularity And Clustering Of Directed Acyclic Task Graphs
- IEEE Transactions on Parallel and Distributed Systems
, 1990
"... Clustering has been used as a compile time pre-processing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definition, t ..."
Abstract
-
Cited by 92 (20 self)
- Add to MetaCart
Clustering has been used as a compile time pre-processing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definition, the impact of the granularity on clustering strategies is analyzed. A clustering is called linear if every cluster is one simple directed path in the task graph; otherwise is called nonlinear. For coarse grain directed acyclic task graphs (DAGs), a completely connected architecture with unbounded number of processors and under the assumption that task duplication is not allowed, the following property is shown: For every nonlinear clustering there exists a linear clustering with less or equal parallel time. This property, along with a performance bound for linear clustering algorithms, shows that linear clustering is the best choice for coarse grain DAGs. It provides a theoretical justificati...
Determining Average Program Execution Times and their Variance
, 1989
"... This paper presents a general framework for determining average program execution times and their variance, based on the program's interval structure and control dependence graph. Average execution times and variance values are computed using frequency information from an optimized counter-based exe ..."
Abstract
-
Cited by 84 (0 self)
- Add to MetaCart
This paper presents a general framework for determining average program execution times and their variance, based on the program's interval structure and control dependence graph. Average execution times and variance values are computed using frequency information from an optimized counter-based execution profile of the program. 1 Introduction It is important for a compiler to obtain estimates of execution times for subcomputations of an input program, if it is to attempt optimizations related to overhead values in the target architecture. In earlier work [SH86a, SH86b, Sar87, Sar89], we used estimates of execution times to facilitate the automatic partitioning and scheduling of programs written in the singleassignment language, Sisal, for parallel execution on multiprocessors. In this paper, we present a general framework for estimating average execution times in a program. This approach is based on the interval structure [ASU86] and the control dependence relation [FOW87], both of w...
PYRROS: Static Task Scheduling and Code Generation for Message Passing Multiprocessors
- The 6th ACM Int'l Conf. on Supercomputing
, 1992
"... We describe a parallel programming tool for scheduling static task graphs and generating the appropriate target code for message passing MIMD architectures. The computational complexity of the system is almost linear to the size of the task graph and preliminary experiments show performance comparab ..."
Abstract
-
Cited by 81 (21 self)
- Add to MetaCart
We describe a parallel programming tool for scheduling static task graphs and generating the appropriate target code for message passing MIMD architectures. The computational complexity of the system is almost linear to the size of the task graph and preliminary experiments show performance comparable to the "best" hand-written programs. 1 Introduction In this paper, we consider static scheduling and code generation for message passing architectures. There are generally three distinct ways in addressing the programming difficulties for distributed memory architectures. The first approach considers the problem of automatic parallelization and scheduling from sequential programs. The emphasis has been in the development of compilers or software tools that will assist in programming parallel architectures [2, 16, 18, 19]. Since message passing architectures require coarse grain parallelism to be efficient, one difficulty is the identification of parallelism especially at the procedural ...
A Comparison of Multiprocessor Scheduling Heuristics
- In Proceedings of the 1994 International Conference on Parallel Processing, volume II
, 1994
"... Many algorithms for scheduling DAGs on multiprocessors have been proposed, but there has been little work done to determine their effectiveness. Since multi-processor scheduling is an NP-hard problem, no exact tractible algorithm exists, and no baseline is available from which to compare the resulti ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
Many algorithms for scheduling DAGs on multiprocessors have been proposed, but there has been little work done to determine their effectiveness. Since multi-processor scheduling is an NP-hard problem, no exact tractible algorithm exists, and no baseline is available from which to compare the resulting schedules. Furthermore, performance guarantees have been found for only a few simple DAGs. This paper is an attempt to quantify the differences in five of the heuristics. Classification criteria are defined for the DAGs, and the differences between the heuristics are noted for various criteria. The comparison is made between a graph based method, two critical path methods, and two list scheduling heuristics. The empirical performance of the five heuristics is compared when they are applied to the randomly generated DAGs. This work is supported by NSF grant number CCR-9203319 0 1 Introduction One of the primary problems in executing programs efficiently on multiprocessor systems with ...
List Scheduling with and without Communication Delays
- Parallel Computing
, 1993
"... Empirical results have shown that the classical critical path (CP) list scheduling heuristic for task graphs is a fast and practical heuristic when communication cost is zero. In the first part of this paper we study the theoretical properties of the CP heuristic that lead to near optimum performanc ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
Empirical results have shown that the classical critical path (CP) list scheduling heuristic for task graphs is a fast and practical heuristic when communication cost is zero. In the first part of this paper we study the theoretical properties of the CP heuristic that lead to near optimum performance in practice. In the second part we extend the CP analysis to the problem of ordering the task execution when the processor assignment is given and communication cost is nonzero. We propose two new list scheduling heuristics, the RCP and RCP 3 that use critical path information and ready list priority scheduling. We show that the performance properties for RCP and RCP 3 , when communication is nonzero, are similar to CP when communication is zero. Finally, we present an extensive experimental study and optimality analysis of the heuristics which verifies our theoretical results. 1 Introduction The processor scheduling problem is of considerable importance in parallel processing. Given a...
A Fast Static Scheduling Algorithm for DAGs on an Unbounded Number of Processors
, 1991
"... Scheduling parallel tasks on an unbounded number of completely connected processors when communication overhead is taken into account is NP-complete. Assuming that task duplication is not allowed, we propose a fast heuristic algorithm, called the dominant sequence clustering algorithm (DSC), for thi ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Scheduling parallel tasks on an unbounded number of completely connected processors when communication overhead is taken into account is NP-complete. Assuming that task duplication is not allowed, we propose a fast heuristic algorithm, called the dominant sequence clustering algorithm (DSC), for this scheduling problem. The DSC algorithm is superior to several other algorithms from the literature in terms of both computational complexity and parallel time. We present experimental results for scheduling general directed acyclic task graphs (DAGs) and compare the performance of several algorithms. Moreover, we show that DSC is optimum for special classes of DAGs such as join, fork and coarse grain tree graphs. 1 Introduction Scheduling parallel tasks with precedence relations over distributed memory multiprocessors has been found to be much more difficult than the classical scheduling problem, see Graham [14] and Lenstra and Kan [15]. This is because data transferring between processor...
Clustering Task Graphs for Message Passing Architectures
- Proceedings of ACM International Conference on Supercomputing
, 1990
"... Clustering is a mapping of the nodes of a task graph onto labeled clusters. We present a unified framework for clustering of directed acyclic graphs (DAGs). Several clustering algorithms from the literature are compared using this framework. For coarse grain DAGs two interesting properties are prese ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
Clustering is a mapping of the nodes of a task graph onto labeled clusters. We present a unified framework for clustering of directed acyclic graphs (DAGs). Several clustering algorithms from the literature are compared using this framework. For coarse grain DAGs two interesting properties are presented. For every nonlinear clustering there exists a linear clustering whose parallel time is less than the nonlinear one. Furthermore, the parallel time of any linear clustering is within a factor of two of the optimal. Two clustering algorithms are presented with near linear time complexity for coarse grain DAGs. The conclusion is that linear clustering is an efficient and accurate operation. 1 Introduction Identification of parallelism, partitioning, clustering and scheduling are some of the major problems in parallel processing. Partitioning is used as a first step to scheduling and is defined as a mapping of the nodes of a data dependence graph (DDG) onto labeled tasks. The definition o...
A Comparison of Heuristics for Scheduling DAGs on Multiprocessors
- in Proceedings of the Eighth International Parallel Processing Symposium
, 1994
"... Many algorithms to schedule DAGs on multiprocessors have been proposed, but there has been little work done to determine their effectiveness. Since multi-processor scheduling is an NP-hard problem, no exact tractable algorithm exists, and no baseline is available from which to compare the resulting ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Many algorithms to schedule DAGs on multiprocessors have been proposed, but there has been little work done to determine their effectiveness. Since multi-processor scheduling is an NP-hard problem, no exact tractable algorithm exists, and no baseline is available from which to compare the resulting schedules. This paper is an attempt to quantify the differences in a few of the heuristics. The empiracle performance of five heuristics is compared when they are applied to ten specific DAGs which represent program dependence graphs of important applications. The comparison is made between a graph based method, a list scheduling technique and three critical path mathods. 1. Introduction One of the primary problems in creating efficient programs for multiprocessor systems with distributed memory is to partition the program into tasks that can be assigned to different processors for parallel execution. If a high degree of parallelism is the objective, a greater amount of communication will b...
Program and Data Transformations for Efficient Execution on Distributed Memory Architectures
, 1993
"... This report is concerned with the efficient execution of array computation on Distributed Memory Architectures by applying compiler-directed program and data transformations. By translating a sub-set of a single-assignment language, Sisal, into a linear algebraic framework it is possible to transfor ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
This report is concerned with the efficient execution of array computation on Distributed Memory Architectures by applying compiler-directed program and data transformations. By translating a sub-set of a single-assignment language, Sisal, into a linear algebraic framework it is possible to transform a program so as to reduce load imbalance and non-local memory access. A new test is presented which allows the construction of transformations to reduce load imbalance. By a new expression of data alignment, transformations to reduce non-local access are derived. A new pre-fetching procedure, which prevents redundant non-local accesses, is presented and forms the basis of a new data partitioning methodology. By applying these transformations in a straightforward manner to some well known scientific programs, it is shown that this approach is competitive with hand-crafted methods. Preface The author graduated from Aston University in 1987 with an upper second B.Sc.(Hons.) in Computationa...

