Results 1  10
of
106
DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors
 IEEE Transactions on Parallel and Distributed Systems
"... We present a low complexity heuristic named the Dominant Sequence Clustering algorithm (DSC) for scheduling parallel tasks on an unbounded number of completely connected processors. The performance of DSC is comparable or even better on average than many other higher complexity algorithms. We assume ..."
Abstract

Cited by 166 (9 self)
 Add to MetaCart
We present a low complexity heuristic named the Dominant Sequence Clustering algorithm (DSC) for scheduling parallel tasks on an unbounded number of completely connected processors. The performance of DSC is comparable or even better on average than many other higher complexity algorithms. We assume no task duplication and nonzero communication overhead between processors. Finding the optimum solution for arbitrary directed acyclic task graphs (DAGs) is NPcomplete. DSC finds optimal schedules for special classes of DAGs such as fork, join, coarse grain trees and some fine grain trees. It guarantees a performance within a factor of two of the optimum for general coarse grain DAGs. We compare DSC with three higher complexity general scheduling algorithms, the MD by Wu and Gajski [19], the ETF by Hwang, Chow, Anger and Lee [12] and Sarkar's clustering algorithm [17]. We also give a sample of important practical applications where DSC has been found useful. Index Terms  Clustering, dire...
On The Granularity And Clustering Of Directed Acyclic Task Graphs
 IEEE Transactions on Parallel and Distributed Systems
, 1990
"... Clustering has been used as a compile time preprocessing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definiti ..."
Abstract

Cited by 99 (20 self)
 Add to MetaCart
Clustering has been used as a compile time preprocessing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definition, the impact of the granularity on clustering strategies is analyzed. A clustering is called linear if every cluster is one simple directed path in the task graph; otherwise is called nonlinear. For coarse grain directed acyclic task graphs (DAGs), a completely connected architecture with unbounded number of processors and under the assumption that task duplication is not allowed, the following property is shown: For every nonlinear clustering there exists a linear clustering with less or equal parallel time. This property, along with a performance bound for linear clustering algorithms, shows that linear clustering is the best choice for coarse grain DAGs. It provides a theoretical justificati...
Determining Average Program Execution Times and their Variance
, 1989
"... This paper presents a general framework for determining average program execution times and their variance, based on the program's interval structure and control dependence graph. Average execution times and variance values are computed using frequency information from an optimized counterbase ..."
Abstract

Cited by 88 (0 self)
 Add to MetaCart
This paper presents a general framework for determining average program execution times and their variance, based on the program's interval structure and control dependence graph. Average execution times and variance values are computed using frequency information from an optimized counterbased execution profile of the program. 1 Introduction It is important for a compiler to obtain estimates of execution times for subcomputations of an input program, if it is to attempt optimizations related to overhead values in the target architecture. In earlier work [SH86a, SH86b, Sar87, Sar89], we used estimates of execution times to facilitate the automatic partitioning and scheduling of programs written in the singleassignment language, Sisal, for parallel execution on multiprocessors. In this paper, we present a general framework for estimating average execution times in a program. This approach is based on the interval structure [ASU86] and the control dependence relation [FOW87], both of w...
PYRROS: Static Task Scheduling and Code Generation for Message Passing Multiprocessors
 The 6th ACM Int'l Conf. on Supercomputing
, 1992
"... We describe a parallel programming tool for scheduling static task graphs and generating the appropriate target code for message passing MIMD architectures. The computational complexity of the system is almost linear to the size of the task graph and preliminary experiments show performance comparab ..."
Abstract

Cited by 85 (21 self)
 Add to MetaCart
We describe a parallel programming tool for scheduling static task graphs and generating the appropriate target code for message passing MIMD architectures. The computational complexity of the system is almost linear to the size of the task graph and preliminary experiments show performance comparable to the "best" handwritten programs. 1 Introduction In this paper, we consider static scheduling and code generation for message passing architectures. There are generally three distinct ways in addressing the programming difficulties for distributed memory architectures. The first approach considers the problem of automatic parallelization and scheduling from sequential programs. The emphasis has been in the development of compilers or software tools that will assist in programming parallel architectures [2, 16, 18, 19]. Since message passing architectures require coarse grain parallelism to be efficient, one difficulty is the identification of parallelism especially at the procedural ...
A Comparison of Multiprocessor Scheduling Heuristics
 In Proceedings of the 1994 International Conference on Parallel Processing, volume II
, 1994
"... Many algorithms for scheduling DAGs on multiprocessors have been proposed, but there has been little work done to determine their effectiveness. Since multiprocessor scheduling is an NPhard problem, no exact tractible algorithm exists, and no baseline is available from which to compare the resulti ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
Many algorithms for scheduling DAGs on multiprocessors have been proposed, but there has been little work done to determine their effectiveness. Since multiprocessor scheduling is an NPhard problem, no exact tractible algorithm exists, and no baseline is available from which to compare the resulting schedules. Furthermore, performance guarantees have been found for only a few simple DAGs. This paper is an attempt to quantify the differences in five of the heuristics. Classification criteria are defined for the DAGs, and the differences between the heuristics are noted for various criteria. The comparison is made between a graph based method, two critical path methods, and two list scheduling heuristics. The empirical performance of the five heuristics is compared when they are applied to the randomly generated DAGs. This work is supported by NSF grant number CCR9203319 0 1 Introduction One of the primary problems in executing programs efficiently on multiprocessor systems with ...
List Scheduling with and without Communication Delays
 Parallel Computing
, 1993
"... Empirical results have shown that the classical critical path (CP) list scheduling heuristic for task graphs is a fast and practical heuristic when communication cost is zero. In the first part of this paper we study the theoretical properties of the CP heuristic that lead to near optimum performanc ..."
Abstract

Cited by 35 (6 self)
 Add to MetaCart
Empirical results have shown that the classical critical path (CP) list scheduling heuristic for task graphs is a fast and practical heuristic when communication cost is zero. In the first part of this paper we study the theoretical properties of the CP heuristic that lead to near optimum performance in practice. In the second part we extend the CP analysis to the problem of ordering the task execution when the processor assignment is given and communication cost is nonzero. We propose two new list scheduling heuristics, the RCP and RCP 3 that use critical path information and ready list priority scheduling. We show that the performance properties for RCP and RCP 3 , when communication is nonzero, are similar to CP when communication is zero. Finally, we present an extensive experimental study and optimality analysis of the heuristics which verifies our theoretical results. 1 Introduction The processor scheduling problem is of considerable importance in parallel processing. Given a...
Communication contention in task scheduling
 IEEE Transactions on Parallel and Distributed Systems
, 2005
"... Abstract—Task scheduling is an essential aspect of parallel programming. Most heuristics for this NPhard problem are based on a simple system model that assumes fully connected processors and concurrent interprocessor communication. Hence, contention for communication resources is not considered in ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
Abstract—Task scheduling is an essential aspect of parallel programming. Most heuristics for this NPhard problem are based on a simple system model that assumes fully connected processors and concurrent interprocessor communication. Hence, contention for communication resources is not considered in task scheduling, yet it has a strong influence on the execution time of a parallel program. This paper investigates the incorporation of contention awareness into task scheduling. A new system model for task scheduling is proposed, allowing us to capture both endpoint and network contention. To achieve this, the communication network is reflected by a topology graph for the representation of arbitrary static and dynamic networks. The contention awareness is accomplished by scheduling the communications, represented by the edges in the task graph, onto the links of the topology graph. Edge scheduling is theoretically analyzed, including aspects like heterogeneity, routing, and causality. The proposed contentionaware scheduling preserves the theoretical basis of task scheduling. It is shown how classic list scheduling is easily extended to this more accurate system model. Experimental results show the significantly improved accuracy and efficiency of the produced schedules. Index Terms—Parallel processing, concurrent programming, scheduling and task partitioning, communication contention, heterogeneous system model. æ 1
A Comparison of Heuristics for Scheduling DAGs on Multiprocessors
 in Proceedings of the Eighth International Parallel Processing Symposium
, 1994
"... Many algorithms to schedule DAGs on multiprocessors have been proposed, but there has been little work done to determine their effectiveness. Since multiprocessor scheduling is an NPhard problem, no exact tractable algorithm exists, and no baseline is available from which to compare the resulting ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Many algorithms to schedule DAGs on multiprocessors have been proposed, but there has been little work done to determine their effectiveness. Since multiprocessor scheduling is an NPhard problem, no exact tractable algorithm exists, and no baseline is available from which to compare the resulting schedules. This paper is an attempt to quantify the differences in a few of the heuristics. The empiracle performance of five heuristics is compared when they are applied to ten specific DAGs which represent program dependence graphs of important applications. The comparison is made between a graph based method, a list scheduling technique and three critical path mathods. 1. Introduction One of the primary problems in creating efficient programs for multiprocessor systems with distributed memory is to partition the program into tasks that can be assigned to different processors for parallel execution. If a high degree of parallelism is the objective, a greater amount of communication will b...
A Fast Static Scheduling Algorithm for DAGs on an Unbounded Number of Processors
, 1991
"... Scheduling parallel tasks on an unbounded number of completely connected processors when communication overhead is taken into account is NPcomplete. Assuming that task duplication is not allowed, we propose a fast heuristic algorithm, called the dominant sequence clustering algorithm (DSC), for thi ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
Scheduling parallel tasks on an unbounded number of completely connected processors when communication overhead is taken into account is NPcomplete. Assuming that task duplication is not allowed, we propose a fast heuristic algorithm, called the dominant sequence clustering algorithm (DSC), for this scheduling problem. The DSC algorithm is superior to several other algorithms from the literature in terms of both computational complexity and parallel time. We present experimental results for scheduling general directed acyclic task graphs (DAGs) and compare the performance of several algorithms. Moreover, we show that DSC is optimum for special classes of DAGs such as join, fork and coarse grain tree graphs. 1 Introduction Scheduling parallel tasks with precedence relations over distributed memory multiprocessors has been found to be much more difficult than the classical scheduling problem, see Graham [14] and Lenstra and Kan [15]. This is because data transferring between processor...
Clustering Task Graphs for Message Passing Architectures
 Proceedings of ACM International Conference on Supercomputing
, 1990
"... Clustering is a mapping of the nodes of a task graph onto labeled clusters. We present a unified framework for clustering of directed acyclic graphs (DAGs). Several clustering algorithms from the literature are compared using this framework. For coarse grain DAGs two interesting properties are prese ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
Clustering is a mapping of the nodes of a task graph onto labeled clusters. We present a unified framework for clustering of directed acyclic graphs (DAGs). Several clustering algorithms from the literature are compared using this framework. For coarse grain DAGs two interesting properties are presented. For every nonlinear clustering there exists a linear clustering whose parallel time is less than the nonlinear one. Furthermore, the parallel time of any linear clustering is within a factor of two of the optimal. Two clustering algorithms are presented with near linear time complexity for coarse grain DAGs. The conclusion is that linear clustering is an efficient and accurate operation. 1 Introduction Identification of parallelism, partitioning, clustering and scheduling are some of the major problems in parallel processing. Partitioning is used as a first step to scheduling and is defined as a mapping of the nodes of a data dependence graph (DDG) onto labeled tasks. The definition o...