Results 1  10
of
101
Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors
, 1999
"... Devices]: Modes of ComputationParallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported ..."
Abstract

Cited by 204 (4 self)
 Add to MetaCart
Devices]: Modes of ComputationParallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported by the Hong Kong Research Grants Council under contract numbers HKUST 734/96E, HKUST 6076/97E, and HKU 7124/99E. Authors' addresses: Y.K. Kwok, Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong; email: ykwok@eee.hku.hk; I. Ahmad, Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. 2000 ACM 03600300/99/12000406 $5.00 ACM Computing Surveys, Vol. 31, No. 4, December 1999 1.
DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors
 IEEE Transactions on Parallel and Distributed Systems
"... We present a low complexity heuristic named the Dominant Sequence Clustering algorithm (DSC) for scheduling parallel tasks on an unbounded number of completely connected processors. The performance of DSC is comparable or even better on average than many other higher complexity algorithms. We assume ..."
Abstract

Cited by 165 (9 self)
 Add to MetaCart
We present a low complexity heuristic named the Dominant Sequence Clustering algorithm (DSC) for scheduling parallel tasks on an unbounded number of completely connected processors. The performance of DSC is comparable or even better on average than many other higher complexity algorithms. We assume no task duplication and nonzero communication overhead between processors. Finding the optimum solution for arbitrary directed acyclic task graphs (DAGs) is NPcomplete. DSC finds optimal schedules for special classes of DAGs such as fork, join, coarse grain trees and some fine grain trees. It guarantees a performance within a factor of two of the optimum for general coarse grain DAGs. We compare DSC with three higher complexity general scheduling algorithms, the MD by Wu and Gajski [19], the ETF by Hwang, Chow, Anger and Lee [12] and Sarkar's clustering algorithm [17]. We also give a sample of important practical applications where DSC has been found useful. Index Terms  Clustering, dire...
PerformanceEffective and LowComplexity Task Scheduling for Heterogeneous Computing
 IEEE Transactions on Parallel and Distributed Systems
, 2002
"... AbstractÐEfficient application scheduling is critical for achieving high performance in heterogeneous computing environments. The application scheduling problem has been shown to be NPcomplete in general cases as well as in several restricted cases. Because of its key importance, this problem has b ..."
Abstract

Cited by 133 (0 self)
 Add to MetaCart
AbstractÐEfficient application scheduling is critical for achieving high performance in heterogeneous computing environments. The application scheduling problem has been shown to be NPcomplete in general cases as well as in several restricted cases. Because of its key importance, this problem has been extensively studied and various algorithms have been proposed in the literature which are mainly for systems with homogeneous processors. Although there are a few algorithms in the literature for heterogeneous processors, they usually require significantly high scheduling costs and they may not deliver good quality schedules with lower costs. In this paper, we present two novel scheduling algorithms for a bounded number of heterogeneous processors with an objective to simultaneously meet high performance and fast scheduling time, which are called the Heterogeneous EarliestFinishTime (HEFT) algorithm and the CriticalPathonaProcessor (CPOP) algorithm. The HEFT algorithm selects the task with the highest upward rank value at each step and assigns the selected task to the processor, which minimizes its earliest finish time with an insertionbased approach. On the other hand, the CPOP algorithm uses the summation of upward and downward rank values for prioritizing tasks. Another difference is in the processor selection phase, which schedules the critical tasks onto the processor that minimizes the total execution time of the critical tasks. In order to provide a robust and unbiased comparison with the related work, a parametric graph generator was designed to generate weighted directed acyclic graphs with various characteristics. The comparison study, based on both randomly generated graphs and the graphs of some real applications, shows that our scheduling algorithms significantly surpass previous approaches in terms of both quality and cost of schedules, which are mainly presented with schedule length ratio, speedup, frequency of best results, and average scheduling time metrics. Index TermsÐDAG scheduling, task graphs, heterogeneous systems, list scheduling, mapping. 1
Dynamic CriticalPath Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors
 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1996
"... In this paper, we propose a static scheduling algorithm for allocating task graphs to fullyconnected multiprocessors. We discuss six recently reported scheduling algorithms and show that they possess one drawback or the other which can lead to poor performance. The proposed algorithm, which is calle ..."
Abstract

Cited by 121 (17 self)
 Add to MetaCart
In this paper, we propose a static scheduling algorithm for allocating task graphs to fullyconnected multiprocessors. We discuss six recently reported scheduling algorithms and show that they possess one drawback or the other which can lead to poor performance. The proposed algorithm, which is called the Dynamic CriticalPath (DCP) scheduling algorithm, is different from the previously proposed algorithms in a number of ways. First, it determines the critical path of the task graph and selects the next node to be scheduled in a dynamic fashion. Second, it rearranges the schedule on each processor dynamically in the sense that the positions of the nodes in the partial schedules are not fixed until all nodes have been considered. Third, it selects a suitable processor for a node by looking ahead the potential start times of the remaining nodes on that processor, and schedules relatively less important nodes to the processors already in use. A global as well as a pairwise comparison is c...
On The Granularity And Clustering Of Directed Acyclic Task Graphs
 IEEE Transactions on Parallel and Distributed Systems
, 1990
"... Clustering has been used as a compile time preprocessing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definition, t ..."
Abstract

Cited by 98 (20 self)
 Add to MetaCart
Clustering has been used as a compile time preprocessing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definition, the impact of the granularity on clustering strategies is analyzed. A clustering is called linear if every cluster is one simple directed path in the task graph; otherwise is called nonlinear. For coarse grain directed acyclic task graphs (DAGs), a completely connected architecture with unbounded number of processors and under the assumption that task duplication is not allowed, the following property is shown: For every nonlinear clustering there exists a linear clustering with less or equal parallel time. This property, along with a performance bound for linear clustering algorithms, shows that linear clustering is the best choice for coarse grain DAGs. It provides a theoretical justificati...
MOGAC: A Multiobjective Genetic Algorithm for HardwareSoftware CoSynthesis of Distributed Embedded Systems
, 1998
"... In this paper, we present a hardwaresoftware cosynthesis system, called MOGAC, that partitions and schedules embedded system specifications consisting of multiple periodic task graphs. MOGAC synthesizes realtime heterogeneous distributed architectures using an adaptive multiobjective genetic algor ..."
Abstract

Cited by 94 (6 self)
 Add to MetaCart
In this paper, we present a hardwaresoftware cosynthesis system, called MOGAC, that partitions and schedules embedded system specifications consisting of multiple periodic task graphs. MOGAC synthesizes realtime heterogeneous distributed architectures using an adaptive multiobjective genetic algorithm that can escape local minima. Price and power consumption are optimized while hard realtime constraints are met. MOGAC places no limit on the number of hardware or software processing elements in the architectures it synthesizes. Our general model for bus and pointtopoint communication links allows a number of link types to be used in an architecture. Applicationspecific integrated circuits consisting of multiple processing elements are modeled. Heuristics are used to tackle multirate systems, as well as systems containing task graphs whose hyperperiods are large relative to their periods. The application of a multiobjective optimization strategy allows a single cosynthesis run to ...
COSYN: HardwareSoftware Cosynthesis of Embedded Systems
, 1997
"... Hardwaresoftware cosynthesis is the process of partitioning an embedded system specification into hardware and software modules to meet performance, power, cost, and reliability goals. In this paper, we present a hardwaresoftware cosynthesis technique for realtime distributed embedded systems. ..."
Abstract

Cited by 92 (9 self)
 Add to MetaCart
Hardwaresoftware cosynthesis is the process of partitioning an embedded system specification into hardware and software modules to meet performance, power, cost, and reliability goals. In this paper, we present a hardwaresoftware cosynthesis technique for realtime distributed embedded systems. Our cosynthesis algorithm has the following features: 1) it allows the use of multiple types of processing elements (PEs) and interPE communication links, where the links can take various forms (pointtopoint, bus, local area network, etc.), 2) it supports both concurrent and sequential modes of communication and computation, 3) it allows both preemptive and nonpreemptive scheduling, 4) it employs the concept of an association array to tackle the problem of multirate systems (which are commonly found in multimedia applications), 5) it uses a scheduler based on dynamic deadlinebased priority levels for an accurate performance estimation of a cosynthesis solution, 6) it uses a new dynamic...
PYRROS: Static Task Scheduling and Code Generation for Message Passing Multiprocessors
 The 6th ACM Int'l Conf. on Supercomputing
, 1992
"... We describe a parallel programming tool for scheduling static task graphs and generating the appropriate target code for message passing MIMD architectures. The computational complexity of the system is almost linear to the size of the task graph and preliminary experiments show performance comparab ..."
Abstract

Cited by 83 (21 self)
 Add to MetaCart
We describe a parallel programming tool for scheduling static task graphs and generating the appropriate target code for message passing MIMD architectures. The computational complexity of the system is almost linear to the size of the task graph and preliminary experiments show performance comparable to the "best" handwritten programs. 1 Introduction In this paper, we consider static scheduling and code generation for message passing architectures. There are generally three distinct ways in addressing the programming difficulties for distributed memory architectures. The first approach considers the problem of automatic parallelization and scheduling from sequential programs. The emphasis has been in the development of compilers or software tools that will assist in programming parallel architectures [2, 16, 18, 19]. Since message passing architectures require coarse grain parallelism to be efficient, one difficulty is the identification of parallelism especially at the procedural ...
Benchmarking and Comparison of the Task Graph Scheduling Algorithms
, 1999
"... The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NPcompleteness of the problem has stimulated researchers to propose a myriad of ..."
Abstract

Cited by 80 (2 self)
 Add to MetaCart
The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NPcompleteness of the problem has stimulated researchers to propose a myriad of heuristic algorithms. While most of these algorithms are reported to be efficient, it is not clear how they compare against each other. A meaningful performance evaluation and comparison of these algorithms is a complex task and it must take into account a number of issues. First, most scheduling algorithms are based upon diverse assumptions, making the performance comparison rather purposeless. Second, there does not exist a standard set of benchmarks to examine these algorithms. Third, most algorithms are evaluated using small problem sizes, and, therefore, their scalability is unknown. In this paper, we first provide a taxonomy for classifying various algorithms into distinct categories a...
Models of Machines and Computation for Mapping in Multicomputers
, 1993
"... It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonlyaccepted framework ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonlyaccepted framework whereby results in the field can be compared. Nor is it always easy to assess the relevance of a new result to a particular problem. Furthermore, changes in parallel computing technology have made some of the earlier work of less relevance to current multiprocessor systems. Versions of the mapping problem are classified, and research in the field is considered in terms of its relevance to the problem of programming currently available hardware in the form of a distributed memory multiple instruction stream multiple data stream computer: a multicomputer.