Results 1  10
of
128
Synchronous data flow
, 1987
"... Data flow is a natural paradigm for describing DSP applications for concurrent implementation on parallel hardware. Data flow programs for signal processing are directed graphs where each node represents a function and each arc represents a signal path. Synchronous data flow (SDF) is a special case ..."
Abstract

Cited by 483 (44 self)
 Add to MetaCart
Data flow is a natural paradigm for describing DSP applications for concurrent implementation on parallel hardware. Data flow programs for signal processing are directed graphs where each node represents a function and each arc represents a signal path. Synchronous data flow (SDF) is a special case of data flow (either atomic or large grain) in which the number of data samples produced or consumed by each node on each invocation is specified a priori. Nodes can be scheduled statically (at compile time) onto single or parallel programmable processors so the runtime overhead usually associated with data flow evaporates. Multiple sample rates within the same system are easily and naturally handled. Conditions for correctness of SDF graph are explained and scheduling algorithms are described for homogeneous parallel processors sharing memory. A preliminary SDF software system for automatically generating assembly language code for DSP microcomputers is described. Two new efficiency techniques are introduced, static buffering and an extension to SDF to efficiently implement conditionals.
Iterative modulo scheduling: An algorithm for software pipelining loops
 In Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characte ..."
Abstract

Cited by 281 (3 self)
 Add to MetaCart
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors
, 1999
"... Devices]: Modes of ComputationParallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported ..."
Abstract

Cited by 206 (4 self)
 Add to MetaCart
Devices]: Modes of ComputationParallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported by the Hong Kong Research Grants Council under contract numbers HKUST 734/96E, HKUST 6076/97E, and HKU 7124/99E. Authors' addresses: Y.K. Kwok, Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong; email: ykwok@eee.hku.hk; I. Ahmad, Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. 2000 ACM 03600300/99/12000406 $5.00 ACM Computing Surveys, Vol. 31, No. 4, December 1999 1.
The NPcompleteness column: an ongoing guide
 Journal of Algorithms
, 1985
"... This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NPcompleteness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NPCompleteness,’ ’ W. H. Freeman & Co ..."
Abstract

Cited by 188 (0 self)
 Add to MetaCart
This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NPcompleteness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NPCompleteness,’ ’ W. H. Freeman & Co., New York, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed, and, when appropriate, crossreferences will be given to that book and the list of problems (NPcomplete and harder) presented there. Readers who have results they would like mentioned (NPhardness, PSPACEhardness, polynomialtimesolvability, etc.) or open problems they would like publicized, should
InstructionLevel Parallel Processing: History, Overview and Perspective
, 1992
"... Instructionlevel Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a muc ..."
Abstract

Cited by 171 (0 self)
 Add to MetaCart
Instructionlevel Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built, and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.
Hypertool: A Programming Aid for MessagePassing Systems
 IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1990
"... As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and errorprone. This paper discusses programming assistance and automation concepts and their application to a program development tool for messagepass ..."
Abstract

Cited by 166 (17 self)
 Add to MetaCart
As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and errorprone. This paper discusses programming assistance and automation concepts and their application to a program development tool for messagepassing systems called Hypertool. It performs scheduling and handles the communication primitive insertion automatically. Two algorithms, based on the criticalpath method, are presented for scheduling processes statically. Hypertool also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs.
Dynamic CriticalPath Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors
 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1996
"... In this paper, we propose a static scheduling algorithm for allocating task graphs to fullyconnected multiprocessors. We discuss six recently reported scheduling algorithms and show that they possess one drawback or the other which can lead to poor performance. The proposed algorithm, which is calle ..."
Abstract

Cited by 121 (17 self)
 Add to MetaCart
In this paper, we propose a static scheduling algorithm for allocating task graphs to fullyconnected multiprocessors. We discuss six recently reported scheduling algorithms and show that they possess one drawback or the other which can lead to poor performance. The proposed algorithm, which is called the Dynamic CriticalPath (DCP) scheduling algorithm, is different from the previously proposed algorithms in a number of ways. First, it determines the critical path of the task graph and selects the next node to be scheduled in a dynamic fashion. Second, it rearranges the schedule on each processor dynamically in the sense that the positions of the nodes in the partial schedules are not fixed until all nodes have been considered. Third, it selects a suitable processor for a node by looking ahead the potential start times of the remaining nodes on that processor, and schedules relatively less important nodes to the processors already in use. A global as well as a pairwise comparison is c...
Iterative Modulo Scheduling
, 1995
"... Modulo scheduling is a framework within which algorithms for the software pipelining of innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and heuristics can be defined within this fr ..."
Abstract

Cited by 84 (6 self)
 Add to MetaCart
Modulo scheduling is a framework within which algorithms for the software pipelining of innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and heuristics can be defined within this framework. Little work has been done to evaluate and compare alternative algorithms and heuristics for modulo scheduling from the viewpoints of schedule quality as well as computational complexity. This, along with a vague and unfounded perception that modulo scheduling is computationally expensive as well as difficult to implement, have inhibited its incorporation into product compilers. This report presents iterative modulo scheduling, a practical algorithm that is capable of dealing with realistic machine models. The report also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Approximation Techniques for Average Completion Time Scheduling
, 1997
"... We consider the problem of nonpreemptive scheduling to minimize average (weighted) completion time, allowing for release dates, parallel machines, and precedence constraints. Recent work has led to constantfactor approximations for this problem, based on solving a preemptive or linear programming ..."
Abstract

Cited by 82 (8 self)
 Add to MetaCart
We consider the problem of nonpreemptive scheduling to minimize average (weighted) completion time, allowing for release dates, parallel machines, and precedence constraints. Recent work has led to constantfactor approximations for this problem, based on solving a preemptive or linear programming relaxation and then using the solution to get an ordering on the jobs. We introduce several new techniques which generalize this basic paradigm. We use these ideas to obtain improved approximation algorithms for onemachine scheduling to minimize average completion time with release dates. In the process, we obtain an optimal randomized online algorithm for the same problem that beats a lower bound for deterministic online algorithms. We consider extensions to the case of parallel machine scheduling, and for this we introduce two new ideas: first, we show that a preemptive onemachine relaxation is a powerful tool for designing parallel machine scheduling algorithms that simultaneously pro...
Benchmarking and Comparison of the Task Graph Scheduling Algorithms
, 1999
"... The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NPcompleteness of the problem has stimulated researchers to propose a myriad of ..."
Abstract

Cited by 82 (2 self)
 Add to MetaCart
The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NPcompleteness of the problem has stimulated researchers to propose a myriad of heuristic algorithms. While most of these algorithms are reported to be efficient, it is not clear how they compare against each other. A meaningful performance evaluation and comparison of these algorithms is a complex task and it must take into account a number of issues. First, most scheduling algorithms are based upon diverse assumptions, making the performance comparison rather purposeless. Second, there does not exist a standard set of benchmarks to examine these algorithms. Third, most algorithms are evaluated using small problem sizes, and, therefore, their scalability is unknown. In this paper, we first provide a taxonomy for classifying various algorithms into distinct categories a...