Results 1  10
of
71
Stochastic Scheduling
, 1999
"... There is a current need for scheduling policies that can leverage the performance variability of resources on multiuser clusters. We develop one solution to this problem called stochastic scheduling that utilizes a distribution of application execution performance on the target resources to determin ..."
Abstract

Cited by 82 (13 self)
 Add to MetaCart
There is a current need for scheduling policies that can leverage the performance variability of resources on multiuser clusters. We develop one solution to this problem called stochastic scheduling that utilizes a distribution of application execution performance on the target resources to determine a performanceefficient schedule. In this paper, we define a stochastic scheduling policy based on timebalancing for data parallel applications whose execution behavior can be represented as a normal distribution. Using three distributed applications on two contended platforms, we demonstrate that a stochastic scheduling policy can achieve good and predictable performance for the application as evaluated by several performance measures.
A Hierarchical Multiprocessor Scheduling System for DSP Applications
 In Proceedings of the IEEE Asilomar Conference on Signals, Systems, and Computers
, 1995
"... This paper discusses a hierarchical scheduling framework which reduces the complexity of scheduling synchronous dataflow (SDF) graphs onto multiple processors. The core of this framework is a clustering algorithm that decreases the number of nodes before expanding the SDF graph into a precedence dir ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
This paper discusses a hierarchical scheduling framework which reduces the complexity of scheduling synchronous dataflow (SDF) graphs onto multiple processors. The core of this framework is a clustering algorithm that decreases the number of nodes before expanding the SDF graph into a precedence directed acyclic graph (DAG). The internals of the clusters are then scheduled with uniprocessor SDF schedulers which can optimize for memory usage. The clustering is done in such a manner as to leave ample parallelism exposed for the multiprocessor scheduler. We have developed the SDF composition theorem for testing if a clustering step is valid. The advantages of this framework are demonstrated with several practical, realtime examples. 1 Motivation Dataflow is a natural representation for signal processing algorithms. One of its strengths is that it exposes parallelism by expressing only the actual data dependencies that exist in an algorithm. Applications are specified by a dataflow grap...
Minimizing Memory Requirements in RateOptimal Schedules
, 1994
"... In this paper we address the problem of minimizing buffer storage requirement in constructing rateoptimal compiletime schedules for multirate dataflow graphs. We demonstrate that this problem, called the Minimum Buffer RateOptimal (MBRO) scheduling problem, can be formulated as a unified linear ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
In this paper we address the problem of minimizing buffer storage requirement in constructing rateoptimal compiletime schedules for multirate dataflow graphs. We demonstrate that this problem, called the Minimum Buffer RateOptimal (MBRO) scheduling problem, can be formulated as a unified linear programming problem. A novel feature of our method is that it tries to minimize the memory requirement while simultaneously maximizing the computation rate. We have constructed an experimental testbed which implements our scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules proposed by Lee and Messerschmitt [12], (ii) the optimal scheduling buffer allocation (OSBA) algorithm of Ning and Gao [15], and (iii) the multirate software pipelining (MRSP) algorithm [7]. The experimental results have demonstrated a significant improvement in buffer requirements for the MBRO schedules compared to the schedules generated by the other three methods. Compared to bloc...
Generating Compact Code From Dataflow Specifications Of Multirate Signal Processing Algorithms
 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS — I: FUNDAMENTAL THEORY AND APPLICATIONS
, 1995
"... Synchronous dataflow (SDF) semantics are wellsuited to representing and compiling multirate signal processing algorithms. A key to this match is the ability to cleanly express iteration without overspecifying the execution order of computations, thereby allowing efficient schedules to be constructe ..."
Abstract

Cited by 30 (16 self)
 Add to MetaCart
Synchronous dataflow (SDF) semantics are wellsuited to representing and compiling multirate signal processing algorithms. A key to this match is the ability to cleanly express iteration without overspecifying the execution order of computations, thereby allowing efficient schedules to be constructed. Due to limited program memory, it is often desirable to translate the iteration in an SDF graph into groups of repetitive firing patterns so that loops can be constructed in the target code. This paper establishes fundamental topological relationships between iteration and looping in SDF graphs, and presents a scheduling framework that provably synthesizes the most compact looping structures for a large class of practical SDF graphs. By modularizing different components of the scheduling framework, and establishing their independence, we show how other scheduling objectives, such as minimizing data buffering requirements or increasing the number of data transfers that occur in registers, ...
Memory Management for Dataflow Programming of Multirate Signal Processing Algorithms
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 1994
"... Managing the buffering of data along arcs is a critical part of compiling a synchronous dataflow (SDF) program. This paper shows how dataflow properties can be analyzed at compiletime to make buffering more efficient. Since the target code corresponding to each node of an SDF graph is normally obta ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
Managing the buffering of data along arcs is a critical part of compiling a synchronous dataflow (SDF) program. This paper shows how dataflow properties can be analyzed at compiletime to make buffering more efficient. Since the target code corresponding to each node of an SDF graph is normally obtained from a handoptimized library of predefined blocks, the efficiency of data transfer between blocks is often the limiting factor in how closely an SDF compiler can approximate meticulous manual coding. Furthermore, in the presence of large samplerate changes, straightforward buffering techniques can quickly exhaust limited onchip data memory, necessitating the use of slower external memory. The techniques presented in this paper address both of these problems in a unified manner.
Compiletime scheduling of dynamic constructs in dataflow program graphs
 IEEE Transactions on Computers
, 1997
"... ..."
Looped Schedules for Dataflow Descriptions of Multirate DSP Algorithms
 Journal of Formal Methods in System Design
, 1993
"... The synchronous dataflow (SDF) programming paradigm has been used extensively in design environments for multirate signal processing applications. In this paradigm, the repetition of computations is specified by the relative rates at which the computations consume and produce data. This implicit spe ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
The synchronous dataflow (SDF) programming paradigm has been used extensively in design environments for multirate signal processing applications. In this paradigm, the repetition of computations is specified by the relative rates at which the computations consume and produce data. This implicit specification of iteration allows a compiler to easily explore alternative nested loops structures for the target code with respect to their effects on code size, buffering requirements and throughput. In this paper, we develop important relationships between the SDF description of an algorithm and the range of looping structures offered by this description, and we discuss how to improve code efficiency by applying these relationships. 1 Introduction Synchronous dataflow (SDF) is a restricted form of the dataflow model of computation [5]. In the dataflow model, a program is represented as a directed graph. The nodes of the graph, also called actors, represent computations and the arcs represent...
Minimizing Buffer Requirements under RateOptimal Schedule in Regular Dataflow Networks
 Journal of VLSI Signal Processing
, 1994
"... Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a method to minimize buffer storage requirement in constructing rateoptimal compiletime (MBRO) schedules for multirate dataflow graphs. We demonstrate that the constraints to minimize buffer storage while executing at the optimal computation rate (i.e. the maximum possible computation rate without storage constraints) can be formulated as a unified linear programming problem in our framework. A novel feature of our method is that it tries to minimize the memory requirement while simultaneously maximizing the computation rate. We have constructed an experimental testbed which implements our MBRO scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules (also ...
Determining the order of processor transactions in statically scheduled multiprocessors
 VLSI Signal Processing
, 1997
"... This paper addresses embedded multiprocessor implementation of iterative, realtime applications, such as digital signal and image processing, that are specified as dataflow graphs. Scheduling dataflow graphs on multiple processors involves assigning tasks to processors (processor assignment), order ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
This paper addresses embedded multiprocessor implementation of iterative, realtime applications, such as digital signal and image processing, that are specified as dataflow graphs. Scheduling dataflow graphs on multiple processors involves assigning tasks to processors (processor assignment), ordering the execution of tasks within each processor (task ordering), and determining when each task must commence execution. We consider three scheduling strategies: fullystatic, selftimed and ordered transactions, all of which perform the assignment and ordering steps at compile time. Run time costs are small for the fullystatic strategy; however it is not robust with respect to changes or uncertainty in task execution times. The selftimed approach is tolerant of variations in task execution times, but pays the penalty of high run time costs, because processors need to explicitly synchronize whenever they communicate. The ordered transactions approach lies between the fullystatic and selftimed strategies; in this approach the order in which processors communicate is determined at compile time and enforced at run time. The ordered transactions strategy retains some of the flexibility of selftimed schedules and at the same time has lower run time costs than the selftimed approach. In this paper we determine an order of processor transactions that is nearly optimal given information about task execution times at compile time, and for a given processor assignment and task ordering. The criterion for optimality is the average throughput achieved by the schedule. Our main result is that it is possible to choose a transaction order such that the resulting ordered transactions schedule incurs no performance penalty compared to the more flexible selftimed strategy, even when the higher run time costs implied by the selftimed strategy are ignored.
Optimizing synchronization in multiprocessor DSP systems
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 1997
"... This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are es ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are essential for correct execution. Our study is based in the context of selftimed execution of iterative dataflow programs. An iterative dataflow program consists of a dataflow representation of the body of a loop that is to be iterated an indefinite number of times; dataflow programming in this form has been studied and applied extensively, particularly in the context of signal processing software. Selftimed execution refers to a combined compiletime/runtime scheduling strategy in which processors synchronize with one another only based on interprocessor communication requirements, and thus, synchronization of processors at the end of each loop iteration does not generally occur. We introduce a new graphtheoretic framework, based on a data structure called the synchronization graph, for analyzing and optimizing synchronization overhead in selftimed, iterative dataflow programs. We show that the comprehensive techniques that have been developed for removing redundant synchronizations in noniterative programs can be extended in this framework to optimally remove redundant