Results 1  10
of
79
Stochastic Scheduling
, 1999
"... There is a current need for scheduling policies that can leverage the performance variability of resources on multiuser clusters. We develop one solution to this problem called stochastic scheduling that utilizes a distribution of application execution performance on the target resources to determin ..."
Abstract

Cited by 97 (14 self)
 Add to MetaCart
There is a current need for scheduling policies that can leverage the performance variability of resources on multiuser clusters. We develop one solution to this problem called stochastic scheduling that utilizes a distribution of application execution performance on the target resources to determine a performanceefficient schedule. In this paper, we define a stochastic scheduling policy based on timebalancing for data parallel applications whose execution behavior can be represented as a normal distribution. Using three distributed applications on two contended platforms, we demonstrate that a stochastic scheduling policy can achieve good and predictable performance for the application as evaluated by several performance measures.
A Hierarchical Multiprocessor Scheduling System for DSP Applications
 In Proceedings of the IEEE Asilomar Conference on Signals, Systems, and Computers
, 1995
"... This paper discusses a hierarchical scheduling framework which reduces the complexity of scheduling synchronous dataflow (SDF) graphs onto multiple processors. The core of this framework is a clustering algorithm that decreases the number of nodes before expanding the SDF graph into a precedence dir ..."
Abstract

Cited by 40 (11 self)
 Add to MetaCart
(Show Context)
This paper discusses a hierarchical scheduling framework which reduces the complexity of scheduling synchronous dataflow (SDF) graphs onto multiple processors. The core of this framework is a clustering algorithm that decreases the number of nodes before expanding the SDF graph into a precedence directed acyclic graph (DAG). The internals of the clusters are then scheduled with uniprocessor SDF schedulers which can optimize for memory usage. The clustering is done in such a manner as to leave ample parallelism exposed for the multiprocessor scheduler. We have developed the SDF composition theorem for testing if a clustering step is valid. The advantages of this framework are demonstrated with several practical, realtime examples. 1 Motivation Dataflow is a natural representation for signal processing algorithms. One of its strengths is that it exposes parallelism by expressing only the actual data dependencies that exist in an algorithm. Applications are specified by a dataflow grap...
Generating Compact Code From Dataflow Specifications Of Multirate Signal Processing Algorithms
 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS — I: FUNDAMENTAL THEORY AND APPLICATIONS
, 1995
"... Synchronous dataflow (SDF) semantics are wellsuited to representing and compiling multirate signal processing algorithms. A key to this match is the ability to cleanly express iteration without overspecifying the execution order of computations, thereby allowing efficient schedules to be constructe ..."
Abstract

Cited by 37 (18 self)
 Add to MetaCart
Synchronous dataflow (SDF) semantics are wellsuited to representing and compiling multirate signal processing algorithms. A key to this match is the ability to cleanly express iteration without overspecifying the execution order of computations, thereby allowing efficient schedules to be constructed. Due to limited program memory, it is often desirable to translate the iteration in an SDF graph into groups of repetitive firing patterns so that loops can be constructed in the target code. This paper establishes fundamental topological relationships between iteration and looping in SDF graphs, and presents a scheduling framework that provably synthesizes the most compact looping structures for a large class of practical SDF graphs. By modularizing different components of the scheduling framework, and establishing their independence, we show how other scheduling objectives, such as minimizing data buffering requirements or increasing the number of data transfers that occur in registers, ...
Minimizing Memory Requirements in RateOptimal Schedules
, 1994
"... In this paper we address the problem of minimizing buffer storage requirement in constructing rateoptimal compiletime schedules for multirate dataflow graphs. We demonstrate that this problem, called the Minimum Buffer RateOptimal (MBRO) scheduling problem, can be formulated as a unified linear ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
In this paper we address the problem of minimizing buffer storage requirement in constructing rateoptimal compiletime schedules for multirate dataflow graphs. We demonstrate that this problem, called the Minimum Buffer RateOptimal (MBRO) scheduling problem, can be formulated as a unified linear programming problem. A novel feature of our method is that it tries to minimize the memory requirement while simultaneously maximizing the computation rate. We have constructed an experimental testbed which implements our scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules proposed by Lee and Messerschmitt [12], (ii) the optimal scheduling buffer allocation (OSBA) algorithm of Ning and Gao [15], and (iii) the multirate software pipelining (MRSP) algorithm [7]. The experimental results have demonstrated a significant improvement in buffer requirements for the MBRO schedules compared to the schedules generated by the other three methods. Compared to bloc...
Determining the order of processor transactions in statically scheduled multiprocessors
 VLSI Signal Processing
, 1997
"... This paper addresses embedded multiprocessor implementation of iterative, realtime applications, such as digital signal and image processing, that are specified as dataflow graphs. Scheduling dataflow graphs on multiple processors involves assigning tasks to processors (processor assignment), order ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
(Show Context)
This paper addresses embedded multiprocessor implementation of iterative, realtime applications, such as digital signal and image processing, that are specified as dataflow graphs. Scheduling dataflow graphs on multiple processors involves assigning tasks to processors (processor assignment), ordering the execution of tasks within each processor (task ordering), and determining when each task must commence execution. We consider three scheduling strategies: fullystatic, selftimed and ordered transactions, all of which perform the assignment and ordering steps at compile time. Run time costs are small for the fullystatic strategy; however it is not robust with respect to changes or uncertainty in task execution times. The selftimed approach is tolerant of variations in task execution times, but pays the penalty of high run time costs, because processors need to explicitly synchronize whenever they communicate. The ordered transactions approach lies between the fullystatic and selftimed strategies; in this approach the order in which processors communicate is determined at compile time and enforced at run time. The ordered transactions strategy retains some of the flexibility of selftimed schedules and at the same time has lower run time costs than the selftimed approach. In this paper we determine an order of processor transactions that is nearly optimal given information about task execution times at compile time, and for a given processor assignment and task ordering. The criterion for optimality is the average throughput achieved by the schedule. Our main result is that it is possible to choose a transaction order such that the resulting ordered transactions schedule incurs no performance penalty compared to the more flexible selftimed strategy, even when the higher run time costs implied by the selftimed strategy are ignored.
Compiletime scheduling of dynamic constructs in dataflow program graphs
 IEEE Transactions on Computers
, 1997
"... ..."
(Show Context)
Looped Schedules for Dataflow Descriptions of Multirate DSP Algorithms
 Journal of Formal Methods in System Design
, 1993
"... The synchronous dataflow (SDF) programming paradigm has been used extensively in design environments for multirate signal processing applications. In this paradigm, the repetition of computations is specified by the relative rates at which the computations consume and produce data. This implicit spe ..."
Abstract

Cited by 21 (10 self)
 Add to MetaCart
The synchronous dataflow (SDF) programming paradigm has been used extensively in design environments for multirate signal processing applications. In this paradigm, the repetition of computations is specified by the relative rates at which the computations consume and produce data. This implicit specification of iteration allows a compiler to easily explore alternative nested loops structures for the target code with respect to their effects on code size, buffering requirements and throughput. In this paper, we develop important relationships between the SDF description of an algorithm and the range of looping structures offered by this description, and we discuss how to improve code efficiency by applying these relationships. 1 Introduction Synchronous dataflow (SDF) is a restricted form of the dataflow model of computation [5]. In the dataflow model, a program is represented as a directed graph. The nodes of the graph, also called actors, represent computations and the arcs represent...
Minimizing Buffer Requirements under RateOptimal Schedule in Regular Dataflow Networks
 Journal of VLSI Signal Processing
, 1994
"... Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a method to minimize buffer storage requirement in constructing rateoptimal compiletime (MBRO) schedules for multirate dataflow graphs. We demonstrate that the constraints to minimize buffer storage while executing at the optimal computation rate (i.e. the maximum possible computation rate without storage constraints) can be formulated as a unified linear programming problem in our framework. A novel feature of our method is that it tries to minimize the memory requirement while simultaneously maximizing the computation rate. We have constructed an experimental testbed which implements our MBRO scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules (also ...
Efficient techniques for clustering and scheduling onto embedded multiprocessors
 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 2006
"... Multiprocessor mapping and scheduling algorithms have been extensively studied over the past few decades and have been tackled from different perspectives. In the late 1980’s, the twostep decomposition of scheduling—into clustering and clusterscheduling—was introduced. Ever since, several clusteri ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
Multiprocessor mapping and scheduling algorithms have been extensively studied over the past few decades and have been tackled from different perspectives. In the late 1980’s, the twostep decomposition of scheduling—into clustering and clusterscheduling—was introduced. Ever since, several clustering and merging algorithms have been proposed and individually reported to be efficient. However, it is not clear how effective they are and how well they compare against singlestep scheduling algorithms or other multistep algorithms. In this paper, we explore the effectiveness of the twophase decomposition of scheduling and describe efficient and novel techniques that aggressively streamline interprocessor communications and can be tuned to exploit the significantly longer compilation time that is available to embedded system designers. We evaluate a number of leading clustering and merging algorithms using a set of benchmarks with diverse structures. We present an experimental setup for comparing the singlestep against the twostep scheduling approach. We determine the importance of different steps in scheduling and the effect of different steps on overall schedule performance and show that the decomposition of the scheduling process indeed improves the overall performance. We also show that the quality of the solutions depends on the quality of the clusters generated in the clustering step. Based on the results, we also discuss why the parallel time metric in the clustering step may not provide an accurate measure for the final performance of clusterscheduling.
Memory Management for Dataflow Programming of Multirate Signal Processing Algorithms
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 1994
"... Managing the buffering of data along arcs is a critical part of compiling a synchronous dataflow (SDF) program. This paper shows how dataflow properties can be analyzed at compiletime to make buffering more efficient. Since the target code corresponding to each node of an SDF graph is normally obta ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
Managing the buffering of data along arcs is a critical part of compiling a synchronous dataflow (SDF) program. This paper shows how dataflow properties can be analyzed at compiletime to make buffering more efficient. Since the target code corresponding to each node of an SDF graph is normally obtained from a handoptimized library of predefined blocks, the efficiency of data transfer between blocks is often the limiting factor in how closely an SDF compiler can approximate meticulous manual coding. Furthermore, in the presence of large samplerate changes, straightforward buffering techniques can quickly exhaust limited onchip data memory, necessitating the use of slower external memory. The techniques presented in this paper address both of these problems in a unified manner.