Results 1  10
of
13
Minimizing Memory Requirements in RateOptimal Schedules
, 1994
"... In this paper we address the problem of minimizing buffer storage requirement in constructing rateoptimal compiletime schedules for multirate dataflow graphs. We demonstrate that this problem, called the Minimum Buffer RateOptimal (MBRO) scheduling problem, can be formulated as a unified linear ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
In this paper we address the problem of minimizing buffer storage requirement in constructing rateoptimal compiletime schedules for multirate dataflow graphs. We demonstrate that this problem, called the Minimum Buffer RateOptimal (MBRO) scheduling problem, can be formulated as a unified linear programming problem. A novel feature of our method is that it tries to minimize the memory requirement while simultaneously maximizing the computation rate. We have constructed an experimental testbed which implements our scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules proposed by Lee and Messerschmitt [12], (ii) the optimal scheduling buffer allocation (OSBA) algorithm of Ning and Gao [15], and (iii) the multirate software pipelining (MRSP) algorithm [7]. The experimental results have demonstrated a significant improvement in buffer requirements for the MBRO schedules compared to the schedules generated by the other three methods. Compared to bloc...
Minimizing Buffer Requirements under RateOptimal Schedule in Regular Dataflow Networks
 Journal of VLSI Signal Processing
, 1994
"... Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a method to minimize buffer storage requirement in constructing rateoptimal compiletime (MBRO) schedules for multirate dataflow graphs. We demonstrate that the constraints to minimize buffer storage while executing at the optimal computation rate (i.e. the maximum possible computation rate without storage constraints) can be formulated as a unified linear programming problem in our framework. A novel feature of our method is that it tries to minimize the memory requirement while simultaneously maximizing the computation rate. We have constructed an experimental testbed which implements our MBRO scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules (also ...
Optimizing synchronization in multiprocessor DSP systems
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 1997
"... This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are es ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are essential for correct execution. Our study is based in the context of selftimed execution of iterative dataflow programs. An iterative dataflow program consists of a dataflow representation of the body of a loop that is to be iterated an indefinite number of times; dataflow programming in this form has been studied and applied extensively, particularly in the context of signal processing software. Selftimed execution refers to a combined compiletime/runtime scheduling strategy in which processors synchronize with one another only based on interprocessor communication requirements, and thus, synchronization of processors at the end of each loop iteration does not generally occur. We introduce a new graphtheoretic framework, based on a data structure called the synchronization graph, for analyzing and optimizing synchronization overhead in selftimed, iterative dataflow programs. We show that the comprehensive techniques that have been developed for removing redundant synchronizations in noniterative programs can be extended in this framework to optimally remove redundant
Resynchronization for Multiprocessor DSP Implementation  Part 1: Maximum Throughput Resynchronization
, 1996
"... This paper introduces a technique, called resynchronization, for reducing synchronization overhead in multiprocessor implementations of digital signal processing (DSP) systems. The technique applies to arbitrary collections of dedicated, programmable or configurable processors, such as combinations ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
This paper introduces a technique, called resynchronization, for reducing synchronization overhead in multiprocessor implementations of digital signal processing (DSP) systems. The technique applies to arbitrary collections of dedicated, programmable or configurable processors, such as combinations of programmable DSPs, ASICS, and FPGA subsystems. Thus, it is particularly well suited to the evolving trend towards heterogeneous singlechip multiprocessors in DSP systems. Resynchronization exploits the wellknown observation [36] that in a given multiprocessor implementation, certain synchronization operations may be redundant in the sense that their associated sequencing requirements are ensured by other synchronizations in the system. The goal of resynchronization is to introduce new synchronizations in such a way that the number of additional synchronizations that become redundant exceeds the number of new synchronizations that are added, and thus the net synchronization cost is reduc...
Resynchronization for Multiprocessor DSP Systems
 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS — I: FUNDAMENTAL THEORY AND APPLICATIONS
, 2000
"... This paper introduces a technique, called resynchronization, for reducing synchronization overhead in multiprocessor implementations of digital signal processing (DSP) systems. The technique applies to arbitrary collections of dedicated, programmable or configurable processors, such as combinations ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
This paper introduces a technique, called resynchronization, for reducing synchronization overhead in multiprocessor implementations of digital signal processing (DSP) systems. The technique applies to arbitrary collections of dedicated, programmable or configurable processors, such as combinations of programmable DSP's, ASICS, and FPGA subsystems. Thus, it is particularly wellsuited to the evolving trend toward heterogeneous singlechip multiprocessors in DSP systems. Resynchronization exploits the wellknown observation [43] that in a given multiprocessor implementation, certain synchronization operations may be redundant in the sense that their associated sequencing requirements are ensured by other synchronizations in the system. The goal of resynchronization is to introduce new synchronizations in such a way that the number of original synchronizations that become redundant exceeds the number of new synchronizations that are added, and thus, the net synchronization cost is reduced.
SelfTimed Resynchronization: A PostOptimization for Static Multiprocessor Schedules
 Proceedings of the International Parallel Processing Symposium
, 1996
"... In a sharedmemory multiprocessor system, it is possible that certain synchronization operations are redundant  that is, their corresponding sequencing requirements are enforced completely by other synchronizations in the system  and can be eliminated without compromising correctness. This pap ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
In a sharedmemory multiprocessor system, it is possible that certain synchronization operations are redundant  that is, their corresponding sequencing requirements are enforced completely by other synchronizations in the system  and can be eliminated without compromising correctness. This paper addresses the problem of adding new synchronization operations in a multiprocessor implementation in such a way that the number of original synchronizations that consequently become redundant significantly exceeds the number of new synchronizations. We refer to this approach to reducing synchronization overhead as resynchronization. In this paper we formally define the resynchronization problem, we show that optimal resynchronization is NPhard, and we propose a family of heuristics for this problem. Finally we present a practical example where resynchronization is useful. 1. Motivation Resynchronization is based on the concept that there can be redundancy in the synchronizations of a m...
RESYNCHRONIZATION OF MULTIPROCESSOR SCHEDULES: PART 2  LATENCYCONSTRAINED RESYNCHRONIZATION
, 1996
"... The companion paper [7] introduced the concept of resynchronization, a postoptimization for static multiprocessor schedules in which extraneous synchronization operations are introduced in such a way that the number of original synchronizations that consequently become redundant significantly excee ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The companion paper [7] introduced the concept of resynchronization, a postoptimization for static multiprocessor schedules in which extraneous synchronization operations are introduced in such a way that the number of original synchronizations that consequently become redundant significantly exceeds the number of additional synchronizations. Redundant synchronizations are synchronization operations whose corresponding sequencing requirements are enforced completely by other synchronizations in the system. The amount of runtime overhead required for synchronization can be reduced significantly by eliminating redundant synchronizations [5, 32]. Thus, effective resynchronization reduces the net synchronization overhead in the implementation of a multiprocessor schedule, and improves the overall throughput. However, since additional serialization is imposed by the new synchronizations, resynchronization can produce significant increase in latency. The companion paper [7] develops fundamental properties of resynchronization and studies the problem of optimal resynchronization under the assumption that arbitrary increases in latency can be tolerated (“unboundedlatency resynchronization”). Such an assumption is valid, for example, in a wide variety of simulation applications. This paper addresses the problem of computing an optimal resynchronization among all resynchronizations that do not increase the latency beyond a prespecified upper bound. Our study is based in the context of selftimed execution of iterative dataflow programs, which is an implementation model that has been applied extensively for digital signal processing systems.
Optimizing Synchronization in Multiprocessor Implementations of Iterative Dataflow Programs
, 1995
"... This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead tends to be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that a ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead tends to be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are essential for correct execution. Our study is based in the context of selftimed execution of iterative dataflow programs. An iterative dataflow program consists of a dataflow representation of the body of a loop that is to be iterated an indefinite number of times; dataflow programming in this form has been studied and applied extensively, particularly in the context of signal processing software. Selftimed execution refers to a combined compiletime/ runtime scheduling strategy in which processors synchronize with one another only based on interprocessor communication requirements, and thus, synchronization of processors at the end of each loop iteration does not generally occur. We ...
A Comparison of Clustering and Scheduling Techniques for Embedded Multiprocessor Systems
, 2003
"... In this paper we extensively explore and illustrate the effectiveness of the twophase decomposition of scheduling  into clustering and clusterscheduling or merging  and mapping task graphs onto embedded multiprocessor systems. We describe efficient and novel partitioning (clustering) and ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this paper we extensively explore and illustrate the effectiveness of the twophase decomposition of scheduling  into clustering and clusterscheduling or merging  and mapping task graphs onto embedded multiprocessor systems. We describe efficient and novel partitioning (clustering) and scheduling techniques that aggressively streamline interprocessor communication and can be tuned to exploit the significantly longer compilation time that is available to embedded system designers.
On the Optimal Blocking Factor for Blocked, Nonoverlapped Multiprocessor Schedules
, 1994
"... This paper addresses the issue of determining the blocked nonoverlapped multiprocessor schedule of optimal blocking factor for signal processing programs expressed as synchronous dataflow (SDF) graphs. The main result of this paper is a graphtheoretic characterization of the behavior of the criti ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper addresses the issue of determining the blocked nonoverlapped multiprocessor schedule of optimal blocking factor for signal processing programs expressed as synchronous dataflow (SDF) graphs. The main result of this paper is a graphtheoretic characterization of the behavior of the critical path in the precedence graph of blocking factor as is increased. We show that the asymptotic behavior is cyclic in the following sense: there exist constants and such that the critical path in the precedence graph of blocking factor has weight given by (EQ 1) where is the maximum cycle mean in the original graph, and is an integer computable from the graph. 1. Introduction Synchronous Dataflow (SDF) [2] is a subset of dataflow [6] that has proven to be an elegant model for expressing signal processing programs. In the SDF model, the program is represented as a graph where the nodes represent computations and the arcs represent communication channels and precedence constraints. Each acto...