Results 1  10
of
14
Minimizing Memory Requirements in RateOptimal Schedules
, 1994
"... In this paper we address the problem of minimizing buffer storage requirement in constructing rateoptimal compiletime schedules for multirate dataflow graphs. We demonstrate that this problem, called the Minimum Buffer RateOptimal (MBRO) scheduling problem, can be formulated as a unified linear ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
In this paper we address the problem of minimizing buffer storage requirement in constructing rateoptimal compiletime schedules for multirate dataflow graphs. We demonstrate that this problem, called the Minimum Buffer RateOptimal (MBRO) scheduling problem, can be formulated as a unified linear programming problem. A novel feature of our method is that it tries to minimize the memory requirement while simultaneously maximizing the computation rate. We have constructed an experimental testbed which implements our scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules proposed by Lee and Messerschmitt [12], (ii) the optimal scheduling buffer allocation (OSBA) algorithm of Ning and Gao [15], and (iii) the multirate software pipelining (MRSP) algorithm [7]. The experimental results have demonstrated a significant improvement in buffer requirements for the MBRO schedules compared to the schedules generated by the other three methods. Compared to bloc...
Minimizing Buffer Requirements under RateOptimal Schedule in Regular Dataflow Networks
 Journal of VLSI Signal Processing
, 1994
"... Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
(Show Context)
Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a method to minimize buffer storage requirement in constructing rateoptimal compiletime (MBRO) schedules for multirate dataflow graphs. We demonstrate that the constraints to minimize buffer storage while executing at the optimal computation rate (i.e. the maximum possible computation rate without storage constraints) can be formulated as a unified linear programming problem in our framework. A novel feature of our method is that it tries to minimize the memory requirement while simultaneously maximizing the computation rate. We have constructed an experimental testbed which implements our MBRO scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules (also ...
Optimizing synchronization in multiprocessor DSP systems
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 1997
"... This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are es ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
(Show Context)
This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are essential for correct execution. Our study is based in the context of selftimed execution of iterative dataflow programs. An iterative dataflow program consists of a dataflow representation of the body of a loop that is to be iterated an indefinite number of times; dataflow programming in this form has been studied and applied extensively, particularly in the context of signal processing software. Selftimed execution refers to a combined compiletime/runtime scheduling strategy in which processors synchronize with one another only based on interprocessor communication requirements, and thus, synchronization of processors at the end of each loop iteration does not generally occur. We introduce a new graphtheoretic framework, based on a data structure called the synchronization graph, for analyzing and optimizing synchronization overhead in selftimed, iterative dataflow programs. We show that the comprehensive techniques that have been developed for removing redundant synchronizations in noniterative programs can be extended in this framework to optimally remove redundant
Resynchronization for Multiprocessor DSP Implementation  Part 1: Maximum Throughput Resynchronization
, 1996
"... This paper introduces a technique, called resynchronization, for reducing synchronization overhead in multiprocessor implementations of digital signal processing (DSP) systems. The technique applies to arbitrary collections of dedicated, programmable or configurable processors, such as combinations ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
This paper introduces a technique, called resynchronization, for reducing synchronization overhead in multiprocessor implementations of digital signal processing (DSP) systems. The technique applies to arbitrary collections of dedicated, programmable or configurable processors, such as combinations of programmable DSPs, ASICS, and FPGA subsystems. Thus, it is particularly well suited to the evolving trend towards heterogeneous singlechip multiprocessors in DSP systems. Resynchronization exploits the wellknown observation [36] that in a given multiprocessor implementation, certain synchronization operations may be redundant in the sense that their associated sequencing requirements are ensured by other synchronizations in the system. The goal of resynchronization is to introduce new synchronizations in such a way that the number of additional synchronizations that become redundant exceeds the number of new synchronizations that are added, and thus the net synchronization cost is reduc...
Resynchronization for Multiprocessor DSP Systems  Part 2: Latencyconstrained . . .
, 1996
"... ..."
(Show Context)
SelfTimed Resynchronization: A PostOptimization for Static Multiprocessor Schedules
 Proceedings of the International Parallel Processing Symposium
, 1996
"... In a sharedmemory multiprocessor system, it is possible that certain synchronization operations are redundant  that is, their corresponding sequencing requirements are enforced completely by other synchronizations in the system  and can be eliminated without compromising correctness. This pap ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
In a sharedmemory multiprocessor system, it is possible that certain synchronization operations are redundant  that is, their corresponding sequencing requirements are enforced completely by other synchronizations in the system  and can be eliminated without compromising correctness. This paper addresses the problem of adding new synchronization operations in a multiprocessor implementation in such a way that the number of original synchronizations that consequently become redundant significantly exceeds the number of new synchronizations. We refer to this approach to reducing synchronization overhead as resynchronization. In this paper we formally define the resynchronization problem, we show that optimal resynchronization is NPhard, and we propose a family of heuristics for this problem. Finally we present a practical example where resynchronization is useful. 1. Motivation Resynchronization is based on the concept that there can be redundancy in the synchronizations of a m...
RESYNCHRONIZATION OF MULTIPROCESSOR SCHEDULES: PART 2  LATENCYCONSTRAINED RESYNCHRONIZATION
, 1996
"... The companion paper [7] introduced the concept of resynchronization, a postoptimization for static multiprocessor schedules in which extraneous synchronization operations are introduced in such a way that the number of original synchronizations that consequently become redundant significantly excee ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The companion paper [7] introduced the concept of resynchronization, a postoptimization for static multiprocessor schedules in which extraneous synchronization operations are introduced in such a way that the number of original synchronizations that consequently become redundant significantly exceeds the number of additional synchronizations. Redundant synchronizations are synchronization operations whose corresponding sequencing requirements are enforced completely by other synchronizations in the system. The amount of runtime overhead required for synchronization can be reduced significantly by eliminating redundant synchronizations [5, 32]. Thus, effective resynchronization reduces the net synchronization overhead in the implementation of a multiprocessor schedule, and improves the overall throughput. However, since additional serialization is imposed by the new synchronizations, resynchronization can produce significant increase in latency. The companion paper [7] develops fundamental properties of resynchronization and studies the problem of optimal resynchronization under the assumption that arbitrary increases in latency can be tolerated (â€śunboundedlatency resynchronizationâ€ť). Such an assumption is valid, for example, in a wide variety of simulation applications. This paper addresses the problem of computing an optimal resynchronization among all resynchronizations that do not increase the latency beyond a prespecified upper bound. Our study is based in the context of selftimed execution of iterative dataflow programs, which is an implementation model that has been applied extensively for digital signal processing systems.
Optimizing Synchronization in Multiprocessor Implementations of Iterative Dataflow Programs
, 1995
"... This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead tends to be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that a ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead tends to be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are essential for correct execution. Our study is based in the context of selftimed execution of iterative dataflow programs. An iterative dataflow program consists of a dataflow representation of the body of a loop that is to be iterated an indefinite number of times; dataflow programming in this form has been studied and applied extensively, particularly in the context of signal processing software. Selftimed execution refers to a combined compiletime/ runtime scheduling strategy in which processors synchronize with one another only based on interprocessor communication requirements, and thus, synchronization of processors at the end of each loop iteration does not generally occur. We ...
A Comparison of Clustering and Scheduling Techniques for Embedded Multiprocessor Systems
, 2003
"... In this paper we extensively explore and illustrate the effectiveness of the twophase decomposition of scheduling  into clustering and clusterscheduling or merging  and mapping task graphs onto embedded multiprocessor systems. We describe efficient and novel partitioning (clustering) and ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
In this paper we extensively explore and illustrate the effectiveness of the twophase decomposition of scheduling  into clustering and clusterscheduling or merging  and mapping task graphs onto embedded multiprocessor systems. We describe efficient and novel partitioning (clustering) and scheduling techniques that aggressively streamline interprocessor communication and can be tuned to exploit the significantly longer compilation time that is available to embedded system designers.
Resynchronization for Embedded Multiprocessors
 Laboratory, University of California, Berkeley
, 1995
"... This paper introduces a technique, called resynchronization, for reducing synchronization overhead in embedded multiprocessor implementations. The technique exploits the wellknown observation [39] that in a given multiprocessor implementation, certain synchronization operations may be redundant in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
This paper introduces a technique, called resynchronization, for reducing synchronization overhead in embedded multiprocessor implementations. The technique exploits the wellknown observation [39] that in a given multiprocessor implementation, certain synchronization operations may be redundant in the sense that their associated sequencing requirements are ensured by other synchronizations in the system. The goal of resynchronization is to introduce new synchronizations in such a way that the number of additional synchronizations that become redundant exceeds the number of new synchronizations that are added, and thus the net synchronization cost is reduced. First, we define the general form of our resynchronization problem; we show that it is NP hard by establishing a correspondence to the set covering problem; and based on this correspondence, we specify how an arbitrary heuristic for set covering can be applied to yield a heuristic for resynchronization. Next, we show that for a certain class of applications, optimal resynchronizations can be computed efficiently by means of pipelining. These pipelined solutions, however, can suffer from significantly increased latency, and this motivates the latencyconstrained resynchronization