Results 1  10
of
67
Rotation scheduling: A loop pipelining algorithm
 Dept. of Computer Science, Princeton University
, 1997
"... Abstract — We consider the resourceconstrained scheduling of loops with interiteration dependencies. A loop is modeled as a data flow graph (DFG), where edges are labeled with the number of iterations between dependencies. We design a novel and flexible technique, called rotation scheduling, for sc ..."
Abstract

Cited by 112 (53 self)
 Add to MetaCart
(Show Context)
Abstract — We consider the resourceconstrained scheduling of loops with interiteration dependencies. A loop is modeled as a data flow graph (DFG), where edges are labeled with the number of iterations between dependencies. We design a novel and flexible technique, called rotation scheduling, for scheduling cyclic DFG’s using loop pipelining. The rotation technique repeatedly transforms a schedule to a more compact schedule. We provide a theoretical basis for the operations based on retiming. We propose two heuristics to perform rotation scheduling and give experimental results showing that they have very good performance. Index Terms — Highlevel synthesis, loop pipelining, parallel compiler, retiming, scheduling.
Scheduling dataflow graphs via retiming and unfolding
 IEEE Trans. on Parallel and Distributed Systems
, 1997
"... Abstract—Loop scheduling is an important problem in parallel processing. The retiming technique reorganizes an iteration; the unfolding technique schedules several iterations together. We combine these two techniques to obtain a static schedule with a reduced average computation time per iteration. ..."
Abstract

Cited by 63 (26 self)
 Add to MetaCart
Abstract—Loop scheduling is an important problem in parallel processing. The retiming technique reorganizes an iteration; the unfolding technique schedules several iterations together. We combine these two techniques to obtain a static schedule with a reduced average computation time per iteration. We first prove that the order of retiming and unfolding is immaterial for scheduling a dataflow graph (DFG). From this nice property, we present a polynomialtime algorithm on the original DFG, before unfolding, to find the minimumrate static schedule for a given unfolding factor. For the case of a unittime DFG, efficient checking and retiming algorithms are presented.
Optimizing synchronization in multiprocessor DSP systems
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 1997
"... This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are es ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
(Show Context)
This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs, in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are essential for correct execution. Our study is based in the context of selftimed execution of iterative dataflow programs. An iterative dataflow program consists of a dataflow representation of the body of a loop that is to be iterated an indefinite number of times; dataflow programming in this form has been studied and applied extensively, particularly in the context of signal processing software. Selftimed execution refers to a combined compiletime/runtime scheduling strategy in which processors synchronize with one another only based on interprocessor communication requirements, and thus, synchronization of processors at the end of each loop iteration does not generally occur. We introduce a new graphtheoretic framework, based on a data structure called the synchronization graph, for analyzing and optimizing synchronization overhead in selftimed, iterative dataflow programs. We show that the comprehensive techniques that have been developed for removing redundant synchronizations in noniterative programs can be extended in this framework to optimally remove redundant
Code size reduction technique and implementation for softwarepipelined DSP applications
 ACM Transactions on Embedded Computing Systems
, 2003
"... Software pipelining technique is extensively used to explore the instruction level parallelism in loops. However, this performance optimization technique results in code size expansion. For embedded systems with very limited onchip memory resources, the code size becomes one of the most important o ..."
Abstract

Cited by 14 (8 self)
 Add to MetaCart
(Show Context)
Software pipelining technique is extensively used to explore the instruction level parallelism in loops. However, this performance optimization technique results in code size expansion. For embedded systems with very limited onchip memory resources, the code size becomes one of the most important optimization concerns. This paper presents the fundamental understanding of the relationship between code size expansion and software pipelining based on retiming. We propose a general Codesize REDuction technique (CRED) for softwarepipelined loops on various kinds of processors. Our CRED algorithms integrate the code size reduction procedure with software pipelining to produce minimal code size for a target schedule length. The experiments on a set of wellknown benchmarks show the effectiveness of CRED technique on both reducing the code size of softwarepipelined loops and exploring the code size/performance tradeoff space.
Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems
 ACM Transactions on Design Automation of Electronic Systems (TODAES
, 2009
"... In highlevel synthesis for realtime embedded systems using heterogeneous functional units (FUs), it is critical to select the best FU type for each task. However, some tasks may not have fixed execution times. This article models each varied execution time as a probabilistic random variable and so ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
In highlevel synthesis for realtime embedded systems using heterogeneous functional units (FUs), it is critical to select the best FU type for each task. However, some tasks may not have fixed execution times. This article models each varied execution time as a probabilistic random variable and solves heterogeneous assignment with probability (HAP) problem. The solution of the HAP problem assigns a proper FU type to each task such that the total cost is minimized while the timing constraint is satisfied with a guaranteed confidence probability. The solutions to the HAP problem are useful for both hard realtime and soft realtime systems. Optimal algorithms are proposed to find the optimal solutions for the HAP problem when the input is a tree or a simple path. Two other algorithms, one is optimal and the other is nearoptimal heuristic, are proposed to solve the general problem. The experiments show that our algorithms can effectively reduce the total cost while satisfying timing constraints with guaranteed confidence probabilities. For example, our algorithms achieve an average reduction of 33.0 % on total cost with 0.90 confidence probability satisfying timing constraints compared with the previous work using worstcase scenario.
Probabilistic loop scheduling for applications with uncertain execution time
 IEEE Trans. Computers
, 2000
"... ABSTRACT One of the difficulties in highlevel synthesis and compiler optimization is obtaining a good schedule without knowing the exact computation time of the tasks involved. The uncertain computation times of these tasks normally occur when conditional instructions are employed and/or inputs of ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
ABSTRACT One of the difficulties in highlevel synthesis and compiler optimization is obtaining a good schedule without knowing the exact computation time of the tasks involved. The uncertain computation times of these tasks normally occur when conditional instructions are employed and/or inputs of the tasks influence the computation time. The relationship between these tasks can be represented as a dataflow graph where each node models the task associated with a probabilistic computation time. A set of edges represents the dependencies between tasks. In this research, we study scheduling and optimization algorithms taking into account the probabilistic execution times. Two novel algorithms, called probabilistic retiming and probabilistic rotation scheduling, are developed for solving the underlying nonresource and resource constrained scheduling problems respectively. Experimental results show that probabilistic retiming consistently produces a graph with a smaller longest path computation time for a given confidence level, as compared with the traditional retiming algorithm that assumes a fixed worstcase and averagecase computation times. Furthermore when considering the resource constraints and probabilistic environments, probabilistic rotation scheduling gives a schedule whose length is guaranteed to satisfy a given probability requirement. This schedule is better than schedules produced by other algorithms that consider worstcase and averagecase scenarios.
Efficient assignment and scheduling for heterogeneous dsp systems
 IEEE Trans. on Parallel and Distributed Systems
, 2005
"... This paper addresses high level synthesis for realtime digital signal processing (DSP) architectures using heterogeneous functional units (FUs). For such special purpose architecture synthesis, an important problem is how to assign a proper FU type to each operation of a DSP application and genera ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
This paper addresses high level synthesis for realtime digital signal processing (DSP) architectures using heterogeneous functional units (FUs). For such special purpose architecture synthesis, an important problem is how to assign a proper FU type to each operation of a DSP application and generate a schedule in such a way that all requirements can be met and the total cost can be minimized. We propose a twophase approach to solve this problem. In the first phase, we solve heterogeneous assignment problem, i.e., given the types of heterogeneous FUs, a DataFlow Graph (DFG) in which each node has different execution times and costs (may relate to power, reliability, etc.) for different FU types, and a timing constraint, how to assign a proper FU type to each node such that the total cost can be minimized while the timing constraint is satisfied. In the second phase, based on the assignments obtained in the first phase, we propose a minimum resource scheduling algorithm to generate a schedule and a feasible configuration that uses as little resource as possible. We prove heterogeneous assignment problem is NPcomplete. Efficient algorithms are proposed to find an optimal solution when the given DFG is a simple path or a tree. Three other algorithms are proposed to solve the general problem. The experiments show that our algorithms can effectively reduce the total cost compared with the previous work.
Optimal Scheduling of DataFlow Graphs Using Extended Retiming
 In Proceedings of the ISCA 12th International Conference on Parallel and Distributed Computing Systems
, 1999
"... Many iterative or recursive applications commonly found in DSP and image processing applications can be represented by dataflow graphs (DFGs). A great deal of research has been done attempting to optimize such applications by applying various graph transformation techniques to the DFG in order to m ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
Many iterative or recursive applications commonly found in DSP and image processing applications can be represented by dataflow graphs (DFGs). A great deal of research has been done attempting to optimize such applications by applying various graph transformation techniques to the DFG in order to minimize the schedule length. One of the most effective of these techniques is retiming. In this paper, we demonstrate that the traditional retiming technique does not always achieve optimal schedules (although it can be used in combination with other techniques to do so) and propose a new graphtransformation technique, extended retiming, which will. Index terms: Scheduling, Dataflow Graphs, Retiming, Graph Transformation, Timing Optimization 1 Introduction Many iterative or recursive applications, such as image processing, DSP and PDE simulations, can be represented by dataflow graphs, or DFGs [4]. The nodes of a DFG represent tasks, while edges between nodes represent data dependenci...
Retiming synchronous dataflow graphs to reduce execution time
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 2001
"... Many common iterative or recursive DSP applications can be represented by synchronous dataflow graphs (SDFGs). A great deal of research has been done attempting to optimize such applications through retiming. However, despite its proven effectiveness in transforming singlerate dataflow graphs to ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Many common iterative or recursive DSP applications can be represented by synchronous dataflow graphs (SDFGs). A great deal of research has been done attempting to optimize such applications through retiming. However, despite its proven effectiveness in transforming singlerate dataflow graphs to equivalent DFGs with smaller clock periods, the use of retiming for attempting to reduce the execution time of synchronous DFGs has never been explored. In this paper, we do just this. We develop the basic definitions and results necessary for expressing and studying SDFGs. We review the problems faced when attempting to retime a SDFG in order to minimize clock period, then present algorithms for doing this. Finally, we demonstrate the effectiveness of our methods on several examples.
Resynchronization for Multiprocessor DSP Implementation  Part 1: Maximum Throughput Resynchronization
, 1996
"... This paper introduces a technique, called resynchronization, for reducing synchronization overhead in multiprocessor implementations of digital signal processing (DSP) systems. The technique applies to arbitrary collections of dedicated, programmable or configurable processors, such as combinations ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
This paper introduces a technique, called resynchronization, for reducing synchronization overhead in multiprocessor implementations of digital signal processing (DSP) systems. The technique applies to arbitrary collections of dedicated, programmable or configurable processors, such as combinations of programmable DSPs, ASICS, and FPGA subsystems. Thus, it is particularly well suited to the evolving trend towards heterogeneous singlechip multiprocessors in DSP systems. Resynchronization exploits the wellknown observation [36] that in a given multiprocessor implementation, certain synchronization operations may be redundant in the sense that their associated sequencing requirements are ensured by other synchronizations in the system. The goal of resynchronization is to introduce new synchronizations in such a way that the number of additional synchronizations that become redundant exceeds the number of new synchronizations that are added, and thus the net synchronization cost is reduc...