Results 1  10
of
24
Rotation scheduling: A loop pipelining algorithm
 Dept. of Computer Science, Princeton University
, 1997
"... Abstract — We consider the resourceconstrained scheduling of loops with interiteration dependencies. A loop is modeled as a data flow graph (DFG), where edges are labeled with the number of iterations between dependencies. We design a novel and flexible technique, called rotation scheduling, for sc ..."
Abstract

Cited by 101 (51 self)
 Add to MetaCart
(Show Context)
Abstract — We consider the resourceconstrained scheduling of loops with interiteration dependencies. A loop is modeled as a data flow graph (DFG), where edges are labeled with the number of iterations between dependencies. We design a novel and flexible technique, called rotation scheduling, for scheduling cyclic DFG’s using loop pipelining. The rotation technique repeatedly transforms a schedule to a more compact schedule. We provide a theoretical basis for the operations based on retiming. We propose two heuristics to perform rotation scheduling and give experimental results showing that they have very good performance. Index Terms — Highlevel synthesis, loop pipelining, parallel compiler, retiming, scheduling.
Scheduling And Behavioral Transformations For Parallel Systems
, 1993
"... In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually the most timecritical parts of an application, the parallelism embedded in the repetitive pattern of an iterative algorithm needs to be explored. This thesis studies techniques and algorithms to expose the parallelism in an iterative algorithm so that the designer can find an implementation achieving a desired execution rate. In particular, the objective is to find an efficient schedule to be executed iteratively. A form of dataflow graphs is used to model the iterative part of an application, e.g. a digital signal filter or the while/for loop of a program. Nodes in the graph represent operations to be performed and edges represent both intraiteration and interiteration precedence relat...
Loop Pipelining for Scheduling Multidimensional Systems via Rotation
 In Proceedings of the 31st Design Automation Conference
, 1993
"... Multidimensional (MD) systems are widely used in scientific applications such as image processing, geophysical signal processing and fluid dynamics. Earlier scheduling methods in synthesizing MD systems do not explore loop pipelining across different dimensions. This paper explores the basic proper ..."
Abstract

Cited by 36 (27 self)
 Add to MetaCart
(Show Context)
Multidimensional (MD) systems are widely used in scientific applications such as image processing, geophysical signal processing and fluid dynamics. Earlier scheduling methods in synthesizing MD systems do not explore loop pipelining across different dimensions. This paper explores the basic properties of MD loop pipelining and presents an algorithm, called multidimensional rotation scheduling, to find an efficient schedule based on the multidimensional retiming technique we developed. The description and the correctness of our algorithm are presented in the paper. The experiments show that our algorithm can achieve optimal results efficiently. 1 Introduction Computation intensive applications usually depend on timecritical sections consisting of a loop of instructions. To optimize the execution rate of such applications, the designer needs to explore the parallelism embedded in repetitive patterns of a loop. However, the existence of resource constraints makes the problem of sche...
Achieving Full Parallelism using MultiDimensional Retiming
, 1996
"... Most scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in onedimensional proble ..."
Abstract

Cited by 29 (18 self)
 Add to MetaCart
Most scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in onedimensional problems, when loops are represented by data flow graphs (DFGs). In this paper, uniform nested loops are modeled as multidimensional data flow graphs (MDFGs). Full parallelism of the loop body, i.e., all nodes in the MDFG executed in parallel, substantially decreases the overall computation time. It is well known that, for onedimensional DFGs, retiming can not always achieve full parallelism. Other existing optimization techniques for nested loops also can not always achieve full parallelism. This paper shows an important and counterintuitive result, which proves that we can always obtain fullparallelism for MDFGs with more than one dimension. This result is obtained by transforming the MDFG in...
A transformationbased method for loop folding
 IEEE TRANSACTIONS ON COMPUTERAIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
, 1994
"... We propose a transformationbased scheduling algorithm for the problem given a loop construct, a target initiation interval and a set of resource constraints, schedule the loop in a pipelined fashion such that the iteration time of executing an iteration of the loop is minimized. The iteration tim ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
We propose a transformationbased scheduling algorithm for the problem given a loop construct, a target initiation interval and a set of resource constraints, schedule the loop in a pipelined fashion such that the iteration time of executing an iteration of the loop is minimized. The iteration time is an important quality measure of a data path design because it affects both storage and control costs. Our algorithm first performs an As Soon As Possible Pipelined (ASAPp) scheduling regardless the resource constraint. It then resolves resource constraint violations by rescheduling some operations. The software system implementing the proposed algorithm, called Theda.Fold, can deal with behavioral loop descriptions that contain chained, multicycle and/or structural pipelined operations as well as those having data dependencies across iteration boundaries. Experiment on a number of benchmarks is reported.
CALiBeR: A software pipelining algorithm for clustered embedded VLIW processors
 In ICCAD
, 2001
"... This paper proposes a software pipelining framework, CALiBeR (Cluster Aware Load Balancing Retiming Algorithm), suitable for compilers targeting clustered embedded VLIW processors. CALiBeR can be used by embedded system designers to explore different code optimization alternatives, that is, highqua ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
This paper proposes a software pipelining framework, CALiBeR (Cluster Aware Load Balancing Retiming Algorithm), suitable for compilers targeting clustered embedded VLIW processors. CALiBeR can be used by embedded system designers to explore different code optimization alternatives, that is, highquality customized retiming solutions for desired throughput and program memory size requirements, while minimizing register pressure. An extensive set of experimental results is presented, demonstrating that our algorithm compares favorably with one of the best stateoftheart algorithms, achieving up to 50 % improvement in performance and up to 47 % improvement in register requirements. In order to empirically assess the effectiveness of clustering for high ILP applications, additional experiments are presented contrasting the performance achieved by software pipelined kernels executing on clustered and on centralized machines.
Hardwaresoftware partitioning and pipelined scheduling of transformative applications
 IEEE Transactions on Very Large Scale Integration (VLSI) Systems
, 2002
"... Abstract—Transformative applications are computation intensive applications characterized by iterative dataflow behavior. Typical examples are image processing applications like JPEG, MPEG, etc. The performance of embedded hardware–software systems that implement transformative applications can be m ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Transformative applications are computation intensive applications characterized by iterative dataflow behavior. Typical examples are image processing applications like JPEG, MPEG, etc. The performance of embedded hardware–software systems that implement transformative applications can be maximized by obtaining a pipelined design. We present a tool for hardware–software partitioning and pipelined scheduling of transformative applications. The tool uses iterative partitioning and pipelined scheduling to obtain optimal partitions that satisfy the timing and area constraints. The partitioner uses a branch and bound approach with a unique objective function that minimizes the initiation interval of the final design. We present techniques for generation of good initial solution and searchspace limitation for the branch and bound algorithm. A candidate partition is evaluated by generating its pipelined schedule. The scheduler uses a novel retiming heuristic that optimizes the initiation interval, number of pipeline stages, and memory requirements of the particular design alternative. We evaluate the performance of the retiming heuristic by comparing it with an existing technique. The effectiveness of the entire tool is demonstrated by a case study of the JPEG image compression algorithm. We also evaluate the run time and design quality of the tool by experimentation with synthetic graphs. Index Terms—Image processing, partitioning, performance tradeoffs, pipelining, scheduling, systemlevel design. I.
Scheduling of Uniform MultiDimensional Systems under Resource Constraints
 IEEE Transactions on VLSI Systems
"... Multidimensional (MD) systems are widely used to model scientific applications such as image processing, geophysical signal processing and fluid dynamics. Such systems, usually, contain repetitive groups of operations represented by nested loops. The optimization of such loops, considering processi ..."
Abstract

Cited by 15 (12 self)
 Add to MetaCart
(Show Context)
Multidimensional (MD) systems are widely used to model scientific applications such as image processing, geophysical signal processing and fluid dynamics. Such systems, usually, contain repetitive groups of operations represented by nested loops. The optimization of such loops, considering processing resource constraints, is required in order to improve their computational time. Most of the existing static scheduling mechanisms, used in the high level synthesis of VLSI architectures, do not consider the parallelism inherent to the multidimensional characteristics of the problem. This paper explores the basic properties of MD loop pipelining and presents two novel techniques, called multidimensional rotation scheduling and pushup scheduling, able to achieve the shortest possible schedule length. These new techniques transform a multidimensional data flow graph representing the problem, while assigning the loop operations to a schedule table. The multidimensional rotation scheduling...
AutomataBased Symbolic Scheduling
, 2000
"... This dissertation presents a set of techniques for representing the highlevel behavior of a digital subsystem as a collection of nondeterministic finite automata, NFA. Desired behavioral and implementation dynamics: dependencies, repetition, bounded resources, sequential character, and control stat ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
This dissertation presents a set of techniques for representing the highlevel behavior of a digital subsystem as a collection of nondeterministic finite automata, NFA. Desired behavioral and implementation dynamics: dependencies, repetition, bounded resources, sequential character, and control state, can also be similarly modeled. All possible system execution sequences, obeying imposed constraints, are encapsulated in a composed NFA. Technology similar to that used in symbolic model checking enables implicit exploration and extraction of bestpossible execution sequences. This provides a very general, systematic procedure to perform exact highlevel synthesis of cyclic, controldominated behaviors constrained by arbitrary sequential constraints. This dissertation further demonstrates that these techniques are scalable to practical problem sizes and complexities. Exact scheduling solutions are constructed for a variety of academic and industrial problems, including a pipelined RISC processor. The ability to represent and schedule sequential models with hundreds of tasks and onehalf million control cases substantially raises the bar as to what is believed possible for exact scheduling models. Keywords: Scheduling; Binary Decision Diagrams; HighLevel Synthesis; Nondeterminism; Automata; Symbolic Model.
Behavioral optimization using the manipulation of timing constraints
, 1995
"... Abstract — We introduce a transformation, named rephasing, that manipulates the timing parameters in controldataflow graphs (CDFG’s) during the highlevel synthesis of datapathintensive applications. Timing parameters in such CDFG’s include the sample period, the latencies between input–output pa ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Abstract — We introduce a transformation, named rephasing, that manipulates the timing parameters in controldataflow graphs (CDFG’s) during the highlevel synthesis of datapathintensive applications. Timing parameters in such CDFG’s include the sample period, the latencies between input–output pairs, the relative times at which corresponding samples become available on different inputs, and the relative times at which the corresponding samples become available at the delay nodes. While some of the timing parameters may be constrained by performance requirements, or by the interface to the external world, others remain free to be chosen during the process of highlevel synthesis. Traditionally highlevel synthesis systems for datapathintensive applications either have assumed that all the relative times, called phases, when corresponding samples are available at input and delay nodes are zero (i.e., all input and delay node samples enter at the initial cycle of the schedule) or have automatically assigned values to these phases as part of the datapath allocation/scheduling step in the case of newer schedulers that use techniques like overlapped scheduling to generate complex time shapes. Rephasing, however, manipulates the values of these phases as an algorithm transformation before the scheduling/allocation stage. The advantage of this approach is that phase values can be chosen to transform and optimize the algorithm for explicit metrics such as area, throughput, latency, and power. Moreover, the rephasing transformation can be combined with other transformations such as algebraic transformations. We have developed techniques for using rephasing to optimize a variety of design metrics, and our results show significant improvements in several design metrics. We have also investigated the relationship and interaction of rephasing with other highlevel synthesis tasks. Index Terms—Behavioral synthesis, transformations. I.