Results 1 - 10
of
17
Rotation Scheduling: A Loop Pipelining Algorithm
- In Proceedings of the 30th Design Automation Conference
, 1993
"... We consider the resource-constrained scheduling of loops with inter-iteration dependencies. A loop is modeled as a data flow graph (DFG), where edges are labeled with the number of iterations between dependencies. We design a novel and flexible technique, called rotation scheduling, for scheduling c ..."
Abstract
-
Cited by 86 (42 self)
- Add to MetaCart
We consider the resource-constrained scheduling of loops with inter-iteration dependencies. A loop is modeled as a data flow graph (DFG), where edges are labeled with the number of iterations between dependencies. We design a novel and flexible technique, called rotation scheduling, for scheduling cyclic DFGs using loop pipelining. The rotation technique repeatedly transforms a schedule to a more compact schedule. We provide a theoretical basis for the operations based on retiming. We propose two heuristics to perform rotation scheduling, and give experimental results showing that they have very good performance. 1 Introduction For real-time or high-performance computing, a synthesis system needs to have the ability to optimize the execution rate of a design. Since loops are usually the most time-critical parts of an application, the parallelism embedded in the repetitive pattern of a loop needs to be explored. This paper proposes a generic technique for the scheduling of loops when re...
Loop Pipelining for Scheduling Multi-dimensional Systems via Rotation
- In Proceedings of the 31st Design Automation Conference
, 1993
"... Multi-dimensional (MD) systems are widely used in scientific applications such as image processing, geophysical signal processing and fluid dynamics. Earlier scheduling methods in synthesizing MD systems do not explore loop pipelining across different dimensions. This paper explores the basic proper ..."
Abstract
-
Cited by 32 (24 self)
- Add to MetaCart
Multi-dimensional (MD) systems are widely used in scientific applications such as image processing, geophysical signal processing and fluid dynamics. Earlier scheduling methods in synthesizing MD systems do not explore loop pipelining across different dimensions. This paper explores the basic properties of MD loop pipelining and presents an algorithm, called multi-dimensional rotation scheduling, to find an efficient schedule based on the multidimensional retiming technique we developed. The description and the correctness of our algorithm are presented in the paper. The experiments show that our algorithm can achieve optimal results efficiently. 1 Introduction Computation intensive applications usually depend on time-critical sections consisting of a loop of instructions. To optimize the execution rate of such applications, the designer needs to explore the parallelism embedded in repetitive patterns of a loop. However, the existence of resource constraints makes the problem of sche...
Scheduling And Behavioral Transformations For Parallel Systems
, 1993
"... In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually the most time-critical parts of an application, the parallelism embedded in the repetitive pattern of an iterative algorithm needs to be explored. This thesis studies techniques and algorithms to expose the parallelism in an iterative algorithm so that the designer can find an implementation achieving a desired execution rate. In particular, the objective is to find an efficient schedule to be executed iteratively. A form of data-flow graphs is used to model the iterative part of an application, e.g. a digital signal filter or the while/for loop of a program. Nodes in the graph represent operations to be performed and edges represent both intra-iteration and inter-iteration precedence relat...
Achieving Full Parallelism using Multi-Dimensional Retiming
, 1996
"... Most scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in one-dimensional proble ..."
Abstract
-
Cited by 22 (14 self)
- Add to MetaCart
Most scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in one-dimensional problems, when loops are represented by data flow graphs (DFGs). In this paper, uniform nested loops are modeled as multidimensional data flow graphs (MDFGs). Full parallelism of the loop body, i.e., all nodes in the MDFG executed in parallel, substantially decreases the overall computation time. It is well known that, for one-dimensional DFGs, retiming can not always achieve full parallelism. Other existing optimization techniques for nested loops also can not always achieve full parallelism. This paper shows an important and counter-intuitive result, which proves that we can always obtain full-parallelism for MDFGs with more than one dimension. This result is obtained by transforming the MDFG in...
A transformation-based method for loop folding
- IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
, 1994
"... We propose a transformation-based scheduling algorithm for the problem- given a loop construct, a target initiation interval and a set of resource constraints, schedule the loop in a pipelined fashion such that the iteration time of executing an iteration of the loop is minimized. The iteration tim ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
We propose a transformation-based scheduling algorithm for the problem- given a loop construct, a target initiation interval and a set of resource constraints, schedule the loop in a pipelined fashion such that the iteration time of executing an iteration of the loop is minimized. The iteration time is an important quality measure of a data path design because it affects both storage and control costs. Our algorithm first performs an As Soon As Possible Pipelined (ASAPp) scheduling regardless the resource constraint. It then resolves resource constraint violations by rescheduling some operations. The software system imple-menting the proposed algorithm, called Theda.Fold, can deal with behavioral loop descriptions that contain chained, multicycle and/or structural pipelined operations as well as those having data dependencies across iteration boundaries. Experiment on a number of benchmarks is reported.
CALiBeR: A software pipelining algorithm for clustered embedded VLIW processors
- In ICCAD
, 2001
"... This paper proposes a software pipelining framework, CALiBeR (Cluster Aware Load Balancing Retiming Algorithm), suitable for compilers targeting clustered embedded VLIW processors. CALiBeR can be used by embedded system designers to explore different code optimization alternatives, that is, high-qua ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
This paper proposes a software pipelining framework, CALiBeR (Cluster Aware Load Balancing Retiming Algorithm), suitable for compilers targeting clustered embedded VLIW processors. CALiBeR can be used by embedded system designers to explore different code optimization alternatives, that is, high-quality customized retiming solutions for desired throughput and program memory size requirements, while minimizing register pressure. An extensive set of experimental results is presented, demonstrating that our algorithm compares favorably with one of the best state-of-the-art algorithms, achieving up to 50 % improvement in performance and up to 47 % improvement in register requirements. In order to empirically assess the effectiveness of clustering for high ILP applications, additional experiments are presented contrasting the performance achieved by software pipelined kernels executing on clustered and on centralized machines.
Scheduling of Uniform Multi-Dimensional Systems under Resource Constraints
- IEEE Transactions on VLSI Systems
"... Multi-dimensional (MD) systems are widely used to model scientific applications such as image processing, geophysical signal processing and fluid dynamics. Such systems, usually, contain repetitive groups of operations represented by nested loops. The optimization of such loops, considering processi ..."
Abstract
-
Cited by 12 (10 self)
- Add to MetaCart
Multi-dimensional (MD) systems are widely used to model scientific applications such as image processing, geophysical signal processing and fluid dynamics. Such systems, usually, contain repetitive groups of operations represented by nested loops. The optimization of such loops, considering processing resource constraints, is required in order to improve their computational time. Most of the existing static scheduling mechanisms, used in the high level synthesis of VLSI architectures, do not consider the parallelism inherent to the multi-dimensional characteristics of the problem. This paper explores the basic properties of MD loop pipelining and presents two novel techniques, called multi-dimensional rotation scheduling and push-up scheduling, able to achieve the shortest possible schedule length. These new techniques transform a multidimensional data flow graph representing the problem, while assigning the loop operations to a schedule table. The multi-dimensional rotation scheduling...
Hardware-software partitioning and pipelined scheduling of transformative applications
- IEEE Transactions on Very Large Scale Integration (VLSI) Systems
, 2002
"... Abstract—Transformative applications are computation intensive applications characterized by iterative dataflow behavior. Typical examples are image processing applications like JPEG, MPEG, etc. The performance of embedded hardware–software systems that implement transformative applications can be m ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Abstract—Transformative applications are computation intensive applications characterized by iterative dataflow behavior. Typical examples are image processing applications like JPEG, MPEG, etc. The performance of embedded hardware–software systems that implement transformative applications can be maximized by obtaining a pipelined design. We present a tool for hardware–software partitioning and pipelined scheduling of transformative applications. The tool uses iterative partitioning and pipelined scheduling to obtain optimal partitions that satisfy the timing and area constraints. The partitioner uses a branch and bound approach with a unique objective function that minimizes the initiation interval of the final design. We present techniques for generation of good initial solution and search-space limitation for the branch and bound algorithm. A candidate partition is evaluated by generating its pipelined schedule. The scheduler uses a novel retiming heuristic that optimizes the initiation interval, number of pipeline stages, and memory requirements of the particular design alternative. We evaluate the performance of the retiming heuristic by comparing it with an existing technique. The effectiveness of the entire tool is demonstrated by a case study of the JPEG image compression algorithm. We also evaluate the run time and design quality of the tool by experimentation with synthetic graphs. Index Terms—Image processing, partitioning, performance tradeoffs, pipelining, scheduling, system-level design. I.
Automata-Based Symbolic Scheduling
, 2000
"... This dissertation presents a set of techniques for representing the high-level behavior of a digital subsystem as a collection of nondeterministic finite automata, NFA. Desired behavioral and implementation dynamics: dependencies, repetition, bounded resources, sequential character, and control stat ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This dissertation presents a set of techniques for representing the high-level behavior of a digital subsystem as a collection of nondeterministic finite automata, NFA. Desired behavioral and implementation dynamics: dependencies, repetition, bounded resources, sequential character, and control state, can also be similarly modeled. All possible system execution sequences, obeying imposed constraints, are encapsulated in a composed NFA. Technology similar to that used in symbolic model checking enables implicit exploration and extraction of best-possible execution sequences. This provides a very general, systematic procedure to perform exact high-level synthesis of cyclic, control-dominated behaviors constrained by arbitrary sequential constraints. This dissertation further demonstrates that these techniques are scalable to practical problem sizes and complexities. Exact scheduling solutions are constructed for a variety of academic and industrial problems, including a pipelined RISC processor. The ability to represent and schedule sequential models with hundreds of tasks and one-half million control cases substantially raises the bar as to what is believed possible for exact scheduling models. Keywords: Scheduling; Binary Decision Diagrams; High-Level Synthesis; Nondeterminism; Automata; Symbolic Model.
Behavioral optimization using the manipulation of timing constraints
, 1995
"... Abstract — We introduce a transformation, named rephasing, that manipulates the timing parameters in control-data-flow graphs (CDFG’s) during the high-level synthesis of data-pathintensive applications. Timing parameters in such CDFG’s include the sample period, the latencies between input–output pa ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract — We introduce a transformation, named rephasing, that manipulates the timing parameters in control-data-flow graphs (CDFG’s) during the high-level synthesis of data-pathintensive applications. Timing parameters in such CDFG’s include the sample period, the latencies between input–output pairs, the relative times at which corresponding samples become available on different inputs, and the relative times at which the corresponding samples become available at the delay nodes. While some of the timing parameters may be constrained by performance requirements, or by the interface to the external world, others remain free to be chosen during the process of high-level synthesis. Traditionally high-level synthesis systems for data-pathintensive applications either have assumed that all the relative times, called phases, when corresponding samples are available at input and delay nodes are zero (i.e., all input and delay node samples enter at the initial cycle of the schedule) or have automatically assigned values to these phases as part of the data-path allocation/scheduling step in the case of newer schedulers that use techniques like overlapped scheduling to generate complex time shapes. Rephasing, however, manipulates the values of these phases as an algorithm transformation before the scheduling/allocation stage. The advantage of this approach is that phase values can be chosen to transform and optimize the algorithm for explicit metrics such as area, throughput, latency, and power. Moreover, the rephasing transformation can be combined with other transformations such as algebraic transformations. We have developed techniques for using rephasing to optimize a variety of design metrics, and our results show significant improvements in several design metrics. We have also investigated the relationship and interaction of rephasing with other high-level synthesis tasks. Index Terms—Behavioral synthesis, transformations. I.

