Results 1  10
of
51
Iterative Modulo Scheduling
, 1995
"... Modulo scheduling is a framework within which algorithms for the software pipelining of innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and heuristics can be defined within this fr ..."
Abstract

Cited by 84 (6 self)
 Add to MetaCart
Modulo scheduling is a framework within which algorithms for the software pipelining of innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and heuristics can be defined within this framework. Little work has been done to evaluate and compare alternative algorithms and heuristics for modulo scheduling from the viewpoints of schedule quality as well as computational complexity. This, along with a vague and unfounded perception that modulo scheduling is computationally expensive as well as difficult to implement, have inhibited its incorporation into product compilers. This report presents iterative modulo scheduling, a practical algorithm that is capable of dealing with realistic machine models. The report also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Minimizing Register Requirements under ResourceConstrained RateOptimal Software Pipelining
, 1995
"... The rapid advances in highperformance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs ..."
Abstract

Cited by 75 (12 self)
 Add to MetaCart
The rapid advances in highperformance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (`a la rateoptimal) while minimizing the number of buffers  a close approximation to minimizing the number of registers. The main contributions of this paper are: ffl First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rateoptimal schedules) under different sets of constraints. ffl Secondly, we show that a precise mathematical formulation...
Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule
 IN PROC. OF THE 28TH ANNUAL INT. SYMP. ON MICROARCHITECTURE (MICRO28
, 1995
"... Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a set of low computational complexity stagescheduling heuristics that reduce the register requirements o ..."
Abstract

Cited by 58 (5 self)
 Add to MetaCart
Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a set of low computational complexity stagescheduling heuristics that reduce the register requirements of a given modulo schedule by shifting operations by multiples of II cycles. Measurements on a benchmark suite of 1289 loops from the Perfect Club, SPEC89, and the Livermore Fortran Kernels shows that our best heuristic achieves on average 99% of the decrease in register requirements obtained by an optimal stage scheduler.
A Practical Data Flow Framework for Array Reference Analysis and its Use in Optimizations
 In ACM SIGPLAN'93 Conf. on Prog. Lang. Design and Implementation
, 1993
"... Data flow analysis techniques have traditionally been restricted to the analysis of scalar variables. This restriction, however, imposes a limitation on the kinds of optimizations that can be performed in loops containing array references. We present a data flow framework for array reference analysi ..."
Abstract

Cited by 57 (2 self)
 Add to MetaCart
Data flow analysis techniques have traditionally been restricted to the analysis of scalar variables. This restriction, however, imposes a limitation on the kinds of optimizations that can be performed in loops containing array references. We present a data flow framework for array reference analysis that provides the information needed in various optimizations targeted at sequential or finegrained parallel architectures. The framework extends the traditional scalar framework by incorporating iteration distance values into the analysis to qualify the computed data flow solution during the fixed point iteration. Analyses phrased in this framework are capable of discovering recurrent access patterns among array references that evolve during the execution of a loop. The framework is practical in that the fixed point solution requires at most three passes over the body of structured loops. Applications of our framework are discussed for register allocation, load/store optimizations, and controlled loop unrolling.
A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs
 In International Workshop on Compiler Construction, Paderdorn
, 1993
"... In this paper, we propose the use of cyclic interval graphs as an alternative representation for register allocation. The "thickness" of the cyclic interval graph captures the notion of overlap between live ranges of variables relative to each particular point of time in the program execution. We de ..."
Abstract

Cited by 56 (12 self)
 Add to MetaCart
In this paper, we propose the use of cyclic interval graphs as an alternative representation for register allocation. The "thickness" of the cyclic interval graph captures the notion of overlap between live ranges of variables relative to each particular point of time in the program execution. We demonstrate that cyclic interval graphs provide a feasible and effective representation that accurately captures the periodic nature of live ranges found in loops. A new heuristic algorithm for minimum register allocation, the fat cover algorithm, has been developed and implemented to exploit such program structure. In addition, a new spilling algorithm is proposed that makes use of the extra information available in the interval graph representation. These two algorithms work together to provide a twophase register allocation process that does not require iteration of the spilling or coloring phases. We extend the notion of cyclic interval graphs to hierarchical cyclic interval graphs and we...
Hypernode Reduction Modulo Scheduling
 IN PROC. OF THE 28TH ANNUAL INT. SYMP. ON MICROARCHITECTURE (MICRO28
, 1995
"... Software Pipelining is a loop scheduling technique that extracts parallelism from loops by overlapping the execution of several consecutive iterations. Most prior scheduling research has focused on achieving minimum execution time, without regarding register requirements. Most strategies tend to str ..."
Abstract

Cited by 52 (21 self)
 Add to MetaCart
Software Pipelining is a loop scheduling technique that extracts parallelism from loops by overlapping the execution of several consecutive iterations. Most prior scheduling research has focused on achieving minimum execution time, without regarding register requirements. Most strategies tend to stretch operand lifetimes because they schedule some operations too early or too late. The paper presents a novel strategy that simultaneously schedules some operations late and other operations early, minimizing all the stretchable dependencies and therefore reducing the registers required by the loop. The key of this strategy is a preordering phase that selects the order in which the operations will be scheduled. The results show that the method described in this paper performs better than other heuristic methods and almost as well as a linear programming method but requiring much less time to produce the schedules.
Exploring tradeoffs in buffer requirements and throughput constraints for synchronous dataflow graphs
 DESIGN AUTOMATION CONFERRENCE, PROC. ACM
, 2006
"... Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Synchronous Dataflow Graphs. Communication between nodes in ..."
Abstract

Cited by 36 (8 self)
 Add to MetaCart
Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Synchronous Dataflow Graphs. Communication between nodes in these graphs requires storage space which influences throughput. We present exact techniques to chart the Pareto space of throughput and storage tradeoffs, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint. The feasibility of the approach is demonstrated with a number of examples.
The Meeting Graph: A New Model for Loop Cyclic Register Allocation
 In Proc. of the Fifth Workshop on Compilers for Parallel Computers (CPC95
, 1995
"... Register allocation is a compiler phase in which the gains can be essential in achieving performance on new architectures exploiting instruction level parallelism. We focus our attention on loops and improve the existing methods by introducing a new kind of graph. We model loop unrolling and registe ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
Register allocation is a compiler phase in which the gains can be essential in achieving performance on new architectures exploiting instruction level parallelism. We focus our attention on loops and improve the existing methods by introducing a new kind of graph. We model loop unrolling and register allocation together in a common framework, called the meeting graph. We expect our results to significantly improve loop register allocation while keeping the amount of code replication low. As a byproduct, we present an optimal algorithm for allocating loop variables to a rotating register file, as well as a new heuristic for loop variables spilling. 1 Introduction The efficiency of register allocation is a crucial problem in modern microprocessors, where the increasing gap between the internal clock cycle and memory latency exacerbates the need to keep the variables in registers and to avoid spill code. In this paper, we address the important problem of loop register allocation and spi...
Minimizing Memory Requirements in RateOptimal Schedules
, 1994
"... In this paper we address the problem of minimizing buffer storage requirement in constructing rateoptimal compiletime schedules for multirate dataflow graphs. We demonstrate that this problem, called the Minimum Buffer RateOptimal (MBRO) scheduling problem, can be formulated as a unified linear ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
In this paper we address the problem of minimizing buffer storage requirement in constructing rateoptimal compiletime schedules for multirate dataflow graphs. We demonstrate that this problem, called the Minimum Buffer RateOptimal (MBRO) scheduling problem, can be formulated as a unified linear programming problem. A novel feature of our method is that it tries to minimize the memory requirement while simultaneously maximizing the computation rate. We have constructed an experimental testbed which implements our scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules proposed by Lee and Messerschmitt [12], (ii) the optimal scheduling buffer allocation (OSBA) algorithm of Ning and Gao [15], and (iii) the multirate software pipelining (MRSP) algorithm [7]. The experimental results have demonstrated a significant improvement in buffer requirements for the MBRO schedules compared to the schedules generated by the other three methods. Compared to bloc...
Optimum modulo schedules for minimum register requirements
 In Proc., Internat. Conf. On Supercomputing
, 1995
"... Modulo scheduling is an e cient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a combined approach that schedules the loop operations for the highest steady state throughput and minimum ..."
Abstract

Cited by 29 (6 self)
 Add to MetaCart
Modulo scheduling is an e cient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a combined approach that schedules the loop operations for the highest steady state throughput and minimum register requirements. Our method determines optimal register requirements for machines with nite resources and for general dependencegraphs. We comparetheperformance of this and other modulo schedulers for a benchmark of 629 loops from the Perfect Club, SPEC89, and the Livermore Fortran Kernels. Measurements demonstrate the potential of registersensitive modulo schedulers, which will be useful in evaluating the performance ofregistersensitive modulo scheduling heuristics. Keywords: Registersensitive moduloscheduling, software pipelining, loop scheduling, instruction level parallelism,