Results 1  10
of
18
Optimal Spilling for CISC Machines with Few Registers
 In Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
, 2000
"... Register allocation based on graph coloring performs poorly for machines with few registers, if each temporary is held either in machine registers or memory over its entire lifetime. With the exception of shortlived temporaries, most temporaries must spill  including long lived temporaries that a ..."
Abstract

Cited by 62 (1 self)
 Add to MetaCart
Register allocation based on graph coloring performs poorly for machines with few registers, if each temporary is held either in machine registers or memory over its entire lifetime. With the exception of shortlived temporaries, most temporaries must spill  including long lived temporaries that are used within inner loops. Liverange splitting before or during register allocation helps to alleviate the problem but prior techniques are sometimes complex, make no guarantees about subsequent colorability and thus require further iterations of splitting, pay no attention to addressing modes, and make no claim to optimality. We formulate the register allocation problem for CISC architectures with few registers in two parts: an integer linear program that determines the optimal location to break up the implementation of a live range between registers and memory, and a register assignment phase that we guarantee to complete without further spill code insertion. Our linear programming model ...
Datadependency graph transformations for instruction scheduling
 Journal of Scheduling
, 2006
"... This paper presents a set of efficient graph transformations for local instruction scheduling. These transformations to the datadependency graph prune redundant and inferior schedules from the solution space of the problem. Optimally scheduling the transformed problems using an enumerative schedule ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
This paper presents a set of efficient graph transformations for local instruction scheduling. These transformations to the datadependency graph prune redundant and inferior schedules from the solution space of the problem. Optimally scheduling the transformed problems using an enumerative scheduler is faster and the number of problems solved to optimality within a bounded time is increased. Furthermore, heuristic scheduling of the transformed problems often yields improved schedules for hard problems. The basic nodebased transformation runs in O(ne) time, where n is the number of nodes and e is the number of edges in the graph. A generalized subgraphbased transformation runs in O(n 2 e) time. The transformations are implemented within the Gnu Compiler Collection (GCC) and are evaluated experimentally using the SPEC CPU2000 floatingpoint benchmarks targeted to various processor models. The results show that the transformations are fast and improve the results of both heuristic and optimal scheduling. KEY WORDS: instruction scheduling, graph transformation, optimal scheduling, compiler scheduling 1.
Minimum Register Instruction Sequencing to Reduce Register Spills in OutofOrder Issue Superscalar Architectures
 IEEE Transactions on Computers
"... ..."
Minimum Register Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs
 LAB., UNIVERSITY OF DELAWARE
, 2001
"... We revisit the optimal code generation or evaluation order determination problem  the problem of generating an instruction sequence from a data dependence graph (DDG). In particular, we are interested in generating an instruction sequence S that is optimal in terms of the number of registers used ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We revisit the optimal code generation or evaluation order determination problem  the problem of generating an instruction sequence from a data dependence graph (DDG). In particular, we are interested in generating an instruction sequence S that is optimal in terms of the number of registers used by the sequence S. We call this MRIS (Minimum Register Instruction Sequence) problem. We
Integrated Prepass Scheduling for a Java Justintime Compiler on the IA64 Architecture
 in Proceedings of the International Symposium on Code Generation and Optimization
, 2003
"... We present a new integrated prepass scheduling (IPS) algorithm for a Java JustInTime (JIT) compiler, which integrates register minimization into list scheduling. We use backtracking in the list scheduling when we have used up all the available registers. To reduce the overhead of backtracking, w ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We present a new integrated prepass scheduling (IPS) algorithm for a Java JustInTime (JIT) compiler, which integrates register minimization into list scheduling. We use backtracking in the list scheduling when we have used up all the available registers. To reduce the overhead of backtracking, we incrementally maintain a set of candidate instructions for undoing scheduling. To maximize the ILP after undoing scheduling, we select an instruction chain with the smallest increase in the total execution time. We implemented our new algorithm in a productionlevel Java JIT compiler for the Intel Itanium processor. The experiment showed that, compared to the best known algorithm by Govindarajan et al., our IPS algorithm improved the performance by up to +1.8 % while it reduced the compilation time for IPS by 58 % on average. 1.
Effective Instruction Scheduling with Limited Registers
, 2001
"... Effective global instruction scheduling techniques have become an important component in modern compilers for exposing more instructionlevel parallelism (ILP) and exploiting the everincreasing number of parallel function units. Effective register allocation has long been an essential component of a ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Effective global instruction scheduling techniques have become an important component in modern compilers for exposing more instructionlevel parallelism (ILP) and exploiting the everincreasing number of parallel function units. Effective register allocation has long been an essential component of a good compiler for reducing memory references. While instruction scheduling and register allocation are both essential compiler optimizations for fully exploiting the capability of modern highperformance microprocessors, there is a phaseordering problem when we perform these two optimizations separately: instruction scheduling before register allocation may create insatiable demands for registers; register allocation before instruction scheduling may reduce the amount of parallelism that instruction scheduling can exploit. In this thesis, we propose to solve this phaseordering problem by inserting a moderating optimization called code reorganization between prepass instruction scheduling and register allocation. Code reorganization adjusts the prepass scheduling results to make them demand fewer registers (i.e. exhibit lower register pressure) and guides register allocation to insert spill code that has less impact on schedule length. Our new approach avoids the complexity of simultaneous instruction scheduling and register allocation algorithms. In fact, it does not modify either instruction scheduling or register allocation algorithms. Therefore instruction scheduling can focus on maximizing instructionlevel parallelism, and register allocation can focus on minimizing the cost of spill code. We compare the performance of our approach with a particular successful registerpressuresensitive scheduling algorithm, and show an average of 18% improvement in speedup for an 8...
Optimal Global Instruction Scheduling Using Enumeration
 University of California
, 1991
"... of the ..."
Minimum Register Instruction Scheduling: A New Approach for Dynamic Instruction Issue Processors
 In Proc. of the Twelfth International Workshop on Languages and Compilers for Parallel Computing
, 1999
"... . Modern superscalar architectures with dynamic scheduling and register renaming capabilities have introduced subtle but important changes into the tradeoffs between compiletime register allocation and instruction scheduling. In particular, it is perhaps not wise to increase the degree of parall ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
. Modern superscalar architectures with dynamic scheduling and register renaming capabilities have introduced subtle but important changes into the tradeoffs between compiletime register allocation and instruction scheduling. In particular, it is perhaps not wise to increase the degree of parallelism of the static instruction schedule at the expense of excessive register pressure which may result in additional spill code. To the contrary, it may even be beneficial to reduce the register pressure at the expense of constraining the degree of parallelism of the static instruction schedule. This leads to the following interesting problem: given a data dependence graph (DDG) G, can we derive a schedule S for G that uses the least number of registers ? In this paper, we present a heuristic approach to compute the nearoptimal number of registers required for a DDG G (under all possible legal schedules). We propose an extended listscheduling algorithm which uses the above number...
Author manuscript, published in "international conference on Compilers, architecture, and synthesis for embedded systems (2009)" Spatial Complexity of Reversibly Computable DAG
, 2011
"... In this paper we address the issue of making a program reversible in terms of spatial complexity. Spatial complexity is the amount of memory/register locations required for performing the computation in both forward and backward directions. Spatial complexity has important relationship with the intr ..."
Abstract
 Add to MetaCart
In this paper we address the issue of making a program reversible in terms of spatial complexity. Spatial complexity is the amount of memory/register locations required for performing the computation in both forward and backward directions. Spatial complexity has important relationship with the intrinsics power consumption required at run time; this was our primary motivation. But it has also important relationship with the trade off between storing or recomputing reused intermediate values, also known as the rematerialization problem in the context of compiler register allocation, or the checkpointing issue in the general case. We present a lower bound of the spatial complexity of a DAG (directed acyclic graph) with reversible operations, as well as a heuristic aimed at finding the minimum number of registers required for a forward and backward execution of a DAG. We define energetic garbage as the additional number of registers needed for the reversible computation with respect to the original computation. We have run experiments that suggest that the garbage size is never more than 50% of the DAG size for DAGs with unary/binary operations.
Cooperative instruction scheduling with linear scan
, 2005
"... Linear scan register allocation is an attractive register allocation algorithm because of its simplicity and fast running time. However, it is generally felt that linear scan register allocation yields poorer code than allocation schemes based on graph coloring. In this paper, we propose a prepass ..."
Abstract
 Add to MetaCart
Linear scan register allocation is an attractive register allocation algorithm because of its simplicity and fast running time. However, it is generally felt that linear scan register allocation yields poorer code than allocation schemes based on graph coloring. In this paper, we propose a prepass instruction scheduling algorithm that improves on the code quality of linear scan allocators. Our implementation in the Trimaran compilersimulator infrastructure shows that our scheduler can reduce the number of active live ranges that the linear scan allocator has to deal with. As a result, fewer spills are needed and the quality of the generated code is improved. Furthermore, compared to the default scheduling and graphcoloring allocator schemes found in the IMPACT and Elcor components of Trimaran, our implementation with our prepass scheduler and linear scan register allocator significantly reduced compilation times.