Results 11 - 20
of
44
Load/Store Range Analysis for Global Register Allocation
- Proc. of the SIGPLAN Conference on Programming Language Design and Implementation
, 1994
"... Live range splitting techniques improve global register allocation by splitting the live ranges of variables into segments that are individually allocated registers. Load/store range analysis is a new technique for live range splitting that is based on reaching definition and live variable analyses. ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Live range splitting techniques improve global register allocation by splitting the live ranges of variables into segments that are individually allocated registers. Load/store range analysis is a new technique for live range splitting that is based on reaching definition and live variable analyses. Our analysis localizes the profits and the register requirements of every access to every variable to provide a fine granularity of candidates for register allocation. Experiments on a suite of C and FORTRAN benchmark programs show that a graph coloring register allocator operating on load/store ranges often provides better allocations than the same allocator operating on live ranges. Experimental results also show that the computational cost of using load/store ranges for register allocation is moderately more than the cost of using live ranges. 1 Introduction Register allocation maps variables in an intermediate language program to either registers or memory locations in order to minimiz...
Improved Spill Code Generation for Software Pipelined Loops
, 1999
"... Software pipelining is a loop scheduling technique that extracts parallelism out of loops by overlapping the execution of several consecutive iterations. Due to the overlapping of iterations, schedules impose high register requirements during their execution. A schedule is valid if it requires at mo ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
Software pipelining is a loop scheduling technique that extracts parallelism out of loops by overlapping the execution of several consecutive iterations. Due to the overlapping of iterations, schedules impose high register requirements during their execution. A schedule is valid if it requires at most the number of registers available in the target architecture. If not, its register requirements have to be reduced either by decreasing the iteration overlapping or by spilling registers to memory. In this paper we describe a set of heuristics to increase the quality of register--constrained modulo schedules. The heuristics decide between the two previous alternatives and define criteria for effectively selecting spilling candidates. The heuristics proposed for reducing the register pressure can be applied to any software pipelining technique. The proposals are evaluated using a register-- conscious software pipeliner on a workbench composed of a large set of loops from the Perfect Club benchmark and a set of processor configurations. Proposals in this paper are compared against a previous proposal already described in the literature. For one of these processor configurations and the set of loops that do not fit in the available registers (32), a speed--up of 1.68 and a reduction of the memory traffic by a factor of 0.57 are achieved with an affordable increase in compilation time. For all the loops, this represents a speed-- up of 1.38 and a reduction of the memory traffic by a factor of 0.7.
Simple Register Spilling in a Retargetable Compiler
, 1995
"... This paper describes the management of register spills in a retargetable C compiler. Spills are rare, which means that testing is a bigger problem than performance. The trade-offs have been arranged so that the common case (no spills) generates respectable code quickly and the uncommon case (spills) ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
This paper describes the management of register spills in a retargetable C compiler. Spills are rare, which means that testing is a bigger problem than performance. The trade-offs have been arranged so that the common case (no spills) generates respectable code quickly and the uncommon case (spills) is less efficient but as simple as possible. The technique has proven practical and is in production use on VAX, Motorola 68020, SPARC and MIPS machines. KEY WORDS ANSI C code generation compilers register allocation register spilling INTRODUCTION When register allocators run out of registers, they generate code to spill one or more busy registers into temporaries and code to reload those values when they are needed again. The trend in compiling research is increasing the sophistication --- and the implementation and execution costs --- of the techniques that avoid spills.
Reducing Memory Traffic with CRegs
, 1994
"... Array and pointer references are often ambiguous in that compile time analysis cannot always determine if distinct references are to the same object. Ambiguously aliased objects are not allocated to registers by conventional compilers due to the cost of the loads and stores required to keep regis ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Array and pointer references are often ambiguous in that compile time analysis cannot always determine if distinct references are to the same object. Ambiguously aliased objects are not allocated to registers by conventional compilers due to the cost of the loads and stores required to keep register copies consistent with memory and each other. These instructions affect performance by generating memory traffic. There are several hardware and software strategies that can be used to solve the ambiguous alias problem; we have implemented one such scheme called CRegs in a compiler and instruction level simulator. We present a modification to Briggs' optimistic coloring algorithm that allows us to allocate local and parameter arrays to CRegs. The CRegs register file operation and instruction set modifications required to implement this scheme are discussed. Underlying hardware issues such as pipeline impact and chip area are briefly discussed. Several benchmarks are compared in ...
Code Reuse in an Optimizing Compiler
"... This paper describes how the cmcc compiler reuses code --- both internally (reuse between different modules) and externally (reuse between versions for different target machines). The key to reuse are the application frameworks developed for global data-flow analysis, code generation, instruction sc ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
This paper describes how the cmcc compiler reuses code --- both internally (reuse between different modules) and externally (reuse between versions for different target machines). The key to reuse are the application frameworks developed for global data-flow analysis, code generation, instruction scheduling, and register allocation. The code produced by cmcc is as good as the code produced by the native compilers for the MIPS and SPARC, although significantly less resources have been spent on cmcc (overall, about 6 man years by 2.5 persons). cmcc is implemented in C++, which allowed for a compact expression of the frameworks as class hierarchies. The results support the claim that suitable frameworks facilitate reuse and thereby significantly improve developer effectiveness. 1 Introduction A well-chosen set of application-specific frameworks results in significant code reuse, which is a requirement for concise code. In this paper, we report on our experience with using this software...
A Progressive Register Allocator for Irregular Architectures
"... ... a compiler performs. Conventional graph-coloring based register allocators are fast and do well on regular, RISC-like, architectures, but perform poorly on irregular, CISC-like, architectures with few registers and nonorthogonal instruction sets. At the other extreme, optimal register allocators ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
... a compiler performs. Conventional graph-coloring based register allocators are fast and do well on regular, RISC-like, architectures, but perform poorly on irregular, CISC-like, architectures with few registers and nonorthogonal instruction sets. At the other extreme, optimal register allocators based on integer linear programming are capable of fully modeling and exploiting the peculiarities of irregular architectures but do not scale well. We introduce the idea of a progressive allocator. A progressive allocator finds an initial allocation of quality comparable to a conventional allocator, but as more time is allowed for computation the quality of the allocation approaches optimal. This paper presents a progressive register allocator which uses a multi-commodity network flow model to elegantly represent the intricacies of irregular architectures. We evaluate our allocator as a substitute for gcc's local register allocation pass.
Fusion-Based Register Allocation
- ACM Transactions on Programming Languages and Systems
, 1997
"... This paper describes fusion based register allocation in detail and compares it with other approaches to register allocation. For programs from the SPEC92 suite, fusion based register allocation can improve the execution time (of optimized programs, for the MIPS architecture) by up to 8.4% over Chai ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper describes fusion based register allocation in detail and compares it with other approaches to register allocation. For programs from the SPEC92 suite, fusion based register allocation can improve the execution time (of optimized programs, for the MIPS architecture) by up to 8.4% over Chaitin-style register allocation.
Register Saturation in Superscalar and VLIW Codes
- IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION, LECTURE NOTES IN COMPUTER SCIENCE
, 2001
"... The registers constraints can be taken into account during the scheduling phase of an acyclic data dependence graph (DAG): any schedule must minimize the register requirement. In this work, we mathematically study and extend the approach which consists of computing the exact upper-bound of the r ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The registers constraints can be taken into account during the scheduling phase of an acyclic data dependence graph (DAG): any schedule must minimize the register requirement. In this work, we mathematically study and extend the approach which consists of computing the exact upper-bound of the register need for all the valid schedules, independently of the functional unit constraints. A previous work (URSA) was presented in [5, 4]. Its aim was to add some serial arcs to the original DAG such that the worst register need does not exceed the number of available registers. We write an appropriate mathematical formalism for this problem and extend the DAG model to take into account delayed read from and write into registers with multiple registers types. This
Hybrid Optimizations: Which Optimization Algorithm to Use?
- IN 15TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC 2006
, 2006
"... We introduce a new class of compiler heuristics: hybrid optimizations. Hybrid ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We introduce a new class of compiler heuristics: hybrid optimizations. Hybrid
The RTL System
, 1990
"... Assignment ImplicitAssignment Assignment PhiAssignment Jump CondJump Return The subclasses play the following roles: EmptyRegisterTransfer: used in the instruction builder to represent transfers that only set flags. See Chapter 8. RegisterTransferSet: represents sets of RegisterTransfers to be pe ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Assignment ImplicitAssignment Assignment PhiAssignment Jump CondJump Return The subclasses play the following roles: EmptyRegisterTransfer: used in the instruction builder to represent transfers that only set flags. See Chapter 8. RegisterTransferSet: represents sets of RegisterTransfers to be performed concurrently. Theoretically, this is a recursive structure, since it could contain an instance of itself; however, this is never allowed. Its only instance variable is transfers, an OrderedCollection of the component transfers. Call: represents procedure calls. Its instance variables are method, the Register containing the callee's address; returnValueRegister, the Register in which the result of the call will be found; and argumentLogicalRegisters, an OrderedCollection of the Registers that contain the receiver and the first two arguments. AbstractAssignment: the abstract superclass of all classes representing assignments to some storage. Its only instance variable is destinatio...

