Results 1 - 10
of
25
Improvements to Graph Coloring Register Allocation
- ACM Transactions on Programming Languages and Systems
, 1994
"... This paper describes both the techniques themselves and our experience building and using register allocators that incorporate them. It provides a detailed description of optimistic coloring and rematerialization. It presents experimental data to show the performance of several versions of the regis ..."
Abstract
-
Cited by 158 (8 self)
- Add to MetaCart
This paper describes both the techniques themselves and our experience building and using register allocators that incorporate them. It provides a detailed description of optimistic coloring and rematerialization. It presents experimental data to show the performance of several versions of the register allocator on a suite of FORTRAN programs. It discusses several insights that we discovered only after repeated implementation of these allocators. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors---compi l ers , optimization General terms: Languages Additional Key Words and Phrases: Register allocation, code generation, graph coloring 1. INTRODUCTION The relationship between run-time performance and e#ective use of a machine's register set is well understood. In a compiler, the process of deciding which values to keep in registers at each point in the generated code is called register allocation. Value
Memory-System Design Considerations For Dynamically-Scheduled Microprocessors
, 1997
"... Memory-System Design Considerations for Dynamically-Scheduled Microprocessors Keith Istvan Farkas Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 1997 Dynamically-scheduled processors challenge hardware and software architects to develop designs ..."
Abstract
-
Cited by 66 (4 self)
- Add to MetaCart
Memory-System Design Considerations for Dynamically-Scheduled Microprocessors Keith Istvan Farkas Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 1997 Dynamically-scheduled processors challenge hardware and software architects to develop designs that balance hardware complexity and compiler technology against performance targets. This dissertation presents a first thorough look at some of the issues introduced by this hardware complexity. The focus of the investigation of these issues is the register file and the other components of the data memory system. These components are: the lockup-free data cache, the stream buffers, and the interface to the lower levels of the memory system. The investigation is based on software models. These models incorporate the features of a dynamically-scheduled processor that affect the design of the data-memory components. The models represent a balance between accuracy and generality, and ar...
Data Flow Frequency Analysis
- In Proc. of the ACM SIGPLAN '96 Conference on Programming Language Design and Implementation (PLDI'96
, 1996
"... Conventional dataflow analysis computes information about what facts may or will not hold during the execution of a program. Sometimes it is useful, for program optimization, to know how often or with what probability a fact holds true during program execution. In this paper, we provide a precise fo ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
Conventional dataflow analysis computes information about what facts may or will not hold during the execution of a program. Sometimes it is useful, for program optimization, to know how often or with what probability a fact holds true during program execution. In this paper, we provide a precise formulation of this problem for a large class of dataflow problems --- the class of finite bi-distributive subset problems. We show how it can be reduced to a generalization of the standard dataflow analysis problem, one that requires a sum-over-all-paths quantity instead of the usual meet-overall -paths quantity. We show that Kildall's result expressing the meet-over-all-paths value as a maximal-fixed-point carries over to the generalized setting. We then outline ways to adapt the standard dataflow analysis algorithms to solve this generalized problem, both in the intraprocedural and the interprocedural case. 1 Introduction Conventional dataflow analysis computes information about what facts...
Register Allocation over the Program Dependence Graph
- IN PROC. OF THE ACM SIGPLAN '94 CONF. ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1994
"... This paper describes RAP, a Register Allocator that allocates registers over the Program Dependence Graph (PDG) representation of a program in a hierarchical manner. The PDG program representation has been used successfully for scalar optimizations, the detection and improvement of parallelism for v ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
This paper describes RAP, a Register Allocator that allocates registers over the Program Dependence Graph (PDG) representation of a program in a hierarchical manner. The PDG program representation has been used successfully for scalar optimizations, the detection and improvement of parallelism for vector machines, multiple processor machines, and machines that exhibit instruction level parallelism, as well as debugging, the integration of different versions of a program, and translation of imperative programs for data flow machines. By basing register allocation on the PDG, the register allocation phase may be more easily integrated and intertwined with other optimization analyses and transformations. In addition, the advantages of a hierarchical approach to global register allocation can be attained without constructing an additional structure used solely for register allocation. Our experimental results have shown that on average, code allocated registers via RAP executed 2.7% faster...
A Scheduler-Sensitive Global Register Allocator
- IN SUPERCOMPUTING '93 PROCEEDINGS
, 1993
"... Compile-time reordering of machine-level instructions has been very successful at achieving large increases in performance of programs on machines offering fine-grained parallelism. However, because of the interdependences between instruction scheduling and register allocation, it is not clear which ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Compile-time reordering of machine-level instructions has been very successful at achieving large increases in performance of programs on machines offering fine-grained parallelism. However, because of the interdependences between instruction scheduling and register allocation, it is not clear which of these two phases of the compiler should run first to generate the most efficient final code. In this paper, we describe our investigation into slight modifications to key phases of a successful global register allocator to create a scheduler-sensitive register allocator, which is then followed by an "off-the-shelf" instruction scheduler. Our experimental studies reveal that this approach achieves speedups comparable and increasingly better than previous cooperative approaches with an increasing number of available registers without the complexities of the previous approaches.
An Experimental Study of Several Cooperative Register Allocation and Instruction Scheduling Strategies
- In Proceedings of the Twenty-eigth International Symposium on Microarchitecture, Ann Arbor
, 1995
"... Compile-time reordering of low level instructions is successful in achieving large increases in performance of programs on fine-grain parallel machines. However, because of the interdependences between instruction scheduling and register allocation, a lack of cooperation between the scheduler and re ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Compile-time reordering of low level instructions is successful in achieving large increases in performance of programs on fine-grain parallel machines. However, because of the interdependences between instruction scheduling and register allocation, a lack of cooperation between the scheduler and register allocator can result in generating code that contains excess register spills and/or a lower degree of parallelism than actually achievable. This paper describes a strategy for providing cooperation between register allocation and both global and local instruction scheduling. We experimentally compare this strategy with other cooperative and uncooperative scenarios. Our experiments indicate that the greatest speedups are obtained by performing either cooperative or uncooperative global instruction scheduling with cooperative register allocation and local instruction scheduling. 1 Introduction The major focus of optimizing compilers for architectures supporting instruction level paral...
Load/Store Range Analysis for Global Register Allocation
- Proc. of the SIGPLAN Conference on Programming Language Design and Implementation
, 1994
"... Live range splitting techniques improve global register allocation by splitting the live ranges of variables into segments that are individually allocated registers. Load/store range analysis is a new technique for live range splitting that is based on reaching definition and live variable analyses. ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Live range splitting techniques improve global register allocation by splitting the live ranges of variables into segments that are individually allocated registers. Load/store range analysis is a new technique for live range splitting that is based on reaching definition and live variable analyses. Our analysis localizes the profits and the register requirements of every access to every variable to provide a fine granularity of candidates for register allocation. Experiments on a suite of C and FORTRAN benchmark programs show that a graph coloring register allocator operating on load/store ranges often provides better allocations than the same allocator operating on live ranges. Experimental results also show that the computational cost of using load/store ranges for register allocation is moderately more than the cost of using live ranges. 1 Introduction Register allocation maps variables in an intermediate language program to either registers or memory locations in order to minimiz...
Register Allocation Sensitive Region Scheduling
- In PACT `95: International Conference on Parallel Architectures and Compilation Techniques, Limassol
, 1995
"... Because of the interdependences between instruction scheduling and register allocation, it is not clear which of these two phases should run first. In this paper, we describe how we modified a global instruction scheduling technique to make it cooperate with a subsequent register allocation phase. ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Because of the interdependences between instruction scheduling and register allocation, it is not clear which of these two phases should run first. In this paper, we describe how we modified a global instruction scheduling technique to make it cooperate with a subsequent register allocation phase. In particular, our cooperative global instruction scheduler performs region scheduling transformations on the program dependence graph representation of a program while attempting to prevent an increase in the amount of spill code which will be introduced in the subsequent register allocation phase. Our experimental findings indicate that the cooperative technique does indeed produce more efficient code than noncooperative global instruction scheduling in programs in which an allocation can not be performed without the insertion of spill code. 1 Introduction In order to effectively exploit the finegrained parallelism in pipelined, superscalar and VLIW machines, various strategies for careful...
Minimum Cost Interprocedural Register Allocation
- In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
, 1996
"... Past register allocators have applied heuristics to allocate registers at the local, global, and interprocedural levels. This paper presents a polynomial time interprocedural register allocator that models the cost of allocating registers to procedures and spilling registers across calls. To find th ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Past register allocators have applied heuristics to allocate registers at the local, global, and interprocedural levels. This paper presents a polynomial time interprocedural register allocator that models the cost of allocating registers to procedures and spilling registers across calls. To find the minimum cost allocation, our allocator maps solutions from a dual network flow problem that can be solved in polynomial time. Experiments show that our interprocedural register allocator can yield significant improvements in execution time. 1 Introduction Effectively using registers can significantly decrease the execution time of a program. Common policy in current compilers using only intraprocedural register allocation is to spill at call sites registers that might be used by both the caller and callee[CHKW86]. The goal of interprocedural register allocation is to minimize execution time given the register requirements of individual procedures in a program. Based on these requirements...
Probabilistic Data Flow System with Two-Edge Profiling
- WORKSHOP ON DYNAMIC AND ADAPTIVE COMPILATION AND OPTIMIZATION (DYNAMO'00). ACM SIGPLAN NOTICES, 35(7):65 -- 72
, 2000
"... Traditionally optimization is done statically independent of actual execution environments. For generating highly optimized code, however, runtime information can be used to adapt a program to different environments. In probabilistic data flow systems runtime information on representative input data ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Traditionally optimization is done statically independent of actual execution environments. For generating highly optimized code, however, runtime information can be used to adapt a program to different environments. In probabilistic data flow systems runtime information on representative input data is exploited to compute the probability with what data flow facts may hold. Probabilistic data flow analysis can guide advanced optimizing transformations in order to improve the running time of programs. In comparison classical data flow analysis does not take runtime information into account. All paths are equally weighted irrespectively whether they are never, heavily, or rarely executed. In this paper we present the best solution what we can theoretically obtain for probabilistic data flow problems and compare it with the state-of-the-art oneedge approach. We show that the differences can be considerable and improvements are crucial. However, the theoretically best solution is too expe...

