Results 1 
5 of
5
Register Allocation via Graph Coloring
, 1992
"... Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimis ..."
Abstract

Cited by 135 (4 self)
 Add to MetaCart
Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimistically assumes any node of high degree will not be colored and must therefore be spilled. By optimistically assuming that nodes of high degree will receive colors, I often achieve lower spill costs and faster code; my results are never worse. Coloring pairs The pessimism of Chaitin's coloring heuristic is emphasized when trying to color register pairs. My heuristic handles pairs as a natural consequence of its optimism. Rematerialization Chaitin et al. introduced the idea of rematerialization to avoid the expense of spilling and reloading certain simple values. By propagating rematerialization information around the SSA graph using a simple variation of Wegman and Zadeck's constant propag...
A Simple, Fast Dominance Algorithm
"... The problem of finding the dominators in a controlflow graph has a long history in the literature. The original algorithms su#ered from a large asymptotic complexity but were easy to understand. Subsequent work improved the time bound, but generally sacrificed both simplicity and ease of implemen ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
The problem of finding the dominators in a controlflow graph has a long history in the literature. The original algorithms su#ered from a large asymptotic complexity but were easy to understand. Subsequent work improved the time bound, but generally sacrificed both simplicity and ease of implementation. This paper returns to a simple formulation of dominance as a global dataflow problem. Some insights into the natureofdominance lead to an implementation of an O(N )algorithm that runs faster, in practice, than the classic LengauerTarjan algorithm, which has a timebound of O(E log(N)). We compare the algorithm to LengauerTarjan because it is the best known and most widely used of the fast algorithms for dominance. Working from the same implementationinsights,wealso rederive (from earlier work on control dependence by Ferrante, et al.)amethodforcalculating dominance frontiers that we show is faster than the original algorithm by Cytron, et al. The aim of this paper is not to present a new algorithm, but, rather, to make an argument based on empirical evidence that algorithms with discouraging asymptotic complexities can be faster in practice than those more commonly employed. We show that, in some cases, careful engineering of simple algorithms can overcome theoretical advantages, even when problems grow beyond realistic sizes. Further, we argue that the algorithms presented herein are intuitive and easily implemented, making them excellent teaching tools.
Operator Strength Reduction
, 1995
"... This paper presents a new al gS ithm for operator strengM reduction, called OSR. OSR improves upon an earlier alg orithm due to Allen, Cocke, and Kennedy [Allen et al. 1981]. OSR operates on the static sing e assig4 ent (SSA) form of a procedure [Cytron et al. 1991]. By taking advantag of the pr ..."
Abstract

Cited by 29 (9 self)
 Add to MetaCart
This paper presents a new al gS ithm for operator strengM reduction, called OSR. OSR improves upon an earlier alg orithm due to Allen, Cocke, and Kennedy [Allen et al. 1981]. OSR operates on the static sing e assig4 ent (SSA) form of a procedure [Cytron et al. 1991]. By taking advantag of the properties of SSA form, we have derived an alg ithm that is simple to understand, quick to implement, and, in practice, fast to run. Its asymptotic complexity is, in the worst case, the same as the Allen, Cocke, and Kennedy al gS ithm (ACK). OSR achieves optimization results that are equivalent to those obtained with the ACK alg orithm. OSR has been implemented in several research and production compilers
Fast copy coalescing and liverange identification
 In Proceedings of the ACM Sigplan Conference on Programming Language Design and Implementation (PLDI’02
, 2002
"... This paper presents a fast new algorithm for modeling and reasoning about interferences for variables in a program without constructing an interference graph. It then describes how to use this information to minimize copy insertion for φnode instantiation during the conversion of the static single ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
This paper presents a fast new algorithm for modeling and reasoning about interferences for variables in a program without constructing an interference graph. It then describes how to use this information to minimize copy insertion for φnode instantiation during the conversion of the static single assignment (SSA) form into the controlflow graph (CFG), effectively yielding a new, very fast copy coalescing and liverange identification algorithm. This paper proves some properties of the SSA form that enable construction of data structures to compute interference information for variables that are considered for folding. The asymptotic complexity of our SSAtoCFG conversion algorithm is O(nα(n)), where n is the number of instructions in the program. Performing copy folding during the SSAtoCFG conversion eliminates the need for a separate coalescing phase while simplifying the intermediate code. This may make graphcoloring register allocation more practical in just in time (JIT) and other timecritical compilers For example, Sun’s Hotspot Server Compiler already employs a graphcoloring register allocator[10]. This paper also presents an improvement to the classical interferencegraph based coalescing optimization that shows a decrease in memory usage of up to three orders of magnitude and a decrease of a factor of two in compilation time, while providing the exact same results. We present experimental results that demonstrate that our algorithm is almost as precise (within one percent on average) as the improved interferencegraphbased coalescing algorithm, while requiring three times less compilation time.