Results 1  10
of
27
Register allocation: what does the NPCompleteness proof of Chaitin et al. really prove?
 IN PROC. OF THE 19 TH INTERNATIONAL WORKSHOP ON LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING (LCPC ’06
, 2006
"... Register allocation is one of the most studied problems in compilation. It is considered as an NPcomplete problem since Chaitin et al., in 1981, modeled the problem of assigning temporary variables to k machine registers as the problem of coloring, with k colors, the interference graph associated t ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Register allocation is one of the most studied problems in compilation. It is considered as an NPcomplete problem since Chaitin et al., in 1981, modeled the problem of assigning temporary variables to k machine registers as the problem of coloring, with k colors, the interference graph associated to the variables. The fact that the interference graph can be arbitrary proves the NPcompleteness of this formulation. However, this original proof does not really show where the complexity of register allocation comes from. Recently, the rediscovery that interference graphs of SSA programs can be colored in polynomial time raised the question: Can we exploit SSA form to perform register allocation in polynomial time, without contradicting Chaitin et al’s NPcompleteness result? To address such a question and, more generally, the complexity of register allocation, we revisit Chaitin et al’s proof to better identify the interactions between spilling (load/store insertion), coalescing/splitting (removal/insertion of moves between registers), critical edges (a property of the controlflow graph), and coloring (assignment to registers). In particular, we show that, in general (we will make clear when), it is easy to decide if temporary variables can be assigned to k registers or if some spilling is necessary. In other words, the real complexity does not come from the coloring itself (as a wrong interpretation of the proof of Chaitin et al. may suggest) but comes from the presence of critical edges and from the optimizations of spilling and coalescing.
A fast cuttingplane algorithm for optimal coalescing
 In Proc. of the 16 th International Conference on Compiler Construction (CC ’07
"... Abstract. Recent work has shown that the subtasks of register allocation (spilling, register assignment, and coalescing) can be completely separated. This work presents an algorithm for the coalescing subproblem that relies on this separation. The algorithm uses 0/1 Linear Programming (ILP), a gener ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Abstract. Recent work has shown that the subtasks of register allocation (spilling, register assignment, and coalescing) can be completely separated. This work presents an algorithm for the coalescing subproblem that relies on this separation. The algorithm uses 0/1 Linear Programming (ILP), a generalpurpose optimization technique, to derive optimal solutions. We provide the first optimal solutions for a benchmark called “Optimal Coalescing Challenge”, i.e., our ILP model outperforms previous approaches. Additionally, we use these optimal solutions to assess the quality of wellknown heuristics. A second benchmark on SPEC CPU2000 programs emphasizes the practicality of our algorithm. 1
On the complexity of register coalescing
 In Proc. of the International Symposium on Code Generation and Optimization (CGO ’07
, 2006
"... Memory transfers are becoming more important to optimize, for both performance and power consumption. With this goal in mind, new register allocation schemes are developed, which revisit not only the spilling problem but also the coalescing problem. Indeed, a more aggressive strategy to avoid load/s ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Memory transfers are becoming more important to optimize, for both performance and power consumption. With this goal in mind, new register allocation schemes are developed, which revisit not only the spilling problem but also the coalescing problem. Indeed, a more aggressive strategy to avoid load/store instructions may increase the constraints to suppress (coalesce) move instructions. This paper is devoted to the complexity of the coalescing phase, in particular in the light of recent developments on the SSA form. We distinguish several optimizations that occur in coalescing heuristics: a) aggressive coalescing removes as many moves as possible, regardless of the colorability of the resulting interference graph; b) conservative coalescing removes as many moves as possible while keeping the colorability of the graph; c) incremental conservative coalescing removes one particular move while keeping the colorability of the graph; d) optimistic coalescing coalesces moves aggressively, then gives up about as few moves as possible so that the graph becomes colorable again. We almost completely classify the NPcompleteness of these problems, discussing also on the structure of the interference graph: arbitrary, chordal, or kcolorable in a greedy fashion. We believe that such a study is a necessary step for designing new coalescing strategies. 1
Scratchpad Allocation for Data Aggregates in Superperfect Graphs
, 2007
"... Existing methods place data or code in scratchpad memory, i.e., SPM by either relying on heuristics or resorting to integer programming or mapping it to a graph coloring problem. In this work, the SPM allocation problem is formulated as an interval coloring problem. The key observation is that in ma ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Existing methods place data or code in scratchpad memory, i.e., SPM by either relying on heuristics or resorting to integer programming or mapping it to a graph coloring problem. In this work, the SPM allocation problem is formulated as an interval coloring problem. The key observation is that in many embedded applications, arrays (including structs as a special case) are often related in the following way: For any two arrays, their live ranges are often such that one is either disjoint from or contains the other. As a result, array interference graphs are often superperfect graphs and optimal interval colorings for such array interference graphs are possible. This has led to the development of two new SPM allocation algorithms. While differing in whether live range splits and spills are done sequentially or together, both algorithms place arrays in SPM based on examining the cliques in an interference graph. In both cases, we guarantee optimally that all arrays in an interference graph can be placed in SPM if its size is no smaller than the clique number of the graph. In the case that the SPM is not large enough, we rely on heuristics to split or spill a live range until the graph is colorable. Our experiment results using embedded benchmarks show that our algorithms can outperform graph coloring when their interference graphs are superperfect or nearly so although graph coloring is admittedly more general and may also be effective to applications with arbitrary interference graphs.
Fast Liveness Checking for SSAForm Programs
"... Liveness analysis is an important analysis in optimizing compilers. Liveness information is used in several optimizations and is mandatory during the codegeneration phase. Two drawbacks of conventional liveness analyses are that their computations are fairly expensive and their results are easily i ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Liveness analysis is an important analysis in optimizing compilers. Liveness information is used in several optimizations and is mandatory during the codegeneration phase. Two drawbacks of conventional liveness analyses are that their computations are fairly expensive and their results are easily invalidated by program transformations. We present a method to check liveness of variables that overcomes both obstacles. The major advantage of the proposed method is that the analysis result survives all program transformations except for changes in the controlflow graph. For common program sizes our technique is faster and consumes less memory than conventional dataflow approaches. Thereby, we heavily make use of SSAform properties, which allow us to completely circumvent dataflow equation solving. We evaluate the competitiveness of our approach in an industrial strength compiler. Our measurements use the integer part of the SPEC2000 benchmarks and investigate the liveness analysis used by the SSA destruction pass. We compare the net time spent in liveness computations of our implementation against the one provided by that compiler. The results show that in the vast majority of cases our algorithm, while providing the same quality of information, needs less time: an average speedup of 16%.
On the complexity of spill everywhere under ssa form
 LCTES’07
, 2007
"... Compilation for embedded processors can be either aggressive (time consuming crosscompilation) or just in time (embedded and usually dynamic). The heuristics used in dynamic compilation are highly constrained by limited resources, time and memory in particular. Recent results on the SSA form open p ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Compilation for embedded processors can be either aggressive (time consuming crosscompilation) or just in time (embedded and usually dynamic). The heuristics used in dynamic compilation are highly constrained by limited resources, time and memory in particular. Recent results on the SSA form open promising directions for the design of new register allocation heuristics for embedded systems and especially for embedded compilation. In particular, heuristics based on tree scan with two separated phases — one for spilling, then one for coloring/coalescing — seem good candidates for designing memoryfriendly, fast, and competitive register allocators. Still, also because of the side effect on power consumption, the minimization of loads and stores overhead (spilling problem) is an important issue. This paper provides an exhaustive study of the complexity of the “spill everywhere” problem in the context of the SSA form. Unfortunately, conversely to our initial hopes, many of the questions we raised lead to NPcompleteness results. We identify some polynomial cases but that are impractical in JIT context. Nevertheless, they can give hints to simplify formulations for the design of aggressive allocators.
Register Allocation after Classical SSA Elimination is NPcomplete
"... Abstract. Chaitin proved that register allocation is equivalent to graph coloring and hence NPcomplete. Recently, Bouchez, Brisk, and Hack have proved independently that the interference graph of a program in static single assignment (SSA) form is chordal and therefore colorable in linear time. Can ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract. Chaitin proved that register allocation is equivalent to graph coloring and hence NPcomplete. Recently, Bouchez, Brisk, and Hack have proved independently that the interference graph of a program in static single assignment (SSA) form is chordal and therefore colorable in linear time. Can we use the result of Bouchez et al. to do register allocation in polynomial time by first transforming the program to SSA form, then performing register allocation, and finally doing the classical SSA elimination that replaces φfunctions with copy instructions? In this paper we show that the answer is no, unless P = NP: register allocation after classical SSA elimination is NPcomplete. Chaitin’s proof technique does not work for programs after classical SSA elimination; instead we use a reduction from the graph coloring problem for circular arc graphs. 1
SSA Elimination after Register Allocation
"... Abstract. Compilers such as gcc use staticsingleassignment (SSA) form as an intermediate representation and usually perform SSA elimination before register allocation. But the order could as well be the opposite: the recent approach of SSAbased register allocation performs SSA elimination after r ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract. Compilers such as gcc use staticsingleassignment (SSA) form as an intermediate representation and usually perform SSA elimination before register allocation. But the order could as well be the opposite: the recent approach of SSAbased register allocation performs SSA elimination after register allocation. SSA elimination before register allocation is straightforward and standard, while previously described approaches to SSA elimination after register allocation have shortcomings; in particular, they have problems with implementing copies between memory locations. We present spillfree SSA elimination, a simple and efficient algorithm for SSA elimination after register allocation that avoids increasing the number of spilled variables. We also present three optimizations of the core algorithm. Our experiments show that spillfree SSA elimination takes less than five percent of the total compilation time of a JIT compiler. Our optimizations reduce the number of memory accesses by more than 9 % and improve the program execution time by more than 1.8%. 1
Register spilling and liverange splitting for SSAform programs
 IN: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION. LECTURE NOTES IN COMPUTER SCIENCE
, 2009
"... Register allocation decides which parts of a variable’s live range are held in registers and which in memory. The compiler inserts spill code to move the values of variables between registers and memory. Since fetching data from memory is much slower than reading directly from a register, careful s ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Register allocation decides which parts of a variable’s live range are held in registers and which in memory. The compiler inserts spill code to move the values of variables between registers and memory. Since fetching data from memory is much slower than reading directly from a register, careful spill code insertion is critical for the performance of the compiled program. In this paper, we present a spilling algorithm for programs in SSA form. Our algorithm generalizes the wellknown furthestfirst algorithm, which is known to work well on straightline code, to controlflow graphs. We evaluate our technique by counting the executed spilling instructions in the CINT2000 benchmark on an x86 machine. The number of executed load (store) instructions was reduced by 54.5 % (61.5%) compared to a stateoftheart linear scan allocator and reduced by 58.2 % (41.9%) compared to a standard graphcoloring allocator. The runtime of our algorithm is competitive with standard linearscan allocators.
Scratchpad Memory Allocation for Data Aggregates via Interval Coloring in Superperfect Graphs
"... Existing methods place data or code in scratchpad memory, i.e., SPM by relying on heuristics or resorting to integer programming or mapping it to a graph coloring problem. In this paper, the SPM allocation problem for arrays is formulated as an interval coloring problem. The key observation is that ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Existing methods place data or code in scratchpad memory, i.e., SPM by relying on heuristics or resorting to integer programming or mapping it to a graph coloring problem. In this paper, the SPM allocation problem for arrays is formulated as an interval coloring problem. The key observation is that in many embedded C programs, two arrays can be modeled such that either their live ranges do not interfere or one contains the other (with good accuracy). As a result, array interference graphs often form a special class of superperfect graphs (known as comparability graphs) and their optimal interval colorings become efficiently solvable. This insight has led to the development of an SPM allocation algorithm that places arrays in an interference graph in SPM by examining its maximal cliques. If the SPM is no smaller than the clique number of an interference graph, then all arrays in the graph can be placed in SPM optimally. Otherwise, we rely on containmentmotivated heuristics to split or spill array live ranges until the resulting graph is optimally colorable. We have implemented our algorithm in SUIF/machSUIF and evaluated it using a set of embedded C benchmarks from MediaBench and MiBench. Compared to a graph coloring algorithm and an optimal ILP algorithm (when it runs to completion), our algorithm achieves closetooptimal results and is superior to graph coloring for the benchmarks tested.