Results 1 - 10
of
21
Register allocation for programs in ssa-form
- In Compiler Construction 2006, volume 3923 of LNCS
, 2006
"... In this technical report, we present an architecture for register allocation on the SSA-form. We show, how the properties of SSA-form programs and their interference graphs can be exploited to develop new methods for spilling, coloring and coalescing. We present heuristic and optimal solution method ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
In this technical report, we present an architecture for register allocation on the SSA-form. We show, how the properties of SSA-form programs and their interference graphs can be exploited to develop new methods for spilling, coloring and coalescing. We present heuristic and optimal solution methods for these three subtasks. 1
Virtual machine showdown: stack versus registers
- In VEE ’05: Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
, 2005
"... Virtual machines (VMs) are commonly used to distribute programs in an architecture-neutral format, which can easily be interpreted or compiled. A long-running question in the design of VMs is whether stack architecture or register architecture can be implemented more efficiently with an interpreter. ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Virtual machines (VMs) are commonly used to distribute programs in an architecture-neutral format, which can easily be interpreted or compiled. A long-running question in the design of VMs is whether stack architecture or register architecture can be implemented more efficiently with an interpreter. We extend existing work on comparing virtual stack and virtual register architectures in two ways. Firstly, our translation from stack to register code is much more sophisticated. The result is that we eliminate an average of more than 47 % of executed VM instructions, with the register machine bytecode size only 25 % larger than that of the corresponding stack bytecode. Secondly we present an implementation of a register machine in a fully standardcompliant implementation of the Java VM. We find that, on the Pentium 4, the register architecture requires an average of 32.3 % less time to execute standard benchmarks if dispatch is performed using a C switch statement. Even if more efficient threaded dispatch is available (which requires labels as first class values), the reduction in running time is still approximately 26.5 % for the register architecture.
Extended Linear Scan: an Alternate Foundation for Global Register Allocation
"... Abstract. In this paper, we extend past work on Linear Scan register allocation, and propose two Extended Linear Scan (ELS) algorithms that retain the compiletime efficiency of past Linear Scan algorithms while delivering performance that can match or surpass that of Graph Coloring. Specifically, th ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Abstract. In this paper, we extend past work on Linear Scan register allocation, and propose two Extended Linear Scan (ELS) algorithms that retain the compiletime efficiency of past Linear Scan algorithms while delivering performance that can match or surpass that of Graph Coloring. Specifically, this paper makes the following contributions: – We highlight three fundamental theoretical limitations in using Graph Coloring as a foundation for global register allocation, and introduce a basic Extended Linear Scan algorithm, ELS0, which addresses all three limitations for the problem of Spill-Free Register Allocation. – We introduce the ELS1 algorithm which extends ELS0 to obtain a greedy algorithm for the problem of Register Allocation with Total Spills. – Finally, we present experimental results to compare the Graph Coloring and Extended Linear Scan algorithms. Our results show that the compile-time speedups for ELS1 relative to GC were significant, and varied from 15 × to 68×. In addition, the resulting execution time improved by up to 5.8%, with an average improvement of 2.3%. Together, these results show that Extended Linear Scan is promising as an alternate foundation for global register allocation, compared to Graph Coloring, due to its compile-time scalability without loss of execution time performance. 1
On the complexity of register coalescing
- In Proc. of the International Symposium on Code Generation and Optimization (CGO ’07
, 2006
"... Memory transfers are becoming more important to optimize, for both performance and power consumption. With this goal in mind, new register allocation schemes are developed, which revisit not only the spilling problem but also the coalescing problem. Indeed, a more aggressive strategy to avoid load/s ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Memory transfers are becoming more important to optimize, for both performance and power consumption. With this goal in mind, new register allocation schemes are developed, which revisit not only the spilling problem but also the coalescing problem. Indeed, a more aggressive strategy to avoid load/store instructions may increase the constraints to suppress (coalesce) move instructions. This paper is devoted to the complexity of the coalescing phase, in particular in the light of recent developments on the SSA form. We distinguish several optimizations that occur in coalescing heuristics: a) aggressive coalescing removes as many moves as possible, regardless of the colorability of the resulting interference graph; b) conservative coalescing removes as many moves as possible while keeping the colorability of the graph; c) incremental conservative coalescing removes one particular move while keeping the colorability of the graph; d) optimistic coalescing coalesces moves aggressively, then gives up about as few moves as possible so that the graph becomes colorable again. We almost completely classify the NP-completeness of these problems, discussing also on the structure of the interference graph: arbitrary, chordal, or k-colorable in a greedy fashion. We believe that such a study is a necessary step for designing new coalescing strategies. 1
Graph coloring vs. optimal register allocation for optimizing compilers
- In Joint Modular Languages Conference
, 2003
"... Abstract. Optimizing compilers play an important role for the efficient execution of programs written in high level programming languages. Current microprocessors impose the problem that the gap between processor cycle time and memory latency increases. In order to fully exploit the potential of pro ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract. Optimizing compilers play an important role for the efficient execution of programs written in high level programming languages. Current microprocessors impose the problem that the gap between processor cycle time and memory latency increases. In order to fully exploit the potential of processors, nearly optimal register allocation is of paramount importance. In the predominance of the x86 architecture and in the increased usage of high-level programming languages for embedded systems peculiarities and irregularities in their register sets have to be handled. These irregularities makes the task of register allocation for optimizing compilers more difficult than for regular architectures and register files. In this article we show how optimistic graph coloring register allocation can be extended to handle these irregularities. Additionally we present an exponential algorithm which in most cases can compute an optimal solution for register allocation and copy elimination. These algorithms are evaluated on a virtual processor architecture modeling two and three operand architectures with different register file sizes. The evaluation demonstrates that the heuristic graph coloring register allocator comes close to the optimal solution for large register files, but performs badly on small register files. For small register files the optimal algorithm is fast enough to replace a heuristic algorithm. 1
A fast cutting-plane algorithm for optimal coalescing
- In Proc. of the 16 th International Conference on Compiler Construction (CC ’07
"... Abstract. Recent work has shown that the subtasks of register allocation (spilling, register assignment, and coalescing) can be completely separated. This work presents an algorithm for the coalescing subproblem that relies on this separation. The algorithm uses 0/1 Linear Programming (ILP), a gener ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract. Recent work has shown that the subtasks of register allocation (spilling, register assignment, and coalescing) can be completely separated. This work presents an algorithm for the coalescing subproblem that relies on this separation. The algorithm uses 0/1 Linear Programming (ILP), a general-purpose optimization technique, to derive optimal solutions. We provide the first optimal solutions for a benchmark called “Optimal Coalescing Challenge”, i.e., our ILP model outperforms previous approaches. Additionally, we use these optimal solutions to assess the quality of well-known heuristics. A second benchmark on SPEC CPU2000 programs emphasizes the practicality of our algorithm. 1
Optimizing Scientific Application Loops on Stream Processors
"... This paper describes a graph coloring compiler framework to allocate on-chip SRF (Stream Register File) storage for optimizing scientific applications on stream processors. Our framework consists of first applying enabling optimizations such as loop unrolling to expose stream reuse and opportunities ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper describes a graph coloring compiler framework to allocate on-chip SRF (Stream Register File) storage for optimizing scientific applications on stream processors. Our framework consists of first applying enabling optimizations such as loop unrolling to expose stream reuse and opportunities for maximizing parallelism, i.e., overlapping kernel execution and memory transfers. Then the three SRF management tasks are solved in a unified manner via graph coloring: (1) placing streams in the SRF, (2) exploiting stream use, and (3) maximizing parallelism. We evaluate the performance of our compiler framework by actually running nine representative scientific computing kernels on our FT64 stream processor. Our preliminary results show that compiler management achieves an average speedup of 2.3x compared to First-Fit allocation. In comparison with the performance results obtained from running these benchmarks on Itanium 2, an average speedup of 2.1x is observed. Categories and Subject Descriptors D.3.4 [Programming Languages]:
Unroll-based copy elimination for enhanced pipeline scheduling
, 1997
"... Abstract Enhanced pipeline scheduling (EPS) is a software pipelining technique which can achieve a variable initiation interval (II) for loops with control flows via its code motion pipelining. EPS, however, leaves behind many renaming copy instructions that cannot be coalesced due to interferences. ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract Enhanced pipeline scheduling (EPS) is a software pipelining technique which can achieve a variable initiation interval (II) for loops with control flows via its code motion pipelining. EPS, however, leaves behind many renaming copy instructions that cannot be coalesced due to interferences. These copies take resources, and more seriously, they may cause a stall if they rename a multi-latency instruction whose latency is longer than the II aimed for by EPS.
Live-range Unsplitting for Faster Optimal Coalescing
"... Register allocation is often a two-phase approach: spilling of registers to memory, followed by coalescing of registers. Extreme liverange splitting (i.e. live-range splitting after each statement) enables optimal solutions based on ILP, for both spilling and coalescing. However, while the solutions ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Register allocation is often a two-phase approach: spilling of registers to memory, followed by coalescing of registers. Extreme liverange splitting (i.e. live-range splitting after each statement) enables optimal solutions based on ILP, for both spilling and coalescing. However, while the solutions are easily found for spilling, for coalescing they are more elusive. This difficulty stems from the huge size of interference graphs resulting from live-range splitting. This report focuses on optimal coalescing in the context of extreme liverange splitting. We present some theoretical properties that give rise to an algorithm for reducing interference graphs, while preserving optimality. This reduction consists mainly in finding and removing useless splitting points. It is followed by a graph decomposition based on clique separators. The last optimization consists in two preprocessing rules. Any coalescing technique can be applied after these optimizations. Our optimizations have been tested on a standard benchmark, the optimal coalescing challenge. For this benchmark, the cutting-plane algorithm for optimal coalescing (the only optimal algorithm for coalescing) runs 300 times faster when combined with our optimizations. Moreover, we provide all the solutions of the optimal coalescing challenge, including the 3 instances that were previously unsolved.
Optimal bitwise register allocation using integer linear programming
- In International Workshop on Languages and Compilers for Parallel Computing (LCPC’06), LNCS
, 2006
"... Abstract. This paper addresses the problem of optimal global register allocation. The register allocation problem is expressed as an integer linear programming problem and solved optimally. The model is more flexible than previous graphcoloring based methods and thus allows for register allocations ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. This paper addresses the problem of optimal global register allocation. The register allocation problem is expressed as an integer linear programming problem and solved optimally. The model is more flexible than previous graphcoloring based methods and thus allows for register allocations with significantly fewer moves and spills. The formulation can also model complex architectural features, such as bit-wise access to registers. With bit-wise access to registers, multiple subword temporaries can be stored in a single register and accessed efficiently, resulting in a register allocation problem that cannot be addressed effectively with simple graph coloring. The paper describes techniques that can help reduce the problem size of the ILP formulation, making the algorithm feasible in practice. Preliminary empirical results from an implementation prototype are reported. 1

