Results 1  10
of
11
Register spilling and liverange splitting for SSAform programs
 IN: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION. LECTURE NOTES IN COMPUTER SCIENCE
, 2009
"... Register allocation decides which parts of a variable’s live range are held in registers and which in memory. The compiler inserts spill code to move the values of variables between registers and memory. Since fetching data from memory is much slower than reading directly from a register, careful s ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Register allocation decides which parts of a variable’s live range are held in registers and which in memory. The compiler inserts spill code to move the values of variables between registers and memory. Since fetching data from memory is much slower than reading directly from a register, careful spill code insertion is critical for the performance of the compiled program. In this paper, we present a spilling algorithm for programs in SSA form. Our algorithm generalizes the wellknown furthestfirst algorithm, which is known to work well on straightline code, to controlflow graphs. We evaluate our technique by counting the executed spilling instructions in the CINT2000 benchmark on an x86 machine. The number of executed load (store) instructions was reduced by 54.5 % (61.5%) compared to a stateoftheart linear scan allocator and reduced by 58.2 % (41.9%) compared to a standard graphcoloring allocator. The runtime of our algorithm is competitive with standard linearscan allocators.
PreferenceGuided Register Assignment
 In Compiler Construction 2010
, 2010
"... Abstract. This paper deals with coalescing in SSAbased register allocation. Current coalescing techniques all require the interference graph to be built. This is generally considered to be too compiletime intensive for justintime compilation. In this paper, we present a biased coloring approach ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Abstract. This paper deals with coalescing in SSAbased register allocation. Current coalescing techniques all require the interference graph to be built. This is generally considered to be too compiletime intensive for justintime compilation. In this paper, we present a biased coloring approach that gives results similar to standalone coalescers while significantly reducing compile time. 1
SSA Elimination after Register Allocation
"... Abstract. Compilers such as gcc use staticsingleassignment (SSA) form as an intermediate representation and usually perform SSA elimination before register allocation. But the order could as well be the opposite: the recent approach of SSAbased register allocation performs SSA elimination after r ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Compilers such as gcc use staticsingleassignment (SSA) form as an intermediate representation and usually perform SSA elimination before register allocation. But the order could as well be the opposite: the recent approach of SSAbased register allocation performs SSA elimination after register allocation. SSA elimination before register allocation is straightforward and standard, while previously described approaches to SSA elimination after register allocation have shortcomings; in particular, they have problems with implementing copies between memory locations. We present spillfree SSA elimination, a simple and efficient algorithm for SSA elimination after register allocation that avoids increasing the number of spilled variables. We also present three optimizations of the core algorithm. Our experiments show that spillfree SSA elimination takes less than five percent of the total compilation time of a JIT compiler. Our optimizations reduce the number of memory accesses by more than 9 % and improve the program execution time by more than 1.8%. 1
Linear scan register allocation on ssa form
 In Proceedings of the International Symposium on Code Generation and Optimization
, 2010
"... The linear scan algorithm for register allocation provides a good register assignment with a low compilation overhead and is thus frequently used for justintime compilers. Although most of these compilers use static single assignment (SSA) form, the algorithm has not yet been applied on SSA form, ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
The linear scan algorithm for register allocation provides a good register assignment with a low compilation overhead and is thus frequently used for justintime compilers. Although most of these compilers use static single assignment (SSA) form, the algorithm has not yet been applied on SSA form, i.e., SSA form is usually deconstructed before register allocation. However, the structural properties of SSA form can be used to simplify the algorithm. With only one definition per variable, lifetime intervals (the main data structure) can be constructed without data flow analysis. During allocation, some tests of interval intersection can be skipped because SSA form guarantees nonintersection. Finally, deconstruction of SSA form after register allocation can be integrated into the resolution phase of the register allocator without much additional code. We modified the linear scan register allocator of the Java HotSpot TM client compiler so that it operates on SSA form. The evaluation shows that our simpler and faster version generates equally good or slightly better machine code.
An Optimistic and Conservative Register Assignment Heuristic for Chordal Graphs
, 2007
"... This paper presents a new register assignment heuristic for procedures in SSA Form, whose interference graphs are chordal; the heuristic is called optimistic chordal coloring (OCC). Previous register assignment heuristics eliminate copy instructions via coalescing, in other words, merging nodes in t ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
This paper presents a new register assignment heuristic for procedures in SSA Form, whose interference graphs are chordal; the heuristic is called optimistic chordal coloring (OCC). Previous register assignment heuristics eliminate copy instructions via coalescing, in other words, merging nodes in the interference graph. Node merging, however, can not preserve the chordal graph property, making it unappealing for SSAbased register allocation. OCC is based on graph coloring, but does not employ coalescing, and, consequently, preserves graph chordality, and does not increase its chromatic number; in this sense, OCC is conservative as well as optimistic. OCC is observed to eliminate at least as many dynamically executed copy instructions as iterated register coalescing (IRC) for a set of chordal interference graphs generated from several Mediabench and MiBench applications. In many cases, OCC and IRC were able to find optimal or nearoptimal solutions for these graphs. OCC ran 1.89x faster than IRC, on average.
Optimal polynomialtime interprocedural register allocation for high level synthesis and ASIP design
 IN PROC. INT. CONF. COMPUT.AIDED DESIGN, 2007
, 2007
"... Register allocation, in highlevel synthesis and ASIP design, is the process of determining the number of registers to include in the resulting circuit or processor. The goal is to allocate the minimum number of registers such that no scalar variable is spilled to memory. Previously, an optimal poly ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Register allocation, in highlevel synthesis and ASIP design, is the process of determining the number of registers to include in the resulting circuit or processor. The goal is to allocate the minimum number of registers such that no scalar variable is spilled to memory. Previously, an optimal polynomialtime algorithm for this problem has been presented for individual procedures represented in Static Single Assignment (SSA) Form. This result is now extended to complete programs (or subprograms), as long as: (1) each procedure is represented in SSA Form; and (2) at every procedure call, all live variables are split at the call point. With this representation, it is possible to ensure that the interprocedural interference graph (IIG) is chordal, and can therefore be colored optimally in polynomial time. An optimal coloring of the IIG can be achieved by allocating registers for each procedure individually. Previous work has shown that optimal register allocation in SSA Form does not require an interference graph. Optimal interprocedural register allocation, therefore, is achieved without constructing an interference graph, giving the optimal algorithm a significant runtime advantage over prior suboptimal heuristics.
Punctual coalescing
 In Proceedings of the International Conference on Compiler Construction, CC'10
, 2010
"... Abstract. Compilers use register coalescing to avoid generating code for copy instructions. For architectures with register aliasing such as x86, Smith, Ramsey, and Holloway (2004) presented a polynomialtime approach, while Scholz and Eckstein (2002) presented an optimal, exponentialtime approac ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Compilers use register coalescing to avoid generating code for copy instructions. For architectures with register aliasing such as x86, Smith, Ramsey, and Holloway (2004) presented a polynomialtime approach, while Scholz and Eckstein (2002) presented an optimal, exponentialtime approach together with a nearoptimal, quadratictime heuristic. Both methods scale poorly after aggressive live range splitting, especially for programs in elementary form where live ranges are split at every program point. In contrast, we mentioned in a previous paper (2008), without giving details, that we have a scalable, lineartime heuristic for programs in elementary form. In an effort to formalize that heuristic, we discovered an even better algorithm, called Punctual Coalescing, which we present here. Punctual Coalescing is scalable, linear time, locally optimal in general, close to globally optimal for straightline code, and proven correct with the Twelf theorem prover. We define global optimality with an ILPformulation and we show via experiments that Punctual Coalescing compares well to this and two other approaches. 1
Coordinated Resource Optimization in Behavioral Synthesis
"... Abstract—Reducing resource usage is one of the most important optimization objectives in behavioral synthesis due to its direct impact on power, performance and cost. The datapath in a typical design is composed of different kinds of components, including functional units, registers and multiplexers ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Reducing resource usage is one of the most important optimization objectives in behavioral synthesis due to its direct impact on power, performance and cost. The datapath in a typical design is composed of different kinds of components, including functional units, registers and multiplexers. To optimize the overall resource usage, a behavioral synthesis tool should consider all kinds of components at the same time. However, most previous work on behavioral synthesis has the limitations of (i) not being able to consider all kinds of resources globally, and/or (ii) separating the synthesis process into a sequence of optimization steps without a consistent optimization objective. In this paper we present a behavioral synthesis flow in which all types of components in the datapath are modeled and optimized consistently. The key idea is to feed to the scheduler the intentions for sharing functional units and registers in favor of the global optimization goal (such as total area), so that the scheduler could generate a schedule that makes the sharing intentions feasible. Experiments show that compared to the solution of minimizing functional unit requirements in scheduling and using the least number of functional units and registers in binding, our solution achieves a 24 % reduction in total area; compared to the online tool provided by ctoverilog.com, our solution achieves a 30% reduction on average. I.
Program Interpolation
"... Program interpolation is a new type of transformation that given an input program written in a specially constructed Domain Specific Language (DSL), produces a family of functionally equivalent instruction sequences as output. Each sequence is an “interpolation” between the controlflows of implemen ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Program interpolation is a new type of transformation that given an input program written in a specially constructed Domain Specific Language (DSL), produces a family of functionally equivalent instruction sequences as output. Each sequence is an “interpolation” between the controlflows of implementation strategies supplied in the input program. The purpose of the transformation is to expose behavioural differences (e.g. performance) within the sequences, and thus allow automated optimisation with respect to architectural tradeoffs that are difficult to quantify and model. We present results from a prototype compiler that demonstrate a 63 % speedup in the domain of multiprecision integer arithmetic. 1.
An Optimal LinearTime Algorithm for Interprocedural Register Allocation in High Level Synthesis Using SSA Form
, 2010
"... An optimal lineartime algorithm for interprocedural register allocation in high level synthesis is presented. Historically, register allocation has been modeled as a graph coloring problem, which is nondeterministic polynomial timecomplete in general; however, converting each procedure to static ..."
Abstract
 Add to MetaCart
An optimal lineartime algorithm for interprocedural register allocation in high level synthesis is presented. Historically, register allocation has been modeled as a graph coloring problem, which is nondeterministic polynomial timecomplete in general; however, converting each procedure to static single assignment (SSA) form ensures a chordal interference graph, which can be colored in O(V+E) time; the interprocedural interference graph (IIG) is not guaranteed to be chordal after this transformation. An extension to SSA form is introduced which ensures that the IIG is chordal, and the conversion process does not increase its chromatic number. The resulting IIG can then be colored in lineartime.