Results 1  10
of
30
Spill Code Minimization via Interference Region Spilling
 in SIGPLAN Conference on Programming Language Design and Implementation
, 1997
"... Many optimizing compilers perform global register allocation using a Chaitinstyle graph coloring algorithm. Live ranges that cannot be allocated to registers are spilled to memory. The amount of code required to spill the live range depends on the spilling heuristic used. Chaitin's spilling heurist ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
Many optimizing compilers perform global register allocation using a Chaitinstyle graph coloring algorithm. Live ranges that cannot be allocated to registers are spilled to memory. The amount of code required to spill the live range depends on the spilling heuristic used. Chaitin's spilling heuristic offers some guidance in reducing the amount of spill code produced. However, this heuristic does not allow the partial spilling of live ranges and the reduction in spill code is limited to a local level. In this paper, we present a global technique called interference region spilling that improves the spilling granularity of any local spilling heuristic. Our technique works above the local spilling heuristic, limiting the normal insertion of spill code to a portion of each spilled live range. By partially spilling live ranges, we can achieve large reductions in dynamically executed spill code; up to 75% in some cases and an average of 33.6% across the benchmarks tested. 1 Introduction Gl...
An Efficient Representation for Sparse Sets
 ACM Letters on Programming Languages and Systems
, 1993
"... this paper, we have described a representation suitable for sets with a fixedsize universe. The representation supports constanttime implementations of clearset, member, addmember, deletemember, cardinality, and chooseone. Based on the efficiency of these operations, the new representation wi ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
this paper, we have described a representation suitable for sets with a fixedsize universe. The representation supports constanttime implementations of clearset, member, addmember, deletemember, cardinality, and chooseone. Based on the efficiency of these operations, the new representation will often be superior to alternatives such as bit vectors, balanced binary trees, hash tables, linked lists, etc. Additionally, the new representation supports enumeration of the members in O(n) time, making it a competitive choice for relatively sparse sets requiring operations like forall, setcopy, setunion, and setdifference.
A Generalized Algorithm for GraphColoring Register Allocation
, 2004
"... Graphcoloring register allocation is an elegant and extremely popular optimization for modern machines. But as currently formulated, it does not handle two characteristics commonly found in commercial architectures. First, a single register name may appear in multiple register classes, where a clas ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
Graphcoloring register allocation is an elegant and extremely popular optimization for modern machines. But as currently formulated, it does not handle two characteristics commonly found in commercial architectures. First, a single register name may appear in multiple register classes, where a class is a set of register names that are interchangeable in a particular role. Second, multiple register names may be aliases for a single hardware register. We present a generalization of graphcoloring register allocation that handles these problematic characteristics while preserving the elegance and practicality of traditional graph coloring. Our generalization adapts easily to a new target machine, requiring only the sets of names in the register classes and a map of the register aliases. It also drops easily into a wellknown graphcoloring allocator, is efficient at compile time, and produces highquality code.
Fast copy coalescing and liverange identification
 In Proceedings of the ACM Sigplan Conference on Programming Language Design and Implementation (PLDI’02
, 2002
"... This paper presents a fast new algorithm for modeling and reasoning about interferences for variables in a program without constructing an interference graph. It then describes how to use this information to minimize copy insertion for φnode instantiation during the conversion of the static single ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
This paper presents a fast new algorithm for modeling and reasoning about interferences for variables in a program without constructing an interference graph. It then describes how to use this information to minimize copy insertion for φnode instantiation during the conversion of the static single assignment (SSA) form into the controlflow graph (CFG), effectively yielding a new, very fast copy coalescing and liverange identification algorithm. This paper proves some properties of the SSA form that enable construction of data structures to compute interference information for variables that are considered for folding. The asymptotic complexity of our SSAtoCFG conversion algorithm is O(nα(n)), where n is the number of instructions in the program. Performing copy folding during the SSAtoCFG conversion eliminates the need for a separate coalescing phase while simplifying the intermediate code. This may make graphcoloring register allocation more practical in just in time (JIT) and other timecritical compilers For example, Sun’s Hotspot Server Compiler already employs a graphcoloring register allocator[10]. This paper also presents an improvement to the classical interferencegraph based coalescing optimization that shows a decrease in memory usage of up to three orders of magnitude and a decrease of a factor of two in compilation time, while providing the exact same results. We present experimental results that demonstrate that our algorithm is almost as precise (within one percent on average) as the improved interferencegraphbased coalescing algorithm, while requiring three times less compilation time.
Live Range Splitting in a Graph Coloring Register Allocator
, 1998
"... Graph coloring is the dominant paradigm for global register allocation [8, 7, 4]. Coloring allocators use an interference graph to model the conflicts that prevent two values from sharing a register. Nodes in the graph represent live ranges, or values. An edge between two nodes indicates that they a ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Graph coloring is the dominant paradigm for global register allocation [8, 7, 4]. Coloring allocators use an interference graph to model the conflicts that prevent two values from sharing a register. Nodes in the graph represent live ranges, or values. An edge between two nodes indicates that they are simultaneously live and, thus, cannot share a register.
Minimizing Buffer Requirements under RateOptimal Schedule in Regular Dataflow Networks
 Journal of VLSI Signal Processing
, 1994
"... Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a method to minimize buffer storage requirement in constructing rateoptimal compiletime (MBRO) schedules for multirate dataflow graphs. We demonstrate that the constraints to minimize buffer storage while executing at the optimal computation rate (i.e. the maximum possible computation rate without storage constraints) can be formulated as a unified linear programming problem in our framework. A novel feature of our method is that it tries to minimize the memory requirement while simultaneously maximizing the computation rate. We have constructed an experimental testbed which implements our MBRO scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules (also ...
Aligning parallel arrays to reduce communication
 In Frontiers '95: The 5th Symp. on the Frontiers of Massively Parallel Computation
, 1995
"... Axis and stride alignment is an important optimization in compiling dataparallel programs for distributedmemory machines. We previously developed an optimal algorithm for aligning array expressions. Here, we examine alignment for more general program graphs. We show that optimal alignment is NPco ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Axis and stride alignment is an important optimization in compiling dataparallel programs for distributedmemory machines. We previously developed an optimal algorithm for aligning array expressions. Here, we examine alignment for more general program graphs. We show that optimal alignment is NPcomplete in this setting, so we study heuristic methods. This paper makes two contributions. First, we show how local graph transformations can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. Second, we give aheuristic that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. Our algorithms have been implemented; we present experimental results showing their effect on the performance of some example programs running on the CM5. 1
Scalable Certification for Typed Assembly Language
, 2000
"... A typebased certifying compiler maps source code to machine code and targetlevel type annotations. The targetlevel annotations make it possible to prove easily that the machine code is typesafe, independent of the source code or compiler. To be useful across a range of source languages and compi ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
A typebased certifying compiler maps source code to machine code and targetlevel type annotations. The targetlevel annotations make it possible to prove easily that the machine code is typesafe, independent of the source code or compiler. To be useful across a range of source languages and compilers, the targetlanguage type system should provide powerful type constructors for encoding higherlevel invariants. Unfortunately, it is difficult...
APPROXIMATING MAXIMUM STABLE SET AND MINIMUM GRAPH COLORING PROBLEMS WITH THE POSITIVE SEMIDEFINITE RELAXATION
"... We compute approximate solutions to the maximum stable set problem and the minimum graph coloring problem using a positive semidefinite relaxation. The positive semidefinite programs are solved using an implementation of the dual scaling algorithm that takes advantage of the sparsity inherent in m ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We compute approximate solutions to the maximum stable set problem and the minimum graph coloring problem using a positive semidefinite relaxation. The positive semidefinite programs are solved using an implementation of the dual scaling algorithm that takes advantage of the sparsity inherent in most graphs and the structure inherent in the problem formulation. From the solution to the relaxation, we apply a randomized algorithm to find approximate maximum stable sets and a modification of a popular heuristic to find graph colorings. We obtained high quality answers for graphs with over 1000 vertices and almost 7000 edges.
Graph Coloring on Coarse Grained Multicomputers
, 2002
"... We present an efficient and scalable Coarse Grained Multicomputer (CGM) coloring algorithm that colors a graph G with at most D+ 1 colors where D is the maximum degree in G. This algorithm is given in two variants: randomized and deterministic. We show that on a pprocessor CGM model the proposed al ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We present an efficient and scalable Coarse Grained Multicomputer (CGM) coloring algorithm that colors a graph G with at most D+ 1 colors where D is the maximum degree in G. This algorithm is given in two variants: randomized and deterministic. We show that on a pprocessor CGM model the proposed algorithms require a parallel time of O( G p ) and a total work and overall communication cost of O(G). These bounds correspond to the average case for the randomized version and to the worstcase for the deterministic variant. Key words: graph algorithms, parallel algorithms, graph coloring, Coarse Grained Multicomputers 1