Results 1  10
of
48
Spill code minimization techniques for optimizing compilers
, 1989
"... Global register allocation and spilling is commonly performed by solving a graph coloring problem. In this paper we present a new coherent set of heuristic methods for reducing the amount of spill code generated. This results in more efficient (and shorter) compiled code. Our approach has been comp ..."
Abstract

Cited by 77 (0 self)
 Add to MetaCart
Global register allocation and spilling is commonly performed by solving a graph coloring problem. In this paper we present a new coherent set of heuristic methods for reducing the amount of spill code generated. This results in more efficient (and shorter) compiled code. Our approach has been compared to both standard and prioritybased coloring algorithms, universally outperforming them. In our approach, we extend the capability of the existing algorithms in several ways. First, we use multiple heuristic functions to increase the likelihood that less spill code will be inserted. We have found three complementary heuristic functions which together appear to span a large proportion of good spill decisions. Second, we use a specially tuned greedy heuristic for determining the order of deleting (and hence coloring) the unconstrained vertices. Third, we have developed a “cleaning” technique which avoids some of the insertion of spill code in nonbusy regions. kurrently with the IBM T.J. Watson Research Center
Spill Code Minimization via Interference Region Spilling
 in SIGPLAN Conference on Programming Language Design and Implementation
, 1997
"... Many optimizing compilers perform global register allocation using a Chaitinstyle graph coloring algorithm. Live ranges that cannot be allocated to registers are spilled to memory. The amount of code required to spill the live range depends on the spilling heuristic used. Chaitin's spilling he ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
(Show Context)
Many optimizing compilers perform global register allocation using a Chaitinstyle graph coloring algorithm. Live ranges that cannot be allocated to registers are spilled to memory. The amount of code required to spill the live range depends on the spilling heuristic used. Chaitin's spilling heuristic offers some guidance in reducing the amount of spill code produced. However, this heuristic does not allow the partial spilling of live ranges and the reduction in spill code is limited to a local level. In this paper, we present a global technique called interference region spilling that improves the spilling granularity of any local spilling heuristic. Our technique works above the local spilling heuristic, limiting the normal insertion of spill code to a portion of each spilled live range. By partially spilling live ranges, we can achieve large reductions in dynamically executed spill code; up to 75% in some cases and an average of 33.6% across the benchmarks tested. 1 Introduction Gl...
An Efficient Representation for Sparse Sets
 ACM LETTERS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1993
"... ..."
(Show Context)
Register Promotion by Sparse Partial Redundancy Elimination of Loads and Stores
 In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation
, 1998
"... An algorithm for register promotion is presented based on the observation that the circumstances for promoting a memory location's value to register coincide with situations where the program exhibits partial redundancy between accesses to the memory location. The recent SSAPRE algorithm for el ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
An algorithm for register promotion is presented based on the observation that the circumstances for promoting a memory location's value to register coincide with situations where the program exhibits partial redundancy between accesses to the memory location. The recent SSAPRE algorithm for eliminating partial redundancy using a sparse SSA representation forms the foundation for the present algorithm to eliminate redundancy among memory accesses, enabling us to achieve both computational and live range optimality in our register promotion results. We discuss how to effect speculative code motion in the SSAPRE framework. We present two different algorithms for performing speculative code motion: the conservative speculation algorithm used in the absence of profile data, and the the profiledriven speculation algorithm used when profile data are available. We define the static single use (SSU) form and develop the dual of the SSAPRE algorithm, called SSUPRE, to perform the partial redun...
Fast copy coalescing and liverange identification
 In Proceedings of the ACM Sigplan Conference on Programming Language Design and Implementation (PLDI’02
, 2002
"... This paper presents a fast new algorithm for modeling and reasoning about interferences for variables in a program without constructing an interference graph. It then describes how to use this information to minimize copy insertion for φnode instantiation during the conversion of the static single ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
(Show Context)
This paper presents a fast new algorithm for modeling and reasoning about interferences for variables in a program without constructing an interference graph. It then describes how to use this information to minimize copy insertion for φnode instantiation during the conversion of the static single assignment (SSA) form into the controlflow graph (CFG), effectively yielding a new, very fast copy coalescing and liverange identification algorithm. This paper proves some properties of the SSA form that enable construction of data structures to compute interference information for variables that are considered for folding. The asymptotic complexity of our SSAtoCFG conversion algorithm is O(nα(n)), where n is the number of instructions in the program. Performing copy folding during the SSAtoCFG conversion eliminates the need for a separate coalescing phase while simplifying the intermediate code. This may make graphcoloring register allocation more practical in just in time (JIT) and other timecritical compilers For example, Sun’s Hotspot Server Compiler already employs a graphcoloring register allocator[10]. This paper also presents an improvement to the classical interferencegraph based coalescing optimization that shows a decrease in memory usage of up to three orders of magnitude and a decrease of a factor of two in compilation time, while providing the exact same results. We present experimental results that demonstrate that our algorithm is almost as precise (within one percent on average) as the improved interferencegraphbased coalescing algorithm, while requiring three times less compilation time.
A Generalized Algorithm for GraphColoring Register Allocation
, 2004
"... Graphcoloring register allocation is an elegant and extremely popular optimization for modern machines. But as currently formulated, it does not handle two characteristics commonly found in commercial architectures. First, a single register name may appear in multiple register classes, where a clas ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
(Show Context)
Graphcoloring register allocation is an elegant and extremely popular optimization for modern machines. But as currently formulated, it does not handle two characteristics commonly found in commercial architectures. First, a single register name may appear in multiple register classes, where a class is a set of register names that are interchangeable in a particular role. Second, multiple register names may be aliases for a single hardware register. We present a generalization of graphcoloring register allocation that handles these problematic characteristics while preserving the elegance and practicality of traditional graph coloring. Our generalization adapts easily to a new target machine, requiring only the sets of names in the register classes and a map of the register aliases. It also drops easily into a wellknown graphcoloring allocator, is efficient at compile time, and produces highquality code.
Live Range Splitting in a Graph Coloring Register Allocator
, 1998
"... Graph coloring is the dominant paradigm for global register allocation [8, 7, 4]. Coloring allocators use an interference graph to model the conflicts that prevent two values from sharing a register. Nodes in the graph represent live ranges, or values. An edge between two nodes indicates that they a ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
(Show Context)
Graph coloring is the dominant paradigm for global register allocation [8, 7, 4]. Coloring allocators use an interference graph to model the conflicts that prevent two values from sharing a register. Nodes in the graph represent live ranges, or values. An edge between two nodes indicates that they are simultaneously live and, thus, cannot share a register.
Minimizing Buffer Requirements under RateOptimal Schedule in Regular Dataflow Networks
 Journal of VLSI Signal Processing
, 1994
"... Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Largegrain synchronous dataflow graphs or multirate graphs have the distinct feature that the nodes of the dataflow graph fire at different rates. Such multirate largegrain dataflow graphs have been widely regarded as a powerful programming model for DSP applications. In this paper we propose a method to minimize buffer storage requirement in constructing rateoptimal compiletime (MBRO) schedules for multirate dataflow graphs. We demonstrate that the constraints to minimize buffer storage while executing at the optimal computation rate (i.e. the maximum possible computation rate without storage constraints) can be formulated as a unified linear programming problem in our framework. A novel feature of our method is that it tries to minimize the memory requirement while simultaneously maximizing the computation rate. We have constructed an experimental testbed which implements our MBRO scheduling algorithm as well as (i) the widely used periodic admissible parallel schedules (also ...
Aligning parallel arrays to reduce communication
 IN FRONTIERS '95: THE 5TH SYMP. ON THE FRONTIERS OF MASSIVELY PARALLEL COMPUTATION
, 1995
"... Axis and stride alignment is an important optimization in compiling dataparallel programs for distributedmemory machines. We previously developed an optimal algorithm for aligning array expressions. Here, we examine alignment for more general program graphs. We show that optimal alignment is NPco ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
Axis and stride alignment is an important optimization in compiling dataparallel programs for distributedmemory machines. We previously developed an optimal algorithm for aligning array expressions. Here, we examine alignment for more general program graphs. We show that optimal alignment is NPcomplete in this setting, so we study heuristic methods. This paper makes two contributions. First, we show how local graph transformations can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. Second, we give aheuristic that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. Our algorithms have been implemented; we present experimental results showing their effect on the performance of some example programs running on the CM5.