Results 11 - 20
of
46
Rematerialization
, 1992
"... This paper examines a problem that arises during global register allocation -- rematerialization. If a value cannot be kept in a register, the allocator should recognize when it is cheaper to recompute the value (rematerialize it) than to store and reload it. Chaitin's original graph-coloring alloca ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
This paper examines a problem that arises during global register allocation -- rematerialization. If a value cannot be kept in a register, the allocator should recognize when it is cheaper to recompute the value (rematerialize it) than to store and reload it. Chaitin's original graph-coloring allocator handled simple instances of this problem correctly. This paper details a general solution to the problem and presents experimental evidence that shows its importance. Our approach is to tag individual values in the procedure 's SSA graph with information specifying how it should be spilled. We use a variant of Wegman and Zadeck's sparse simple constant algorithm to propagate tags throughout the graph. The allocator then splits live ranges into values with different tags. This isolates those values that can be easily rematerialized from values that require general spilling. We modify the base allocator to use this information when estimating spill costs and introducing spill code. Our p...
Load/Store Range Analysis for Global Register Allocation
- Proc. of the SIGPLAN Conference on Programming Language Design and Implementation
, 1994
"... Live range splitting techniques improve global register allocation by splitting the live ranges of variables into segments that are individually allocated registers. Load/store range analysis is a new technique for live range splitting that is based on reaching definition and live variable analyses. ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Live range splitting techniques improve global register allocation by splitting the live ranges of variables into segments that are individually allocated registers. Load/store range analysis is a new technique for live range splitting that is based on reaching definition and live variable analyses. Our analysis localizes the profits and the register requirements of every access to every variable to provide a fine granularity of candidates for register allocation. Experiments on a suite of C and FORTRAN benchmark programs show that a graph coloring register allocator operating on load/store ranges often provides better allocations than the same allocator operating on live ranges. Experimental results also show that the computational cost of using load/store ranges for register allocation is moderately more than the cost of using live ranges. 1 Introduction Register allocation maps variables in an intermediate language program to either registers or memory locations in order to minimiz...
Dependence Analysis for Java
- In 12th International Workshop on Languages and Compilers for Parallel Computing
, 1999
"... . We describe a novel approach to performing data dependence analysis for Java in the presence of Java's "non-traditional" language features such as exceptions, synchronization, and memory consistency. We introduce new classes of edges in a dependence graph to model code motion constraints arising f ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
. We describe a novel approach to performing data dependence analysis for Java in the presence of Java's "non-traditional" language features such as exceptions, synchronization, and memory consistency. We introduce new classes of edges in a dependence graph to model code motion constraints arising from these language features. We present a linear-time algorithm for constructing this augmented dependence graph for an extended basic block. 1 Introduction Data dependence analysis is a fundamental program analysis technique used by optimizing and parallelizing compilers to identify constraints on data flow, code motion, and instruction reordering [11]. It is desirable for dependence analysis to be as precise as possible so as to minimize code motion constraints and maximize opportunities for program transformations and optimizations such as instruction scheduling. Precise dependence analysis for scalar variables is well understood; e.g., an effective solution is to use SSA form [6]...
Run-time Compilation for Parallel Sparse Matrix Computations
- In Proceedings of ACM International Conference on Supercomputing
, 1996
"... Run-time compilation techniques have been shown effective for automating the parallelization of loops with unstructured indirect data accessing patterns. However, it is still an open problem to efficiently parallelize sparse matrix factorizations commonly used in iterative numerical problems. The di ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
Run-time compilation techniques have been shown effective for automating the parallelization of loops with unstructured indirect data accessing patterns. However, it is still an open problem to efficiently parallelize sparse matrix factorizations commonly used in iterative numerical problems. The difficulty is that a factorization process contains irregularlyinterleaved communication and computation with varying granularities and it is hard to obtain scalable performance on distributed memory machines. In this paper, we present an inspector/executor approach for parallelizing such applications by embodying automatic graph scheduling techniques to optimize interleaved communication and computation. We describe a run-time system called RAPID that provides a set of library functions for specifying irregular data objects and tasks that access these objects. The system extracts a task dependence graph from data access patterns, and executes tasks efficiently on a distributed memory machine....
Extended SSA with Factored Use-Def Chains to Support Optimization and Parallelism
- In Proceedings of the 27th Annual Hawaii International Conference on System Sciences
"... This paper describes our implementation of the Static Single Assignment (SSA) form of intermediate program representation in our parallelizing Fortran 90 compiler, Nascent. Although the traditional SSA form algorithm renames variables uniquely at every definition point, it is not practical to add ne ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
This paper describes our implementation of the Static Single Assignment (SSA) form of intermediate program representation in our parallelizing Fortran 90 compiler, Nascent. Although the traditional SSA form algorithm renames variables uniquely at every definition point, it is not practical to add new names to the symbol table at all assignments. Thus, most implementations actually provide def-use chains for each definition. In contrast, we provide use-def chains, so that in the intermediate representation the link at each use points to its unique reaching definition. We discuss how our approach improves the implementation and efficiency of optimization and analysis techniques such as induction variable recognition and scalar dependence identification, used in the detection of parallelism. We also support parallelism by extending the traditional SSA form into languages with parallel constructs. 1 Introduction The Static Single Assignment (SSA) form for intermediate program flow represe...
An Efficient Construction of Parallel Static Single Assignment Form for Structured Parallel Programs
, 1991
"... This paper describes an efficient method of computing Static Single Assignment form explicitly parallel programs using parallel sections and wait clauses. Static Single Assignment form [Cytron et al. 89] is an efficient intermediate representation and has been used as a platform for various classica ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper describes an efficient method of computing Static Single Assignment form explicitly parallel programs using parallel sections and wait clauses. Static Single Assignment form [Cytron et al. 89] is an efficient intermediate representation and has been used as a platform for various classical code optimization algorithms [Rosen et al. 88], [Alpern et al. 88], [Wegman & Kenneth Zadeck 85]. We believe the SSA form will also be useful in performing code optimizations on parallel programs and our techniques will allow us to extend existing algorithms to optimize this class of explicitly parallel programs.
An Evaluation of the Potential Benefits of Register Allocation for Array References
- In Workshop on Interaction between Compilers and Computer Architectures in conjuction with the HPCA-2
, 1996
"... Array references are common in scientific and engineering programs. However, current compilers for scalar processors do not keep array elements in registers for reuse. This is mainly because it is unclear whether the potential benefits would justify the more difficult array data flow analysis and al ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Array references are common in scientific and engineering programs. However, current compilers for scalar processors do not keep array elements in registers for reuse. This is mainly because it is unclear whether the potential benefits would justify the more difficult array data flow analysis and also because processor architectures have yet to provide efficient mechanisms for addressing array elements in registers. The goal of this paper is to evaluate the potential benefits of register allocation for array elements in scalar processors. Using trace-driven simulations of a set of benchmarking programs, which include nine from the Perfect Club Suite, four from SPEC92, and four from an image processing benchmark suite, we study the potential benefit. We relate the register reference percentage to the data reference characteristics of different programs and examine the effect of varying the number of registers from 16 to 512. We also discuss the effect of code reordering and compiler ana...
Register allocation : what does the NP-Completeness proof of Chaitin et al. really prove? Or revisting register allocation: why and how
- In Proc. of the 19 th International Workshop on Languages and Compilers for Parallel Computing (LCPC ’06
, 2006
"... Register allocation is one of the most studied problems in compilation. It is considered as an NP-complete problem since Chaitin et al., in 1981, modeled the problem of assigning temporary variables to k machine registers as the problem of coloring, with k colors, the interference graph associated t ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Register allocation is one of the most studied problems in compilation. It is considered as an NP-complete problem since Chaitin et al., in 1981, modeled the problem of assigning temporary variables to k machine registers as the problem of coloring, with k colors, the interference graph associated to the variables. The fact that the interference graph can be arbitrary proves the NP-completeness of this formulation. However, this original proof does not really show where the complexity of register allocation comes from. Recently, the re-discovery that interference graphs of SSA programs can be colored in polynomial time raised the question: Can we exploit SSA form to perform register allocation in polynomial time, without contradicting Chaitin et al’s NP-completeness result? To address such a question and, more generally, the complexity of register allocation, we revisit Chaitin et al’s proof to better identify the interactions between spilling (load/store insertion), coalescing/splitting (removal/insertion of moves between registers), critical edges (a property of the control-flow graph), and coloring (assignment to registers). In particular, we show that, in general (we will make clear when), it is easy to decide if temporary variables can be assigned to k registers or if some spilling is necessary. In other words, the real complexity does not come from the coloring itself (as a wrong interpretation of the proof of Chaitin et al. may suggest) but comes from the presence of critical edges and from the optimizations of spilling and coalescing.
Run-time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures
- Journal of Parallel and Distributed Computing
, 1997
"... Automatic scheduling for directed acyclic graphs (DAG) and its applications for coarse-grained irregular problems such as large n-body simulation have been studied in the literature. However solving irregular problems with mixed granularities such as sparse matrix factorization is challenging since ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Automatic scheduling for directed acyclic graphs (DAG) and its applications for coarse-grained irregular problems such as large n-body simulation have been studied in the literature. However solving irregular problems with mixed granularities such as sparse matrix factorization is challenging since it requires efficient run-time support to execute a DAG schedule. In this paper, we investigate run-time optimization techniques for executing general asynchronous DAG schedules on distributed memory machines and discuss an approach for exploiting parallelism from commuting operations in the DAG model. Our solution tightly integrates the run-time scheme with a fast communication mechanism to eliminate unnecessary overhead in message buffering and copying. We present a consistency model incorporating the above optimizations, and taking advantage of task dependence properties to ensure the correctness of execution. We demonstrate the applications of this scheme in sparse matrix factorizations ...
Parallelizing Compilers: Implementation and Effectiveness
, 1993
"... An important thank you goes to one of my undergraduate professors, Ken Kennedy. He proposed the project that led to this thesis, and my desire to know the answer gave me the strength to complete this work. I would like to thank the languages group at Kubota Pacific Computers, Inc. for showing me tha ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
An important thank you goes to one of my undergraduate professors, Ken Kennedy. He proposed the project that led to this thesis, and my desire to know the answer gave me the strength to complete this work. I would like to thank the languages group at Kubota Pacific Computers, Inc. for showing me that I could indeed be productive and that all problems in compilers did not take years to solve. My sanity is thanks to all of my friends from dancing, "O " runs, and everything else. They made it possible to return to work each day and eventually to graduate. I owe my parents a great debt for encouraging me to stay in graduate school even when I thought I would never finish. Last, but certainly not least, I would like to thank Don Ramsey for reading many drafts and listening to many dry runs. His input greatly helped the presentation of this thesis in both oral and written forms.

