Results 1 - 10
of
23
Register Allocation via Graph Coloring
, 1992
"... Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimis ..."
Abstract
-
Cited by 133 (4 self)
- Add to MetaCart
Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimistically assumes any node of high degree will not be colored and must therefore be spilled. By optimistically assuming that nodes of high degree will receive colors, I often achieve lower spill costs and faster code; my results are never worse. Coloring pairs The pessimism of Chaitin's coloring heuristic is emphasized when trying to color register pairs. My heuristic handles pairs as a natural consequence of its optimism. Rematerialization Chaitin et al. introduced the idea of rematerialization to avoid the expense of spilling and reloading certain simple values. By propagating rematerialization information around the SSA graph using a simple variation of Wegman and Zadeck's constant propag...
Compositional Analysis of Modular Logic Programs
- In Proc. Twentieth Annual ACM Symp. on Principles of Programming Languages
, 1993
"... This paper describes a semantic basis for a compositional approach to the analysis of logic programs. A logic program is viewed as consisting of a set of modules, each module defining a subset of the program's predicates. Analyses are constructed by considering abstract interpretations of a compo ..."
Abstract
-
Cited by 52 (10 self)
- Add to MetaCart
This paper describes a semantic basis for a compositional approach to the analysis of logic programs. A logic program is viewed as consisting of a set of modules, each module defining a subset of the program's predicates. Analyses are constructed by considering abstract interpretations of a compositional semantics. The abstract meaning of a module corresponds to its analysis and composition of abstract meanings corresponds to composition of analyses. Such an approach is essential for large program development so that altering one module does not require re-analysis of the entire program. A compositional analysis for ground dependencies is included to illustrate the approach. To the best of our knowledge this is the first account of a compositional framework for the analysis of (logic) programs. 1 Introduction It is widely acknowledged that as the size of a program increases, it becomes impractical to maintain it as a single monolithic structure. Instead, the program has to be...
No Assembly Required: Compiling Standard ML to C
- ACM Letters on Programming Languages and Systems
, 1990
"... C has been used as a portable target language for implementing languages like Standard ML and Scheme. Previous efforts at compiling these languages to C have produced efficient code, but also compromised on portability and proper tail-recursion. We show how to compile Standard ML to C without making ..."
Abstract
-
Cited by 50 (1 self)
- Add to MetaCart
C has been used as a portable target language for implementing languages like Standard ML and Scheme. Previous efforts at compiling these languages to C have produced efficient code, but also compromised on portability and proper tail-recursion. We show how to compile Standard ML to C without making such compromises. The compilation technique is based on converting Standard ML to a continuation-passing style -calculus intermediate language, and then compiling the continuationpassing style -calculus to C. The generated code achieves an execution speed that is about a factor of two slower than a native code compiler. The generated code is highly portable, yet still supports advanced features like garbage-collection and first-class continuations. We analyze aspects of the compilation method that lead to the observed slowdown, and suggest changes to C compilers that would better support the compilation method. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors-...
An Experiment with Inline Substitution
, 1991
"... This paper describes an experiment undertaken to evaluate the effectiveness of inline substitution as a method of improving the running time of compiled code. Our particular interests are in the interaction between inline substitution and aggressive code optimization. To understand this relationship ..."
Abstract
-
Cited by 42 (9 self)
- Add to MetaCart
This paper describes an experiment undertaken to evaluate the effectiveness of inline substitution as a method of improving the running time of compiled code. Our particular interests are in the interaction between inline substitution and aggressive code optimization. To understand this relationship, we used commercially available FORTRAN optimizing compilers as the basis for our study. This paper reports on the effectiveness of the various compilers at optimizing the inlined code. We examine both the runtime performance of the resulting code and the compile-time performance of the compilers. This work can be viewed as a study of the effectiveness of inlining in modern optimizers; alternatively, it can be viewed as one data point on the overall effectiveness of modern optimizing compilers. We discovered that, with optimizing FORTRAN compilers, (1) object-code growth from inlining is substantially smaller than source-code growth, (2) compile-time growth from inlining is smaller than source-code growth, and (3) the compilers we tested were not able to capitalize consistently inlining
alto: A Link-Time Optimizer for the Compaq Alpha
- Software - Practice and Experience
, 1999
"... Traditional optimizing compilers are limited in the scope of their optimizations by the fact that only a single function, or possibly a single module, is available for analysis and optimization. In particular, this means that library routines cannot be optimized to specific calling contexts. Other ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
Traditional optimizing compilers are limited in the scope of their optimizations by the fact that only a single function, or possibly a single module, is available for analysis and optimization. In particular, this means that library routines cannot be optimized to specific calling contexts. Other optimization opportunities, exploiting information not available before linktime such as addresses of variables and the final code layout, are often ignored because linkers are traditionally unsophisticated. A possible solution is to carry out whole-program optimization at link time. This paper describes alto, a link-time optimizer for the Compaq Alpha architecture. It is able to realize significant performance improvements even for programs compiled with a good optimizing compiler with a high level of optimization. The resulting code is considerably faster that that obtained using the OM link-time optimizer, even when the latter is used in conjunction with profile-guided and inter-fi...
Hot Cold Optimization of Large Windows/NT Applications
- Proc. MICRO29
, 1996
"... A dynamic instruction trace often contains many unnecessary instructions that are required only by the unexecuted portion of the program. Hot-cold optimization (HCO) is a technique that realizes this performance opportunity. HCO uses profile information to partition each routine into frequently exec ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
A dynamic instruction trace often contains many unnecessary instructions that are required only by the unexecuted portion of the program. Hot-cold optimization (HCO) is a technique that realizes this performance opportunity. HCO uses profile information to partition each routine into frequently executed (hot) and infrequently executed (cold) parts. Unnecessary operations in the hot portion are removed, and compensation code is added on transitions from hot to cold as needed. We evaluate HCO on a collection of large Windows NT applications. HCO is most effective on the programs that are call intensive and have flat profiles, providing a 3-8% reduction in path length beyond conventional optimization. 1. Introduction Typically, compiler writers study code listings to look for optimization opportunities. However, we have found that a dynamic view, studying instruction traces, gives a very different perspective on code quality[Sites95]. High quality code often looks unoptimized when viewe...
Unexpected Side Effects of Inline Substitution: A Case Study
- ACM LETTERS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1992
"... This paper describes the specific problem that we encountered, explores its origins, and examines the ability of several analytical techniques to help the compiler avoid similar problems. ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
This paper describes the specific problem that we encountered, explores its origins, and examines the ability of several analytical techniques to help the compiler avoid similar problems.
A framework for unrestricted whole-program optimization
- In ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation
, 2006
"... Procedures have long been the basic units of compilation in conventional optimization frameworks. However, procedures are typically formed to serve software engineering rather than optimization goals, arbitrarily constraining code transformations. Techniques, such as aggressive inlining and interpro ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
Procedures have long been the basic units of compilation in conventional optimization frameworks. However, procedures are typically formed to serve software engineering rather than optimization goals, arbitrarily constraining code transformations. Techniques, such as aggressive inlining and interprocedural optimization, have been developed to alleviate this problem, but, due to code growth and compile time issues, these can be applied only sparingly. This paper introduces the Procedure Boundary Elimination (PBE) compilation framework, which allows unrestricted whole-program optimization. PBE allows all intra-procedural optimizations and analyses to operate on arbitrary subgraphs of the program, regardless of the original procedure boundaries and without resorting to inlining. In order to control compilation time, PBE also introduces novel extensions of region formation and encapsulation. PBE enables targeted code specialization, which recovers the specialization benefits of inlining while keeping code growth in check. This paper shows that PBE attains better performance than inlining with half the code growth.
Minimum Cost Interprocedural Register Allocation
- In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
, 1996
"... Past register allocators have applied heuristics to allocate registers at the local, global, and interprocedural levels. This paper presents a polynomial time interprocedural register allocator that models the cost of allocating registers to procedures and spilling registers across calls. To find th ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Past register allocators have applied heuristics to allocate registers at the local, global, and interprocedural levels. This paper presents a polynomial time interprocedural register allocator that models the cost of allocating registers to procedures and spilling registers across calls. To find the minimum cost allocation, our allocator maps solutions from a dual network flow problem that can be solved in polynomial time. Experiments show that our interprocedural register allocator can yield significant improvements in execution time. 1 Introduction Effectively using registers can significantly decrease the execution time of a program. Common policy in current compilers using only intraprocedural register allocation is to spill at call sites registers that might be used by both the caller and callee[CHKW86]. The goal of interprocedural register allocation is to minimize execution time given the register requirements of individual procedures in a program. Based on these requirements...

