Results 1 - 10
of
277
Efficiently computing static single assignment form and the control dependence graph
- ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1991
"... In optimizing compilers, data structure choices directly influence the power and efficiency of practical program optimization. A poor choice of data structure can inhibit optimization or slow compilation to the point that advanced optimization features become undesirable. Recently, static single ass ..."
Abstract
-
Cited by 749 (7 self)
- Add to MetaCart
In optimizing compilers, data structure choices directly influence the power and efficiency of practical program optimization. A poor choice of data structure can inhibit optimization or slow compilation to the point that advanced optimization features become undesirable. Recently, static single assignment form and the control dependence graph have been proposed to represent data flow and control flow propertiee of programs. Each of these previously unrelated techniques lends efficiency and power to a useful class of program optimization. Although both of these structures are attractive, the difficulty of their construction and their potential size have discouraged their use. We present new algorithms that efficiently compute these data structures for arbitrary control flow graphs. The algorithms use dominance frontiers, a new concept that may have other applications. We also give analytical and experimental evidence that all of these data structures are usually linear in the size of the original program. This paper thus presents strong evidence that these structures can be of practical use in optimization.
The Superblock: An effective technique for VLIW and superscalar compilation
- THE JOURNAL OF SUPERCOMPUTING
, 1993
"... A compiler for VLIW and superscalar processors must expose sufficient instruction-level parallelism (ILP) to eddectively utilize the parallel hardware. However, ILP within basic blocks is extremely limited for control-intensive programs. We have developed a set of techniques for exploiting ILP acros ..."
Abstract
-
Cited by 249 (26 self)
- Add to MetaCart
A compiler for VLIW and superscalar processors must expose sufficient instruction-level parallelism (ILP) to eddectively utilize the parallel hardware. However, ILP within basic blocks is extremely limited for control-intensive programs. We have developed a set of techniques for exploiting ILP across basic block boundaries. These techniques are based on a novel structure called the superblock. The superblock enables the optimizer and scheduler to extract more ILP along the important execution paths by systematically removing constraints due to the unimportant paths. Superblock optimization and scheduling have been implemented in the IMPACT-I compiler. This implementation gives us a unique opportunity to fully understand the issues involved in incorporating these techniques into a real compiler. Superblock optimizations and scheduling are shown to be useful while taking into account a variety of architectural features.
IMPACT: An architectural framework for multiple-instruction-issue processors
- in Proceedings of the 18th International Symposium on Computer Architecture
, 1991
"... The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate e cient code for concurrent hardware. In the IM-PACT project, we havedeveloped IMPACT-I, a highly optimizing C compiler to exploit instruction level concurrency. The optimization ca ..."
Abstract
-
Cited by 203 (41 self)
- Add to MetaCart
The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate e cient code for concurrent hardware. In the IM-PACT project, we havedeveloped IMPACT-I, a highly optimizing C compiler to exploit instruction level concurrency. The optimization capabilities of the IMPACT-I C compiler are summarized in this paper. Using the IMPACT-I C compiler, we ran experiments to analyze the performance of multiple-instruction-issue processors executing some important non-numerical programs. The multiple-instruction-issue processors achieve solid speedup over high-performance single-instruction-issue processors. We ran experiments to characterize the following architectural design issues: code scheduling model, instruction issue rate, memory load latency, and function unit resource limitations. Based on the experimental results, we propose the IMPACT Architectural Framework, a set of architectural features that best support the IMPACT-I C compiler to generate e cient code for multiple-instructionissue processors. By supporting these architectural features, multiple-instruction-issue implementations of existing and new architectures receive immediate compilation support from the IMPACT-I C compiler. 1
Improving Register Allocation for Subscripted Variables
, 1990
"... INTRODUCTION By the late 1980s, memory system performance and CPU performance had already begun to diverge. This trend made effective use of the register file imperative for excellent performance. Although most compilers at that time allocated scalar variables to registers using graph coloring with ..."
Abstract
-
Cited by 192 (34 self)
- Add to MetaCart
INTRODUCTION By the late 1980s, memory system performance and CPU performance had already begun to diverge. This trend made effective use of the register file imperative for excellent performance. Although most compilers at that time allocated scalar variables to registers using graph coloring with marked success [12, 13, 14, 6], allocation of array values to registers only occurred in rare circumstances because standard data-flow analysis techniques could not uncover the available reuse of array memory locations. This deficiency was especially problematic for scientific codes since a majority of the computation involves array references. Our original paper addressed this problem by presenting an algorithm and experiment for a loop transformation, called scalar replacement, that exposed the reuse available in array references in an innermost loop. It also demonstrated experimentally how another loop transformation, called unroll-and-jam [2], could expose more opportunities for scalar…
Formal certification of a compiler back-end, or: programming a compiler with a proof assistant
- IN PROC. 33RD ACM SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES (POPL ’06
, 2006
"... This paper reports on the development and formal certification (proof of semantic preservation) of a compiler from Cminor (a C-like imperative language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a certified compile ..."
Abstract
-
Cited by 186 (11 self)
- Add to MetaCart
This paper reports on the development and formal certification (proof of semantic preservation) of a compiler from Cminor (a C-like imperative language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a certified compiler is useful in the context of formal methods applied to the certification of critical software: the certification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well.
The Multiflow Trace Scheduling Compiler
- Journal of Supercomputing
, 1993
"... The Multiflow compiler uses the trace scheduling algorithm to find and exploit instruction-level parallelism beyond basic blocks. The compiler generates code for VLIW computers that issue up to 28 operations each cycle and maintain more than 50 operations in flight. At Multiflow the compiler generat ..."
Abstract
-
Cited by 169 (1 self)
- Add to MetaCart
The Multiflow compiler uses the trace scheduling algorithm to find and exploit instruction-level parallelism beyond basic blocks. The compiler generates code for VLIW computers that issue up to 28 operations each cycle and maintain more than 50 operations in flight. At Multiflow the compiler generated code for eight different target machine architectures and compiled over 50 million lines of FORTRAN and C applications and systems code. The requirement of finding large amounts of parallelism in ordinary programs, the trace scheduling algorithm, and the many unique features of the Multiflow hardware placed novel demands on the compiler. New techniques in instruction scheduling, register allocation, memory-bank management, and intermediate-code optimizations were developed, as were refinements to reduce the overhead of trace scheduling. This paper describes the Multiflow compiler and reports on the Multiflow practice and experience with compiling for instruction-level parallelism beyond basic blocks.
Instruction-Level Parallel Processing: History, Overview and Perspective
, 1992
"... Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a muc ..."
Abstract
-
Cited by 166 (0 self)
- Add to MetaCart
Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built, and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.
Improvements to Graph Coloring Register Allocation
- ACM Transactions on Programming Languages and Systems
, 1994
"... This paper describes both the techniques themselves and our experience building and using register allocators that incorporate them. It provides a detailed description of optimistic coloring and rematerialization. It presents experimental data to show the performance of several versions of the regis ..."
Abstract
-
Cited by 158 (8 self)
- Add to MetaCart
This paper describes both the techniques themselves and our experience building and using register allocators that incorporate them. It provides a detailed description of optimistic coloring and rematerialization. It presents experimental data to show the performance of several versions of the register allocator on a suite of FORTRAN programs. It discusses several insights that we discovered only after repeated implementation of these allocators. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors---compi l ers , optimization General terms: Languages Additional Key Words and Phrases: Register allocation, code generation, graph coloring 1. INTRODUCTION The relationship between run-time performance and e#ective use of a machine's register set is well understood. In a compiler, the process of deciding which values to keep in registers at each point in the generated code is called register allocation. Value
Approximate graph coloring by semidefinite programming
- Proc. 35 th IEEE FOCS, IEEE
, 1994
"... a coloring is called the chromatic number of�, and is usually denoted by��.Determining the chromatic number of a graph is known to be NP-hard (cf. [19]). Besides its theoretical significance as a canonical NPhard problem, graph coloring arises naturally in a variety of applications such as register ..."
Abstract
-
Cited by 154 (7 self)
- Add to MetaCart
a coloring is called the chromatic number of�, and is usually denoted by��.Determining the chromatic number of a graph is known to be NP-hard (cf. [19]). Besides its theoretical significance as a canonical NPhard problem, graph coloring arises naturally in a variety of applications such as register allocation [11, 12, 13] is the maximum degree of any vertex. Be-and timetable/examination scheduling [8, 40]. In many We consider the problem of coloring�-colorable graphs with the fewest possible colors. We give a randomized polynomial time algorithm which colors a 3-colorable graph on vertices with� � ���� colors where sides giving the best known approximation ratio in terms of, this marks the first non-trivial approximation result as a function of the maximum degree. This result can be generalized to�-colorable graphs to obtain a coloring using�� � ��� � � � �colors. Our results are inspired by the recent work of Goemans and Williamson who used an algorithm for semidefinite optimization problems, which generalize linear programs, to obtain improved approximations for the MAX CUT and MAX 2-SAT problems. An intriguing outcome of our work is a duality relationship established between the value of the optimum solution to our semidefinite program and the Lovász�-function. We show lower bounds on the gap between the optimum solution of our semidefinite program and the actual chromatic number; by duality this also demonstrates interesting new facts about the�-function. 1
Register Allocation via Graph Coloring
, 1992
"... Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimis ..."
Abstract
-
Cited by 133 (4 self)
- Add to MetaCart
Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimistically assumes any node of high degree will not be colored and must therefore be spilled. By optimistically assuming that nodes of high degree will receive colors, I often achieve lower spill costs and faster code; my results are never worse. Coloring pairs The pessimism of Chaitin's coloring heuristic is emphasized when trying to color register pairs. My heuristic handles pairs as a natural consequence of its optimism. Rematerialization Chaitin et al. introduced the idea of rematerialization to avoid the expense of spilling and reloading certain simple values. By propagating rematerialization information around the SSA graph using a simple variation of Wegman and Zadeck's constant propag...

