Results 1 - 10
of
44
Instruction-Level Parallel Processing: History, Overview and Perspective
, 1992
"... Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a muc ..."
Abstract
-
Cited by 186 (0 self)
- Add to MetaCart
Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built, and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.
Register Allocation via Graph Coloring
, 1992
"... Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pes ..."
Abstract
-
Cited by 155 (4 self)
- Add to MetaCart
(Show Context)
Chaitin and his colleagues at IBM in Yorktown Heights built the first global register allocator based on graph coloring. This thesis describes a series of improvements and extensions to the Yorktown allocator. There are four primary results: Optimistic coloring Chaitin's coloring heuristic pessimistically assumes any node of high degree will not be colored and must therefore be spilled. By optimistically assuming that nodes of high degree will receive colors, I often achieve lower spill costs and faster code; my results are never worse. Coloring pairs The pessimism of Chaitin's coloring heuristic is emphasized when trying to color register pairs. My heuristic handles pairs as a natural consequence of its optimism. Rematerialization Chaitin et al. introduced the idea of rematerialization to avoid the expense of spilling and reloading certain simple values. By propagating rematerialization information around the SSA graph using a simple variation of Wegman and Zadeck's constant propag...
Linear Scan Register Allocation
- ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1999
"... ..."
Coloring heuristics for register allocation
- In PLDI ’89: Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
, 1989
"... We describe an improvement to a heuristic introduced by Chaitin for use in graph coloring register alloca-tion. Our modified heuristic produces better colorings, with less spill code. It has similar compile-time and implementation requirements. We present experimental data to compare the two methods ..."
Abstract
-
Cited by 141 (8 self)
- Add to MetaCart
We describe an improvement to a heuristic introduced by Chaitin for use in graph coloring register alloca-tion. Our modified heuristic produces better colorings, with less spill code. It has similar compile-time and implementation requirements. We present experimental data to compare the two methods. 1.
Profile-guided automatic inline expansion for C programs
- SOFTWARE PRACTICE AND EXPERIENCE
, 1992
"... This paper describes critical implementation issues that must be addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program modi cation. An automatic inter- le inliner that uses pro ..."
Abstract
-
Cited by 121 (5 self)
- Add to MetaCart
(Show Context)
This paper describes critical implementation issues that must be addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program modi cation. An automatic inter- le inliner that uses pro le information has been implemented and integrated into an optimizing C compiler. The experimental results show that this inliner achieves signi cant speedups for production C programs.
Adaptive Optimizing Compilers for the 21st Century
- Journal of Supercomputing
, 2001
"... Historically, compilers have operated by applying a fixed set of optimizations in a predetermined order. We call such an ordered list of optimizations a compilation sequence. This paper describes a prototype system that uses biased random search to discover a program-specific compilation sequence ..."
Abstract
-
Cited by 99 (9 self)
- Add to MetaCart
(Show Context)
Historically, compilers have operated by applying a fixed set of optimizations in a predetermined order. We call such an ordered list of optimizations a compilation sequence. This paper describes a prototype system that uses biased random search to discover a program-specific compilation sequence that minimizes an explicit, external objective function. The result is a compiler framework that adapts its behavior to the application being compiled, to the pool of available transformations, to the objective function, and to the target machine.
Threads Cannot be Implemented as a Library
"... In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a general understanding that this is not the right approach. We provide specific arg ..."
Abstract
-
Cited by 94 (5 self)
- Add to MetaCart
(Show Context)
In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a general understanding that this is not the right approach. We provide specific arguments that a pure library approach, in which the compiler is designed independently of threading issues, cannot guarantee correctness of the resulting code. We first review why the approach almost works, and then examine some of the surprising behavior it may entail. We further illustrate that there are very simple cases in which a pure librarybased approach seems incapable of expressing an efficient parallel algorithm. Our discussion takes place in the context of C with Pthreads, since it is commonly used, reasonably well specified, and does not attempt to ensure type-safety, which would entail even stronger constraints. The issues we raise are not specific to that context.
Effective Partial Redundancy Elimination
- Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation
, 1994
"... Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness ..."
Abstract
-
Cited by 92 (13 self)
- Add to MetaCart
(Show Context)
Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness of partial redundancy elimination. By imposing a discipline on the choice of names and the shape of expressions, we are able to expose more redundancies. As part of the work, we introduce a new algorithm for global reassociation of expressions. It uses global information to reorder expressions, creating opportunities for other optimizations. The new algorithm generalizes earlier work that ordered FORTRAN array address expressions to improve optimization [25]. 1 Introduction Partial redundancy elimination is a powerful optimization that has been discussed in the literature for many years (e.g., [21, 8, 14, 12, 18]). Unfortunately, partial redundancy elimination has two serious limitations...
Code selection through object code optimization
- ACM Transactions on Programming Languages and Systems
, 1984
"... This paper shows how thorough object code optimization has simplified a compiler and made it easy to retarget. The code generator forgoes case analysis and emits naive code that is improved by a retargetable object code optimizer. With this technique, cross-compilers have been built for seven machin ..."
Abstract
-
Cited by 77 (12 self)
- Add to MetaCart
(Show Context)
This paper shows how thorough object code optimization has simplified a compiler and made it easy to retarget. The code generator forgoes case analysis and emits naive code that is improved by a retargetable object code optimizer. With this technique, cross-compilers have been built for seven machines, some in as few as three person days. These cross-compilers emit code comparable to host-specific compilers. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors--code genera-tion; compilers; optimization
Spill code minimization techniques for optimizing compilers
, 1989
"... Global register allocation and spilling is commonly performed by solving a graph coloring problem. In this paper we present a new coherent set of heuristic methods for reducing the amount of spill code gen-erated. This results in more efficient (and shorter) compiled code. Our approach has been comp ..."
Abstract
-
Cited by 75 (0 self)
- Add to MetaCart
Global register allocation and spilling is commonly performed by solving a graph coloring problem. In this paper we present a new coherent set of heuristic methods for reducing the amount of spill code gen-erated. This results in more efficient (and shorter) compiled code. Our approach has been compared to both standard and priority-based coloring algo-rithms, universally outperforming them. In our approach, we extend the capability of the existing algorithms in several ways. First, we use multiple heuristic functions to increase the likeli-hood that less spill code will be inserted. We have found three complementary heuristic functions which together appear to span a large proportion of good spill decisions. Second, we use a specially tuned greedy heuristic for determining the order of deleting (and hence coloring) the unconstrained vertices. Third, we have developed a “cleaning” technique which avoids some of the insertion of spill code in non-busy regions. kurrently with the IBM T.J. Watson Research Center