Results 1 - 10
of
31
Lazy Code Motion
, 1992
"... We present a bit-vector algorithm for the optimal and economical placement of computations within flow graphs, which is as efficient as standard uni-directional analyses. The point of our algorithm is the decomposition of the bi-directional structure of the known placement algorithms into a sequenc ..."
Abstract
-
Cited by 154 (20 self)
- Add to MetaCart
We present a bit-vector algorithm for the optimal and economical placement of computations within flow graphs, which is as efficient as standard uni-directional analyses. The point of our algorithm is the decomposition of the bi-directional structure of the known placement algorithms into a sequence of a backward and a forward analysis, which directly implies the efficiency result. Moreover, the new compositional structure opens the algorithm for modification: two further uni-directional analysis components exclude any unnecessary code motion. This laziness of our algorithm minimizes the register pressure, which has drastic effects on the run-time behaviour of the optimized programs in practice, where an economical use of registers is essential. Topics: data flow analysis, program optimization, partial redundancy elimination, code motion, bit-vector data flow analyses. 1 Motivation Code motion is a technique to improve the efficiency of a program by avoiding unnecessary recomputati...
Optimal Code Motion: Theory and Practice
, 1993
"... An implementation oriented algorithm for lazy code motion is presented that minimizes the number of computations in programs while suppressing any unnecessary code motion in order to avoid superfluous register pressure. In particular, this variant of the original algorithm for lazy code motion works ..."
Abstract
-
Cited by 111 (18 self)
- Add to MetaCart
An implementation oriented algorithm for lazy code motion is presented that minimizes the number of computations in programs while suppressing any unnecessary code motion in order to avoid superfluous register pressure. In particular, this variant of the original algorithm for lazy code motion works on flowgraphs whose nodes are basic blocks rather than single statements, as this format is standard in optimizing compilers. The theoretical foundations of the modified algorithm are given in the first part, where t-refined flowgraphs are introduced for simplifying the treatment of flowgraphs whose nodes are basic blocks. The second part presents the `basic block' algorithm in standard notation, and gives directions for its implementation in standard compiler environments. Keywords Elimination of partial redundancies, code motion, data flow analysis (bit-vector, unidirectional, bidirectional), nondeterministic flowgraphs, t-refined flow graphs, critical edges, lifetimes of registers, com...
Optimizing for Reduced Code Space Using Genetic Algorithms
, 1999
"... Code space is a critical issue facing designers of software for embedded systems. Many traditional compiler optimizations are designed to reduce the execution time of compiled code, but not necessarily the size of the compiled code. Further, di#erent results can be achieved by running some optimizat ..."
Abstract
-
Cited by 95 (10 self)
- Add to MetaCart
Code space is a critical issue facing designers of software for embedded systems. Many traditional compiler optimizations are designed to reduce the execution time of compiled code, but not necessarily the size of the compiled code. Further, di#erent results can be achieved by running some optimizations more than once and changing the order in which optimizations are applied. Register allocation only complicates matters, as the interactions between di#erent optimizations can cause more spill code to be generated. The compiler for embedded systems, then, must take care to use the best sequence of optimizations to minimize code space.
Effective Partial Redundancy Elimination
- Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation
, 1994
"... Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness ..."
Abstract
-
Cited by 84 (13 self)
- Add to MetaCart
Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness of partial redundancy elimination. By imposing a discipline on the choice of names and the shape of expressions, we are able to expose more redundancies. As part of the work, we introduce a new algorithm for global reassociation of expressions. It uses global information to reorder expressions, creating opportunities for other optimizations. The new algorithm generalizes earlier work that ordered FORTRAN array address expressions to improve optimization [25]. 1 Introduction Partial redundancy elimination is a powerful optimization that has been discussed in the literature for many years (e.g., [21, 8, 14, 12, 18]). Unfortunately, partial redundancy elimination has two serious limitations...
Combining Analyses, Combining Optimizations
, 1995
"... This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iter ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iterative algorithm for solving these frameworks. A framework is shown that combines Constant Propagation, Unreachable Code Elimination, Global Congruence Finding and Global Value Numbering. For these optimizations, the iterative algorithm runs in O(n^2) time.
This thesis then presents an O(n log n) algorithm for combining the same optimizations. This technique also finds many of the common subexpressions found by Partial Redundancy Elimination. However, it requires a global code motion pass to make the optimized code correct, also presented. The global code motion algorithm removes some Partially Dead Code as a side-effect. An implementation demonstrates that the algorithm has shorter compile times than repeated passes of the separate optimizations while producing run-time speedups of 4%–7%.
While global analyses are stronger, peephole analyses can be unexpectedly powerful. This thesis demonstrates parse-time peephole optimizations that find more than 95% of the constants and common subexpressions found by the best combined analysis. Finding constants and common subexpressions while parsing reduces peak intermediate representation size. This speeds up the later global analyses, reducing total compilation time by 10%. In conjunction with global code motion, these peephole optimizations generate excellent code very quickly, a useful feature for compilers that stress compilation speed over code quality.
Complete Removal of Redundant Expressions
, 1998
"... Partial redundancy elimination (PRE), the most important component of global optimizers, generalizes the removal of common subexpressions and loop-invariant computations. Because existing PRE implementations are based on code motion, they fail to completely remove the redundancies. In fact, we obser ..."
Abstract
-
Cited by 64 (13 self)
- Add to MetaCart
Partial redundancy elimination (PRE), the most important component of global optimizers, generalizes the removal of common subexpressions and loop-invariant computations. Because existing PRE implementations are based on code motion, they fail to completely remove the redundancies. In fact, we observed that 73% of loop-invariant statements cannot be eliminated from loops by code motion alone. In dynamic terms, traditional PRE eliminates only half of redundancies that are strictly partial. To achieve a complete PRE, control flow restructuring must be applied. However, the resulting code duplication may cause code size explosion. This paper focuses on achieving a complete PRE while incurring an acceptable code growth. First, we present an algorithm for complete removal of partial redundancies, based on the integration of code motion and control flow restructuring. In contrast to existing complete techniques, we resort to restructuring merely to remove obstacles to code motion, rather th...
Dependence-Based Program Analysis
- In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation
, 1993
"... Program analysis and optimization can be speeded up through the use of the dependence flow graph (DFG), a representation of program dependences which generalizes def-use chains and static single assignment (SSA) form. In this paper, we give a simple graph-theoretic description of the DFG and show ho ..."
Abstract
-
Cited by 58 (6 self)
- Add to MetaCart
Program analysis and optimization can be speeded up through the use of the dependence flow graph (DFG), a representation of program dependences which generalizes def-use chains and static single assignment (SSA) form. In this paper, we give a simple graph-theoretic description of the DFG and show how the DFG for a program can be constructed in O(EV ) time. We then show how forward and backward dataflow analyses can be performed efficiently on the DFG, using constant propagation and elimination of partial redundancies as examples. These analyses can be framed as solutions of dataflow equations in the DFG. Our construction algorithm is of independent interest because it can be used to construct a program's control dependence graph in O(E) time and its SSA representation in O(EV ) time, which are improvements over existing algorithms. 1 Introduction Anumber of recent papers have focused attention on the problem of speeding up program optimization [FOW87, BMO90, CCF91, PBJ + 91, CFR +...
Scalar Replacement in the Presence of Conditional Control Flow
- SOFTWARE PRACTICE AND EXPERIENCE
, 1992
"... Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of individual array elements. This deficiency is particularly troublesome for floating-point registers, ..."
Abstract
-
Cited by 54 (18 self)
- Add to MetaCart
Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of individual array elements. This deficiency is particularly troublesome for floating-point registers, which are most often used as temporary repositories for subscripted variables. This paper
Memory-Hierarchy Management
, 1994
"... The trend in high-performance microprocessor design is toward increasing computational power on the chip. Microprocessors can now process dramatically more data per machine cycle than previous models. Unfortunately, memory speeds have not kept pace. The result is an imbalance between computation spe ..."
Abstract
-
Cited by 50 (14 self)
- Add to MetaCart
The trend in high-performance microprocessor design is toward increasing computational power on the chip. Microprocessors can now process dramatically more data per machine cycle than previous models. Unfortunately, memory speeds have not kept pace. The result is an imbalance between computation speed and memory speed. This imbalance is leading machine designers to use more complicated memory hierarchies. In turn, programmers are explicitly restructuring codes to perform well on particular memory systems, leading to machine-specific programs. It is our belief that machine-specific programming is a step in the wrong direction. Compilers, not programmers, should handle machine-specific implementation details. To this end, this thesis develops and experiments with compiler algorithms that manage the memory hierarchy of a machine for floating-point intensive numerical codes. Specifically, we address the following issues: Scalar replacement. Lack of information concerning the flow of arra...

