Results 1  10
of
37
Lazy Code Motion
, 1992
"... We present a bitvector algorithm for the optimal and economical placement of computations within flow graphs, which is as efficient as standard unidirectional analyses. The point of our algorithm is the decomposition of the bidirectional structure of the known placement algorithms into a sequenc ..."
Abstract

Cited by 158 (20 self)
 Add to MetaCart
We present a bitvector algorithm for the optimal and economical placement of computations within flow graphs, which is as efficient as standard unidirectional analyses. The point of our algorithm is the decomposition of the bidirectional structure of the known placement algorithms into a sequence of a backward and a forward analysis, which directly implies the efficiency result. Moreover, the new compositional structure opens the algorithm for modification: two further unidirectional analysis components exclude any unnecessary code motion. This laziness of our algorithm minimizes the register pressure, which has drastic effects on the runtime behaviour of the optimized programs in practice, where an economical use of registers is essential.
Optimizing for Reduced Code Space Using Genetic Algorithms
, 1999
"... Code space is a critical issue facing designers of software for embedded systems. Many traditional compiler optimizations are designed to reduce the execution time of compiled code, but not necessarily the size of the compiled code. Further, di#erent results can be achieved by running some optimizat ..."
Abstract

Cited by 134 (10 self)
 Add to MetaCart
(Show Context)
Code space is a critical issue facing designers of software for embedded systems. Many traditional compiler optimizations are designed to reduce the execution time of compiled code, but not necessarily the size of the compiled code. Further, di#erent results can be achieved by running some optimizations more than once and changing the order in which optimizations are applied. Register allocation only complicates matters, as the interactions between di#erent optimizations can cause more spill code to be generated. The compiler for embedded systems, then, must take care to use the best sequence of optimizations to minimize code space.
Optimal Code Motion: Theory and Practice
, 1993
"... An implementation oriented algorithm for lazy code motion is presented that minimizes the number of computations in programs while suppressing any unnecessary code motion in order to avoid superfluous register pressure. In particular, this variant of the original algorithm for lazy code motion works ..."
Abstract

Cited by 111 (18 self)
 Add to MetaCart
An implementation oriented algorithm for lazy code motion is presented that minimizes the number of computations in programs while suppressing any unnecessary code motion in order to avoid superfluous register pressure. In particular, this variant of the original algorithm for lazy code motion works on flowgraphs whose nodes are basic blocks rather than single statements, as this format is standard in optimizing compilers. The theoretical foundations of the modified algorithm are given in the first part, where trefined flowgraphs are introduced for simplifying the treatment of flowgraphs whose nodes are basic blocks. The second part presents the `basic block' algorithm in standard notation, and gives directions for its implementation in standard compiler environments. Keywords Elimination of partial redundancies, code motion, data flow analysis (bitvector, unidirectional, bidirectional), nondeterministic flowgraphs, trefined flow graphs, critical edges, lifetimes of registers, com...
Effective Partial Redundancy Elimination
 Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation
, 1994
"... Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness ..."
Abstract

Cited by 84 (13 self)
 Add to MetaCart
(Show Context)
Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness of partial redundancy elimination. By imposing a discipline on the choice of names and the shape of expressions, we are able to expose more redundancies. As part of the work, we introduce a new algorithm for global reassociation of expressions. It uses global information to reorder expressions, creating opportunities for other optimizations. The new algorithm generalizes earlier work that ordered FORTRAN array address expressions to improve optimization [25]. 1 Introduction Partial redundancy elimination is a powerful optimization that has been discussed in the literature for many years (e.g., [21, 8, 14, 12, 18]). Unfortunately, partial redundancy elimination has two serious limitations...
Combining Analyses, Combining Optimizations
, 1995
"... This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iter ..."
Abstract

Cited by 78 (4 self)
 Add to MetaCart
This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iterative algorithm for solving these frameworks. A framework is shown that combines Constant Propagation, Unreachable Code Elimination, Global Congruence Finding and Global Value Numbering. For these optimizations, the iterative algorithm runs in O(n^2) time.
This thesis then presents an O(n log n) algorithm for combining the same optimizations. This technique also finds many of the common subexpressions found by Partial Redundancy Elimination. However, it requires a global code motion pass to make the optimized code correct, also presented. The global code motion algorithm removes some Partially Dead Code as a sideeffect. An implementation demonstrates that the algorithm has shorter compile times than repeated passes of the separate optimizations while producing runtime speedups of 4%–7%.
While global analyses are stronger, peephole analyses can be unexpectedly powerful. This thesis demonstrates parsetime peephole optimizations that find more than 95% of the constants and common subexpressions found by the best combined analysis. Finding constants and common subexpressions while parsing reduces peak intermediate representation size. This speeds up the later global analyses, reducing total compilation time by 10%. In conjunction with global code motion, these peephole optimizations generate excellent code very quickly, a useful feature for compilers that stress compilation speed over code quality.
Complete Removal of Redundant Expressions
"... Partial redundancy elimination (PRE), the most important component of global optimizers, generalizes the removal of common subexpressions and loopinvariant computations. Because existing PRE implementations are based on code motion, they fail to completely remove the redundancies. In fact, we obs ..."
Abstract

Cited by 67 (13 self)
 Add to MetaCart
Partial redundancy elimination (PRE), the most important component of global optimizers, generalizes the removal of common subexpressions and loopinvariant computations. Because existing PRE implementations are based on code motion, they fail to completely remove the redundancies. In fact, we observed that 73 % of loopinvariant statements cannot be eliminated from loops by code motion alone. In dynamic terms, traditional PRE eliminates only half of redundancies that are strictly partial. To achieve a complete PRE, control flow restructuring must be applied. However, the resulting code duplication may cause code size explosion. This paper focuses on achieving a complete PRE while incurring an acceptable code growth. First, we present an algorithm for complete removal of partial redundancies, based on the integration of code motion and control flow restructuring. In contrast to existing complete techniques, we resort to restructuring merely to remove obstacles to code motion, rather than to carry out the actual optimization. Guiding the optimization with a profile enables additional code growth reduction through selecting those duplications whose cost is justified by sufficient executiontime gains. The paper develops two methods for determining the optimization benefit of restructuring a program region, one based on pathprofiles and the other on dataflow frequency analysis. Furthermore, the abstraction underlying the new PRE algorithm enables a simple formulation of speculative code motion guaranteed to have positive dynamic improvements. Finally, we show how to balance the three transformations (code motion, restructuring, and speculation) to achieve a nearcomplete PRE with very little code growth. We also
DependenceBased Program Analysis
 In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation
, 1993
"... Program analysis and optimization can be speeded up through the use of the dependence flow graph (DFG), a representation of program dependences which generalizes defuse chains and static single assignment (SSA) form. In this paper, we give a simple graphtheoretic description of the DFG and show ho ..."
Abstract

Cited by 60 (6 self)
 Add to MetaCart
Program analysis and optimization can be speeded up through the use of the dependence flow graph (DFG), a representation of program dependences which generalizes defuse chains and static single assignment (SSA) form. In this paper, we give a simple graphtheoretic description of the DFG and show how the DFG for a program can be constructed in O(EV ) time. We then show how forward and backward dataflow analyses can be performed efficiently on the DFG, using constant propagation and elimination of partial redundancies as examples. These analyses can be framed as solutions of dataflow equations in the DFG. Our construction algorithm is of independent interest because it can be used to construct a program's control dependence graph in O(E) time and its SSA representation in O(EV ) time, which are improvements over existing algorithms. 1 Introduction Anumber of recent papers have focused attention on the problem of speeding up program optimization [FOW87, BMO90, CCF91, PBJ + 91, CFR +...
MemoryHierarchy Management
, 1994
"... The trend in highperformance microprocessor design is toward increasing computational power on the chip. Microprocessors can now process dramatically more data per machine cycle than previous models. Unfortunately, memory speeds have not kept pace. The result is an imbalance between computation spe ..."
Abstract

Cited by 55 (14 self)
 Add to MetaCart
The trend in highperformance microprocessor design is toward increasing computational power on the chip. Microprocessors can now process dramatically more data per machine cycle than previous models. Unfortunately, memory speeds have not kept pace. The result is an imbalance between computation speed and memory speed. This imbalance is leading machine designers to use more complicated memory hierarchies. In turn, programmers are explicitly restructuring codes to perform well on particular memory systems, leading to machinespecific programs. It is our belief that machinespecific programming is a step in the wrong direction. Compilers, not programmers, should handle machinespecific implementation details. To this end, this thesis develops and experiments with compiler algorithms that manage the memory hierarchy of a machine for floatingpoint intensive numerical codes. Specifically, we address the following issues: Scalar replacement. Lack of information concerning the flow of arra...
Scalar Replacement in the Presence of Conditional Control Flow
 SOFTWARE PRACTICE AND EXPERIENCE
, 1992
"... Most conventional compilers fail to allocate array elements to registers because standard dataflow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of individual array elements. This deficiency is particularly troublesome for floatingpoint registers, ..."
Abstract

Cited by 55 (18 self)
 Add to MetaCart
Most conventional compilers fail to allocate array elements to registers because standard dataflow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of individual array elements. This deficiency is particularly troublesome for floatingpoint registers, which are most often used as temporary repositories for subscripted variables. This paper