Results 1 -
9 of
9
Effective Partial Redundancy Elimination
- Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation
, 1994
"... Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness ..."
Abstract
-
Cited by 84 (13 self)
- Add to MetaCart
Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global value numbering can increase the effectiveness of partial redundancy elimination. By imposing a discipline on the choice of names and the shape of expressions, we are able to expose more redundancies. As part of the work, we introduce a new algorithm for global reassociation of expressions. It uses global information to reorder expressions, creating opportunities for other optimizations. The new algorithm generalizes earlier work that ordered FORTRAN array address expressions to improve optimization [25]. 1 Introduction Partial redundancy elimination is a powerful optimization that has been discussed in the literature for many years (e.g., [21, 8, 14, 12, 18]). Unfortunately, partial redundancy elimination has two serious limitations...
Register Promotion by Sparse Partial Redundancy Elimination of Loads and Stores
- In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation
, 1998
"... An algorithm for register promotion is presented based on the observation that the circumstances for promoting a memory location's value to register coincide with situations where the program exhibits partial redundancy between accesses to the memory location. The recent SSAPRE algorithm for elimina ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
An algorithm for register promotion is presented based on the observation that the circumstances for promoting a memory location's value to register coincide with situations where the program exhibits partial redundancy between accesses to the memory location. The recent SSAPRE algorithm for eliminating partial redundancy using a sparse SSA representation forms the foundation for the present algorithm to eliminate redundancy among memory accesses, enabling us to achieve both computational and live range optimality in our register promotion results. We discuss how to effect speculative code motion in the SSAPRE framework. We present two different algorithms for performing speculative code motion: the conservative speculation algorithm used in the absence of profile data, and the the profile-driven speculation algorithm used when profile data are available. We define the static single use (SSU) form and develop the dual of the SSAPRE algorithm, called SSUPRE, to perform the partial redun...
Exploiting Instruction Level Parallelism in the Presence of Conditional Branches
, 1996
"... Wide issue superscalar and VLIW processors utilize instruction-level parallelism (ILP) to achieve high performance. However, if insufficient ILP is found, the performance potential of these processors suffers dramatically. Branch instructions, which are one of the major limitations to exploiting ILP ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
Wide issue superscalar and VLIW processors utilize instruction-level parallelism (ILP) to achieve high performance. However, if insufficient ILP is found, the performance potential of these processors suffers dramatically. Branch instructions, which are one of the major limitations to exploiting ILP, enforce strict ordering conditions in programs to ensure correct execution. Therefore, it is difficult to achieve the desired overlap of instruction execution with branches in the instruction stream. To effectively exploit ILP in the presence of branches requires efficient handling of branches and the dependences they impose. This dissertation investigates two techniques for exposing and enhancing ILP in the presence of branches, speculative execution and predicated execution. Speculative execution enables an ILP compiler to remove dependences between instructions and prior branches. In this manner, the execution of instructions and predicted future instructions may be overlapped. Compiler-controlled speculative execution is employed using an efficient structure called the superblock. The formation and optimization of superblocks increase ILP along important execution paths by systematically removing constraints due to unimportant paths. In conjunction with superblock optimizations, speculative execution is utilized to remove control dependences in the superblock
Value Dependence Graphs: Representation without Taxation
- IN CONFERENCE RECORD OF THE 21ST ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES. ACM
, 1994
"... The value dependence graph (VDG) is a sparse dataflow-like representation that simplifies program analysis and transformation. It is a functional representation that represents control flow as data flow and makes explicit all machine quantities, such as stores and I/O channels. We are developing a c ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
The value dependence graph (VDG) is a sparse dataflow-like representation that simplifies program analysis and transformation. It is a functional representation that represents control flow as data flow and makes explicit all machine quantities, such as stores and I/O channels. We are developing a compiler that builds a VDG representing a program, analyzes and transforms the VDG, then produces a control flow graph (CFG) [ASU86] from the optimized VDG. This framework simplifies transformations and improves upon several published results. For example, it enables more powerful code motion than [CLZ86, FOW87], eliminates as many redundancies as [AWZ88, RWZ88] (except for redundant loops), and provides important information to the code scheduler [BR91]. We exhibit a one-pass method for elimination of partial redundancies that never performs redundant code motion [KRS92, DS93] and is simpler than the classical [MR79, Dha91] or SSA [RWZ88] methods. These results accrue from eliminating the CFG from the analysis/transformation phases and using demand dependences in preference to control dependences.
Partial dead code elimination using slicing transformations
- In Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation
, 1997
"... ..."
Optimization of Data Remapping in Data-Parallel Languages
, 1998
"... The user-controlled mapping of data across the local memories of processing nodes is one of the central features of data-parallel languages like High Performance Fortran (HPF). Since many scientific applications typically consist of different computational phases owning each a best data mapping, dyn ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The user-controlled mapping of data across the local memories of processing nodes is one of the central features of data-parallel languages like High Performance Fortran (HPF). Since many scientific applications typically consist of different computational phases owning each a best data mapping, dynamic remappings have proven useful in maintaining good data locality and workload balance. HPF supports remappings by procedure calls and by executing redistribute/realigndirectives. But remappings can be quite expensive as communication is required to migrate the array elements to their new owning processors and can significantly degrade a program's performance. Hence, elimination of unnecessary remappings is of key importance. It is essential because even well-written HPF programs may result in unnecessary remappings. In this thesis we optimize the overall time spent for dynamic data remappings in a program run by reducing the number of executed remappings. Elimination of redundant and d...
UNIVERSITY OF CALGARY Floey, an Intermediate Language for Optimizing Compilers
, 2008
"... In modern optimizing compilers, linear human-readable text representation of a program is first transformed into an abstract syntax tree that represents the structure of that program. Abstract syntax tree is then transformed into intermediate representation (IR), based on which compiler optimization ..."
Abstract
- Add to MetaCart
In modern optimizing compilers, linear human-readable text representation of a program is first transformed into an abstract syntax tree that represents the structure of that program. Abstract syntax tree is then transformed into intermediate representation (IR), based on which compiler optimizations are accomplished. The optimized IR is sent to the code generator and finally translated into assembly or machine code. Research on IRs has been focused on how they can be designed to facilitate compiler optimizations or more effective code generation on specific architecture. This thesis presents a mid-level intermediate language, called Floey. In a Floey program, control flowgraphs are separated into different tree-like structures called control expressions. Different control expressions are connected by entries. On Floey, a machine independent optimization, called the reduction algorithm, is implemented. By comparing the reduction algorithm to various conventional optimizations, we argue that not only Floey facilities compiler optimization design, it also provides a cleaner and uniform perspective on compiler optimizations in general.

