Results 11 - 20
of
132
Dependence-Based Program Analysis
- In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation
, 1993
"... Program analysis and optimization can be speeded up through the use of the dependence flow graph (DFG), a representation of program dependences which generalizes def-use chains and static single assignment (SSA) form. In this paper, we give a simple graph-theoretic description of the DFG and show ho ..."
Abstract
-
Cited by 58 (6 self)
- Add to MetaCart
Program analysis and optimization can be speeded up through the use of the dependence flow graph (DFG), a representation of program dependences which generalizes def-use chains and static single assignment (SSA) form. In this paper, we give a simple graph-theoretic description of the DFG and show how the DFG for a program can be constructed in O(EV ) time. We then show how forward and backward dataflow analyses can be performed efficiently on the DFG, using constant propagation and elimination of partial redundancies as examples. These analyses can be framed as solutions of dataflow equations in the DFG. Our construction algorithm is of independent interest because it can be used to construct a program's control dependence graph in O(E) time and its SSA representation in O(EV ) time, which are improvements over existing algorithms. 1 Introduction Anumber of recent papers have focused attention on the problem of speeding up program optimization [FOW87, BMO90, CCF91, PBJ + 91, CFR +...
Scalar Replacement in the Presence of Conditional Control Flow
- SOFTWARE PRACTICE AND EXPERIENCE
, 1992
"... Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of individual array elements. This deficiency is particularly troublesome for floating-point registers, ..."
Abstract
-
Cited by 54 (18 self)
- Add to MetaCart
Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of individual array elements. This deficiency is particularly troublesome for floating-point registers, which are most often used as temporary repositories for subscripted variables. This paper
Value Numbering
, 1997
"... This paper compares hash-based approaches derived from the classic local algorithm ..."
Abstract
-
Cited by 52 (11 self)
- Add to MetaCart
This paper compares hash-based approaches derived from the classic local algorithm
Memory-Hierarchy Management
, 1994
"... The trend in high-performance microprocessor design is toward increasing computational power on the chip. Microprocessors can now process dramatically more data per machine cycle than previous models. Unfortunately, memory speeds have not kept pace. The result is an imbalance between computation spe ..."
Abstract
-
Cited by 50 (14 self)
- Add to MetaCart
The trend in high-performance microprocessor design is toward increasing computational power on the chip. Microprocessors can now process dramatically more data per machine cycle than previous models. Unfortunately, memory speeds have not kept pace. The result is an imbalance between computation speed and memory speed. This imbalance is leading machine designers to use more complicated memory hierarchies. In turn, programmers are explicitly restructuring codes to perform well on particular memory systems, leading to machine-specific programs. It is our belief that machine-specific programming is a step in the wrong direction. Compilers, not programmers, should handle machine-specific implementation details. To this end, this thesis develops and experiments with compiler algorithms that manage the memory hierarchy of a machine for floating-point intensive numerical codes. Specifically, we address the following issues: Scalar replacement. Lack of information concerning the flow of arra...
Transparent dynamic optimization: The design and implementation of Dynamo
, 1999
"... dynamic optimization, compiler, trace selection, binary translation © Copyright Hewlett-Packard Company 1999 Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, a prototype dynamic optimizer that is capabl ..."
Abstract
-
Cited by 49 (4 self)
- Add to MetaCart
dynamic optimization, compiler, trace selection, binary translation © Copyright Hewlett-Packard Company 1999 Dynamic optimization refers to the runtime optimization of a native program binary. This report describes the design and implementation of Dynamo, a prototype dynamic optimizer that is capable of optimizing a native program binary at runtime. Dynamo is a realistic implementation, not a simulation, that is written entirely in user-level software, and runs on a PA-RISC machine under the HPUX operating system. Dynamo does not depend on any special programming language,
Register Promotion in C Programs
- Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI-97
, 1997
"... The combination of pointers and pointer arithmetic in C makes the task of improving C programs somewhat more difficult than improving programs written in simpler languages like Fortran. While much work has been published that focuses on the analysis of pointers, little has appeared that uses the res ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
The combination of pointers and pointer arithmetic in C makes the task of improving C programs somewhat more difficult than improving programs written in simpler languages like Fortran. While much work has been published that focuses on the analysis of pointers, little has appeared that uses the results of such analysis to improve the code compiled for C. This paper examines the problem of register promotion in C and presents experimental results showing that it can have dramatic effects on memory traffic. 1 Introduction The presence of pointer-valued variables in C has long been recognized as an impediment to effective compiletime optimization. Pointers introduce a degree of uncertainty into the results of static analysis. Pointer assignments create multiple names for storage locations, with the result that the compiler must avoid reordering stores to memory. Pointer arithmetic introduces further ambiguity; understanding the results of *(p+8) requires a detailed knowledge, at compil...
Compiling for the Multiscalar Architecture
, 1998
"... High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world software that run on the computers. State-of-the-art microprocessors exploit instruction level para ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world software that run on the computers. State-of-the-art microprocessors exploit instruction level parallelism (ILP) to achieve high performance on such applications by searching for independent instructions in a dynamic window of instructions and executing them on a wide-issue pipeline. Increasing the window size and the issue width to extract more ILP may hinder achieving high clock speeds, limiting overall performance. The Multiscalar architecture employs multiple small windows and many narrow-issue processing units to exploit ILP at high clock speeds. Sequential programs are partitioned into code fragments called tasks, which are speculatively executed in parallel. Inter-task register dependences are honored via communication and synchronization and inter-task control flow and memory depe...
Parallelism for Free: Efficient and Optimal Bitvector Analyses for Parallel Programs
, 1994
"... In this paper we show how to construct optimal bitvector analysis algorithms for parallel programs with shared memory that are as efficient as their purely sequential counterparts, and which can easily be implemented. Whereas the complexity result is rather obvious, our optimality result is a conseq ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
In this paper we show how to construct optimal bitvector analysis algorithms for parallel programs with shared memory that are as efficient as their purely sequential counterparts, and which can easily be implemented. Whereas the complexity result is rather obvious, our optimality result is a consequence of a new Kam/Ullman-style Coincidence Theorem. Thus, the important merits of sequential bitvector analyses survive the introduction of parallel statements. Keywords Parallelism, interleaving semantics, synchronization, program optimization, data flow analysis, bitvector problems, definition-use chains, partial redundancy elimination, partial dead code elimination. Contents 1 Motivation 1 2 Sequential Programs 2 2.1 Representation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 2.2 Data Flow Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 2.2.1 The MOP-Solution of a DFA : : : : : : : : : : : : : : : : : : : : : : 2 2.2.2 The MFP-Solution o...
Path Profile Guided Partial Dead Code Elimination Using Predication
, 1997
"... We present a path profile guided partial dead code elimination algorithm that uses predication to enable sinking for the removal of deadness along frequently executed paths at the expense of adding additional instructions along infrequently executed paths. Our approach to optimization is particularl ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
We present a path profile guided partial dead code elimination algorithm that uses predication to enable sinking for the removal of deadness along frequently executed paths at the expense of adding additional instructions along infrequently executed paths. Our approach to optimization is particularly suitable for VLIW architectures since it directs the efforts of the optimizer towards aggressively enabling generation of fast schedules along frequently executed paths by reducing their critical path lengths. The paper presents a cost-benefit data flow analysis that uses path profiling information to determine the profitability of using predication enabled sinking. The cost of predication enabled sinking of a statement past a merge point is determined by identifying paths along which an additional statement is introduced. The benefit of predication enabled sinking is determined by identifying paths along which additional dead code elimination is achieved due to predication. The results...

