Results 1 - 10
of
42
Complete Removal of Redundant Expressions
, 1998
"... Partial redundancy elimination (PRE), the most important component of global optimizers, generalizes the removal of common subexpressions and loop-invariant computations. Because existing PRE implementations are based on code motion, they fail to completely remove the redundancies. In fact, we obser ..."
Abstract
-
Cited by 64 (13 self)
- Add to MetaCart
Partial redundancy elimination (PRE), the most important component of global optimizers, generalizes the removal of common subexpressions and loop-invariant computations. Because existing PRE implementations are based on code motion, they fail to completely remove the redundancies. In fact, we observed that 73% of loop-invariant statements cannot be eliminated from loops by code motion alone. In dynamic terms, traditional PRE eliminates only half of redundancies that are strictly partial. To achieve a complete PRE, control flow restructuring must be applied. However, the resulting code duplication may cause code size explosion. This paper focuses on achieving a complete PRE while incurring an acceptable code growth. First, we present an algorithm for complete removal of partial redundancies, based on the integration of code motion and control flow restructuring. In contrast to existing complete techniques, we resort to restructuring merely to remove obstacles to code motion, rather th...
On the importance of points-to analysis and other memory disambiguation methods for c programs
- In Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation
, 2001
"... In this paper, we evaluate the benefits achievable from pointer analysis and other memory disambiguation techniques for C/C++ programs, using the framework of the production compiler for the Intel ® Itanium ™ processor. Most of the prior work on memory disambiguation has primarily focused on pointer ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
In this paper, we evaluate the benefits achievable from pointer analysis and other memory disambiguation techniques for C/C++ programs, using the framework of the production compiler for the Intel ® Itanium ™ processor. Most of the prior work on memory disambiguation has primarily focused on pointer analysis, and either presents only static estimates of the accuracy of the analysis (such as average points-to set size), or provides performance data in the context of certain individual optimizations. In contrast, our study is based on a complete memory disambiguation framework that uses a whole set of techniques including pointer analysis. Further, it presents how various compiler analyses and optimizations interact with the memory disambiguator, evaluates how much they benefit from disambiguation, and measures the eventual impact on the performance of the program. The paper also analyzes the types of disambiguation queries that are typically received by the disambiguator, which disambiguation techniques prove most effective in resolving them, and what type of queries prove difficult to be resolved. The study is based on empirical data collected for the SPEC CINT2000 C/C++ programs, running on the Itanium processor. 1.
Compiler-Controlled Memory
- IN PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS
, 1998
"... Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler’s ability to improve memory behavior is its im-perfect knowledge about the ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler’s ability to improve memory behavior is its im-perfect knowledge about the run-time behavior of the program. The compiler cannot completely predict run-time access patterns. There is an exception to this rule. During the reg-ister allocation phase, the compiler often must insert substantial amount,s of spill code; that is, instructions that move values from registers to memory and back again. Because the compiler itself inserts these memory instructions, it has more knowledge about them than other memory operations in the program. Spill-code operations are disjoint from the memory manipulations required by the semantics of the program being compiled, and, indeed, the two can interfere in the cache. This paper proposes a hardware solution to the problem of increased spill costs-a small compiler-con-trolled memory (CCM) to hold spilled values. This small random-access memory can (and should) be placed in a distinct address space from the main memory hierar-chy. The compiler can target spill instructions to use the CCM, moving most compiler-inserted memory traf-fic out of the pathway to main memory and eliminating any impact that those spill instructions would have on the state of the main memory hierarchy. Such mem-ories already exist on some DSP microprocessors. Our techniques can be applied directly on those chips. This paper presents two compiler-based methods to exploit such a memory, along with experimental results showing that speedups from using CCM may be sizable. It shows that using the register allocation’s coloring paradigm to assign spilled values to memory can greatly reduce the amount of memory required by a program.
Path Profile Guided Partial Redundancy Elimination Using Speculation
, 1997
"... While programs contain a large number of paths, a very small fraction of these paths are typically exercised during program execution. Thus, optimization algorithms should be designed to trade off the performance of less frequently executed paths in favor of more frequently executed paths. However, ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
While programs contain a large number of paths, a very small fraction of these paths are typically exercised during program execution. Thus, optimization algorithms should be designed to trade off the performance of less frequently executed paths in favor of more frequently executed paths. However, traditional formulations to code optimizations are incapable of performing such a trade-off. We present a path profile guided partial redundancy elimination algorithm that uses speculation to enable the removal of redundancy along more frequently executed paths at the expense of introducing additional expression evaluations along less frequently executed paths. We describe cost-benefit data flow analysis that uses path profiling information to determine the profitability of using speculation. The cost of enabling speculation of an expression at a conditional is determined by identifying paths along which an additional evaluation of the expression is introduced. The benefit of enabling specul...
LLVM: An Infrastructure for Multi-Stage Optimization
, 2002
"... Modern programming languages and software engineering principles are causing increasing problems for compiler systems. Traditional approaches, which use a simple compile-link-execute model, are unable to provide adequate application performance under the demands of the new conditions. Traditional ap ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
Modern programming languages and software engineering principles are causing increasing problems for compiler systems. Traditional approaches, which use a simple compile-link-execute model, are unable to provide adequate application performance under the demands of the new conditions. Traditional approaches to interprocedural and profile-driven compilation can provide the application performance needed, but require infeasible amounts of compilation time to build the application. This thesis presents LLVM, a design and implementation of a compiler infrastructure which supports a unique multi-stage optimization system. This system is designed to support extensive interprocedural and profile-driven optimizations, while being efficient enough for use in commercial compiler systems. The LLVM virtual instruction set is the glue that holds the system together. It is a low-level representation, but with high-level type information. This provides the benefits of a low-level representation (compact representation, wide variety of available transformations, etc.) as well as providing high-level information to support aggressive interprocedural optimizations at link- and post-link time. In particular, this system is designed to support optimization in the field, both at run-time and during otherwise unused idle time on the machine. This thesis also describes an implementation of this compiler design, the LLVM compiler infrastructure, proving that the design is feasible. The LLVM compiler infrastructure is a maturing and efficient system, which we show is a good host for a variety of research. More information about LLVM can be found on its web site at: http://llvm.cs.uiuc.edu/
Partial Redundancy Elimination in SSA Form
- ACM Transactions on Programming Languages and Systems
, 1999
"... This paper presents a new approach called SSAPRE [Chow et al. 1997] that shares the optimality properties of the best prior work [Knoop et al. 1992; Knoop et al. 1994; Drechsler and Stadel 1993] and that is based on static single assignment form. Static single assignment form (SSA) is a popular prog ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
This paper presents a new approach called SSAPRE [Chow et al. 1997] that shares the optimality properties of the best prior work [Knoop et al. 1992; Knoop et al. 1994; Drechsler and Stadel 1993] and that is based on static single assignment form. Static single assignment form (SSA) is a popular program representation in modern optimizing compilers. Its versatility stems from the fact that, in addition to representing the program, it provides accurate use-definition (use-def) relationships among the program variables in a concise form [Cytron et al. 1991; Wolfe 1996; Chow et al. 1996]. Many efficient global optimization algorithms have been developed based on SSA. Among these optimizations are dead store elimination [Cytron et al. 1991], constant propagation [Wegman and Zadeck 1991], value numbering [Alpern et al. 1988; Rosen et al. 1988; Briggs et al. 1997], induction variable analysis [Gerlek et al. 1995; Liu et al. 1996], live range computation [Gerlek et al. 1994] and global code motion [Click 1995]. Until recently, most uses of SSA have been restricted to solving problems based essentially on program variables. SSA could not readily be applied to solving expression-based problems because the concept of use-def for expressions is less obvious than for variables. This difficulty was mentioned by Dhamdhere et al. in the conclusion of [Dhamdhere et al. 1992]. They state, essentially, that there is no clear connection between the use-def information for variables represented by SSA form and the redundancy properties for expressions. By demonstrating such a connection and exploiting it, our work shows that an SSA-based approach to PRE and other expression-based problems is not only plausible, but also enlightening and practical. Although this paper addresses only the PRE ...
Shangri-la: achieving high performance from compiled network applications while enabling ease of programming
- In PLDI ’05
, 2005
"... Programming network processors is challenging. To sustain high line rates, network processors have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-coded assembly. Researchers have recently proposed high-level programming languages ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Programming network processors is challenging. To sustain high line rates, network processors have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-coded assembly. Researchers have recently proposed high-level programming languages for packet processing, but the challenges of compiling these languages into code that is competitive with hand-tuned assembly remain unanswered. This paper describes the Shangri-La compiler, which accepts a packet program written in a C-like high-level language and applies scalar and specialized optimizations to generate a highly optimized binary. Hot code paths identified by profiling are mapped across processing elements to maximize processor utilization. Since our compilation target has no hardware caches, software-controlled caches are generated for frequently accessed application data structures. Packet handling optimizations significantly reduce perpacket memory access and instruction counts. Finally, a custom stack model maps stack frames to the fastest levels of the target processor’s heterogeneous memory hierarchy. Binaries generated by the compiler were evaluated on the Intel IXP2400 network processor with eight packet processing cores and eight threads per core. Our results show the importance of both traditional and specialized optimization techniques for achieving the maximum forwarding rates on three network applications, L3-
Translating out of static single assignment form
- In Static Analysis Symposium, Italy
, 1999
"... Abstract. Programs represented in Static Single Assignment (SSA) form contain phi instructions (or functions) whose operational semantics are to merge values coming from distinct control flow paths. However, translating phi instructions into native instructions is nontrivial when transformations suc ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Abstract. Programs represented in Static Single Assignment (SSA) form contain phi instructions (or functions) whose operational semantics are to merge values coming from distinct control flow paths. However, translating phi instructions into native instructions is nontrivial when transformations such as copy propagation and code motion have been performed. In this paper we present a new framework for translating out of SSA form. By appropriately placing copy instructions, we ensure that none of the resources in a phi congruence class interfere. Within our framework, we propose three methods for copy placement. The first method pessimistically places copies for all operands of phi instructions. The second method uses an interference graph to guide copy placement. The third method uses both data flow liveness sets and an interference graph to guide copy placement. We also present a new SSAbased coalescing method that can selectively remove redundant copy instructions with interfering operands. Our experimental results indicate that the third method results in 35 % fewer copy instructions than the second method. Compared to the first method, the third method, on average, inserts 89.9% fewer copies during copy placement and runs 15 % faster, which are significant reductions in compilation time and space. 1.
Path-Sensitive Value-Flow Analysis
- In Symposium on Principles of Programming Languages
, 1998
"... When analyzing programs for value recomputation, one faces the problem of naming the value that flows between equivalent computations with different lexical names. This paper presents a data-flow analysis framework that overcomes this problem by synthesizing a name space tailored for tracing the val ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
When analyzing programs for value recomputation, one faces the problem of naming the value that flows between equivalent computations with different lexical names. This paper presents a data-flow analysis framework that overcomes this problem by synthesizing a name space tailored for tracing the values whose flow is of interest to a given data-flow problem. Furthermore, to exploit recomputation of a value with multiple, synonymous names, path-sensitive value numbering on the synthetic name space is developed. Optimizations that rely on value flow to detect redundant computations, such as partial redundancy elimination and constant propagation, become more powerful when phrased in our framework. The framework is built on a new program representation called Value Name Graph (VNG) which gains its power from integrating three orthogonal techniques: symbolic back-substitution, value numbering, and data-flow analysis. Our experiments with the implementation show that analysis on the VNG is p...
A compiler framework for speculative analysis and optimizations
- In Proceedings of the ACM SIGPLAN ’03 Conference on Programming Language Design and Implementation
, 2003
"... Speculative execution, such as control speculation and data speculation, is an effective way to improve program performance. Using edge/path profile information or simple heuristic rules, existing compiler frameworks can adequately incorporate and exploit control speculation. However, very little ha ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Speculative execution, such as control speculation and data speculation, is an effective way to improve program performance. Using edge/path profile information or simple heuristic rules, existing compiler frameworks can adequately incorporate and exploit control speculation. However, very little has been done so far to allow existing compiler frameworks to incorporate and exploit data speculation effectively in various program transformations beyond instruction scheduling. This paper proposes a speculative SSA form to incorporate information from alias profiling and/or heuristic rules for data speculation, thus allowing existing program analysis frameworks to be easily extended to support both control and data speculation. Such a general framework is very useful for EPIC architectures that provide checking (such as advanced load address table (ALAT) [10]) on data speculation to guarantee the correctness of program execution. We use SSAPRE [21] as one example to illustrate how to incorporate data speculation in those important compiler optimizations such as partial redundancy elimination (PRE), register promotion, strength reduction and linear function test replacement. Our extended framework allows both control and data speculation to be performed on top of SSAPRE and, thus, enables more aggressive speculative optimizations. The proposed framework has been implemented on Intel's Open Research Compiler (ORC). We present experimental data on some SPEC2000 benchmark programs to demonstrate the usefulness of this framework and how data speculation benefits partial redundancy elimination.

