Results 1 - 10
of
37
Efficiently computing static single assignment form and the control dependence graph
- ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1991
"... In optimizing compilers, data structure choices directly influence the power and efficiency of practical program optimization. A poor choice of data structure can inhibit optimization or slow compilation to the point that advanced optimization features become undesirable. Recently, static single ass ..."
Abstract
-
Cited by 749 (7 self)
- Add to MetaCart
In optimizing compilers, data structure choices directly influence the power and efficiency of practical program optimization. A poor choice of data structure can inhibit optimization or slow compilation to the point that advanced optimization features become undesirable. Recently, static single assignment form and the control dependence graph have been proposed to represent data flow and control flow propertiee of programs. Each of these previously unrelated techniques lends efficiency and power to a useful class of program optimization. Although both of these structures are attractive, the difficulty of their construction and their potential size have discouraged their use. We present new algorithms that efficiently compute these data structures for arbitrary control flow graphs. The algorithms use dominance frontiers, a new concept that may have other applications. We also give analytical and experimental evidence that all of these data structures are usually linear in the size of the original program. This paper thus presents strong evidence that these structures can be of practical use in optimization.
The Design and Implementation of the SELF Compiler, an Optimizing Compiler for Object-Oriented Programming Languages
, 1992
"... Object-oriented programming languages promise to improve programmer productivity by supporting abstract data types, inheritance, and message passing directly within the language. Unfortunately, traditional implementations of object-oriented language features, particularly message passing, have been ..."
Abstract
-
Cited by 120 (15 self)
- Add to MetaCart
Object-oriented programming languages promise to improve programmer productivity by supporting abstract data types, inheritance, and message passing directly within the language. Unfortunately, traditional implementations of object-oriented language features, particularly message passing, have been much slower than traditional implementations of their non-object-oriented counterparts: the fastest existing implementation of Smalltalk-80 runs at only a tenth the speed of an optimizing C implementation. The dearth of suitable implementation technology has forced most object-oriented languages to be designed as hybrids with traditional non-object-oriented languages, complicating the languages and making programs harder to extend and reuse. This dissertation describes a collection of implementation techniques that can improve the run-time performance of object-oriented languages, in hopes of reducing the need for hybrid languages and encouraging wider spread of purely object-oriented langu...
Profile-guided automatic inline expansion for C programs
- SOFTWARE PRACTICE AND EXPERIENCE
, 1992
"... This paper describes critical implementation issues that must be addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program modi cation. An automatic inter- le inliner that uses pro ..."
Abstract
-
Cited by 109 (5 self)
- Add to MetaCart
This paper describes critical implementation issues that must be addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program modi cation. An automatic inter- le inliner that uses pro le information has been implemented and integrated into an optimizing C compiler. The experimental results show that this inliner achieves signi cant speedups for production C programs.
Modular Interprocedural Pointer Analysis Using Access Paths: Design, Implementation, and Evaluation
, 2000
"... In this paper we present a modular interprocedural pointer analysis algorithm based on access-paths for C programs. We argue that access paths can reduce the overhead of representing context-sensitive transfer functions and effectively distinguish non-recursive heap objects. And when the modular ana ..."
Abstract
-
Cited by 86 (6 self)
- Add to MetaCart
In this paper we present a modular interprocedural pointer analysis algorithm based on access-paths for C programs. We argue that access paths can reduce the overhead of representing context-sensitive transfer functions and effectively distinguish non-recursive heap objects. And when the modular analysis paradigm is used together with other techniques to handle type casts and function pointers, we are able to handle significant programs like those in the SPECcint92 and SPECcint95 suites. We have implemented the algorithm and tested it on a Pentium II 450 PC running Linux. The observed resource consumption and performance improvement are very encouraging.
Towards Better Inlining Decisions Using Inlining Trials
- In Proceedings of the 1994 ACM Conference on LISP and Functional Programming
, 1994
"... Inlining trials are a general mechanism for making better automatic decisions about whether a routine is profitable to inline. Unlike standard source-level inlining heuristics, an inlining trial captures the effects of optimizations applied to the body of the inlined routine when calculating the cos ..."
Abstract
-
Cited by 53 (6 self)
- Add to MetaCart
Inlining trials are a general mechanism for making better automatic decisions about whether a routine is profitable to inline. Unlike standard source-level inlining heuristics, an inlining trial captures the effects of optimizations applied to the body of the inlined routine when calculating the costs and benefits of inlining. The results of inlining trials are stored in a persistent database to be reused when making future inlining decisions at similar call sites. Type group analysis can determine the amount of available static information exploited during compilation, and the results of analyzing the compilation of an inlined routine help decide when a future call site would lead to substantially the same generated code as a given inlining trial. We have implemented inlining trials and type group analysis in an optimizing compiler for SELF, and by making wiser inlining decisions we were able to cut compilation time and compiled code space with virtually no loss of execution speed. We...
Obtaining Sequential Efficiency for Concurrent Object-Oriented Languages
- In Proceedings of the ACM Symposium on the Principles of Programming Languages
, 1995
"... Concurrent object-oriented programming (COOP) languages focus the abstraction and encapsulation power of abstract data types on the problem of concurrency control. In particular, pure fine-grained concurrent object-oriented languages (as opposed to hybrid or data parallel) provides the programmer wi ..."
Abstract
-
Cited by 47 (15 self)
- Add to MetaCart
Concurrent object-oriented programming (COOP) languages focus the abstraction and encapsulation power of abstract data types on the problem of concurrency control. In particular, pure fine-grained concurrent object-oriented languages (as opposed to hybrid or data parallel) provides the programmer with a simple, uniform, and flexible model while exposing maximum concurrency. While such languages promise to greatly reduce the complexity of large-scale concurrent programming, the popularity of these languages has been hampered by efficiency which is often many orders of magnitude less than that of comparable sequential code. We present a sufficient set of techniques which enables the efficiency of fine-grained concurrent object-oriented languages to equal that of traditional sequential languages (like C) when the required data is available. These techniques are empirically validated by the application to a COOP implementation of the Livermore Loops. 1 Introduction The increasing use of ...
Compiling for the Multiscalar Architecture
, 1998
"... High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world software that run on the computers. State-of-the-art microprocessors exploit instruction level para ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world software that run on the computers. State-of-the-art microprocessors exploit instruction level parallelism (ILP) to achieve high performance on such applications by searching for independent instructions in a dynamic window of instructions and executing them on a wide-issue pipeline. Increasing the window size and the issue width to extract more ILP may hinder achieving high clock speeds, limiting overall performance. The Multiscalar architecture employs multiple small windows and many narrow-issue processing units to exploit ILP at high clock speeds. Sequential programs are partitioned into code fragments called tasks, which are speculatively executed in parallel. Inter-task register dependences are honored via communication and synchronization and inter-task control flow and memory depe...
Interprocedural Transformations for Parallel Code Generation
- IN PROCEEDINGS OF SUPERCOMPUTING '91
, 1991
"... We present a new approach that enables compiler optimization of procedure calls and loop nests containing procedure calls. We introduce two interprocedural transformations that move loops across procedure boundaries, exposing them to traditional optimizations on loop nests. These transformations are ..."
Abstract
-
Cited by 41 (14 self)
- Add to MetaCart
We present a new approach that enables compiler optimization of procedure calls and loop nests containing procedure calls. We introduce two interprocedural transformations that move loops across procedure boundaries, exposing them to traditional optimizations on loop nests. These transformations are incorporated into a code generation algorithm for a shared-memory multiprocessor. The code generator relies on a machine model to estimate the expected benefits of loop parallelization and parallelism-enhancing transformations. Several transformation strategies are explored and one that minimizes total execution time is selected. Efficient support of this strategy is provided by an existing interprocedural compilation system. We demonstrate the potential of these techniques by applying this code generation strategy to two scientific applications programs.
Automatic and Interactive Parallelization
, 1994
"... The goal of this dissertation is to give programmers the ability to achieve high performance by focusing on developing parallel algorithms, rather than on architecturespecific details. The advantages of this approach also include program portability and legibility. To achieve high performance, we pr ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
The goal of this dissertation is to give programmers the ability to achieve high performance by focusing on developing parallel algorithms, rather than on architecturespecific details. The advantages of this approach also include program portability and legibility. To achieve high performance, we provide automatic compilation techniques that tailor parallel algorithms to shared-memory multiprocessors with local caches and a common bus. In particular, the compiler maps complete applications onto the specifics of a machine, exploiting both parallelism and memory. To optimize complete applications, we develop novel, general algorithms to transform loops that contain arbitrary conditional control flow. In addition, we provide new interprocedural transformations which enable optimization across procedure boundaries. These techniques provide the basis for a robust automatic parallelizing algorithm that is applicable to complete programs. The algorithm for automatic parallel code generation t...

