Results 1 -
9 of
9
Reconciling responsiveness with performance in pure object-oriented languages
- ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1996
"... Dynamically-dispatched calls often limit the performance of object-oriented programs since object-oriented programming encourages factoring code into small, reusable units, thereby increasing the frequency of these expensive operations. Frequent calls not only slow down execution with the dispatch o ..."
Abstract
-
Cited by 55 (0 self)
- Add to MetaCart
Dynamically-dispatched calls often limit the performance of object-oriented programs since object-oriented programming encourages factoring code into small, reusable units, thereby increasing the frequency of these expensive operations. Frequent calls not only slow down execution with the dispatch overhead per se, but more importantly they hinder optimization by limiting the range and effectiveness of standard global optimizations. In particular, dynamicallydispatched calls prevent standard interprocedural optimizations that depend on the availability of a static call graph. The SELF implementation described here offers two novel approaches to optimization. Type feedback speculatively inlines dynamically-dispatched calls based on profile information that predicts likely receiver classes. Adaptive optimization reconciles optimizing compilation with interactive performance by incrementally optimizing only the frequently-executed parts of a program. When combined, these two techniques result in a system that can execute programs significantly faster than previous systems while retaining much of the interactiveness of an interpreted system.
Eliminating virtual function calls in C++ programs
, 1996
"... We have designed and implemented an optimizing source-to-source C++ compiler that reduces the frequency of virtual function calls. This technical report describes our preliminary experience with this system. The prototype implementation demonstrates the value of OO-specific optimization of C++. Desp ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
We have designed and implemented an optimizing source-to-source C++ compiler that reduces the frequency of virtual function calls. This technical report describes our preliminary experience with this system. The prototype implementation demonstrates the value of OO-specific optimization of C++. Despite some limitations of our system, and despite the low frequency of virtual function calls in some of the programs, optimization improves the performance of a suite of two small and six large C++ applications totalling over 90,000 lines of code by a median of 20% over the original programs and reduces the number of virtual function calls by a median factor of 5. For more call-intensive versions of the same programs, performance improved by a median of 40 % and the number of virtual calls dropped by a factor of 21. Our measurements indicate that inlining does not necessarily lead to large increases in code size, and that for most programs, the instruction cache miss ratio does not increase significantly.
Generating Efficient Protocol Code from an Abstract Specification
, 1996
"... A protocol compiler takes as input an abstract specification of a protocol and generates an implementation of that protocol. Protocol compilers usually produce inefficient code both in terms of code speed and code size. In this paper, we show that by compiling a modular specification into an integra ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
A protocol compiler takes as input an abstract specification of a protocol and generates an implementation of that protocol. Protocol compilers usually produce inefficient code both in terms of code speed and code size. In this paper, we show that by compiling a modular specification into an integrated automaton and by selectively optimizing its different transitions, it is possible to automatically generate efficient protocol code. Our protocol compiler takes as input a protocol specification in the synchronous language Esterel and compiles it into a C implementation. This process is divided into two stages. First, the specicfiation is compiled into an integrated automaton by the Esterel front end. This automaton is then optimized and converted into an efficient C implementation by a protocol code optimizer called HIPPCO. HIPPCO improves performance and reduces code size by simultaneously optimizing the performance of common path whi...
FIAT: A Framework for Interprocedural Analysis and Transformation
, 1995
"... Modern architectures with deep memory hierarchies or parallehsm require the use of increasingly sophisticated code analysis and optimization to achieve maximum performance for large, scientific programs. In such ..."
Abstract
-
Cited by 48 (7 self)
- Add to MetaCart
Modern architectures with deep memory hierarchies or parallehsm require the use of increasingly sophisticated code analysis and optimization to achieve maximum performance for large, scientific programs. In such
In or out? Putting write barriers in their place
- IN ACM SIGPLAN INTERNATIONAL SYMPOSIUM ON MEMORY MANAGEMENT (ISMM
, 2002
"... In many garbage collected systems, the mutator performs a write barrier for every pointer update. Using generational garbage collectors, we study in depth three code placement options for rememberedset write barriers: inlined, out-of-line, and partially inlined (fast path inlined, slow path out-of-l ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In many garbage collected systems, the mutator performs a write barrier for every pointer update. Using generational garbage collectors, we study in depth three code placement options for rememberedset write barriers: inlined, out-of-line, and partially inlined (fast path inlined, slow path out-of-line). The fast path determines if the collector needs to remember the pointer update. The slow path records the pointer in a list when necessary. Efficient implementations minimize the instructions on the fast path, and record few pointers (from 0.16 to 3 % of pointer stores in our benchmarks). We find the mutator performs best with a partially inlined barrier, by a modest 1.5 % on average over full inlining. We also study the compilation cost of write-barrier code placement. We find that partial inlining reduces the compilation cost by 20 to 25 % compared to full inlining. In the context of just-in-time compilation, the application is exposed to compiler activity. Regardless of the level of compiler activity, partial inlining consistently gives a total running time performance advantage over full inlining on the SPEC JVM98 benchmarks. When the compiler optimizes all application methods on demand and compiler load is highest, partial inlining improves total performance on average by 10.2%, and up to 18.5%.
Recursion Unrolling for Divide and Conquer Programs
, 2000
"... This paper presents recursion unrolling, a technique for improving the performance of recursive computations. Conceptually, recursion unrolling inlines recursive calls to reduce control flow overhead and increase the size of the basic blocks in the computation, which in turn increases the effective ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents recursion unrolling, a technique for improving the performance of recursive computations. Conceptually, recursion unrolling inlines recursive calls to reduce control flow overhead and increase the size of the basic blocks in the computation, which in turn increases the effectiveness of standard compiler optimizations such as register allocation and instruction scheduling. We have identified two transformations that significantly improve the effectiveness of the basic recursion unrolling technique. Conditional fusion merges conditionals with identical expressions, considerably simplifying the control flow in unrolled procedures. Recursion re-rolling rolls back the recursive part of the procedure to ensure that a large unrolled base case is always executed, regardless of the input problem size. We have implemented our techniques and applied them to an important class of recursive programs, divide and conquer programs. Our experimental results show that recursion unrolling can improve the performance of our programs by a factor of between 3.6 to 10.8 depending on the combination of the program and the architecture.
Practical Techniques For Virtual Call Resolution In Java
- McGill University
, 1999
"... Virtual method calls are a fundamental feature offered by Java, an object-oriented programming language. However, they are also a source of degradation of performance at run time and imprecision in interprocedural analyses. There are several well known, inexpensive analyses that have been developed ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Virtual method calls are a fundamental feature offered by Java, an object-oriented programming language. However, they are also a source of degradation of performance at run time and imprecision in interprocedural analyses. There are several well known, inexpensive analyses that have been developed for virtual call resolution. However, they have been observed to be effective in resolving method calls in library code, while not being very effective in the benchmark code excluding libraries. We present a new flow insensitive and context insensitive analysis called reaching type analysis in this thesis. We present the analysis rules for two variations of this analysis, variable type analysis and a coarser grained version declared type analysis. Reaching type analysis is based on an analysis that builds a type propagation graph where nodes represent variables and edges represent the flow of types due to assignments. We have implemented variable type analysis and declared type analysis, and tw...
Using Compiler Technology to Drive Advanced Microprocessors
- In DARPA Software Technology Conference
, 1992
"... Recent years have seen the introduction of a series of ever faster, ever more complex microprocessors. These advanced microprocessors have found widespread application in machines that range from personal computers to engineering workstations to massively parallel multicomputers. Unfortunately, ma ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Recent years have seen the introduction of a series of ever faster, ever more complex microprocessors. These advanced microprocessors have found widespread application in machines that range from personal computers to engineering workstations to massively parallel multicomputers. Unfortunately, many of the features used to endow these processors with high peak performance numbers are difficult for either human programmers or compilers to manage. This paper looks at broad trends in microprocessor architecture, relates them back to the basic problems that they present to a compiler, and examines the kind of compiler infrastructure that will be required to address them. 1 Overview Developments in the design of microprocessors will shape tomorrow's computing systems. Microprocessorbased personal computers and workstations dominate the desktop; today's fastest supercomputers are actually large collections of microprocessors linked together in some regular way. Recent years have seen...
Génération Automatique d'Implantations Optimisées de Protocoles
"... Un compilateur de protocoles prend en entr#e une sp#cication formelle d'un protocole et g#n#re automatiquement son implantation. Les compilateurs de protocoles produisent g#n#ralement des implantations tr#s peu performantes en terme de vitesse et de taille de code. Dans cet article, nous montrons ..."
Abstract
- Add to MetaCart
Un compilateur de protocoles prend en entr#e une sp#cication formelle d'un protocole et g#n#re automatiquement son implantation. Les compilateurs de protocoles produisent g#n#ralement des implantations tr#s peu performantes en terme de vitesse et de taille de code. Dans cet article, nous montrons que la combinaison de deux approches rend la g#n#ra tion automatique d'implantations performantes possible. Ces techniques sont i) l'utilisation d'un compilateur synchrone qui g#n#re # partir de la sp#cication modulaire un automate minimum (au lieu de plusieurs automates ind#pendants), et ii) l'utilisation d'un optimiseur qui am#liore la structure de l'automate et g#n#re une implantation C optimis#e. Nous avons d#velopp# un compilateur de protocoles qui combine ces deux techniques. Ce compilateur prend en entr#e une sp#cication de protocole #crite en Esterel. La sp#ci cation est compil#e en un automate int#gr# par le premi#re passe du compilateur Esterel. L'automate est ensuite opti...

