Results 1 - 10
of
10
Shade: A Fast Instruction-Set Simulator for Execution Profiling
, 1994
"... Tracing tools are used widely to help analyze, design, and tune both hardware and software systems. This paper describes a tool called Shade which combines efficient instruction-set simulation with a flexible, extensible trace generation capability. Efficiency is achieved by dynamically compiling an ..."
Abstract
-
Cited by 315 (2 self)
- Add to MetaCart
Tracing tools are used widely to help analyze, design, and tune both hardware and software systems. This paper describes a tool called Shade which combines efficient instruction-set simulation with a flexible, extensible trace generation capability. Efficiency is achieved by dynamically compiling and caching code to simulate and trace the application program. The user may control the extent of tracing in a variety of ways; arbitrarily detailed application state information may be collected during the simulation, but tracing less translates directly into greater efficiency. Current Shade implementations run on SPARC systems and simulate the SPARC (Versions 8 and 9) and MIPS I instruction sets. This paper describes the capabilities, design, implementation, and performance of Shade, and discusses instruction set emulation in general.
ADAPTIVE OPTIMIZATION FOR SELF: RECONCILING HIGH PERFORMANCE WITH EXPLORATORY PROGRAMMING
, 1994
"... Object-oriented programming languages confer many benefits, including abstraction, which lets the programmer hide
the details of an object’s implementation from the object’s clients. Unfortunately, crossing abstraction boundaries
often incurs a substantial run-time overhead in the form of frequent p ..."
Abstract
-
Cited by 95 (6 self)
- Add to MetaCart
Object-oriented programming languages confer many benefits, including abstraction, which lets the programmer hide
the details of an object’s implementation from the object’s clients. Unfortunately, crossing abstraction boundaries
often incurs a substantial run-time overhead in the form of frequent procedure calls. Thus, pervasive use of abstraction,
while desirable from a design standpoint, may be impractical when it leads to inefficient programs.
Aggressive compiler optimizations can reduce the overhead of abstraction. However, the long compilation times
introduced by optimizing compilers delay the programming environment‘s responses to changes in the program.
Furthermore, optimization also conflicts with source-level debugging. Thus, programmers are caught on the horns of
two dilemmas: they have to choose between abstraction and efficiency, and between responsive programming environments
and efficiency. This dissertation shows how to reconcile these seemingly contradictory goals by performing
optimizations lazily.
Four new techniques work together to achieve high performance and high responsiveness:
• Type feedback achieves high performance by allowing the compiler to inline message sends based on information
extracted from the runtime system. On average, programs run 1.5 times faster than the previous SELF system;
compared to a commercial Smalltalk implementation, two medium-sized benchmarks run about three times faster.
This level of performance is obtained with a compiler that is both simpler and faster than previous SELF compilers.
• Adaptive optimization achieves high responsiveness without sacrificing performance by using a fast nonoptimizing
compiler to generate initial code while automatically recompiling heavily used parts of the program
with an optimizing compiler. On a previous-generation workstation like the SPARCstation-2, fewer than 200
pauses exceeded 200 ms during a 50-minute interaction, and 21 pauses exceeded one second. On a currentgeneration
workstation, only 13 pauses exceed 400 ms.
• Dynamic deoptimization shields the programmer from the complexity of debugging optimized code by
transparently recreating non-optimized code as needed. No matter whether a program is optimized or not, it can
always be stopped, inspected, and single-stepped. Compared to previous approaches, deoptimization allows more
debugging while placing fewer restrictions on the optimizations that can be performed.
• Polymorphic inline caching generates type-case sequences on-the-fly to speed up messages sent from the same
call site to several different types of object. More significantly, they collect concrete type information for the
optimizing compiler.
With better performance yet good interactive behavior, these techniques make exploratory programming possible
both for pure object-oriented languages and for application domains requiring higher ultimate performance, reconciling
exploratory programming, ubiquitous abstraction, and high performance.
Reconciling responsiveness with performance in pure object-oriented languages
- ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1996
"... Dynamically-dispatched calls often limit the performance of object-oriented programs since object-oriented programming encourages factoring code into small, reusable units, thereby increasing the frequency of these expensive operations. Frequent calls not only slow down execution with the dispatch o ..."
Abstract
-
Cited by 55 (0 self)
- Add to MetaCart
Dynamically-dispatched calls often limit the performance of object-oriented programs since object-oriented programming encourages factoring code into small, reusable units, thereby increasing the frequency of these expensive operations. Frequent calls not only slow down execution with the dispatch overhead per se, but more importantly they hinder optimization by limiting the range and effectiveness of standard global optimizations. In particular, dynamicallydispatched calls prevent standard interprocedural optimizations that depend on the availability of a static call graph. The SELF implementation described here offers two novel approaches to optimization. Type feedback speculatively inlines dynamically-dispatched calls based on profile information that predicts likely receiver classes. Adaptive optimization reconciles optimizing compilation with interactive performance by incrementally optimizing only the frequently-executed parts of a program. When combined, these two techniques result in a system that can execute programs significantly faster than previous systems while retaining much of the interactiveness of an interpreted system.
A brief history of just-in-time
- ACM Computing Surveys
, 2003
"... Software systems have been using “just-in-time ” compilation (JIT) techniques since the 1960s. Broadly, JIT compilation includes any translation performed dynamically, after a program has started execution. We examine the motivation behind JIT compilation and constraints imposed on JIT compilation s ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
Software systems have been using “just-in-time ” compilation (JIT) techniques since the 1960s. Broadly, JIT compilation includes any translation performed dynamically, after a program has started execution. We examine the motivation behind JIT compilation and constraints imposed on JIT compilation systems, and present a classification scheme for
Type feedback vs. concrete type inference: A comparison of optimization techniques for object-oriented languages
- In Proceedings of the 1995 ACM Conference on Object Oriented Programming Systems, Languages, and Applications
, 1995
"... Abstract: Two promising optimization techniques for object-oriented languages are type feedback (profilebased receiver class prediction) and concrete type inference (static analysis). We directly compare the two techniques, evaluating their effectiveness on a suite of 23 SELF programs while keeping ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Abstract: Two promising optimization techniques for object-oriented languages are type feedback (profilebased receiver class prediction) and concrete type inference (static analysis). We directly compare the two techniques, evaluating their effectiveness on a suite of 23 SELF programs while keeping other factors constant. Our results show that both systems inline over 95 % of all sends and deliver similar overall performance with one exception: SELF’s automatic coercion of machine integers to arbitrary-precision integers upon overflow confounds type inference and slows down arithmeticintensive benchmarks. We discuss several other issues which, given the comparable run-time performance, may influence the
Region Formation Analysis with Demand-driven Inlining for Region-based Optimization
- IN PROCEEDINGS OF THE 2000 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES
, 2000
"... Region-based compilation repartitions a program into more desirable compilation units for optimization and scheduling, particularly beneficial for ILP architectures. With region-based compilation, the compiler can control problem size and complexity by controlling region size and contents, expose in ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Region-based compilation repartitions a program into more desirable compilation units for optimization and scheduling, particularly beneficial for ILP architectures. With region-based compilation, the compiler can control problem size and complexity by controlling region size and contents, expose interprocedural scheduling and optimization opportunities without interprocedural analysis or large function bodies, and create compilation units for program analysis that more accurately reflect the dynamic behavior of the program. This paper presents a region formation algorithm that eliminates the high compile-time memory costs due to an aggressive inlining prepass. Individual subregions are inlined in a demand-driven way during interprocedural region formation. Our experimental results on a subset of the SPEC benchmarks demonstrate a significant reduction in compile-time memory requirements with comparable runtime performance.
Object, Message, and Performance: How they coexist in SELF
"... this paper, we will present the novel implementation techniques that recapture much of the efficiency that would seem to be lost in a pure object-oriented language. For many of the benchmarks we have measured, these techniques have provided a fivefold speedup, enabling SELF programs to come within a ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
this paper, we will present the novel implementation techniques that recapture much of the efficiency that would seem to be lost in a pure object-oriented language. For many of the benchmarks we have measured, these techniques have provided a fivefold speedup, enabling SELF programs to come within a factor of two or three of optimized C. Overview of SELF: A Simple, Pure, Object-Oriented Programming Language
Dynamic vs. static optimization techniques for object-oriented languages
- Theor. Pract. Object Syst
, 1996
"... Abstract: Object-oriented programs can be optimized either dynamically, i.e., based on run-time information, or statically, i.e., based on program analysis alone. Two promising optimization techniques for object-oriented languages are type feedback (dynamic) and concrete type inference (static). We ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract: Object-oriented programs can be optimized either dynamically, i.e., based on run-time information, or statically, i.e., based on program analysis alone. Two promising optimization techniques for object-oriented languages are type feedback (dynamic) and concrete type inference (static). We directly compare the two techniques, evaluating their effectiveness on a suite of 23 SELF programs while keeping other factors constant. Our results show that both systems inline>95 % of all sends and deliver similar overall performance with one exception: SELF’s automatic coercion of machine integers to arbitrary-precision integers upon overflow confounds type inference and slows down arithmetic-intensive benchmarks. We also show that a system combining the two optimizations can combine their strengths and outperform each individual optimization. We discuss several other issues which, given the comparable run-time performance, may influence the choice between type feedback and type inference. 1.
Using Path Spectra to Direct Function Cloning
"... While function cloning can improve the precision of interprocedural analysis and thus the opportunity for optimization by changing the structure of the call graph, its successful application relies on the cloning decisions. This paper explores the use of program spectra comparisons for guiding cloni ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
While function cloning can improve the precision of interprocedural analysis and thus the opportunity for optimization by changing the structure of the call graph, its successful application relies on the cloning decisions. This paper explores the use of program spectra comparisons for guiding cloning decisions. Our hypothesis is that this approach provides a good heuristic for determining which calls contribute different dynamic interprocedural information and thus suggest good candidates for cloning for the purpose of improving optimization.
Scalable Procedure Restructuring for Ambitious Optimization
, 2000
"... Compiler optimization of computer programs is necessary to exploit the features of the target architecture while masking the details of the architecture from the programmer. The continuing trends toward instruction-level parallel computers and larger programs mean that scalable optimization techniqu ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Compiler optimization of computer programs is necessary to exploit the features of the target architecture while masking the details of the architecture from the programmer. The continuing trends toward instruction-level parallel computers and larger programs mean that scalable optimization techniques which increase available parallelism simultaneously with controlling compilation time and memory usage are required. Well-known solutions include using procedure inlining and cloning to increase the instructionlevel parallelism and specificity of analysis in a program, and a region-based compilation framework to improve code quality while bounding some optimization costs. However, these techniques are inherently unscalable because all can lead to excessive compile time memory usage and code growth. My hypothesis is that procedure boundaries and calling relationships can be restructured to improve optimization opportunities in a scalable way. In particular, I propose to investigate compil...

