Results 1 - 10
of
10
Online performance auditing: using hot optimizations without getting burned
- In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation
, 2006
"... As hardware complexity increases and virtualization is added at more layers of the execution stack, predicting the performance impact of optimizations becomes increasingly difficult. Production compilers and virtual machines invest substantial development effort in performance tuning to achieve good ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
As hardware complexity increases and virtualization is added at more layers of the execution stack, predicting the performance impact of optimizations becomes increasingly difficult. Production compilers and virtual machines invest substantial development effort in performance tuning to achieve good performance for a range of benchmarks. Although optimizations typically perform well on average, they often have unpredictable impact on running time, sometimes degrading performance significantly. Today’s VMs perform sophisticated feedback-directed optimizations, but these techniques do not address performance degradations, and they actually make the situation worse by making the system more unpredictable. This paper presents an online framework for evaluating the effectiveness of optimizations, enabling an online system to automatically identify and correct performance anomalies that occur at runtime. This work opens the door for a fundamental shift in the way optimizations are developed and tuned for online systems, and may allow the body of work in offline empirical optimization search to be applied automatically at runtime. We present our implementation and evaluation of this system in a product Java VM.
ABSTRACT COLE: Compiler Optimization Level Exploration
"... Modern compilers implement a large number of optimizations which all interact in complex ways, and which all have a different impact on code quality, compilation time, code size, energy consumption, etc. For this reason, compilers typically provide a limited number of standard optimization levels, s ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Modern compilers implement a large number of optimizations which all interact in complex ways, and which all have a different impact on code quality, compilation time, code size, energy consumption, etc. For this reason, compilers typically provide a limited number of standard optimization levels, such as-O1,-O2,-O3 and-Os, that combine various optimizations providing a number of trade-offs between multiple objective functions (such as code quality, compilation time and code size). The construction of these optimization levels, i.e., choosing which optimizations to activate at each level, is a manual process typically done using high-level heuristics based on the compiler developer’s experience. This paper proposes COLE, Compiler Optimization Level Exploration, a framework for automatically finding Pareto optimal optimization levels through multi-objective evolutionary searching. Our experimental results using GCC and the SPEC CPU benchmarks show that the automatic construction of optimization levels is feasible in practice, and in addition, yields better optimization levels than GCC’s manually derived (-Os,-O1,-O2 and-O3) optimization levels, as well as the optimization levels obtained through random sampling. We also demonstrate that COLE can be used to gain insight into the effectiveness of compiler optimizations as well as to better understand a benchmark’s sensitivity to compiler optimizations.
Abstract YETI: a graduallY Extensible Trace Interpreter
, 2008
"... The implementation of new programming languages benefits from interpretation because it is simple, flexible and portable. The only downside is speed of execution, as there remains a large performance gap between even efficient interpreters and systems that include a just-in-time (JIT) compiler. Augm ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
The implementation of new programming languages benefits from interpretation because it is simple, flexible and portable. The only downside is speed of execution, as there remains a large performance gap between even efficient interpreters and systems that include a just-in-time (JIT) compiler. Augmenting an interpreter with a JIT, however, is not a small task. Today, Java JITs are typically method-based. To compile whole methods, the JIT must re-implement much functionality already provided by the interpreter, leading to a “big bang ” development effort before the JIT can be deployed. Adding a JIT to an interpreter would be easier if we could more gradually shift from dispatching virtual instructions bodies implemented for the interpreter to running instructions compiled into native code by the JIT. We show that virtual instructions implemented as lightweight callable routines can form the basis for a very efficient interpreter. Our new technique, interpreted traces, identifies hot paths, or traces, as a virtual program is interpreted. By exploiting the way traces predict branch desti-nations our technique markedly reduces branch mispredictions caused by dispatch. Interpreted traces are a high-performance technique, running about 25 % faster than direct threading. We show that interpreted traces are a good starting point for a trace-based JIT. We extend
Java Performance Evaluation through Rigorous Replay Compilation
- In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications
, 2008
"... A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, nondetermini ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, nondeterminism due to timer-based sampling for JIT optimization, thread scheduling, and various system effects further complicate the Java performance benchmarking process. Replay compilation is a recently introduced Java performance analysis methodology that aims at controlling nondeterminism to improve experimental repeatability. The key idea of replay compilation is to control the compilation load during experimentation by inducing a pre-recorded compilation plan at replay time. Replay compilation also enables teasing apart performance effects of the application versus the virtual machine. This paper argues that in contrast to current practice which uses a single compilation plan at replay time, multiple compilation plans add statistical rigor to the replay compilation methodology. By doing so, replay compilation better accounts for the variability observed in compilation load across compilation plans. In addition, we propose matchedpair comparison for statistical data analysis. Matched-pair comparison considers the performance measurements per compilation plan before and after an innovation of interest as a pair, which enables limiting the number of compilation plans needed for accurate performance analysis compared to statistical analysis assuming unpaired measurements.
A parallel dynamic compiler for cil bytecode
- SIGPLAN Not
, 2008
"... Multi-core technology is being employed in most recent high-performance architectures. Such architectures need specifically designed multi-threaded software to exploit all the potentialities of their hardware parallelism. At the same time, object code virtualization technologies are achieving a grow ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Multi-core technology is being employed in most recent high-performance architectures. Such architectures need specifically designed multi-threaded software to exploit all the potentialities of their hardware parallelism. At the same time, object code virtualization technologies are achieving a growing popularity, as they allow higher levels of software portability and reuse. Thus, a virtual execution environment running on a multi-core processor has to run complex, high-level applications and to exploit as much as possible the underlying parallel hardware. We propose an approach that leverages on CMP features to expose a novel pipeline synchronization model for the internal threads of the dynamic compiler. Thanks to compilation latency masking effect of the pipeline organization, our dynamic compiler, ILDJIT, is able to achieve significant speedups (26 % on average) with respect to the baseline, when the underlying hardware exposes at least two cores.
Dynamic Compilation: The Benefits of Early Investing
, 2007
"... Dynamic compilation is typically performed in a separate thread, asynchronously with the remaining application threads. This compilation thread is often scheduled for execution in a simple roundrobin fashion either by the operating system or by the virtual machine itself. Despite the popularity of t ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Dynamic compilation is typically performed in a separate thread, asynchronously with the remaining application threads. This compilation thread is often scheduled for execution in a simple roundrobin fashion either by the operating system or by the virtual machine itself. Despite the popularity of this approach in production virtual machines, it has a number of shortcomings that can lead to suboptimal performance. This paper explores a number of issues surrounding asynchronous dynamic compilation in a virtual machine. We begin by describing the shortcomings of current approaches and demonstrate their potential to perform poorly under certain conditions. We describe the importance of enforcing a minimum level of utilization for the compilation thread, and evaluate the performance implications of varying the utilization that is enforced. We observed surprisingly large speedups by increasing the priority of the compilation thread, averaging 18.2 % improvement over a large benchmark suite. Finally, we discuss options for implementing these techniques in a VM and address relevant issues when moving from a single-processor to a multiprocessor machine.
Automated Just-In-Time Compiler Tuning
"... Managed runtime systems, such as a Java virtual machine (JVM), are complex pieces of software with many interacting components. The Just-In-Time (JIT) compiler is at the core of the virtual machine, however, tuning the compiler for optimum performance is a challenging task. There are (i) many compil ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Managed runtime systems, such as a Java virtual machine (JVM), are complex pieces of software with many interacting components. The Just-In-Time (JIT) compiler is at the core of the virtual machine, however, tuning the compiler for optimum performance is a challenging task. There are (i) many compiler optimizations and options, (ii) there may be multiple optimization levels (e.g.,-O0,-O1,-O2), each with a specific optimization plan consisting of a collection of optimizations, (iii) the Adaptive Optimization System (AOS) that decides which method to optimize to which optimization level requires fine-tuning, and (iv) the effectiveness of the optimizations depends on the application as well as on the hardware platform. Current practice is to manually tune the JIT compiler which is both tedious and very time-consuming, and in addition may lead to suboptimal performance. This paper proposes automated tuning of the JIT compiler through multi-objective evolutionary search. The proposed framework (i) identifies optimization plans that are Pareto-optimal in terms of compilation time and code quality, (ii) assigns these plans to optimization levels, and (iii) fine-tunes the AOS accordingly. The key benefit of our framework is that it automates the entire exploration process, which enables tuning the JIT compiler for a given hardware platform and/or application at very low cost. By automatically tuning Jikes RVM using our framework for average performance across the DaCapo and SPECjvm98 benchmark suites, we achieve similar performance to the hand-tuned default Jikes RVM. When optimizing the JIT compiler for individual benchmarks, we achieve statistically significant speedups for most benchmarks, up to 40 % for start-up and up to 19 % for steady-state performance. We also show that tuning the JIT compiler for a new hardware platform can yield significantly better performance compared to using a JIT compiler that was tuned for another platform.
Using HPM-Sampling to Drive Dynamic Compilation
"... All high-performance production JVMs employ an adaptive strategy for program execution. Methods are first executed unoptimized and then an online profiling mechanism is used to find a subset of methods that should be optimized during the same execution. This paper empirically evaluates the design sp ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
All high-performance production JVMs employ an adaptive strategy for program execution. Methods are first executed unoptimized and then an online profiling mechanism is used to find a subset of methods that should be optimized during the same execution. This paper empirically evaluates the design space of several profilers for initiating dynamic compilation and shows that existing online profiling schemes suffer from several limitations. They provide an insufficient number of samples, are untimely, and have limited accuracy at determining the frequently executed methods. We describe and comprehensively evaluate HPM-sampling, a simple but effective profiling scheme for finding optimization candidates using hardware performance monitors (HPMs) that addresses the aforementioned limitations. We show that HPM-sampling is more accurate; has low overhead; and improves performance by 5.7 % on average and up to 18.3% when compared to the default system in Jikes RVM, without changing the compiler.
Language-Independent Sandboxing of Just-In-Time Compilation and Self-Modifying Code
"... When dealing with dynamic, untrusted content, such as on the Web, software behavior must be sandboxed, typically through use of a language like JavaScript. However, even for such speciallydesigned languages, it is difficult to ensure the safety of highlyoptimized, dynamic language runtimes which, fo ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
When dealing with dynamic, untrusted content, such as on the Web, software behavior must be sandboxed, typically through use of a language like JavaScript. However, even for such speciallydesigned languages, it is difficult to ensure the safety of highlyoptimized, dynamic language runtimes which, for efficiency, rely on advanced techniques such as Just-In-Time (JIT) compilation, large libraries of native-code support routines, and intricate mechanisms for multi-threading and garbage collection. Each new runtime provides a new potential attack surface and this security risk raises a barrier to the adoption of new languages for creating untrusted content. Removing this limitation, this paper introduces general mechanisms for safely and efficiently sandboxing software, such as dynamic language runtimes, that make use of advanced, lowlevel
Drie valkuilen bij de prestatie-evaluatie van Javaprogramma's Three Pitfalls in Java Performance Evaluation
"... For Elias, firstborn, who gave profound meaning to existence For Nathan, who showed that there’s no such thing as too much of something good, For I would not be complete without you. Acknowledgements At the onset of this work, the road was unclear. The general direction was known, but there was no t ..."
Abstract
- Add to MetaCart
For Elias, firstborn, who gave profound meaning to existence For Nathan, who showed that there’s no such thing as too much of something good, For I would not be complete without you. Acknowledgements At the onset of this work, the road was unclear. The general direction was known, but there was no telling where we would end up. If it was, it would not be called research. To walk this path alone is not possible, or to quote Newton ‘If I have been able to see further than others, it is because I have stood on the shoulders of giants’. I am much indebted to my promotors, professor Koen De Bosschere and professor Lieven Eeckhout, for offering me the chance to start working at the ELIS department, and to encourage me to start this research. Thank you for your councel, help, understanding that not all things tried eventually work out, and to keep motivating me to run the race until the finish. Every doctoral dissertation has need of a jury (in no particular order): Prof.

