Results 1 - 10
of
21
Interprocedural query extraction for transparent persistence
- In Proc. of ACM Conf. on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA
, 2008
"... Transparent persistence promises to integrate programming languages and databases by allowing procedural programs to access persistent data with the same ease as non-persistent data. Transparent persistence is more likely to be adopted if it leverages the performance and transaction management of re ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Transparent persistence promises to integrate programming languages and databases by allowing procedural programs to access persistent data with the same ease as non-persistent data. Transparent persistence is more likely to be adopted if it leverages the performance and transaction management of relational databases. Since creating good relational queries from procedural programs is hard, most practical systems compromise transparency to achieve performance. In this work we demonstrate the practical feasibility of a technique for extracting relational queries from object-oriented programs. A program analysis derives query structure and conditions across methods that access persistent data. The system combines static analysis and runtime query composition to handle procedures that return persistent values. Our prototype Java compiler implements the analysis, and handles recursion and parameterized queries. We evaluate the effectiveness of the optimization on the 007 and TORPEDO benchmarks, showing that automatic optimizations are in some cases as efficient as hand-tuned code. 1.
Looking back on the language and hardware revolution: Measured power performance, and scaling
- In 16th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2011
"... This paper reports and analyzes measured chip power and performance on five process technology generations executing 61 diverse benchmarks with a rigorous methodology. We measure representative Intel IA32 processors with technologies ranging from 130nm to 32nm while they execute sequential and paral ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
This paper reports and analyzes measured chip power and performance on five process technology generations executing 61 diverse benchmarks with a rigorous methodology. We measure representative Intel IA32 processors with technologies ranging from 130nm to 32nm while they execute sequential and parallel benchmarks written in native and managed languages. During this period, hardware and software changed substantially: (1) hardware vendors delivered chip multiprocessors instead of uniprocessors, and independently (2) software developers increasingly chose managed languages instead of native languages. This quantitative data reveals the extent of some known and previously unobserved hardware and software trends. Two themes emerge. (I) Workload: The power, performance, and energy trends of native workloads do not approximate managed workloads. For example, (a) the SPEC CPU2006 native benchmarks on the i7 (45) and i5 (32) draw significantly less power than managed or scalable native benchmarks; and (b) managed runtimes exploit parallelism even when running single-threaded applications. The results recommend architects always include native and managed workloads when designing and evaluating energy efficient hardware. (II) Architecture: Clock scaling, microarchitecture, simultaneous multithreading, and chip multiprocessors each elicit a huge variety of power, performance, and energy responses. This variety and the difficulty of obtaining power measurements recommends exposing on-chip power meters and when possible structure specific power meters for cores, caches, and other structures. Just as hardware event counters provide a quantitative grounding for performance innovations, power meters are necessary for optimizing energy.
Java Performance Evaluation through Rigorous Replay Compilation
- In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications
, 2008
"... A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, nondetermini ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, nondeterminism due to timer-based sampling for JIT optimization, thread scheduling, and various system effects further complicate the Java performance benchmarking process. Replay compilation is a recently introduced Java performance analysis methodology that aims at controlling nondeterminism to improve experimental repeatability. The key idea of replay compilation is to control the compilation load during experimentation by inducing a pre-recorded compilation plan at replay time. Replay compilation also enables teasing apart performance effects of the application versus the virtual machine. This paper argues that in contrast to current practice which uses a single compilation plan at replay time, multiple compilation plans add statistical rigor to the replay compilation methodology. By doing so, replay compilation better accounts for the variability observed in compilation load across compilation plans. In addition, we propose matchedpair comparison for statistical data analysis. Matched-pair comparison considers the performance measurements per compilation plan before and after an innovation of interest as a pair, which enables limiting the number of compilation plans needed for accurate performance analysis compared to statistical analysis assuming unpaired measurements.
Transparent Reputation Management for Composite Web Services
"... The dependability of composite services is largely affected by their constituent Web services. Composite services have to operate in an open and dynamically changing environment in order to leverage the best performing services available at the moment. Hence, there is the need for an efficient mecha ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The dependability of composite services is largely affected by their constituent Web services. Composite services have to operate in an open and dynamically changing environment in order to leverage the best performing services available at the moment. Hence, there is the need for an efficient mechanism to provide reliable service rankings. In this paper we present a novel, generic, and customizable reputation infrastructure to automatically and transparently monitor the execution of composite services, taking both functional and non-functional properties into account. The experienced Web service Quality-of-Service is communicated to a configurable reputation mechanism that publishes service rankings. Our reputation infrastructure supports notifications upon changes in service reputation, enabling self-tuning and self-healing properties in the execution of composite services. We implemented our architecture using standard technologies, such as BPEL and JavaEE. Performance measurements show that our infrastructure causes only moderate overhead.
Influence of program inputs on the selection of garbage collectors
- In Proceedings of the International Conference on Virtual Execution Environments (VEE
, 2009
"... Many studies have shown that the best performer among a set of garbage collectors tends to be different for different applications. Researchers have proposed application-specific selection of garbage collectors. In this work, we concentrate on a second dimension of the problem: the influence of prog ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Many studies have shown that the best performer among a set of garbage collectors tends to be different for different applications. Researchers have proposed application-specific selection of garbage collectors. In this work, we concentrate on a second dimension of the problem: the influence of program inputs on the selection of garbage collectors. We collect tens to hundreds of inputs for a set of Java benchmarks, and measure their performance on Jikes RVM with different heap sizes and garbage collectors. A rigorous statistical analysis produces four-fold insights. First, inputs influence the relative performance of garbage collectors significantly, causing large variations to the top set of garbage collectors across inputs. Profiling one or few runs is thus inadequate for selecting the garbage collector that works well for most inputs. Second, when the heap size ratio is fixed, one or two types of garbage collectors are enough to stimulate the top performance of the program on all inputs. Third, for some programs, the heap size ratio significantly affects the relative performance of different types of garbage collectors. For the selection of garbage collectors on those programs, it is necessary to have a cross-input predictive model that predicts the minimum possible heap size of the execution on an arbitrary input. Finally, based on regression techniques, we demonstrate the predictability of the minimum possible heap size, indicating the potential feasibility of the input-specific selection of garbage collectors.
Automated Just-In-Time Compiler Tuning
"... Managed runtime systems, such as a Java virtual machine (JVM), are complex pieces of software with many interacting components. The Just-In-Time (JIT) compiler is at the core of the virtual machine, however, tuning the compiler for optimum performance is a challenging task. There are (i) many compil ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Managed runtime systems, such as a Java virtual machine (JVM), are complex pieces of software with many interacting components. The Just-In-Time (JIT) compiler is at the core of the virtual machine, however, tuning the compiler for optimum performance is a challenging task. There are (i) many compiler optimizations and options, (ii) there may be multiple optimization levels (e.g.,-O0,-O1,-O2), each with a specific optimization plan consisting of a collection of optimizations, (iii) the Adaptive Optimization System (AOS) that decides which method to optimize to which optimization level requires fine-tuning, and (iv) the effectiveness of the optimizations depends on the application as well as on the hardware platform. Current practice is to manually tune the JIT compiler which is both tedious and very time-consuming, and in addition may lead to suboptimal performance. This paper proposes automated tuning of the JIT compiler through multi-objective evolutionary search. The proposed framework (i) identifies optimization plans that are Pareto-optimal in terms of compilation time and code quality, (ii) assigns these plans to optimization levels, and (iii) fine-tunes the AOS accordingly. The key benefit of our framework is that it automates the entire exploration process, which enables tuning the JIT compiler for a given hardware platform and/or application at very low cost. By automatically tuning Jikes RVM using our framework for average performance across the DaCapo and SPECjvm98 benchmark suites, we achieve similar performance to the hand-tuned default Jikes RVM. When optimizing the JIT compiler for individual benchmarks, we achieve statistically significant speedups for most benchmarks, up to 40 % for start-up and up to 19 % for steady-state performance. We also show that tuning the JIT compiler for a new hardware platform can yield significantly better performance compared to using a JIT compiler that was tuned for another platform.
Using HPM-Sampling to Drive Dynamic Compilation
"... All high-performance production JVMs employ an adaptive strategy for program execution. Methods are first executed unoptimized and then an online profiling mechanism is used to find a subset of methods that should be optimized during the same execution. This paper empirically evaluates the design sp ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
All high-performance production JVMs employ an adaptive strategy for program execution. Methods are first executed unoptimized and then an online profiling mechanism is used to find a subset of methods that should be optimized during the same execution. This paper empirically evaluates the design space of several profilers for initiating dynamic compilation and shows that existing online profiling schemes suffer from several limitations. They provide an insufficient number of samples, are untimely, and have limited accuracy at determining the frequently executed methods. We describe and comprehensively evaluate HPM-sampling, a simple but effective profiling scheme for finding optimization candidates using hardware performance monitors (HPMs) that addresses the aforementioned limitations. We show that HPM-sampling is more accurate; has low overhead; and improves performance by 5.7 % on average and up to 18.3% when compared to the default system in Jikes RVM, without changing the compiler.
Evaluating the Accuracy of Java Profilers
"... Performance analysts profile their programs to find methods that are worth optimizing: the “hot ” methods. This paper shows that four commonly-used Java profilers (xprof, hprof, jprofile, and yourkit) often disagree on the identity of the hot methods. If two profilers disagree, at least one must be ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Performance analysts profile their programs to find methods that are worth optimizing: the “hot ” methods. This paper shows that four commonly-used Java profilers (xprof, hprof, jprofile, and yourkit) often disagree on the identity of the hot methods. If two profilers disagree, at least one must be incorrect. Thus, there is a good chance that a profiler will mislead a performance analyst into wasting time optimizing a cold method with little or no performance improvement. This paper uses causality analysis to evaluate profilers and to gain insight into the source of their incorrectness. It shows that these profilers all violate a fundamental requirement for samplingbased profilers: to be correct, a sampling-based profiler must collect samples randomly. We show that a proof-of-concept profiler, which collects samples randomly, does not suffer from the above problems. Specifically, we show, using a number of case studies, that our profiler correctly identifies methods that are important to optimize; in some cases other profilers report that these methods are cold and thus not worth optimizing. C.4 [Measurement tech-
Predicting Performance via Automated Feature-Interaction Detection
"... Abstract—Customizable programs and program families provide user-selectable features to allow users to tailor a program to an application scenario. Knowing in advance which feature selection yields the best performance is difficult because a direct measurement of all possible feature combinations is ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract—Customizable programs and program families provide user-selectable features to allow users to tailor a program to an application scenario. Knowing in advance which feature selection yields the best performance is difficult because a direct measurement of all possible feature combinations is infeasible. Our work aims at predicting program performance based on selected features. However, when features interact, accurate predictions are challenging. An interaction occurs when a particular feature combination has an unexpected influence on performance. We present a method that automatically detects performance-relevant feature interactions to improve prediction accuracy. To this end, we propose three heuristics to reduce the number of measurements required to detect interactions. Our evaluation consists of six real-world case studies from varying domains (e.g., databases, encoding libraries, and web servers) using different configuration techniques (e.g., configuration files and preprocessor flags). Results show an average prediction accuracy of 95 %. I.
Power and Performance of Native and Java Benchmarks on 130nm to 32nm Process Technologies
"... Over the past decade, chip fabrication technology shrank from 130nm to 32nm. This reduction was generally considered to provide performance improvements together with chip power reductions. This paper examines how well process technology and microarchitecture delivered on this assumption. This paper ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Over the past decade, chip fabrication technology shrank from 130nm to 32nm. This reduction was generally considered to provide performance improvements together with chip power reductions. This paper examines how well process technology and microarchitecture delivered on this assumption. This paper evaluates power and performance of native and Java workloads across a selection of IA32 processors from five technology generations (130nm, 90nm, 65nm, 45nm, and 32nm). We use a Hall effect sensor to accurately measure chip power. This paper reports a range findings in three areas. 1) Methodology: TDP is unsurprisingly a poor predictor of application power consumption for a particular processor, but worse, TDP is a poor predictor of relative power consumption between processors. 2) Power-performance trends: Processors appear to have already hit the power wall at 45nm. 3) Native versus Java workloads and their relationship to processor technology: Single threaded Java workloads exploit multiple cores. These results indicate that Java workloads offer different opportunities and challenges compared to native workloads. Our findings challenge prevalent methodologies and offer new insight into how microarchitectures have traded power and performance as process technology shrank. 1.

