Results 1 -
8 of
8
Comprehensive Profiling Support in the Java Virtual Machine
, 1999
"... Existing profilers for Java applications typically rely on custom instrumentation in the Java virtual machine, and measure only limited types of resource consumption. Garbage collection and multi-threading pose additional challenges to profiler design and implementation. In this paper we discuss a ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Existing profilers for Java applications typically rely on custom instrumentation in the Java virtual machine, and measure only limited types of resource consumption. Garbage collection and multi-threading pose additional challenges to profiler design and implementation. In this paper we discuss a general-purpose, portable, and extensible approach for obtaining comprehensive profiling information from the Java virtual machine. Profilers based on this framework can uncover CPU usage hot spots, heavy memory allocation sites, unnecessary object retention, contended monitors, and thread deadlocks. In addition, we discuss a novel algorithm for thread-aware statistical CPU time profiling, a heap profiling technique independent of the garbage collection implementation, and support for interactive profiling with minimum overhead.
Differential Profiling
- IN MASCOTS'95
, 2002
"... Performance can be a critical aspect of software quality; in some systems, poor performance can cause financial loss, physical damage, or even death. In such cases, it is imperative to identify system performance problems before deployment, preferably well before implementation. Unfortunately, ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Performance can be a critical aspect of software quality; in some systems, poor performance can cause financial loss, physical damage, or even death. In such cases, it is imperative to identify system performance problems before deployment, preferably well before implementation. Unfortunately,
Finding Bottlenecks In Large Scale Parallel Programs
, 1994
"... This thesis addresses the problem of trying to locate the source of performance bottlenecks in large-scale parallel and distributed applications. Performance monitoring creates a dilemma: identifying a bottleneck necessitates collecting detailed information, yet collecting all this data can introduc ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This thesis addresses the problem of trying to locate the source of performance bottlenecks in large-scale parallel and distributed applications. Performance monitoring creates a dilemma: identifying a bottleneck necessitates collecting detailed information, yet collecting all this data can introduce serious data collection bottlenecks. At the same time, users are being inundated with volumes of complex graphs and tables that require a performance expert to interpret. I have developed a new approach that addresses both these problems by combining dynamic on-the-fly selection of what performance data to collect with decision support to assist users with the selection and presentation of performance data. The approach is called the W 3 Search Model. To make it possible to implement the W 3 Search Model, I have developed a new monitoring technique for parallel programs called Dynamic Instrumentation. The premise of my work is that not only is it possible to do on-line performance debu...
Profiling I/O Interrupts in Modern Architectures
- Proc. 8th Int’l Symp. Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2000), IEEE CS Press, Los Alamitos, Calif
, 2000
"... As applications grow increasingly communication-oriented, interrupt performance quickly becomes a crucial component of high performance I/O system design. At the same time, accurately measuring interrupt handler per-formance is difficult with the traditional simulation, instrumentation, or statistic ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
As applications grow increasingly communication-oriented, interrupt performance quickly becomes a crucial component of high performance I/O system design. At the same time, accurately measuring interrupt handler per-formance is difficult with the traditional simulation, instrumentation, or statistical sampling approaches. One of the most important components of interrupt performance is cache behavior. This paper presents a por-table method for measuring the cache effects of I/O interrupt handling using native hardware performance counters. To provide a portability stress test, the method is demonstrated on two commercial platforms with dif-ferent architectures, the SGI Origin 200 and the Sun Ultra-1. This case study uses the methodology to measure the overhead of the two most common forms of interrupt traffic: disk and network interrupts. The study demon-strates that the method works well and is reasonably robust. In addition, the results show that disk interrupts be-have similar on both platforms, while differences in OS organization cause network interrupts to behave very differently. Furthermore, network interrupts exhibit significantly larger cache footprints. 1 1
Paul E. McKenney
- In MASCOTS'95
, 2002
"... Performance can be a critical aspect of software quality; in some systems, poor performance can cause financial loss, physical damage, or even death. In such cases, it is imperative to identify system performance problems before deployment, preferably well before implementation. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Performance can be a critical aspect of software quality; in some systems, poor performance can cause financial loss, physical damage, or even death. In such cases, it is imperative to identify system performance problems before deployment, preferably well before implementation.
A Unified, Low-overhead Framework to Support Continuous Profiling and Optimization
"... We propose a unified, low-overhead framework (ULF) to support continuous system profiling and optimization based on a specifically designed embedded board. Instead of building a new profiling tool from scratch, ULF provides a unified interface to integrate various existing profiling tools and optimi ..."
Abstract
- Add to MetaCart
We propose a unified, low-overhead framework (ULF) to support continuous system profiling and optimization based on a specifically designed embedded board. Instead of building a new profiling tool from scratch, ULF provides a unified interface to integrate various existing profiling tools and optimizers, and helps to easily build future tools. ULF uses an embedded processor to offload tasks of post-processing profiling data, which reduces system overhead caused by profiling tools and makes ULF especially suitable for continuous profiling on production systems. By processing the profiling data in parallel and providing feedback promptly, ULF supports on-line optimization. Our case study on I/O profiling demonstrated that ULFenhanced profiling tool dramatically reduces overhead making continuous profiling on production systems feasible. Key words: Embedded system, continuous profiling, on-line optimization, performance evaluation 1
Stack Analysis of x86 Executables ⋆
"... Abstract. Binary rewriting is becoming increasingly popular for a variety of low-level code manipulation purposes. One of the difficulties encountered in this context is that machine-language programs typically have much less semantic information compared to source code, which makes it harder to rea ..."
Abstract
- Add to MetaCart
Abstract. Binary rewriting is becoming increasingly popular for a variety of low-level code manipulation purposes. One of the difficulties encountered in this context is that machine-language programs typically have much less semantic information compared to source code, which makes it harder to reason about the program’s runtime behavior. This problem is especially acute in the widely used Intel x86 architecture, where the paucity of registers often makes it necessary to store values on the runtime stack. The use of memory in this manner affects many analyses and optimizations because of the possibility of indirect memory references, which are difficult to reason about. This paper describes a simple analysis of some basic aspects of the way in which programs manipulate the runtime stack. The information so obtained can be very helpful in enhancing and improving a variety of other dataflow analyses that reason about and manipulate values stored on the runtime stack. Experiments indicate that the analyses are efficient and useful for improving optimizations that need to reason about the runtime stack. 1
Source Code Instrumentation and its Perturbation Analysis in Pentium II
"... Microprocessors typically have software readable counters for events such as instructions executed, cycles, instruction stalls, and cache misses. Besides their usefulness to count overall performance metrics, these counters can be used to reveal details about dynamic process behavior and hardware ef ..."
Abstract
- Add to MetaCart
Microprocessors typically have software readable counters for events such as instructions executed, cycles, instruction stalls, and cache misses. Besides their usefulness to count overall performance metrics, these counters can be used to reveal details about dynamic process behavior and hardware effects of compiler optimizations, but interference between probe code and code under test makes it difficult to interpret the counts accurately for pure application code. In this paper the problems to reduce instrumentation perturbations, to model and correct for them and then quantify the remaining uncertainties are addressed for the Pentium II. After having eliminated most operating system and interrupt interference by embedding benchmarks into Linux kernel modules, we introduce the "null instrumentation" strategy in which code under test is displaced but not removed. We report as remaining uncertainty levels the differences between results of "full" (detailed) instrumentation and 1 result...

