Results 1 - 10
of
75
Optimally Profiling and Tracing Programs
- ACM Transactions on Programming Languages and Systems
, 1994
"... copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others ..."
Abstract
-
Cited by 256 (17 self)
- Add to MetaCart
copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications
An API for Runtime Code Patching
- The International Journal of High Performance Computing Applications
, 2000
"... We present a post-compiler program manipulation tool called Dyninst which provides a C++ class library for program instrumentation. Using this library, it is possible to instrument and modify application programs during execution. A unique feature of this library is that it permits machine-independe ..."
Abstract
-
Cited by 194 (10 self)
- Add to MetaCart
We present a post-compiler program manipulation tool called Dyninst which provides a C++ class library for program instrumentation. Using this library, it is possible to instrument and modify application programs during execution. A unique feature of this library is that it permits machine-independent binary instrumentation programs to be written. We describe the interface that a tool sees when using this library. We also discuss three simple tools built using this interface: a utility to count the number of times a function is called, a program to capture the output of an already running program to a file, and an implementation of conditional break points. For the conditional breakpoint example, we show that by using our interface compared with gdb we are able to execute a program with conditional breakpoints up to 900 times faster. 1. Introduction The normal cycle of developing a program is to edit source code, compile it, and then execute the resulting binary. However, sometimes t...
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
, 1994
"... Mint is a software package designed to ease the process of constructing event-driven memory hierarchy simulators for multiprocessors. It provides a set of simulated processors that run standard Unix executable files compiled for a MIPS R3000 based multiprocessor. These generate multiple streams of m ..."
Abstract
-
Cited by 170 (1 self)
- Add to MetaCart
Mint is a software package designed to ease the process of constructing event-driven memory hierarchy simulators for multiprocessors. It provides a set of simulated processors that run standard Unix executable files compiled for a MIPS R3000 based multiprocessor. These generate multiple streams of memory reference events that drive a user-provided memory system simulator. Mint uses a novel hybrid technique that exploits the best aspects of native execution and software interpretation to minimize the overhead of processor simulation. Combined with related techniques to improve performance, this approach makes simulation on uniprocessor hosts extremely efficient. 1 Introduction The simulation of high-performance computer systems is computationally expensive. Each unit of simulated processor execution time requires many units of simulator time. If the target is a parallel computer, simulator time necessarily grows in proportion to the total amount of work done in the simulated parallel ...
Trace-Driven Memory Simulation: A Survey
- ACM Computing Surveys
, 2004
"... This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show t ..."
Abstract
-
Cited by 134 (0 self)
- Add to MetaCart
This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered. In a concluding section, we examine fundamental limitations to trace-driven simulation, and survey some recent developments in memory simulation that may overcome these bottlenecks
Rewriting Executable Files to Measure Program Behavior
- SOFTWARE PRACTICE & EXPERIENCE
, 1994
"... ..."
Efficient Representation and Abstractions for Quantifying and Exploiting Data Reference Locality
, 2001
"... With the growing processor-memory performance gap, understanding and optimizing a program's reference locality, and consequently, its cache performance, is becoming increasingly important. Unfortunately, current reference locality optimizations rely on heuristics and are fairly ad-hoc. In additi ..."
Abstract
-
Cited by 77 (4 self)
- Add to MetaCart
With the growing processor-memory performance gap, understanding and optimizing a program's reference locality, and consequently, its cache performance, is becoming increasingly important. Unfortunately, current reference locality optimizations rely on heuristics and are fairly ad-hoc. In addition, while optimization technology for improving instruction cache performance is fairly mature (though heuristic-based), data cache optimizations are still at an early stage. We believe the primary reason for this imbalance is the lack of a suitable representation of a program's dynamic data reference behavior and a quantitative basis for understanding this behavior. We address these issues by proposing a quantitative basis for understanding and optimizing reference locality, and by describing efficient data reference representations and an exploitable locality abstraction that support this framework. Our data reference representations (Whole Program Streams and Stream Flow Graphs) are compact---two to four orders of magnitude smaller than the program's data reference trace---and permit efficient analysis---on the order of seconds to a few minutes---even for complex applications. These representations can be used to efficiently compute our exploitable locality abstraction (hot data streams). We demonstrate that these representations and our hot data stream abstraction are useful for quantifying and exploiting data reference locality. We applied our framework to several SPECint 2000 benchmarks, a graphics program, and a commercial Microsoft database application. The results suggest significant opportunity for hot data stream-based locality optimizations. 1.
The Measured Cost of Conservative Garbage Collection
- Software Practice and Experience
, 1993
"... this paper, I evaluate the costs of different dynamic storage management algorithms, including domain-specific allocators, widelyused general-purpose allocators, and a publicly available conservative garbage collection algorithm. Surprisingly, I find that programmer enhancements often have little ef ..."
Abstract
-
Cited by 76 (6 self)
- Add to MetaCart
this paper, I evaluate the costs of different dynamic storage management algorithms, including domain-specific allocators, widelyused general-purpose allocators, and a publicly available conservative garbage collection algorithm. Surprisingly, I find that programmer enhancements often have little effect on program performance. I also find that the true cost of conservative garbage collection is not the CPU overhead, but the memory system overhead of the algorithm. I conclude that conservative garbage collection is a promising alternative to explicit storage management and that the performance of conservative collection is likely to improve in the future. C programmers should now seriously consider using conservative garbage collection instead of explicitly calling free in programs they write
Using Lifetime Predictors to Improve Memory Allocation Performance
, 1993
"... Dynamic storage allocation is used heavily in many application areas including interpreters, simulators, optimizers, and translators. We describe research that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are al ..."
Abstract
-
Cited by 71 (7 self)
- Add to MetaCart
Dynamic storage allocation is used heavily in many application areas including interpreters, simulators, optimizers, and translators. We describe research that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are allocated. Using five significant, allocation-intensive C programs, we show that a great fraction of all bytes allocated are short-lived (? 90% in all cases). Furthermore, we describe an algorithm for lifetime prediction that accurately predicts the lifetimes of 42--99% of all objects allocated. We describe and simulate a storage allocator that takes advantage of lifetime prediction of short-lived objects and show that it can significantly improve a program's memory overhead and reference locality, and even, at times, improve CPU performance as well.
Design tradeoffs for software-managed TLBs
- ACM Transactions on Computer Systems
, 1993
"... An increasing number of architectures provide virtual memory support through software-managed TLBs. However, software management can impose considerable penalties, which are highly dependent on the operating system’s structure and its use of virtual memory. This work explores software-managed TLB de ..."
Abstract
-
Cited by 70 (9 self)
- Add to MetaCart
An increasing number of architectures provide virtual memory support through software-managed TLBs. However, software management can impose considerable penalties, which are highly dependent on the operating system’s structure and its use of virtual memory. This work explores software-managed TLB design tradeoffs and their interaction with a range of operating systems including monolithic and microkernel designs. Through hardware monitoring and simulation, we explore TLB performance for benchmarks running on a MIPS R2000-based workstation running Ultrix, OSF/1, and three versions of Mach 3.0. Results: New operating systems are changing the relative frequency of different types of TLB misses, some of which may not be efficiently handled by current architectures. For the same application binaries, total TLB service time varies by as much as an order of magnitude under different operating systems. Reducing the handling cost for kernel TLB misses reduces total TLB service time up to 40%. For TLBs between 32 and 128 slots, each doubling of the TLB size reduces total TLB service time by as much as
MINT Tutorial and User Manual
, 1994
"... This document describes Mint, a software package designed to ease the process of constructing event-driven memory hierarchy simulators for multiprocessors. It provides a set of simulated processors that run standard Unix executable files compiled for a MIPS R3000 based multiprocessor. These generate ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
This document describes Mint, a software package designed to ease the process of constructing event-driven memory hierarchy simulators for multiprocessors. It provides a set of simulated processors that run standard Unix executable files compiled for a MIPS R3000 based multiprocessor. These generate multiple streams of memory reference events that drive a user-provided memory system simulator. Mint uses a novel hybrid technique that exploits the best aspects of native execution and software interpretation to minimize the overhead of processor simulation. Combined with related techniques to improve performance, this approach makes simulation on uniprocessor hosts extremely efficient. This material is based upon work supported by the National Science Foundation under Grant number CDA-8822724. The U. S. Government has certain rights in this material. Contents 1 Introduction 3 1.1 Trace-driven simulation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Program-driven s...

