Results 1 -
8 of
8
METRIC: Tracking Down Inefficiencies in the Memory Hierarchy via Binary Rewriting
, 2003
"... In this paper, we present METRIC, an environment for determining memory inefficiencies by examining data traces. METRIC is designed to alter the performance behavior of applications that are mostly constrained by their latency to resolve memory references. We make four primary contributions in this ..."
Abstract
-
Cited by 22 (12 self)
- Add to MetaCart
In this paper, we present METRIC, an environment for determining memory inefficiencies by examining data traces. METRIC is designed to alter the performance behavior of applications that are mostly constrained by their latency to resolve memory references. We make four primary contributions in this paper. First, we present methods to extract partial data traces from running applications by observing their memory behavior via dynamic binary rewriting. Second, we present a methodology to represent partial data traces in constant space for regular references through a novel technique for online compression of reference streams. Third, we employ offline cache simulation to derive indications about memory performance bottlenecks from partial data traces. By exploiting summarized memory metrics, by-reference metrics as well as cache evictor information, we can pin-point the sources of performance problems. Fourth, we demonstrate the ability to derive opportunities for optimizations and assess their benefits in several experiments resulting in up to 40% lower miss ratios.
SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance Abstract
"... Dynamic instrumentation systems have proven to be extremely valuable for program introspection, architectural simulation, and bug detection. Yet a major drawback of modern instrumentation systems is that the instrumented applications often execute several orders of magnitude slower than native appli ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Dynamic instrumentation systems have proven to be extremely valuable for program introspection, architectural simulation, and bug detection. Yet a major drawback of modern instrumentation systems is that the instrumented applications often execute several orders of magnitude slower than native application performance. In this paper, we present a novel approach to dynamic instrumentation where several non-overlapping slices of an application are launched as separate instrumentation threads and executed in parallel in order to approach real-time performance. A direct implementation of our technique in the Pin dynamic instrumentation system results in dramatic speedups for various instrumentation tasks – often resulting in orderof-magnitude performance improvements. Our implementation is available as part of the Pin distribution, which has been downloaded over 10,000 times since its release. 1.
Partial Data Traces: Efficient Generation and Representation
, 2001
"... Binary manipulation techniques are increasing in popularity. They support program transformations tailored toward certain program inputs, and these transformations have been shown to yield performance gains beyond the scope of static code optimizations without prole-directed feedback. They even deli ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Binary manipulation techniques are increasing in popularity. They support program transformations tailored toward certain program inputs, and these transformations have been shown to yield performance gains beyond the scope of static code optimizations without prole-directed feedback. They even deliver moderate gains in the presence of prole-guided optimizations. In addition, transformations can be performed on the entire executable, including library routines. This work focuses on program instrumentation, yet another application of binary manipulation. This paper reports preliminary results on generating partial data traces through dynamic binary rewriting. The contributions are threefold. First, a portable method for extracting precise data traces for partial executions of arbitrary applications is developed. Second, a set of hierarchical structures for compactly representing these accesses is developed. Third, an e- cient online algorithm to detect regular accesses is introduced. These eorts are part of a larger project to counter the increasing gap between processor and main memory speeds by means of software optimization and hardware enhancements. 1.
Metric: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
- ACM Transactions on Programming Languages and Systems
, 2007
"... With the diverging improvements in CPU speeds and memory access latencies, detecting and removing memory access bottlenecks becomes increasingly important. In this work we present METRIC, a software framework for isolating and understanding such bottlenecks using partial access traces. METRIC extrac ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
With the diverging improvements in CPU speeds and memory access latencies, detecting and removing memory access bottlenecks becomes increasingly important. In this work we present METRIC, a software framework for isolating and understanding such bottlenecks using partial access traces. METRIC extracts access traces from executing programs without special compiler or linker support. We make four primary contributions. First, we present a framework for extracting partial access traces based on dynamic binary rewriting of the executing application. Second, we introduce a novel algorithm for compressing these traces. The algorithm generates constant space representations for regular accesses occurring in nested loop structures. Third, we use these traces for offline incremental memory hierarchy simulation. We extract symbolic information from the application executable and use this to generate detailed source-code correlated statistics including per-reference metrics, cache evictor information and stream metrics. Finally, we demonstrate how this information can be used to isolate and understand memory access inefficiencies. This illustrates a potential advantage of METRIC over compile-time analysis for sample codes, particularly when interprocedural analysis is required. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors—compilers; optimization;
Code Cache Management in Dynamic Optimization Systems
, 2004
"... Dynamic optimization systems store optimized or translated code in software-managed code caches in order to maximize reuse of transformed code. Code caches store superblocks that are not fixed in size, may contain links to other superblocks, and carry a high replacement overhead. These additional co ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Dynamic optimization systems store optimized or translated code in software-managed code caches in order to maximize reuse of transformed code. Code caches store superblocks that are not fixed in size, may contain links to other superblocks, and carry a high replacement overhead. These additional constraints reduce the effectiveness of conventional cache management policies. This dissertation investigates the code cache management problem in dynamic optimization systems and presents three major advances that cover the design space of cache management decisions. Through code cache simulations, we show that a FIFO replacement policy outperforms other traditional policies, as it enables contiguous cache evictions, allows for a simple circular buffer implementation, and results in comparable cache miss rates to LRU. Furthermore, a pseudo-circular FIFO algorithm is presented, which handles the problem of un-deletable cache blocks. An investigation of cache eviction granularities also reveals that evicting more than the minimum number of superblocks from the code cache at a time results in
Atome- Binary Translation for Accurate Simulation
"... An in-depth study of the field of Virtual Machine optimization is presented. It covers dynamic binary translation, simulation accuracy, and intermediate code representations. It is aimed at readers that have a good knowledge of computer architecture and wish to learn about creating virtual ..."
Abstract
- Add to MetaCart
An in-depth study of the field of Virtual Machine optimization is presented. It covers dynamic binary translation, simulation accuracy, and intermediate code representations. It is aimed at readers that have a good knowledge of computer architecture and wish to learn about creating virtual
ISAMAP: Instruction Mapping Driven by Dynamic Binary Translation
"... Abstract—Dynamic Binary Translation (DBT) techniques have been largely used in the migration of legacy code and in the transparent execution of programs across different architectures. They have also been used in dynamic optimizing compilers, to collect runtime information so as to improve code qual ..."
Abstract
- Add to MetaCart
Abstract—Dynamic Binary Translation (DBT) techniques have been largely used in the migration of legacy code and in the transparent execution of programs across different architectures. They have also been used in dynamic optimizing compilers, to collect runtime information so as to improve code quality. In many cases, DBT translation mechanism misses important lowlevel mapping opportunities available at the source/target ISAs. Hot code performance has been shown to be central to the overall program performance, as different instruction mappings can account for high performance gains. Hence, DBT techniques that provide efficient instruction mapping at the ISA level has the potential to considerably improve performance. This paper proposes ISAMAP, a flexible instruction mapping driven by dynamic binary translation. Its mapping mechanism, provides a fast translation between ISAs, under an easy-to-use description. At its current state, ISAMAP is capable of translating 32-bit PowerPC code to 32-bit x86 and to perform local optimizations on the resulting x86 code. Our experimental results show that ISAMAP is capable of executing PowerPC code on an x86 host faster than the processor emulator QEMU, achieving speedups of up to 3.16x for SPEC CPU2000 programs. I.
Author manuscript, published in "AMAS-BT- 3rd Workshop on Architectural and Microarchitectural Support for Binary Translation (2010)" ISAMAP: Instruction Mapping Driven by Dynamic Binary Translation
, 2010
"... Abstract—Dynamic Binary Translation (DBT) techniques have been largely used in the migration of legacy code and in the transparent execution of programs across different architectures. They have also been used in dynamic optimizing compilers, to collect runtime information so as to improve code qual ..."
Abstract
- Add to MetaCart
Abstract—Dynamic Binary Translation (DBT) techniques have been largely used in the migration of legacy code and in the transparent execution of programs across different architectures. They have also been used in dynamic optimizing compilers, to collect runtime information so as to improve code quality. In many cases, DBT translation mechanism misses important lowlevel mapping opportunities available at the source/target ISAs. Hot code performance has been shown to be central to the overall program performance, as different instruction mappings can account for high performance gains. Hence, DBT techniques that provide efficient instruction mapping at the ISA level has the potential to considerably improve performance. This paper proposes ISAMAP, a flexible instruction mapping driven by dynamic binary translation. Its mapping mechanism, provides a fast translation between ISAs, under an easy-to-use description. At its current state, ISAMAP is capable of translating 32-bit PowerPC code to 32-bit x86 and to perform local optimizations on the resulting x86 code. Our experimental results show that ISAMAP is capable of executing PowerPC code on an x86 host faster than the processor emulator QEMU, achieving speedups of up to 3.16x for SPEC CPU2000 programs. I.

