Results 1 - 10
of
184
Pin: building customized program analysis tools with dynamic instrumentation
- In PLDI ’05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
, 2005
"... Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and eff ..."
Abstract
-
Cited by 416 (20 self)
- Add to MetaCart
Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and efficient instrumentation. Instrumentation tools (called Pintools) are written in C/C++ using Pin’s rich API. Pin follows the model of ATOM, allowing the tool writer to analyze an application at the instruction level without the need for detailed knowledge of the underlying instruction set. The API is designed to be architecture independent whenever possible, making Pintools source compatible across different architectures. However, a Pintool can access architecture-specific details when necessary. Instrumentation with Pin is mostly transparent as the application and Pintool observe the application’s original, uninstrumented behavior. Pin uses dynamic compilation to instrument executables while they are running. For efficiency, Pin uses several techniques, including inlining, register re-allocation, liveness analysis, and instruction scheduling to optimize instrumentation. This fully automated approach delivers significantly better instrumentation performance than similar tools. For example, Pin is 3.3x faster than Valgrind and 2x faster than DynamoRIO for basic-block counting. To illustrate Pin’s versatility, we describe two Pintools in daily use to analyze production software. Pin is publicly available for Linux platforms on four architectures: IA32 (32-bit x86), EM64T (64-bit x86), Itanium R ○ , and ARM. In the ten months since Pin 2 was released in July 2004, there have been over 3000 downloads from its website. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging-code inspections and walk-throughs,
Exploiting hardware performance counters with flow and context sensitive profiling
- ACM Sigplan Notices
, 1997
"... A program pro le attributes run-time costs to portions of a program's execution. Most pro ling systems su er from two major de ciencies: rst, they only apportion simple metrics, such as execution frequency or elapsed time to static, syntactic units, such as procedures or statements; second, they agg ..."
Abstract
-
Cited by 189 (9 self)
- Add to MetaCart
A program pro le attributes run-time costs to portions of a program's execution. Most pro ling systems su er from two major de ciencies: rst, they only apportion simple metrics, such as execution frequency or elapsed time to static, syntactic units, such as procedures or statements; second, they aggressively reduce the volume of information collected and reported, although aggregation can hide striking di erences in program behavior. This paper addresses both concerns by exploiting the hardware counters available in most modern processors and by incorporating two concepts from data ow analysis { ow and context sensitivity{to report more context for measurements. This paper extends our previous work on e cient path pro ling to ow sensitive pro ling, which associates hardware performance metrics with a path through a procedure. In addition, it describes a data structure, the calling context tree, that e ciently captures calling contexts for procedure-level measurements. Our measurements show that the SPEC95 benchmarks execute a small number (3{28) of hot paths that account for 9{98 % of their L1 data cache misses. Moreover, these hot paths are concentrated in a few routines, which have complex dynamic behavior. 1
Fine-grained dynamic instrumentation of commodity operating system kernels
, 1999
"... We have developed a technology, fine-grained dynamic instrumentation of commodity kernels, which can splice (insert) dynamically generated code before almost any machine code instruction of a completely unmodified running commodity operating system kernel. This technology is well-suited to performan ..."
Abstract
-
Cited by 107 (5 self)
- Add to MetaCart
We have developed a technology, fine-grained dynamic instrumentation of commodity kernels, which can splice (insert) dynamically generated code before almost any machine code instruction of a completely unmodified running commodity operating system kernel. This technology is well-suited to performance profiling, debugging, code coverage, security auditing, runtime code optimizations, and kernel extensions. We have designed and implemented a tool called KernInst that performs dynamic instrumentation on a stock production Solaris kernel running on an UltraSPARC. On top of KernInst, we have implemented a kernel performance profiling tool, and used it to understand kernel and application performance under a Web proxy server workload. We used this information to make two changes (one to the kernel, one to the proxy) that cumulatively reduce the percentage of elapsed time that the proxy spends opening disk cache files from 40 % to 7%. 1
Putting Pointer Analysis to Work
, 1998
"... This paper addresses the problem of how to apply pointer analysis to a wide variety of compiler applications. We are not presenting a new pointer analysis. Rather, we focus on putting two existing pointer analyses, points-to analysis and connection analysis, to work. We demonstrate that the fundamen ..."
Abstract
-
Cited by 91 (8 self)
- Add to MetaCart
This paper addresses the problem of how to apply pointer analysis to a wide variety of compiler applications. We are not presenting a new pointer analysis. Rather, we focus on putting two existing pointer analyses, points-to analysis and connection analysis, to work. We demonstrate that the fundamental problem is that one must be able to compare the memory locations read/written via pointer indirections, at different program points, and one must also be able to summarize the effect of pointer references over regions in the program. It is straightforward to compute read/write sets for indirections involving stack-directed pointers using points-to information. However, for heap-directed pointers we show that one needs to introduce the notion of anchor handles into the connection analysis and then express read/write sets to the heap with respect to these anchor handles. Based on the read/write sets we show how to extend traditional optimizations like common subexpression elimination, loop...
Performance Analysis Using the MIPS R10000 Performance Counters
, 1996
"... : Tuning supercomputer application performance often requires analyzing the interaction of the application and the underlying architecture. In this paper, we describe support in the MIPS R10000 for non-intrusively monitoring a variety of processor events -- support that is particularly useful for c ..."
Abstract
-
Cited by 91 (0 self)
- Add to MetaCart
: Tuning supercomputer application performance often requires analyzing the interaction of the application and the underlying architecture. In this paper, we describe support in the MIPS R10000 for non-intrusively monitoring a variety of processor events -- support that is particularly useful for characterizing the dynamic behavior of multi-level memory hierarchies, hardware-based cache coherence, and speculative execution. We first explain how performance data is collected using an integrated set of hardware mechanisms, operating system abstractions, and performance tools. We then describe several examples drawn from scientific applications, which illustrate how the counters and profiling tools provide information that helps developers analyze and tune applications. Keywords: performance analysis, profiling tools, hardware performance counters, MIPS R10000, SGI Power Challenge 1. Introduction A fundamental question asked by HPC application developers is: "Where is the time spent?"...
Analyzing Memory Accesses in x86 Executables
- In CC
, 2004
"... This paper concerns static-analysis algorithms for analyzing x86 executables. ..."
Abstract
-
Cited by 90 (22 self)
- Add to MetaCart
This paper concerns static-analysis algorithms for analyzing x86 executables.
The use of program profiling for software maintenance with applications to the year 2000 problem
- ACM Software Engineering Notes
, 1997
"... This paper describes new techniques to help with testing and debugging, using information obtained from path profiling. Apath profiler instruments a program so that the number of times each different loopfree path executes is accumulated during an execution run. With such an instrumented program, ea ..."
Abstract
-
Cited by 86 (5 self)
- Add to MetaCart
This paper describes new techniques to help with testing and debugging, using information obtained from path profiling. Apath profiler instruments a program so that the number of times each different loopfree path executes is accumulated during an execution run. With such an instrumented program, each run of the program generates a path spectrum for the execution—a distribution of the paths that were executed during that run. Apath spectrum is a finite, easily obtainable characterization of a program’s execution on adataset, and provides a behavior signature for a run of the program. Our techniques are based on the idea of comparing path spectra from different runs of the program. When different runs produce different spectra, the spectral differences can be used to identify paths in the program along which control diverges in the two runs. By choosing input datasets to hold all factors constant except one, the divergence can be attributed to this factor. The point of divergence itself may not be the cause of the underlying problem, but provides a starting place for a programmer to begin his exploration. One application of this technique is in the “Year 2000 Problem ” (i.e., the problem of fixing computer systems that use only 2-digit year fields in date-valued data). In this context, path-spectrum comparison provides a heuristic for identifying paths in a program that are good candidates for being date-dependent computations. The application of path-spectrum comparison to a number of other software-maintenance issues is also discussed.
Cross-Architecture Performance Predictions for Scientific Applications Using Parameterized Models
, 2004
"... This paper describes a toolkit for semi-automatically measuring and modeling static and dynamic characteristics of applications in an architecture-neutral fashion. For predictable applications, models of dynamic characteristics have a convex and diferentiable profile. Our toolkit operates on applica ..."
Abstract
-
Cited by 74 (2 self)
- Add to MetaCart
This paper describes a toolkit for semi-automatically measuring and modeling static and dynamic characteristics of applications in an architecture-neutral fashion. For predictable applications, models of dynamic characteristics have a convex and diferentiable profile. Our toolkit operates on application binaries and succeeds in modeling key application characteristics that determine program performance. We use these characterizations to explore the interactions between an application and a target architecture. We apply our toolkit to SPARC binaries to develop architectureneutral models of computation and memory access patterns of the ASCI Sweep3D and the NAS SP, BT and LU benchmarks. From our models, we predict the L1, L2 and TLB cache miss counts as well as the overall execution time of these applications on an Origin 2000 system. We evaluate our predictions by comparing them against measurements collected using hardware performance counters.
Encoding program executions
- In ICSE
, 2001
"... Dynamic analysis is based on collecting data as the program runs. However, raw traces tend to be too voluminous and too unstructured to be used directly for visualization and understanding. We address this problem in two phases: the first phase selects subsets of the data and then compacts it, while ..."
Abstract
-
Cited by 69 (1 self)
- Add to MetaCart
Dynamic analysis is based on collecting data as the program runs. However, raw traces tend to be too voluminous and too unstructured to be used directly for visualization and understanding. We address this problem in two phases: the first phase selects subsets of the data and then compacts it, while the second phase encodes the data in an attempt to infer its structure. Our major compaction/selection techniques include gprof-style N-depth call sequences, selection based on class, compaction based on time intervals, and encoding the whole execution as a directed acyclic graph. Our structure inference techniques include run-length encoding, contextfree grammar encoding, and the building of finite state automata.

