Results 1 - 10
of
68
Pin: building customized program analysis tools with dynamic instrumentation
- In PLDI ’05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
, 2005
"... Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and eff ..."
Abstract
-
Cited by 416 (20 self)
- Add to MetaCart
Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and efficient instrumentation. Instrumentation tools (called Pintools) are written in C/C++ using Pin’s rich API. Pin follows the model of ATOM, allowing the tool writer to analyze an application at the instruction level without the need for detailed knowledge of the underlying instruction set. The API is designed to be architecture independent whenever possible, making Pintools source compatible across different architectures. However, a Pintool can access architecture-specific details when necessary. Instrumentation with Pin is mostly transparent as the application and Pintool observe the application’s original, uninstrumented behavior. Pin uses dynamic compilation to instrument executables while they are running. For efficiency, Pin uses several techniques, including inlining, register re-allocation, liveness analysis, and instruction scheduling to optimize instrumentation. This fully automated approach delivers significantly better instrumentation performance than similar tools. For example, Pin is 3.3x faster than Valgrind and 2x faster than DynamoRIO for basic-block counting. To illustrate Pin’s versatility, we describe two Pintools in daily use to analyze production software. Pin is publicly available for Linux platforms on four architectures: IA32 (32-bit x86), EM64T (64-bit x86), Itanium R ○ , and ARM. In the ten months since Pin 2 was released in July 2004, there have been over 3000 downloads from its website. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging-code inspections and walk-throughs,
An Infrastructure for Adaptive Dynamic Optimization
, 2003
"... Dynamic optimization is emerging as a promising approach to overcome many of the obstacles of traditional static compilation. But while there are a number of compiler infrastructures for developing static optimizations, there are very few for developing dynamic optimizations. We present a framework ..."
Abstract
-
Cited by 130 (5 self)
- Add to MetaCart
Dynamic optimization is emerging as a promising approach to overcome many of the obstacles of traditional static compilation. But while there are a number of compiler infrastructures for developing static optimizations, there are very few for developing dynamic optimizations. We present a framework for implementing dynamic analyses and optimizations. We provide an interface for building external modules, or clients, for the DynamoRIO dynamic code modification system. This interface abstracts away many low-level details of the DynamoRIO runtime system while exposing a simple and powerful, yet efficient and lightweight, API. This is achieved by restricting optimization units to linear streams of code and using adaptive levels of detail for representing instructions. The interface is not restricted to optimization and can be used for instrumentation, profiling, dynamic translation, etc.. To demonstrate
Dynamic hot data stream prefetching for general-purpose programs
- InACM SIGPLANConference on Programming Language Designand Implementation
, 2002
"... Prefetching data ahead of use has the potential to tolerate the growing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisticated prefetching techniques have been automated for limited domains, such as scientific codes that access dense ..."
Abstract
-
Cited by 87 (1 self)
- Add to MetaCart
Prefetching data ahead of use has the potential to tolerate the growing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisticated prefetching techniques have been automated for limited domains, such as scientific codes that access dense arrays in loop nests, a similar level of success has eluded general-purpose programs, especially pointer-chasing codes written in languages such as C and C++. We address this problem by describing, implementing and evaluating a dynamic prefetching scheme. Our technique runs on stock hardware, is completely automatic, and works for generalpurpose programs, including pointer-chasing codes written in weakly-typed languages, such as C and C++. It operates in three phases. First, the profiling phase gathers a temporal data reference profile from a running program with low-overhead. Next, the profiling is turned off and a fast analysis algorithm extracts hot data streams, which are data reference sequences that frequently repeat in the same order, from the temporal profile. Then, the system dynamically injects code at appropriate program points to detect and prefetch these hot data streams. Finally, the process enters the hibernation phase where no profiling or analysis is performed, and the program continues to execute with the added prefetch instructions. At the end of the hibernation phase, the program is deoptimized to remove the inserted checks and prefetch instructions, and control returns to the profiling phase. For long-running programs, this profile, analyze and optimize, hibernate, cycle will repeat multiple times. Our initial results from applying dynamic prefetching are promising, indicating overall execution time improvements of 5–19 % for several memory-performance-limited SPECint2000 benchmarks running their largest (ref) inputs.
Perracotta: mining temporal API rules from imperfect traces
- Ohio University
, 2006
"... Dynamic inference techniques have been demonstrated to provide useful support for various software engineering tasks including bug finding, test suite evaluation and improvement, and specification generation. To date, however, dynamic inference has only been used effectively on small programs under ..."
Abstract
-
Cited by 85 (2 self)
- Add to MetaCart
Dynamic inference techniques have been demonstrated to provide useful support for various software engineering tasks including bug finding, test suite evaluation and improvement, and specification generation. To date, however, dynamic inference has only been used effectively on small programs under controlled conditions. In this paper, we identify reasons why scaling dynamic inference techniques has proven difficult, and introduce solutions that enable a dynamic inference technique to scale to large programs and work effectively with the imperfect traces typically available in industrial scenarios. We describe our approximate inference algorithm, present and evaluate heuristics for winnowing the large number of inferred properties to a manageable set of interesting properties, and report on experiments using inferred properties. We evaluate our techniques on JBoss and the Windows kernel. Our tool is able to infer many of the properties checked by the Static Driver Verifier and leads us to discover a previously unknown bug in Windows.
Low-overhead memory leak detection using adaptive statistical profiling
- In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2004
"... Sampling has been successfully used to identify performance optimization opportunities. We would like to apply similar techniques to check program correctness. Unfortunately, sampling provides poor coverage of infrequently executed code, where bugs often lurk. We describe an adaptive profiling schem ..."
Abstract
-
Cited by 77 (1 self)
- Add to MetaCart
Sampling has been successfully used to identify performance optimization opportunities. We would like to apply similar techniques to check program correctness. Unfortunately, sampling provides poor coverage of infrequently executed code, where bugs often lurk. We describe an adaptive profiling scheme that addresses this by sampling executions of code segments at a rate inversely proportional to their execution frequency. To validate our ideas, we have implemented SWAT, a novel memory leak detection tool. SWAT traces program allocations/ frees to construct a heap model and uses our adaptive profiling infrastructure to monitor loads/stores to these objects with low overhead. SWAT reports ‘stale ’ objects that have not been accessed for a ‘long ’ time as leaks. This allows it to find all leaks that manifest during the current program execution. Since SWAT has low runtime overhead (< 5%), and low space overhead (< 10 % in most cases and often less than 5%), it can be used to track leaks in production code that take days to manifest. In addition to identifying the allocations that leak memory, SWAT exposes where the program last accessed the leaked data, which facilitates debugging and fixing the leak. SWAT has been used by several product groups at Microsoft for the past 18 months and has proved effective at detecting leaks with a low false positive rate (<10%).
Effectively prioritizing tests in development environment
- In Proceedings of the International Symposium on Software Testing and Analysis
, 2002
"... Software testing helps ensure not only that the software under development has been implemented correctly, but also that further development does not break it. If developers introduce new defects into the software, these should be detected as early and inexpensively as possible in the development cy ..."
Abstract
-
Cited by 55 (0 self)
- Add to MetaCart
Software testing helps ensure not only that the software under development has been implemented correctly, but also that further development does not break it. If developers introduce new defects into the software, these should be detected as early and inexpensively as possible in the development cycle. To help optimize which tests are run at what points in the design cycle, we have built Echelon, a test prioritization system, which prioritizes the application’s given set of tests, based on what changes have been made to the program. Echelon builds on the previous work on test prioritization and proposes a practical binary code based approach that scales well to large systems. Echelon utilizes a binary matching system that can accurately compute the differences at a basic block granularity between two versions of the program in binary form. Echelon utilizes a fast, simple and intuitive heuristic that works well in practice to compute what tests will cover the affected basic blocks in the program. Echelon orders the given tests to maximally cover the affected program so that defects are likely to be found quickly and inexpensively. Although the primary focus in Echelon is on program changes, other criteria can be added in computing the priorities. Echelon is part of a test effectiveness infrastructure that runs under the Windows environment. It is currently being integrated into the Microsoft software development process. Echelon has been tested on large Microsoft product binaries. The results show that Echelon is effective in ordering tests based on changes between two program versions.
Whole program path-based dynamic impact analysis
- In Proceedings of the International Conference on Software Engineering
, 2003
"... Impact analysis, determining when a change in one part of a program affects other parts of the program, is timeconsuming and problematic. Impact analysis is rarely used to predict the effects of a change, leaving maintainers to deal with consequences rather than working to a plan. Previous approache ..."
Abstract
-
Cited by 47 (3 self)
- Add to MetaCart
Impact analysis, determining when a change in one part of a program affects other parts of the program, is timeconsuming and problematic. Impact analysis is rarely used to predict the effects of a change, leaving maintainers to deal with consequences rather than working to a plan. Previous approaches to impact analysis involving analysis of call graphs, and static and dynamic slicing, exhibit several tradeoffs involving computational expense, precision, and safety, require access to source code, and require a relatively large amount of effort to re-apply as software evolves. This paper presents a new technique for impact analysis based on whole path profiling, that provides a different set of cost-benefits tradeoffs – a set which can potentially be beneficial for an important class of predictive impact analysis tasks. The paper presents the results of experiments that show that the technique can predict impact sets that are more accurate than those computed by call graph analysis, and more precise (relative to the behavior expressed in a program’s profile) than those computed by static slicing. 1.
Bursty Tracing: A Framework for Low-Overhead Temporal Profiling
- In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization
, 2001
"... With processor speed increasing much more rapidly than memory access speed, memory system optimizations have the potential to significantly improve program performance. Unfortunately, cache-level optimizations often require detailed temporal information about a program's references to be effective. ..."
Abstract
-
Cited by 46 (9 self)
- Add to MetaCart
With processor speed increasing much more rapidly than memory access speed, memory system optimizations have the potential to significantly improve program performance. Unfortunately, cache-level optimizations often require detailed temporal information about a program's references to be effective. Traditional techniques for obtaining this information are too expensive to be practical in an on-line setting. We address this problem by describing and evaluating a framework for low-overhead temporal profiling. Our framework extends the Arnold-Ryder framework that uses instrumentation and counter-based sampling to collect frequency profiles with low overhead. Our framework samples bursts (sub-sequences) of the trace of all runtime events to construct a temporal program profile. Our bursty tracing profiler is built using Vulcan, an executable-editing tool for x86, and we evaluate it on optimized x86 binaries. Like the Arnold-Ryder framework, we have the advantages of not requiring operating system or hardware support and being deterministic. Unlike them, we are not limited to capturing temporal relationships on intraprocedural acyclic paths since our trace bursts can span procedure boundaries. In addition, our framework does not require access to program source or recompilation. A direct implementation of our extensions to the Arnold-Ryder framework results in profiling overhead of 6-35%. We describe techniques that reduce this overhead to 3-18%, making it suitable for use in an on-line setting.
Online Feedback-Directed Optimization of Java
, 2002
"... This paper describes the implementation of an online feedback-directed optimization system. The system is fully automatic; it requires no prior... ..."
Abstract
-
Cited by 45 (3 self)
- Add to MetaCart
This paper describes the implementation of an online feedback-directed optimization system. The system is fully automatic; it requires no prior...
Design and Implementation of a Lightweight Dynamic Optimization System
- Journal of Instruction-Level Parallelism
, 2004
"... Many opportunities exist to improve micro-architectural performance due to performance events that are di#cult to optimize at static compile time. Cache misses and branch mis-prediction patterns may vary for di#erent micro-architectures using di#erent inputs. ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
Many opportunities exist to improve micro-architectural performance due to performance events that are di#cult to optimize at static compile time. Cache misses and branch mis-prediction patterns may vary for di#erent micro-architectures using di#erent inputs.

