Results 1 - 10
of
37
Bug Isolation via Remote Program Sampling
- In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation
, 2003
"... We propose a low-overhead sampling infrastructure for gathering information from the executions experienced by a program 's user community. Several example applications illustrate ways to use sampled instrumentation to isolate bugs. Assertion-dense code can be transformed to share the cost of assert ..."
Abstract
-
Cited by 193 (15 self)
- Add to MetaCart
We propose a low-overhead sampling infrastructure for gathering information from the executions experienced by a program 's user community. Several example applications illustrate ways to use sampled instrumentation to isolate bugs. Assertion-dense code can be transformed to share the cost of assertions among many users. Lacking assertions, broad guesses can be made about predicates that predict program errors and a process of elimination used to whittle these down to the true bug. Finally, even for non-deterministic bugs such as memory corruption, statistical modeling based on logistic regression allows us to identify program behaviors that are strongly correlated with failure and are therefore likely places to look for the error.
Dynamic hot data stream prefetching for general-purpose programs
- InACM SIGPLANConference on Programming Language Designand Implementation
, 2002
"... Prefetching data ahead of use has the potential to tolerate the growing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisticated prefetching techniques have been automated for limited domains, such as scientific codes that access dense ..."
Abstract
-
Cited by 87 (1 self)
- Add to MetaCart
Prefetching data ahead of use has the potential to tolerate the growing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisticated prefetching techniques have been automated for limited domains, such as scientific codes that access dense arrays in loop nests, a similar level of success has eluded general-purpose programs, especially pointer-chasing codes written in languages such as C and C++. We address this problem by describing, implementing and evaluating a dynamic prefetching scheme. Our technique runs on stock hardware, is completely automatic, and works for generalpurpose programs, including pointer-chasing codes written in weakly-typed languages, such as C and C++. It operates in three phases. First, the profiling phase gathers a temporal data reference profile from a running program with low-overhead. Next, the profiling is turned off and a fast analysis algorithm extracts hot data streams, which are data reference sequences that frequently repeat in the same order, from the temporal profile. Then, the system dynamically injects code at appropriate program points to detect and prefetch these hot data streams. Finally, the process enters the hibernation phase where no profiling or analysis is performed, and the program continues to execute with the added prefetch instructions. At the end of the hibernation phase, the program is deoptimized to remove the inserted checks and prefetch instructions, and control returns to the profiling phase. For long-running programs, this profile, analyze and optimize, hibernate, cycle will repeat multiple times. Our initial results from applying dynamic prefetching are promising, indicating overall execution time improvements of 5–19 % for several memory-performance-limited SPECint2000 benchmarks running their largest (ref) inputs.
Low-overhead memory leak detection using adaptive statistical profiling
- In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2004
"... Sampling has been successfully used to identify performance optimization opportunities. We would like to apply similar techniques to check program correctness. Unfortunately, sampling provides poor coverage of infrequently executed code, where bugs often lurk. We describe an adaptive profiling schem ..."
Abstract
-
Cited by 77 (1 self)
- Add to MetaCart
Sampling has been successfully used to identify performance optimization opportunities. We would like to apply similar techniques to check program correctness. Unfortunately, sampling provides poor coverage of infrequently executed code, where bugs often lurk. We describe an adaptive profiling scheme that addresses this by sampling executions of code segments at a rate inversely proportional to their execution frequency. To validate our ideas, we have implemented SWAT, a novel memory leak detection tool. SWAT traces program allocations/ frees to construct a heap model and uses our adaptive profiling infrastructure to monitor loads/stores to these objects with low overhead. SWAT reports ‘stale ’ objects that have not been accessed for a ‘long ’ time as leaks. This allows it to find all leaks that manifest during the current program execution. Since SWAT has low runtime overhead (< 5%), and low space overhead (< 10 % in most cases and often less than 5%), it can be used to track leaks in production code that take days to manifest. In addition to identifying the allocations that leak memory, SWAT exposes where the program last accessed the leaked data, which facilitates debugging and fixing the leak. SWAT has been used by several product groups at Microsoft for the past 18 months and has proved effective at detecting leaks with a low false positive rate (<10%).
Connectivity-Based Garbage Collection
, 2003
"... We introduce a new family of connectivity-based garbage collectors (Cbgc) that are based on potential objectconnectivity properties. The key feature of these collectors is that the placement of objects into partitions is determined by performing one of several forms of connectivity analyses on the p ..."
Abstract
-
Cited by 34 (7 self)
- Add to MetaCart
We introduce a new family of connectivity-based garbage collectors (Cbgc) that are based on potential objectconnectivity properties. The key feature of these collectors is that the placement of objects into partitions is determined by performing one of several forms of connectivity analyses on the program. This enables partial garbage collections, as in generational collectors, but without the need for any write barrier.
A Survey of Adaptive Optimization in Virtual Machines
- PROCEEDINGS OF THE IEEE, 93(2), 2005. SPECIAL ISSUE ON PROGRAM GENERATION, OPTIMIZATION, AND ADAPTATION
, 2004
"... Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimiza ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Virtual machines face significant performance challenges beyond those confronted by traditional static optimizers. First, portable program representations and dynamic language features, such as dynamic class loading, force the deferral of most optimizations until runtime, inducing runtime optimization overhead. Second, modular
Phase-Aware Remote Profiling
- IN CONFERENCE ON CODE GENERATION AND OPTIMIZATION (CGO
, 2005
"... Recent advances in networking and embedded device technology have made the vision of ubiquitous computing a reality; users can access the Internet's vast offerings anytime and anywhere. Moreover, battery-powered devices such as personal digital assistants and web-enabled mobile phones have successfu ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
Recent advances in networking and embedded device technology have made the vision of ubiquitous computing a reality; users can access the Internet's vast offerings anytime and anywhere. Moreover, battery-powered devices such as personal digital assistants and web-enabled mobile phones have successfully emerged as new access points to the world's digital infrastructure. This ubiquity offers a new opportunity for software developers: users can now participate in the software development, optimization, and evolution process while they use their software. Such participation
PACER: Proportional Detection of Data Races ∗
"... Data races indicate serious concurrency bugs such as order, atomicity, and sequential consistency violations. Races are difficult to find and fix, often manifesting only in deployment. The frequency of these bugs will likely increase as software adds parallelism to exploit multicore hardware. Unfort ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Data races indicate serious concurrency bugs such as order, atomicity, and sequential consistency violations. Races are difficult to find and fix, often manifesting only in deployment. The frequency of these bugs will likely increase as software adds parallelism to exploit multicore hardware. Unfortunately, sound and precise race detectors slow programs by factors of eight or more, which is too expensive to deploy. This paper presents a precise, low-overhead sampling-based data race detector called PACER. PACER makes a proportionality guarantee: it detects any race at a rate equal to the sampling rate, by finding races whose first access occurs during a global sampling period. During sampling, PACER tracks all accesses using the sound and precise FASTTRACK algorithm. In non-sampling periods, PACER discards sampled access information that cannot be part of a reported race, and PACER simplifies tracking of the happens-before relationship, yielding near-constant, instead of linear, overheads. Experimental results confirm our design claims: time and space overheads scale with the sampling rate, and sampling rates of 1-3 % yield overheads low enough to consider in production software. Furthermore, our results suggest PACER reports each race at a rate equal to the sampling rate. The resulting system provides a “get what you pay for ” approach to race detection that is suitable for identifying real, hard-to-reproduce races in deployed systems. 1.
Holmes: Effective statistical debugging via efficient path profiling
- IN 31ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2009
, 2009
"... Statistical debugging aims to automate the process of isolating bugs by profiling several runs of the program and using statistical analysis to pinpoint the likely causes of failure. In this paper, we investigate the impact of using richer program profiles such as path profiles on the effectiveness ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Statistical debugging aims to automate the process of isolating bugs by profiling several runs of the program and using statistical analysis to pinpoint the likely causes of failure. In this paper, we investigate the impact of using richer program profiles such as path profiles on the effectiveness of bug isolation. We describe a statistical debugging tool called HOLMES that isolates bugs by finding paths that correlate with failure. We also present an adaptive version of HOLMES that uses iterative, bug-directed profiling to lower execution time and space overheads. We evaluate HOLMES using programs from the SIR benchmark suite and some large, real-world applications. Our results indicate that path profiles can help isolate bugs more precisely by providing more information about the context in which bugs occur. Moreover, bug-directed profiling can efficiently isolate bugs with low overheads, providing a scalable and accurate alternative to sparse random sampling.
Profile-guided proactive garbage collection for locality optimization
- In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation
, 2006
"... Many applications written in garbage collected languages have large dynamic working sets and poor data locality. We present a new system for continuously improving program data locality at run time with low overhead. Our system proactively reorganizes the heap by leveraging the garbage collector and ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Many applications written in garbage collected languages have large dynamic working sets and poor data locality. We present a new system for continuously improving program data locality at run time with low overhead. Our system proactively reorganizes the heap by leveraging the garbage collector and uses profile information collected through a low-overhead mechanism to guide the reorganization at run time. The key contributions include making a case that garbage collection should be viewed as a proactive technique for improving data locality by triggering garbage collection for locality optimization independently of normal garbage collection for space, combining page and cache locality optimization in the same system, and demonstrating that sampling provides sufficiently detailed data access information to guide both page and cache locality optimization with low runtime overhead. We present experimental results obtained by modifying a commercial, state-of-the-art garbage collector to support our claims. Independently triggering garbage collection for locality optimization significantly improved optimizations benefits. Combining page and cache locality optimizations in the same system provided larger average execution time improvements (17%) than either alone (page 8%, cache 7%). Finally, using sampling limited profiling overhead to less than 3%, on average. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors – code generation, memory management (garbage collectors), optimization, run-time
A Concurrent Dynamic Analysis Framework for Multicore Hardware
, 2009
"... Software has spent the bounty of Moore’s law by solving harder problems and exploiting abstractions, such as high-level languages, virtual machine technology, binary rewriting, and dynamic analysis. Abstractions make programmers more productive and programs more portable, but usually slow them down. ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Software has spent the bounty of Moore’s law by solving harder problems and exploiting abstractions, such as high-level languages, virtual machine technology, binary rewriting, and dynamic analysis. Abstractions make programmers more productive and programs more portable, but usually slow them down. Since Moore’s law is now delivering multiple cores instead of faster processors, future systems must either bear a relatively higher cost for abstractions or use some cores to help tolerate abstraction costs. This paper presents the design, implementation, and evaluation of a novel concurrent, configurable dynamic analysis framework that efficiently utilizes multicore cache architectures. It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads. We guide the design and implementation of our framework with a model of dynamic analysis overheads. The framework implements exhaustive and sampling event processing and is analysis-neutral. We evaluate the framework with five popular and diverse analyses, and show performance improvements even for lightweight, low-overhead analyses. Efficient inter-core communication is central to high performance parallel systems and we believe the CAB design gives insight to the subtleties and difficulties of attaining it for dynamic analysis and other parallel software.

