Results 1 - 10
of
96
BugNet: Continuously recording program execution for deterministic replay debugging
- In ISCA
, 2005
"... Significant time is spent by companies trying to reproduce and fix the bugs that occur for released code. To assist developers, we propose the BugNet architecture to continuously record information on production runs. The information collected before the crash of a program can be used by the develop ..."
Abstract
-
Cited by 182 (16 self)
- Add to MetaCart
(Show Context)
Significant time is spent by companies trying to reproduce and fix the bugs that occur for released code. To assist developers, we propose the BugNet architecture to continuously record information on production runs. The information collected before the crash of a program can be used by the developers working in their execution environment to deterministically replay the last several million instructions executed before the crash. BugNet is based on the insight that recording the register file contents at any point in time, and then recording the load values that occur after that point can enable deterministic replaying of a program’s execution. BugNet focuses on being able to replay the application’s execution and the libraries it uses, but not the operating system. But our approach provides the ability to replay an application’s execution across context switches and interrupts. Hence, BugNet obviates the need for tracking program I/O, interrupts and DMA transfers, which would have otherwise required more complex hardware support. In addition, BugNet does not require a final core dump of the system state for replaying, which significantly reduces the amount of data that must be sent back to the developer. 1
Deterministic Replay of Java Multithreaded Applications
- In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools
, 1998
"... Threads and concurrency constructs in Java introduce nondeterminism to a program's execution, which makes it hard to understand and analyze the execution behavior. Nondeterminism in execution behavior also makes it impossible to use execution replay for debugging, performance monitoring, or vis ..."
Abstract
-
Cited by 157 (6 self)
- Add to MetaCart
(Show Context)
Threads and concurrency constructs in Java introduce nondeterminism to a program's execution, which makes it hard to understand and analyze the execution behavior. Nondeterminism in execution behavior also makes it impossible to use execution replay for debugging, performance monitoring, or visualization. This paper discusses a record/replay tool for Java, DejaVu, that provides deterministic replay of a program's execution. In particular, this paper describes the idea of the logical thread schedule, which makes DejaVu efficient and independent of the underlying thread scheduler. The paper also discusses how to handle the various Java synchronization operations for record and replay. DejaVu has been implemented by modifying the Sun Microsystems' Java Virtual Machine. 1 Introduction The ubiquity of the Java programming language in current software systems has made development of advanced programming environment tools for writing efficient and correct Java programs very important. Build...
Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging
- In USENIX Annual Technical Conference, General Track
, 2004
"... Unfortunately, finding software bugs is a very challenging task because many bugs are hard to reproduce. While debugging a program, it would be very useful to rollback a crashed program to a previous execution point and deterministically re-execute the "buggy " code region. However ..."
Abstract
-
Cited by 155 (7 self)
- Add to MetaCart
Unfortunately, finding software bugs is a very challenging task because many bugs are hard to reproduce. While debugging a program, it would be very useful to rollback a crashed program to a previous execution point and deterministically re-execute the "buggy " code region. However, most previous work on rollback and replay support was designed to survive hardware or operating system failures, and is therefore too heavyweight for the fine-grained rollback and replay needed for software debugging. This paper presents Flashback, a lightweight OS extension that provides fine-grained rollback and replay to help debug software. Flashback uses shadow processes to efficiently roll back in-memory state of a process, and logs a process ' interactions with the system to support deterministic replay. Both shadow processes and logging of system calls are implemented in a lightweight fashion specifically designed for the purpose of software debugging. We have implemented a prototype of Flashback in the Linux operating system. Our experimental results with micro-benchmarks and real applications show that Flashback adds little overhead and can quickly roll back a debugged program to a previous execution point and deterministically replay from that point.
Automatically classifying benign and harmful data races using replay analysis
- In PLDI
, 2007
"... Many concurrency bugs in multi-threaded programs are due to data races. There have been many efforts to develop static and dynamic mechanisms to automatically find the data races. Most of the prior work has focused on finding the data races and eliminating the false positives. In this paper, we inst ..."
Abstract
-
Cited by 95 (7 self)
- Add to MetaCart
(Show Context)
Many concurrency bugs in multi-threaded programs are due to data races. There have been many efforts to develop static and dynamic mechanisms to automatically find the data races. Most of the prior work has focused on finding the data races and eliminating the false positives. In this paper, we instead focus on a dynamic analysis technique to automatically classify the data races into two categories – the data races that are potentially benign and the data races that are potentially harmful. A harmful data race is a real bug that needs to be fixed. This classification is needed to focus the triaging effort on those data races that are potentially harmful. Without prioritizing the data races we have found that there are too many data races to triage. Our second focus is to automatically provide to the developer a reproducible scenario of the data race, which allows the developer to understand the different effects of a harmful data race on a program’s execution. To achieve the above, we record a multi-threaded program’s execution in a replay log. The replay log is used to replay the multithreaded program, and during replay we find the data races using a happens-before based algorithm. To automatically classify if a data race that we find is potentially benign or potentially harmful, we replay the execution twice for a given data race – one for each possible order between the conflicting memory operations. If the two replays for the two orders produce the same result, then we classify the data race to be potentially benign. We discuss our experiences in using our replay based dynamic data race checker on several Microsoft applications.
DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently ∗
"... Support for deterministic replay of multithreaded execution can greatly help in finding concurrency bugs. For highest effectiveness, replay schemes should (i) record at production-run speed, (ii) keep their logging requirements minute, and (iii) replay at a speed similar to that of the initial execu ..."
Abstract
-
Cited by 91 (20 self)
- Add to MetaCart
(Show Context)
Support for deterministic replay of multithreaded execution can greatly help in finding concurrency bugs. For highest effectiveness, replay schemes should (i) record at production-run speed, (ii) keep their logging requirements minute, and (iii) replay at a speed similar to that of the initial execution. In this paper, we propose a new substrate for deterministic replay that provides substantial advances along these axes. In our proposal, processors execute blocks of instructions atomically, as in transactional memory or speculative multithreading, and the system only needs to record the commit order of these blocks. We call our scheme DeLorean. Our results show that DeLorean records execution at a speed similar to that of Release Consistency (RC) execution and replays at about 82 % of its speed. In contrast, most current schemes only record at the speed of Sequential Consistency (SC) execution. Moreover, DeLorean only needs 7.5 % of the log size needed by a state-of-the-art scheme. Finally, DeLorean can be configured to need only 0.6 % of the log size of the state-of-the-art scheme at the cost of recording at 86 % of RC’s execution speed — still faster than SC. In this configuration, the log of an 8-processor 5-GHz machine is estimated to be only about 20GB per day. 1.
Recording Shared Memory Dependencies Using Strata
- Proceedings of the 12th international conference on Architectural
, 2006
"... Significant time is spent by companies trying to reproduce and fix bugs. BugNet and FDR are recent architecture proposals that provide architecture support for deterministic replay debugging. They focus on continuously recording information about the program’s execution, which can be communicated ba ..."
Abstract
-
Cited by 86 (9 self)
- Add to MetaCart
Significant time is spent by companies trying to reproduce and fix bugs. BugNet and FDR are recent architecture proposals that provide architecture support for deterministic replay debugging. They focus on continuously recording information about the program’s execution, which can be communicated back to the developer. Using that information, the developer can deterministically replay the program’s execution to reproduce and fix the bugs. In this paper, we propose using Strata to efficiently capture the shared memory dependencies. A stratum creates a time layer across all the logs for the running threads, which separates all the memory operations executed before and after the stratum. A strata log allows us to determine all the shared memory dependencies during replay and thereby supports deterministic replay debugging for multi-threaded programs.
Jockey: A user-space library for record-replay debugging
- In AADEBUG’05: Proceedings of the sixth international symposium on Automated analysis-driven debugging
, 2005
"... Jockey is an execution record/replay tool for debugging Linux programs. It records invocations of system calls and CPU instructions with timing-dependent effects and later replays them deterministically. It supports process checkpointing to diagnose long-running programs efficiently. Jockey is imple ..."
Abstract
-
Cited by 77 (0 self)
- Add to MetaCart
(Show Context)
Jockey is an execution record/replay tool for debugging Linux programs. It records invocations of system calls and CPU instructions with timing-dependent effects and later replays them deterministically. It supports process checkpointing to diagnose long-running programs efficiently. Jockey is implemented as a shared-object file that runs as a part of the target process. While this design is the key for achieving Jockey’s goal of safety and ease of use, it also poses challenges. This paper discusses some of the practical issues we needed to overcome in such environments, including low-overhead system-call interception, techniques for segregating resource usage between Jockey and the target process, and an interface for finegrain control of Jockey’s behavior.
A Case for an Interleaving Constrained Shared-Memory Multi-Processor
- In ISCA
, 2009
"... Shared-memory multi-threaded programming is inherently more difficult than single-threaded programming. The main source of complexity is that, the threads of an application can interleave in so many different ways. To ensure correctness, a programmer has to test all possible thread interleavings, wh ..."
Abstract
-
Cited by 63 (3 self)
- Add to MetaCart
Shared-memory multi-threaded programming is inherently more difficult than single-threaded programming. The main source of complexity is that, the threads of an application can interleave in so many different ways. To ensure correctness, a programmer has to test all possible thread interleavings, which, however, is impractical. Many rare thread interleavings remain untested in production systems, and they are the root cause for a majority of concurrency bugs. We propose a shared-memory multiprocessor design that avoids untested interleavings to improve the correctness of a multi-threaded program. Since untested interleavings tend to occur infrequently at runtime, the performance cost of avoiding them is not high. We propose to encode the set of tested correct interleavings in a program’s binary executable using Predecessor Set (PSet) constraints. These constraints are efficiently enforced at runtime using processor support, which ensures that the runtime follows a tested interleaving. We analyze several bugs in open source applications such as MySQL, Apache, Mozilla, etc., and show that, by enforcing PSet constraints, we can avoid not only data races and atomicity violations, but also other forms of concurrency bugs.
LiteRace: effective sampling for lightweight data-race detection
- In PLDI
, 2009
"... Data races are one of the most common and subtle causes of pernicious concurrency bugs. Static techniques for preventing data races are overly conservative and do not scale well to large programs. Past research has produced several dynamic data race detectors that can be applied to large programs an ..."
Abstract
-
Cited by 62 (3 self)
- Add to MetaCart
(Show Context)
Data races are one of the most common and subtle causes of pernicious concurrency bugs. Static techniques for preventing data races are overly conservative and do not scale well to large programs. Past research has produced several dynamic data race detectors that can be applied to large programs and are precise in the sense that they only report actual data races. However, these dynamic data race detectors incur a high performance overhead, slowing down a program’s execution by an order of magnitude. In this paper we present FeatherLite, a very lightweight data race detector that samples and analyzes only selected portions of a program’s execution. We show that it is possible to sample a multi-threaded program at a low frequency and yet find infrequently occurring data races. We implemented FeatherLite using Microsoft’s Phoenix compiler. Our experiments with several Microsoft programs show that FeatherLite is able to find more than 75 % of data races by sampling less than 5 % of memory accesses in a given program execution. 1.
DoublePlay: Parallelizing Sequential Logging and Replay
"... Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the need to reproduce the order or values read by sha ..."
Abstract
-
Cited by 62 (11 self)
- Add to MetaCart
(Show Context)
Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the need to reproduce the order or values read by shared memory operations performed by multiple threads. In this paper, we present DoublePlay, a new way to efficiently guarantee replay on commodity multiprocessors. Our key insight is that one can use the simpler and faster mechanisms of single-processor record and replay, yet still achieve the scalability offered by multiple cores, by using an additional execution to parallelize the record and replay of an application. DoublePlay timeslices multiple threads on a single processor, then runs multiple time intervals (epochs) of the program concurrently on separate processors. This strategy, which we call uniparallelism, makes logging much easier because each epoch runs on a single processor (so threads in an epoch never simultaneously access the same memory) and different epochs operate on different copies of the memory. Thus, rather than logging the order of shared-memory accesses, we need only log the order in which threads in an epoch are timesliced on the processor. DoublePlay runs an additional execution of the program on multiple processors to generate checkpoints so that epochs run in parallel. We evaluate DoublePlay on a variety of client, server, and scientific parallel benchmarks; with spare cores, DoublePlay reduces logging overhead to an average of 15 % with two worker threads and 28 % with four threads.