Results 1 - 10
of
77
Output-deterministic replay for multicore debugging
, 2009
"... Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur hi ..."
Abstract
-
Cited by 127 (6 self)
- Add to MetaCart
(Show Context)
Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur high overheads, rely on non-standard multiprocessor hardware, or fail to reliably reproduce executions. Their primary stumbling block is data races – a source of nondeterminism that must be captured if executions are to be faithfully reproduced. In this paper, we present ODR–a software-only replay system that reproduces bugs and provides low-overhead multiprocessor recording. The key observation behind ODR is that, for debugging purposes, a replay system does not need to generate a high-fidelity replica of the original execution. Instead, it suffices to produce any execution that exhibits the same outputs as the original. Guided by this observation, ODR relaxes its fidelity guarantees to avoid the problem of reproducing data-races altogether. The result is a system that replays real multiprocessor applications, such as Apache, MySQL, and the Java Virtual Machine, and provides low record-mode overhead. Categories andSubjectDescriptors D.2.5 [Testing and Debugging]: Debugging aids
R2: An application-level kernel for record and replay.
- In Proc. of OSDI,
, 2008
"... ABSTRACT Library-based record and replay tools aim to reproduce an application's execution by recording the results of selected functions in a log and during replay returning the results from the log rather than executing the functions. These tools must ensure that a replay run is identical to ..."
Abstract
-
Cited by 71 (6 self)
- Add to MetaCart
(Show Context)
ABSTRACT Library-based record and replay tools aim to reproduce an application's execution by recording the results of selected functions in a log and during replay returning the results from the log rather than executing the functions. These tools must ensure that a replay run is identical to the record run. The challenge in doing so is that only invocations of a function by the application should be recorded, recording the side effects of a function call can be difficult, and not executing function calls during replay, multithreading, and the presence of the tool may change the application's behavior from recording to replay. These problems have limited the use of such tools. R2 allows developers to choose functions that can be recorded and replayed correctly. Developers annotate the chosen functions with simple keywords so that R2 can handle calls with side effects and multithreading. R2 generates code for record and replay from templates, allowing developers to avoid implementing stubs for hundreds of functions manually. To track whether an invocation is on behalf of the application or the implementation of a selected function, R2 maintains a mode bit, which stubs save and restore. We have implemented R2 on Windows and annotated large parts (1,300 functions) of the Win32 API, and two higher-level interfaces (MPI and SQLite). R2 can replay multithreaded web and database servers that previous library-based tools cannot replay. By allowing developers to choose high-level interfaces, R2 can also keep recording overhead small; experiments show that its recording overhead for Apache is approximately 10%, that recording and replaying at the SQLite interface can reduce the log size up to 99% (compared to doing so at the Win32 API), and that using optimization annotations for BitTorrent and MPI applications achieves log size reduction ranging from 13.7% to 99.4%.
Friday: Global comprehension for distributed replay
- IN PROCEEDINGS OF THE FOURTH SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION (NSDI ’07
, 2007
"... Debugging and profiling large-scale distributed applications is a daunting task. We present Friday, a system for debugging distributed applications that combines deterministic replay of components with the power of symbolic, low-level debugging and a simple language for expressing higher-level distr ..."
Abstract
-
Cited by 56 (5 self)
- Add to MetaCart
(Show Context)
Debugging and profiling large-scale distributed applications is a daunting task. We present Friday, a system for debugging distributed applications that combines deterministic replay of components with the power of symbolic, low-level debugging and a simple language for expressing higher-level distributed conditions and actions. Friday allows the programmer to understand the collective state and dynamics of a distributed collection of coordinated application components. To evaluate Friday, we consider several distributed problems, including routing consistency in overlay networks, and temporal state abnormalities caused by route flaps. We show via micro-benchmarks and larger-scale application measurement that Friday can be used interactively to debug large distributed applications under replay on common hardware.
Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems
"... We present Scribe, the first system to provide transparent, lowoverhead application record-replay and the ability to go live from replayed execution. Scribe introduces new lightweight operating system mechanisms, rendezvous and sync points, to efficiently record nondeterministic interactions such as ..."
Abstract
-
Cited by 49 (6 self)
- Add to MetaCart
(Show Context)
We present Scribe, the first system to provide transparent, lowoverhead application record-replay and the ability to go live from replayed execution. Scribe introduces new lightweight operating system mechanisms, rendezvous and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Rendezvous points make a partial ordering of execution based on system call dependencies sufficient for replay, avoiding the recording overhead of maintaining an exact execution ordering. Sync points convert asynchronous interactions that can occur at arbitrary times into synchronous events that are much easier to record and replay. We have implemented Scribe without changing, relinking, or recompiling applications, libraries, or operating system kernels, and without any specialized hardware support such as hardware performance counters. It works on commodity Linux operating systems, and commodity multi-core and multiprocessor hardware. Our results show for the first time that an operating system mechanism can correctly and transparently record and replay multi-process and multi-threaded applications on commodity multiprocessors. Scribe recording overhead is less than 2.5 % for server applications including Apache and MySQL, and less than 15 % for desktop applications including Firefox, Acrobat, OpenOffice, parallel kernel compilation, and movie playback.
Deterministic process groups in dOS
- In Proc. 9th USENIX Symposium on Operating System Design andImplementation
, 2010
"... Current multiprocessor systems execute parallel and concurrent software nondeterministically: even when given precisely the same input, two executions of the same program may produce different output. This severely complicates debugging, testing, and automatic replication for fault-tolerance. Previo ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
(Show Context)
Current multiprocessor systems execute parallel and concurrent software nondeterministically: even when given precisely the same input, two executions of the same program may produce different output. This severely complicates debugging, testing, and automatic replication for fault-tolerance. Previous efforts to address this issue have focused primarily on record and replay, but making execution actually deterministic would address the problem at the root. Our goals in this work are twofold: (1) to provide fully deterministic execution of arbitrary, unmodified, multithreaded programs as an OS service; and (2) to make all sources of intentional nondeterminism, such as network I/O, be explicit and controllable. To this end we propose a new OS abstraction, the Deterministic Process Group (DPG). All communication between threads and processes internal to a DPG happens deterministically, including implicit communication via sharedmemory accesses, as well as communication via OS channels such as pipes, signals, and the filesystem. To deal with fundamentally nondeterministic external events, our abstraction includes the shim layer, a programmable interface that interposes on all interaction between a DPG and the external world, making determinism useful even for reactive applications. We implemented the DPG abstraction as an extension to Linux and demonstrate its benefits with three use cases: plain deterministic execution; replicated execution; and record and replay by logging just external input. We evaluated our implementation on both parallel and reactive workloads, including Apache, Chromium, and PARSEC. 1.
Dynamic slicing long running programs through execution fast forwarding
- In FSE
, 2006
"... Fixing runtime bugs in long running programs using tracing based analyses such as dynamic slicing was believed to be prohibitively expensive. In this paper, we present a novel execution fast forward-ing technique that makes it feasible. While a naive solution is to divide the entire execution by che ..."
Abstract
-
Cited by 35 (9 self)
- Add to MetaCart
(Show Context)
Fixing runtime bugs in long running programs using tracing based analyses such as dynamic slicing was believed to be prohibitively expensive. In this paper, we present a novel execution fast forward-ing technique that makes it feasible. While a naive solution is to divide the entire execution by checkpoints, and then apply dynamic slicing enabled by tracing on one checkpoint interval at a time, it is still too costly even with state-of-the-art tracing techniques. Our technique is derived from two key observations. The first one is that long running programs are usually driven by events, which has been taken advantage of by checkpointing/replaying techniques to deterministically replay an execution from the event log. The sec-ond observation is that all the events are not relevant to replaying a particular part of the execution, in which the programmer sus-pects an error happened. We develop a slicing-like technique on the event log such that many irrelevant events are successfully pruned. Driven by the reduced log, the replayed execution is now traced for fault location. This replayed execution has the effect of fast forwarding, i.e the amount of executed instructions is significantly reduced without losing the accuracy of reproducing the failure. We describe how execution fast forwarding is combined with check-pointing and tracing based dynamic slicing, which we believe is the first attempt to integrate these two techniques. The dynamic slices of a set of reported bugs for long running programs are studied to show the effectiveness of dynamic slicing, which is a significant step forward compared to our prior work. 1.
Staged deployment in Mirage, an integrated software upgrade testing and distribution system
- In Proc. of the 21st ACM Symp. on Oper. Systems Prin
, 2007
"... ABSTRACT Despite major advances in the engineering of maintainable and robust software over the years, upgrading software remains a primitive and error-prone activity. In this paper, we argue that several problems with upgrading software are caused by a poor integration between upgrade deployment, ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
(Show Context)
ABSTRACT Despite major advances in the engineering of maintainable and robust software over the years, upgrading software remains a primitive and error-prone activity. In this paper, we argue that several problems with upgrading software are caused by a poor integration between upgrade deployment, user-machine testing, and problem reporting. To support this argument, we present a characterization of software upgrades resulting from a survey we conducted of 50 system administrators. Motivated by the survey results, we present Mirage, a distributed framework for integrating upgrade deployment, user-machine testing, and problem reporting into the overall upgrade development process. Our evaluation focuses on the most novel aspect of Mirage, namely its staged upgrade deployment based on the clustering of user machines according to their environments and configurations. Our results suggest that Mirage's staged deployment is effective for real upgrade problems.
Efficient Detection of Split Personalities in Malware
"... Malware is the root cause of many security threats on the Internet. To cope with the thousands of new malware samples that are discovered every day, security companies and analysts rely on automated tools to extract the runtime behavior of malicious programs. Of course, malware authors are aware of ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
(Show Context)
Malware is the root cause of many security threats on the Internet. To cope with the thousands of new malware samples that are discovered every day, security companies and analysts rely on automated tools to extract the runtime behavior of malicious programs. Of course, malware authors are aware of these tools and increasingly try to thwart their analysis techniques. To this end, malware code is often equipped with checks that look for evidence of emulated or virtualized analysis environments. When such evidence is found, the malware program behaves differently or crashes, thus showing a different “personality” than on a real system. Recent work has introduced transparent analysis platforms (such as Ether or Cobra) that make it significantly more difficult for malware programs to detect their presence. Others have proposed techniques to identify and bypass checks introduced by malware authors. Both approaches are often successful in exposing the runtime behavior of malware even when the malicious code attempts to thwart analysis efforts. However, these techniques induce significant performance overhead, especially for finegrained analysis. Unfortunately, this makes them unsuitable for the analysis of current high-volume malware feeds. In this paper, we present a technique that efficiently detects when a malware program behaves differently in an emulated analysis environment and on an uninstrumented reference host. The basic idea is simple: we just compare the runtime behavior of a sample in our analysis system and on a reference machine. However, obtaining a robust and efficient comparison is very difficult. In particular, our approach consists of recording the interactions of the malware with the operating system in one run and using this information to deterministically replay the program in our analysis environment. Our experiments demonstrate that, by using our approach, one can efficiently detect malware samples that use a variety of techniques to identify emulated analysis environments.
Analyzing multicore dumps to facilitate concurrency bug reproduction
- In ASPLOS
, 2010
"... Debugging concurrent programs is difficult. This is primarily because the inherent non-determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings. The problem is exacerbated in multi-core environments where the ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
(Show Context)
Debugging concurrent programs is difficult. This is primarily because the inherent non-determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings. The problem is exacerbated in multi-core environments where there are multiple schedulers, one for each core. In this paper, we propose a reproduction technique for concurrent programs that execute on multi-core platforms. Our technique performs a lightweight analysis of a failing execution that occurs in a multi-core environment, and uses the result of the analysis to enable reproduction of the bug in a singlecore system, under the control of a deterministic scheduler. More specifically, our approach automatically identifies the execution point in the re-execution that corresponds to the failure point. It does so by analyzing the failure core dump and leveraging a technique called execution indexing that identifies a related point in the re-execution. By generating a core dump at this point, and comparing the differences betwen the two dumps, we are able to guide a search algorithm to efficiently generate a failure inducing schedule. Our experiments show that our technique is highly effective and has reasonable overhead.