• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Jockey: A userspace library for record-replay debugging. (2005)

by Y Saito
Venue:In AADEBUG,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 77
Next 10 →

Output-deterministic replay for multicore debugging

by Gautam Altekar, Ion Stoica , 2009
"... Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur hi ..."
Abstract - Cited by 127 (6 self) - Add to MetaCart
Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur high overheads, rely on non-standard multiprocessor hardware, or fail to reliably reproduce executions. Their primary stumbling block is data races – a source of nondeterminism that must be captured if executions are to be faithfully reproduced. In this paper, we present ODR–a software-only replay system that reproduces bugs and provides low-overhead multiprocessor recording. The key observation behind ODR is that, for debugging purposes, a replay system does not need to generate a high-fidelity replica of the original execution. Instead, it suffices to produce any execution that exhibits the same outputs as the original. Guided by this observation, ODR relaxes its fidelity guarantees to avoid the problem of reproducing data-races altogether. The result is a system that replays real multiprocessor applications, such as Apache, MySQL, and the Java Virtual Machine, and provides low record-mode overhead. Categories andSubjectDescriptors D.2.5 [Testing and Debugging]: Debugging aids
(Show Context)

Citation Context

... can invert hash functions in a feasible amount of time, and STP cannot handle floating-point arithmetic. 14Data races Multiple CPUs Efficient and scalable recording Software-only Determinism Jockey =-=[18]-=- Yes No Yes Yes Value RecPlay [17] No Yes Yes Yes Value SMP-ReVirt [4] Yes Yes No Yes Value iDNA [2] Yes Yes No Yes Value DeLorean [14] Yes Yes Yes No Value ODR Yes Yes Yes Yes Output Table 1: Summary...

Replay debugging for distributed applications

by Dennis Michael Geels , 2006
"... ..."
Abstract - Cited by 87 (5 self) - Add to MetaCart
Abstract not found

R2: An application-level kernel for record and replay.

by Zhenyu Guo , Xi Wang , Jian Tang , Xuezheng Liu , Zhilei Xu , Ming Wu , M Frans Kaashoek , Zheng Zhang , Microsoft Research , Asia - In Proc. of OSDI, , 2008
"... ABSTRACT Library-based record and replay tools aim to reproduce an application's execution by recording the results of selected functions in a log and during replay returning the results from the log rather than executing the functions. These tools must ensure that a replay run is identical to ..."
Abstract - Cited by 71 (6 self) - Add to MetaCart
ABSTRACT Library-based record and replay tools aim to reproduce an application's execution by recording the results of selected functions in a log and during replay returning the results from the log rather than executing the functions. These tools must ensure that a replay run is identical to the record run. The challenge in doing so is that only invocations of a function by the application should be recorded, recording the side effects of a function call can be difficult, and not executing function calls during replay, multithreading, and the presence of the tool may change the application's behavior from recording to replay. These problems have limited the use of such tools. R2 allows developers to choose functions that can be recorded and replayed correctly. Developers annotate the chosen functions with simple keywords so that R2 can handle calls with side effects and multithreading. R2 generates code for record and replay from templates, allowing developers to avoid implementing stubs for hundreds of functions manually. To track whether an invocation is on behalf of the application or the implementation of a selected function, R2 maintains a mode bit, which stubs save and restore. We have implemented R2 on Windows and annotated large parts (1,300 functions) of the Win32 API, and two higher-level interfaces (MPI and SQLite). R2 can replay multithreaded web and database servers that previous library-based tools cannot replay. By allowing developers to choose high-level interfaces, R2 can also keep recording overhead small; experiments show that its recording overhead for Apache is approximately 10%, that recording and replaying at the SQLite interface can reduce the log size up to 99% (compared to doing so at the Win32 API), and that using optimization annotations for BitTorrent and MPI applications achieves log size reduction ranging from 13.7% to 99.4%.
(Show Context)

Citation Context

..., multithreading, the presence of the tool in the application’s address space may cause the application to behave differently during replay. Previous library-based tools (e.g., liblog [10] and Jockey =-=[28]-=-) interpose a fixed low-level interface and omit calls that are difficult to make replay faithful, and thus limit the applications they can replay. Consider recording and replaying at the system call ...

Friday: Global comprehension for distributed replay

by Dennis Geels, Gautam Altekar, Petros Maniatis , Timothy Roscoe, Ion Stoica - IN PROCEEDINGS OF THE FOURTH SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION (NSDI ’07 , 2007
"... Debugging and profiling large-scale distributed applications is a daunting task. We present Friday, a system for debugging distributed applications that combines deterministic replay of components with the power of symbolic, low-level debugging and a simple language for expressing higher-level distr ..."
Abstract - Cited by 56 (5 self) - Add to MetaCart
Debugging and profiling large-scale distributed applications is a daunting task. We present Friday, a system for debugging distributed applications that combines deterministic replay of components with the power of symbolic, low-level debugging and a simple language for expressing higher-level distributed conditions and actions. Friday allows the programmer to understand the collective state and dynamics of a distributed collection of coordinated application components. To evaluate Friday, we consider several distributed problems, including routing consistency in overlay networks, and temporal state abnormalities caused by route flaps. We show via micro-benchmarks and larger-scale application measurement that Friday can be used interactively to debug large distributed applications under replay on common hardware.
(Show Context)

Citation Context

...gle-machine-instruction granularity, the WiDS Checker is applicable only to applications developed with the WiDS toolkit and checks predicates at eventhandler granularity. Similarly to Friday, Jockey =-=[20]-=- and Flashback [23] use system call interposition, binary rewriting, and operating system modifications to capture deterministic replayable traces, but only for a single node. DejaVu [15] targets dist...

Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems

by Oren Laadan, Nicolas Viennot, Jason Nieh
"... We present Scribe, the first system to provide transparent, lowoverhead application record-replay and the ability to go live from replayed execution. Scribe introduces new lightweight operating system mechanisms, rendezvous and sync points, to efficiently record nondeterministic interactions such as ..."
Abstract - Cited by 49 (6 self) - Add to MetaCart
We present Scribe, the first system to provide transparent, lowoverhead application record-replay and the ability to go live from replayed execution. Scribe introduces new lightweight operating system mechanisms, rendezvous and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Rendezvous points make a partial ordering of execution based on system call dependencies sufficient for replay, avoiding the recording overhead of maintaining an exact execution ordering. Sync points convert asynchronous interactions that can occur at arbitrary times into synchronous events that are much easier to record and replay. We have implemented Scribe without changing, relinking, or recompiling applications, libraries, or operating system kernels, and without any specialized hardware support such as hardware performance counters. It works on commodity Linux operating systems, and commodity multi-core and multiprocessor hardware. Our results show for the first time that an operating system mechanism can correctly and transparently record and replay multi-process and multi-threaded applications on commodity multiprocessors. Scribe recording overhead is less than 2.5 % for server applications including Apache and MySQL, and less than 15 % for desktop applications including Firefox, Acrobat, OpenOffice, parallel kernel compilation, and movie playback.
(Show Context)

Citation Context

...predominant form for processes to interact with the environment and with other processes. System call interposition is used to record and replay the execution of system calls. Unlike other approaches =-=[11, 26, 28]-=-, Scribe does not simply feed processes with logged data to simulate the effect of system calls. This is not sufficient to enable replayed execution to go live. Instead, Scribe re-executes system call...

Deterministic process groups in dOS

by Tom Bergan, Nicholas Hunt, Luis Ceze, Steven D. Gribble - In Proc. 9th USENIX Symposium on Operating System Design andImplementation , 2010
"... Current multiprocessor systems execute parallel and concurrent software nondeterministically: even when given precisely the same input, two executions of the same program may produce different output. This severely complicates debugging, testing, and automatic replication for fault-tolerance. Previo ..."
Abstract - Cited by 45 (1 self) - Add to MetaCart
Current multiprocessor systems execute parallel and concurrent software nondeterministically: even when given precisely the same input, two executions of the same program may produce different output. This severely complicates debugging, testing, and automatic replication for fault-tolerance. Previous efforts to address this issue have focused primarily on record and replay, but making execution actually deterministic would address the problem at the root. Our goals in this work are twofold: (1) to provide fully deterministic execution of arbitrary, unmodified, multithreaded programs as an OS service; and (2) to make all sources of intentional nondeterminism, such as network I/O, be explicit and controllable. To this end we propose a new OS abstraction, the Deterministic Process Group (DPG). All communication between threads and processes internal to a DPG happens deterministically, including implicit communication via sharedmemory accesses, as well as communication via OS channels such as pipes, signals, and the filesystem. To deal with fundamentally nondeterministic external events, our abstraction includes the shim layer, a programmable interface that interposes on all interaction between a DPG and the external world, making determinism useful even for reactive applications. We implemented the DPG abstraction as an extension to Linux and demonstrate its benefits with three use cases: plain deterministic execution; replicated execution; and record and replay by logging just external input. We evaluated our implementation on both parallel and reactive workloads, including Apache, Chromium, and PARSEC. 1.
(Show Context)

Citation Context

...l the hooks necessary to implement a replay component. The major challenges in faithfully replaying system call traces are orthogonal to the main body of our work and have been explored by prior work =-=[19, 36, 37]-=-. 5.3 Replicated Execution REPLICASHIM supports replication of a multithreaded webserver running inside a DPG by guaranteeing that the order of messages and their logical arrival time is kept consiste...

Dynamic slicing long running programs through execution fast forwarding

by Xiangyu Zhang, Sriraman Tallam, Rajiv Gupta - In FSE , 2006
"... Fixing runtime bugs in long running programs using tracing based analyses such as dynamic slicing was believed to be prohibitively expensive. In this paper, we present a novel execution fast forward-ing technique that makes it feasible. While a naive solution is to divide the entire execution by che ..."
Abstract - Cited by 35 (9 self) - Add to MetaCart
Fixing runtime bugs in long running programs using tracing based analyses such as dynamic slicing was believed to be prohibitively expensive. In this paper, we present a novel execution fast forward-ing technique that makes it feasible. While a naive solution is to divide the entire execution by checkpoints, and then apply dynamic slicing enabled by tracing on one checkpoint interval at a time, it is still too costly even with state-of-the-art tracing techniques. Our technique is derived from two key observations. The first one is that long running programs are usually driven by events, which has been taken advantage of by checkpointing/replaying techniques to deterministically replay an execution from the event log. The sec-ond observation is that all the events are not relevant to replaying a particular part of the execution, in which the programmer sus-pects an error happened. We develop a slicing-like technique on the event log such that many irrelevant events are successfully pruned. Driven by the reduced log, the replayed execution is now traced for fault location. This replayed execution has the effect of fast forwarding, i.e the amount of executed instructions is significantly reduced without losing the accuracy of reproducing the failure. We describe how execution fast forwarding is combined with check-pointing and tracing based dynamic slicing, which we believe is the first attempt to integrate these two techniques. The dynamic slices of a set of reported bugs for long running programs are studied to show the effectiveness of dynamic slicing, which is a significant step forward compared to our prior work. 1.
(Show Context)

Citation Context

...stributed programs [8, 16]. It quickly gained popularity in general application debugging[11, 12]. A lot of research has been carried out on how to reduce its cost [15, 7] and improving its usability =-=[13]-=-. Most of the existing checkpointing techniques focus on how to faithfully replay an execution. They rarely discuss what to do with the replayed execution or simply suggest that the replayed execution...

Staged deployment in Mirage, an integrated software upgrade testing and distribution system

by O Crameri , N Knezevic , D Kostic , R Bianchini , W Zwaenepoel - In Proc. of the 21st ACM Symp. on Oper. Systems Prin , 2007
"... ABSTRACT Despite major advances in the engineering of maintainable and robust software over the years, upgrading software remains a primitive and error-prone activity. In this paper, we argue that several problems with upgrading software are caused by a poor integration between upgrade deployment, ..."
Abstract - Cited by 34 (4 self) - Add to MetaCart
ABSTRACT Despite major advances in the engineering of maintainable and robust software over the years, upgrading software remains a primitive and error-prone activity. In this paper, we argue that several problems with upgrading software are caused by a poor integration between upgrade deployment, user-machine testing, and problem reporting. To support this argument, we present a characterization of software upgrades resulting from a survey we conducted of 50 system administrators. Motivated by the survey results, we present Mirage, a distributed framework for integrating upgrade deployment, user-machine testing, and problem reporting into the overall upgrade development process. Our evaluation focuses on the most novel aspect of Mirage, namely its staged upgrade deployment based on the clustering of user machines according to their environments and configurations. Our results suggest that Mirage's staged deployment is effective for real upgrade problems.
(Show Context)

Citation Context

...e the application continues to execute [2]. We have not considered distributed applications so far. Other work has considered application tracing and deterministic replay for debugging purposes, e.g. =-=[10, 29, 32]-=-. These systems are substantially more complex than Mirage, as they have to checkpoint internal states and replay the details of an execution exactly. Relative to approaches in comparing descriptions ...

Efficient Detection of Split Personalities in Malware

by Davide Balzarotti, Marco Cova, Christoph Karlberger, Christopher Kruegel, Engin Kirda, Giovanni Vigna
"... Malware is the root cause of many security threats on the Internet. To cope with the thousands of new malware samples that are discovered every day, security companies and analysts rely on automated tools to extract the runtime behavior of malicious programs. Of course, malware authors are aware of ..."
Abstract - Cited by 32 (5 self) - Add to MetaCart
Malware is the root cause of many security threats on the Internet. To cope with the thousands of new malware samples that are discovered every day, security companies and analysts rely on automated tools to extract the runtime behavior of malicious programs. Of course, malware authors are aware of these tools and increasingly try to thwart their analysis techniques. To this end, malware code is often equipped with checks that look for evidence of emulated or virtualized analysis environments. When such evidence is found, the malware program behaves differently or crashes, thus showing a different “personality” than on a real system. Recent work has introduced transparent analysis platforms (such as Ether or Cobra) that make it significantly more difficult for malware programs to detect their presence. Others have proposed techniques to identify and bypass checks introduced by malware authors. Both approaches are often successful in exposing the runtime behavior of malware even when the malicious code attempts to thwart analysis efforts. However, these techniques induce significant performance overhead, especially for finegrained analysis. Unfortunately, this makes them unsuitable for the analysis of current high-volume malware feeds. In this paper, we present a technique that efficiently detects when a malware program behaves differently in an emulated analysis environment and on an uninstrumented reference host. The basic idea is simple: we just compare the runtime behavior of a sample in our analysis system and on a reference machine. However, obtaining a robust and efficient comparison is very difficult. In particular, our approach consists of recording the interactions of the malware with the operating system in one run and using this information to deterministically replay the program in our analysis environment. Our experiments demonstrate that, by using our approach, one can efficiently detect malware samples that use a variety of techniques to identify emulated analysis environments.
(Show Context)

Citation Context

...es its own system calls to enable user programs to programmatically create snapshots at certain checkpoints. For this reason, it needs to modify the operating system. Another replaying tool is Jockey =-=[34]-=-, which inserts trampoline functions into system call code to direct the program flow to its own code where system call parameter values are recorded. These logs are then used in replay mode to reprod...

Analyzing multicore dumps to facilitate concurrency bug reproduction

by Dasarath Weeratunge, Xiangyu Zhang, Suresh Jagannathan - In ASPLOS , 2010
"... Debugging concurrent programs is difficult. This is primarily because the inherent non-determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings. The problem is exacerbated in multi-core environments where the ..."
Abstract - Cited by 29 (2 self) - Add to MetaCart
Debugging concurrent programs is difficult. This is primarily because the inherent non-determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings. The problem is exacerbated in multi-core environments where there are multiple schedulers, one for each core. In this paper, we propose a reproduction technique for concurrent programs that execute on multi-core platforms. Our technique performs a lightweight analysis of a failing execution that occurs in a multi-core environment, and uses the result of the analysis to enable reproduction of the bug in a singlecore system, under the control of a deterministic scheduler. More specifically, our approach automatically identifies the execution point in the re-execution that corresponds to the failure point. It does so by analyzing the failure core dump and leveraging a technique called execution indexing that identifies a related point in the re-execution. By generating a core dump at this point, and comparing the differences betwen the two dumps, we are able to guide a search algorithm to efficiently generate a failure inducing schedule. Our experiments show that our technique is highly effective and has reasonable overhead.
(Show Context)

Citation Context

... inducing input can be acquired and used in re-executions, which may not be true if the servers have been running for a long time. A potential solution is to use a lightweight checkpointing technique =-=[23, 25, 28]-=- to avoid the need to re-collect all inputs from the beginning of the execution. It would then only be necessary to reconstruct execution from the closest checkpoint and consider the inputs processed ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University