Results 1 - 10
of
202
ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay
- In Proceedings of the 2002 Symposium on Operating Systems Design and Implementation (OSDI
, 2002
"... Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Current syste ..."
Abstract
-
Cited by 277 (18 self)
- Add to MetaCart
Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Current system loggers have two problems: they depend on the integrity of the operating system being logged, and they do not save sufficient information to replay and analyze attacks that include any non-deterministic events. ReVirt removes the dependency on the target operating system by moving it into a virtual machine and logging below the virtual machine. This allows ReVirt to replay the system’s execution before, during, and after an intruder compromises the system, even if the intruder replaces the target operating system. ReVirt logs enough information to replay a long-term execution of the virtual machine instruction-by-instruction. This enables it to provide arbitrarily detailed observations about what transpired on the system, even in the presence of non-deterministic attacks and executions. ReVirt adds reasonable time and space overhead. Overheads due to virtualization are imperceptible for interactive use and CPU-bound workloads, and 13-58 % for kernel-intensive workloads. Logging adds 0-8 % overhead, and logging
Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail
- In search of the holy grail. Distributed Computing
, 1994
"... : The paper shows that characterizing the causal relationship between significant events is an important but non-trivial aspect for understanding the behavior of distributed programs. An introduction to the notion of causality and its relation to logical time is given; some fundamental results conce ..."
Abstract
-
Cited by 187 (4 self)
- Add to MetaCart
: The paper shows that characterizing the causal relationship between significant events is an important but non-trivial aspect for understanding the behavior of distributed programs. An introduction to the notion of causality and its relation to logical time is given; some fundamental results concerning the characterization of causality are presented. Recent work on the detection of causal relationships in distributed computations is surveyed. The issue of observing distributed computations in a causally consistent way and the basic problems of detecting global predicates are discussed. To illustrate the major difficulties, some typical monitoring and debugging approaches are assessed, and it is demonstrated how their feasibility is severely limited by the fundamental problem to master the complexity of causal relationships. Keywords: Distributed Computation, Causality, Distributed System, Causal Ordering, Logical Time, Vector Time, Global Predicate Detection, Distributed Debugging, ...
Debugging concurrent programs
- ACM Computing Surveys
, 1989
"... The main problems associated with debugging concurrent programs are increased complexity, the “probe effect, ” nonrepeatability, and the lack of a synchronized global clock. The probe effect refers to the fact that any attempt to observe the behavior of a distributed system may change the behavior o ..."
Abstract
-
Cited by 164 (1 self)
- Add to MetaCart
The main problems associated with debugging concurrent programs are increased complexity, the “probe effect, ” nonrepeatability, and the lack of a synchronized global clock. The probe effect refers to the fact that any attempt to observe the behavior of a distributed system may change the behavior of that system. For some parallel programs,
A Design Framework for Internet-Scale Event Observation and Notification
- In Proc. of the 6 th European Software Engineering Conf. held jointly with the 5 th ACM SIGSOFT Symp. on the Foundations of Software Engineering (ESEC/FSE97), number 1301 in LNCS
, 1997
"... There is increasing interest in having software systems execute and interoperate over the Internet. Execution and interoperation at this scale imply a degree of loose coupling and heterogeneity among the components from which such systems will be built. One common architectural style for distributed ..."
Abstract
-
Cited by 138 (9 self)
- Add to MetaCart
There is increasing interest in having software systems execute and interoperate over the Internet. Execution and interoperation at this scale imply a degree of loose coupling and heterogeneity among the components from which such systems will be built. One common architectural style for distributed; loosely-coupled, heterogeneous software systems is a structure based on event generation, observation and notification. The technology to support this approach is well-developed for local area networks, but it is illsuited to networks on the scale of the Internet. Hence, new technologies are needed to support the construction of large-scale, event-based software systems for the Internet. We have begun to design a new facility for event observation and notification that better serves the needs of Internet-scale applications. In this paper we present results from our first step in this design process, in which we defined a framework that captures many of the relevant design dimensions. Our framework comprises seven models-an object model, an event model, a naming model, an observation model, a time model, a notification model, and a resource model. The paper discusses each of these models in detail and illustrates them using an example involving an update to a Web page. The paper also evaluates three existing technologies with respect to the seven models.
Debugging operating systems with time-traveling virtual machines
, 2005
"... Operating systems are difficult to debug with traditional cyclic debugging. They are non-deterministic; they run for long periods of time; they interact directly with hardware devices; and their state is easily perturbed by the act of debugging. This paper describes a time-traveling virtual machine ..."
Abstract
-
Cited by 114 (7 self)
- Add to MetaCart
Operating systems are difficult to debug with traditional cyclic debugging. They are non-deterministic; they run for long periods of time; they interact directly with hardware devices; and their state is easily perturbed by the act of debugging. This paper describes a time-traveling virtual machine that overcomes many of the difficulties associated with debugging operating systems. Time travel enables a programmer to navigate backward and forward arbitrarily through the execution history of a particular run and to replay arbitrary segments of the past execution. We integrate time travel into a general-purpose debugger to enable a programmer to debug an OS in reverse, implementing commands such as reverse breakpoint, reverse watchpoint, and reverse single step. The space and time overheads needed to support time travel are reasonable for debugging, and movements in time are fast enough to support interactive debugging. We demonstrate the value of our time-traveling virtual machine by using it to understand and fix several OS bugs that are difficult to find with standard debugging tools. Reverse debugging is especially helpful in finding bugs that are fragile due to non-determinism, bugs in device drivers, bugs that require long runs to trigger, bugs that corrupt the stack, and bugs that are detected after the relevant stack frame is popped. 1
Deterministic Replay of Java Multithreaded Applications
- In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools
, 1998
"... Threads and concurrency constructs in Java introduce nondeterminism to a program's execution, which makes it hard to understand and analyze the execution behavior. Nondeterminism in execution behavior also makes it impossible to use execution replay for debugging, performance monitoring, or visualiz ..."
Abstract
-
Cited by 102 (6 self)
- Add to MetaCart
Threads and concurrency constructs in Java introduce nondeterminism to a program's execution, which makes it hard to understand and analyze the execution behavior. Nondeterminism in execution behavior also makes it impossible to use execution replay for debugging, performance monitoring, or visualization. This paper discusses a record/replay tool for Java, DejaVu, that provides deterministic replay of a program's execution. In particular, this paper describes the idea of the logical thread schedule, which makes DejaVu efficient and independent of the underlying thread scheduler. The paper also discusses how to handle the various Java synchronization operations for record and replay. DejaVu has been implemented by modifying the Sun Microsystems' Java Virtual Machine. 1 Introduction The ubiquity of the Java programming language in current software systems has made development of advanced programming environment tools for writing efficient and correct Java programs very important. Build...
BugNet: Continuously recording program execution for deterministic replay debugging
- In ISCA
, 2005
"... Significant time is spent by companies trying to reproduce and fix the bugs that occur for released code. To assist developers, we propose the BugNet architecture to continuously record information on production runs. The information collected before the crash of a program can be used by the develop ..."
Abstract
-
Cited by 102 (10 self)
- Add to MetaCart
Significant time is spent by companies trying to reproduce and fix the bugs that occur for released code. To assist developers, we propose the BugNet architecture to continuously record information on production runs. The information collected before the crash of a program can be used by the developers working in their execution environment to deterministically replay the last several million instructions executed before the crash. BugNet is based on the insight that recording the register file contents at any point in time, and then recording the load values that occur after that point can enable deterministic replaying of a program’s execution. BugNet focuses on being able to replay the application’s execution and the libraries it uses, but not the operating system. But our approach provides the ability to replay an application’s execution across context switches and interrupts. Hence, BugNet obviates the need for tracking program I/O, interrupts and DMA transfers, which would have otherwise required more complex hardware support. In addition, BugNet does not require a final core dump of the system state for replaying, which significantly reduces the amount of data that must be sent back to the developer. 1
Causal Distributed Breakpoints
- In Proceedings of the 10th International Conference on Distributed Computing Systems
, 1990
"... A causal distributed breakpoint is initiated by a sequential breakpoint in one process of a distributed computation, and restores each process in the computation to its earliest state that reflects all events that "happened before" the breakpoint. A causal distributed breakpoint is the natural exten ..."
Abstract
-
Cited by 55 (0 self)
- Add to MetaCart
A causal distributed breakpoint is initiated by a sequential breakpoint in one process of a distributed computation, and restores each process in the computation to its earliest state that reflects all events that "happened before" the breakpoint. A causal distributed breakpoint is the natural extension for distributed programs of the conventional notion of a breakpoint in a sequential program. We present an algorithm for finding the causal distributed breakpoint given a sequential breakpoint in one of the processes. Approximately consistent checkpoint sets are used for efficiently restoring each process to its state in a causal distributed breakpoint. Causal distributed breakpoints assume deterministic processes that communicate solely by messages. The dependencies that arise from communication between processes are logged. Dependency logging and approximately consistent checkpoints sets have been implemented on a network of SUN workstations running the V-System. Overhead on the mess...
A Methodology for Testing Intrusion Detection Systems
- IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
, 1996
"... Intrusion Detection Systems (IDSs) attempt to identify unauthorized use, misuse, and abuse of computer systems. In response to the growth in the use and development of IDSs, we have developed a methodology for testing IDSs. The methodology consists of techniques from the field of software testing wh ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
Intrusion Detection Systems (IDSs) attempt to identify unauthorized use, misuse, and abuse of computer systems. In response to the growth in the use and development of IDSs, we have developed a methodology for testing IDSs. The methodology consists of techniques from the field of software testing which we have adapted for the specific purpose of testing IDSs. In this paper, we identify a set of general IDS performance objectives which is the basis for the methodology. We present the details of the methodology, including strategies for test-case selection and specific testing procedures. We include quantitative results from testing experiments on the Network Security Monitor (NSM), an IDS developed at UC Davis. We present an overview of the software platform that we have used to create user-simulation scripts for testing experiments. The platform consists of the UNIX tool expect and enhancements that we have developed, including mechanisms for concurrent scripts and a record-and-replay ...
Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs
- In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging (PADD
, 1993
"... Execution replay is a crucial part of debugging. Because explicitly parallel shared-memory programs can be nondeterministic, a tool is required that traces executions so they can be replayed for debugging. We present an adaptive tracing strategy that is optimal and records the minimal number of shar ..."
Abstract
-
Cited by 53 (0 self)
- Add to MetaCart
Execution replay is a crucial part of debugging. Because explicitly parallel shared-memory programs can be nondeterministic, a tool is required that traces executions so they can be replayed for debugging. We present an adaptive tracing strategy that is optimal and records the minimal number of shared-memory references required to exactly replay executions. Our algorithm makes runtime tracing decisions by detecting and tracing a certain type of race condition on-the-fly. Unlike past schemes, we make no assumptions about the execution'scorrectness (it need not be race free). Experiments show that only 0.01-2% of the shared-memory references are usually traced, a 2-4 order of magnitude reduction over past techniques which trace every access.

