Results 1 - 10
of
124
Finding and Reproducing Heisenbugs in Concurrent Programs
"... Concurrency is pervasive in large systems. Unexpected interference among threads often results in “Heisenbugs” that are extremely difficult to reproduce and eliminate. We have implemented a tool called CHESS for finding and reproducing such bugs. When attached to a program, CHESS takes control of th ..."
Abstract
-
Cited by 154 (12 self)
- Add to MetaCart
Concurrency is pervasive in large systems. Unexpected interference among threads often results in “Heisenbugs” that are extremely difficult to reproduce and eliminate. We have implemented a tool called CHESS for finding and reproducing such bugs. When attached to a program, CHESS takes control of thread scheduling and uses efficient search techniques to drive the program through possible thread interleavings. This systematic exploration of program behavior enables CHESS to quickly uncover bugs that might otherwise have remained hidden for a long time. For each bug, CHESS consistently reproduces an erroneous execution manifesting the bug, thereby making it significantly easier to debug the problem. CHESS scales to large concurrent programs and has found numerous bugs in existing systems that had been tested extensively prior to being tested by CHESS. CHESS has been integrated into the test frameworks of many code bases inside Microsoft and is used by testers on a daily basis. 1
Output-deterministic replay for multicore debugging
, 2009
"... Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur hi ..."
Abstract
-
Cited by 127 (6 self)
- Add to MetaCart
(Show Context)
Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur high overheads, rely on non-standard multiprocessor hardware, or fail to reliably reproduce executions. Their primary stumbling block is data races – a source of nondeterminism that must be captured if executions are to be faithfully reproduced. In this paper, we present ODR–a software-only replay system that reproduces bugs and provides low-overhead multiprocessor recording. The key observation behind ODR is that, for debugging purposes, a replay system does not need to generate a high-fidelity replica of the original execution. Instead, it suffices to produce any execution that exhibits the same outputs as the original. Guided by this observation, ODR relaxes its fidelity guarantees to avoid the problem of reproducing data-races altogether. The result is a system that replays real multiprocessor applications, such as Apache, MySQL, and the Java Virtual Machine, and provides low record-mode overhead. Categories andSubjectDescriptors D.2.5 [Testing and Debugging]: Debugging aids
Kendo: Efficient Deterministic Multithreading in Software
- In ASPLOS
, 2009
"... Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for par-allel programmers by hindering their ability to create parallel applic ..."
Abstract
-
Cited by 116 (0 self)
- Add to MetaCart
(Show Context)
Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for par-allel programmers by hindering their ability to create parallel applications with repeatable results. As a consequence, par-allel applications are significantly harder to debug, test, and maintain than sequential programs. This paper introduces Kendo: a new software-only sys-tem that provides deterministic multithreading of parallel applications. Kendo enforces a deterministic interleaving of lock acquisitions and specially declared non-protected reads through a novel dynamically load-balanced deterministic scheduling algorithm. The algorithm tracks the progress of each thread using performance counters to construct a deterministic logical time that is used to compute an inter-leaving of shared data accesses that is both deterministic and provides good load balancing. Kendo can run on to-day’s commodity hardware while incurring only a modest performance cost. Experimental results on the SPLASH-2 applications yield a geometric mean overhead of only 16% when running on 4 processors. This low overhead makes it possible to benefit from Kendo even after an application is deployed. Programmers can start using Kendo today to pro-gram parallel applications that are easier to develop, debug, and test.
PRES: Probabilistic Replay with Execution Sketching on Multiprocessors
"... Bug reproduction is critically important for diagnosing a production-run failure. Unfortunately, reproducing a concurrency bug on multi-processors (e.g., multi-core) is challenging. Previous techniques either incur large overhead or require new non-trivial hardware extensions. This paper proposes a ..."
Abstract
-
Cited by 104 (7 self)
- Add to MetaCart
(Show Context)
Bug reproduction is critically important for diagnosing a production-run failure. Unfortunately, reproducing a concurrency bug on multi-processors (e.g., multi-core) is challenging. Previous techniques either incur large overhead or require new non-trivial hardware extensions. This paper proposes a novel technique called PRES (probabilistic replay via execution sketching) to help reproduce concurrency bugs on multi-processors. It relaxes the past (perhaps idealistic) objective of “reproducing the bug on the first replay attempt ” to significantly lower production-run recording overhead. This is achieved by (1) recording only partial execution information (referred to as “sketches”) during the production run, and (2) relying on an intelligent replayer during diagnosis time (when performance is less critical) to systematically explore the unrecorded non-deterministic
Paranoid android: versatile protection for smartphones.
- In Proc. ACSAC,
, 2010
"... ABSTRACT Smartphone usage has been continuously increasing in recent years. Moreover, smartphones are often used for privacysensitive tasks, becoming highly valuable targets for attackers. They are also quite different from PCs, so that PCoriented solutions are not always applicable, or do not offe ..."
Abstract
-
Cited by 86 (7 self)
- Add to MetaCart
(Show Context)
ABSTRACT Smartphone usage has been continuously increasing in recent years. Moreover, smartphones are often used for privacysensitive tasks, becoming highly valuable targets for attackers. They are also quite different from PCs, so that PCoriented solutions are not always applicable, or do not offer comprehensive security. We propose an alternative solution, where security checks are applied on remote security servers that host exact replicas of the phones in virtual environments. The servers are not subject to the same constraints, allowing us to apply multiple detection techniques simultaneously. We implemented a prototype of this security model for Android phones, and show that it is both practical and scalable: we generate no more than 2KiB/s and 64B/s of trace data for high-loads and idle operation respectively, and are able to support more than a hundred replicas running on a single server.
Rerun: Exploiting Episodes for Lightweight Memory Race Recording
"... Multiprocessor deterministic replay has many potential uses in the era of multicore computing, including enhanced debugging, fault tolerance, and intrusion detection. While sources of nondeterminism in a uniprocessor can be recorded efficiently in software, it seems likely that hardware support will ..."
Abstract
-
Cited by 81 (0 self)
- Add to MetaCart
(Show Context)
Multiprocessor deterministic replay has many potential uses in the era of multicore computing, including enhanced debugging, fault tolerance, and intrusion detection. While sources of nondeterminism in a uniprocessor can be recorded efficiently in software, it seems likely that hardware support will be needed in a multiprocessor environment where the outcome of memory races must also be recorded. We develop a memory race recording mechanism, called Rerun, that uses small hardware state (~166 bytes/core), writes a small race log (~4 bytes/kiloinstruction), and operates well as the number of cores per system scales (e.g., to 16 cores). Rerun exploits the dual of conventional wisdom in race recording: Rather than record information about individual memory accesses that conflict, we record how long a thread executes without conflicting with other threads. In particular, Rerun passively creates atomic episodes. Each episode is a dynamic instruction sequence that a thread happens to execute without interacting with other threads. Rerun uses Lamport Clocks to order episodes and enable replay of an equivalent execution. 1.
Execution Synthesis: A Technique for Automated Software Debugging
- In Proceedings of Eurosys
, 2010
"... Debugging real systems is hard, requires deep knowledge of the code, and is time-consuming. Bug reports rarely provide sufficient information, thus forcing developers to turn into detectives searching for an explanation of how the program could have arrived at the reported failure point. Execution s ..."
Abstract
-
Cited by 64 (0 self)
- Add to MetaCart
(Show Context)
Debugging real systems is hard, requires deep knowledge of the code, and is time-consuming. Bug reports rarely provide sufficient information, thus forcing developers to turn into detectives searching for an explanation of how the program could have arrived at the reported failure point. Execution synthesis is a technique for automating this de-tective work: given a program and a bug report, it automat-ically produces an execution of the program that leads to the reported bug symptoms. Using a combination of static analysis and symbolic execution, it “synthesizes ” a thread schedule and various required program inputs that cause the bug to manifest. The synthesized execution can be played back deterministically in a regular debugger, like gdb. This is particularly useful in debugging concurrency bugs. Our technique requires no runtime tracing or program modifications, thus incurring no runtime overhead and being practical for use in production systems. We evaluate ESD— a debugger based on execution synthesis—on popular soft-ware (e.g., the SQLite database,
SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
"... Computer systems often fail due to many factors such as software bugs or administrator errors. Diagnosing such production run failures is an important but challenging task since it is difficult to reproduce them in house due to various reasons: (1) unavailability of users ’ inputs and file content d ..."
Abstract
-
Cited by 63 (8 self)
- Add to MetaCart
(Show Context)
Computer systems often fail due to many factors such as software bugs or administrator errors. Diagnosing such production run failures is an important but challenging task since it is difficult to reproduce them in house due to various reasons: (1) unavailability of users ’ inputs and file content due to privacy concerns; (2) difficulty in building the exact same execution environment; and (3) non-determinism of concurrent executions on multi-processors. Therefore, programmers often have to diagnose a production run failure based on logs collected back from customers and the corresponding source code. Such diagnosis requires expert knowledge and is also too time-consuming, tedious to narrow down root causes. To address this problem, we propose a tool, called Sher-Log, that analyzes source code by leveraging information provided by run-time logs to infer what must or may have happened during the failed production run. It requires neither re-execution of the program nor knowledge on the log’s semantics. It infers both control and data value information regarding to the failed execution. We evaluate SherLog with 8 representative real world software failures (6 software bugs and 2 configuration errors) from 7 applications including 3 servers. Information inferred by SherLog are very useful for programmers to diagnose these evaluated failures. Our results also show that SherLog can analyze large server applications such as Apache with thousands of logging messages within only 40 minutes.
DoublePlay: Parallelizing Sequential Logging and Replay
"... Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the need to reproduce the order or values read by sha ..."
Abstract
-
Cited by 62 (11 self)
- Add to MetaCart
(Show Context)
Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the need to reproduce the order or values read by shared memory operations performed by multiple threads. In this paper, we present DoublePlay, a new way to efficiently guarantee replay on commodity multiprocessors. Our key insight is that one can use the simpler and faster mechanisms of single-processor record and replay, yet still achieve the scalability offered by multiple cores, by using an additional execution to parallelize the record and replay of an application. DoublePlay timeslices multiple threads on a single processor, then runs multiple time intervals (epochs) of the program concurrently on separate processors. This strategy, which we call uniparallelism, makes logging much easier because each epoch runs on a single processor (so threads in an epoch never simultaneously access the same memory) and different epochs operate on different copies of the memory. Thus, rather than logging the order of shared-memory accesses, we need only log the order in which threads in an epoch are timesliced on the processor. DoublePlay runs an additional execution of the program on multiple processors to generate checkpoints so that epochs run in parallel. We evaluate DoublePlay on a variety of client, server, and scientific parallel benchmarks; with spare cores, DoublePlay reduces logging overhead to an average of 15 % with two worker threads and 28 % with four threads.
Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism
"... Deterministic replay systems record and reproduce the execution of a hardware or software system. While it is well known how to replay uniprocessor systems, replaying shared memory multiprocessor systems at low overhead on commodity hardware is still an open problem. This paper presents Respec, a ne ..."
Abstract
-
Cited by 53 (10 self)
- Add to MetaCart
(Show Context)
Deterministic replay systems record and reproduce the execution of a hardware or software system. While it is well known how to replay uniprocessor systems, replaying shared memory multiprocessor systems at low overhead on commodity hardware is still an open problem. This paper presents Respec, a new way to support deterministic replay of shared memory multithreaded programs on commodity multiprocessor hardware. Respec targets online replay in which the recorded and replayed processes execute concurrently. Respec uses two strategies to reduce overhead while still ensuring correctness: speculative logging and externally deterministic replay. Speculative logging optimistically logs less information about shared memory dependencies than is needed to guarantee deterministic replay, then recovers and retries if the replayed process diverges from the recorded process. Externally deterministic replay relaxes the degree to which the two executions must match by requiring only their system output and final program states match. We show that the combination of these two techniques results in low recording and replay overhead for the common case of datarace-free execution intervals and still ensures correct replay for execution intervals that have data races. We modified the Linux kernel to implement our techniques. Our software system adds on average about 18 % overhead to the execution time for recording and replaying programs with two threads and 55 % overhead for programs with four threads.