Results 1 - 10
of
17
Efficient Deterministic Multithreading through Schedule Relaxation
"... Deterministic multithreading (DMT) eliminates many pernicious software problems caused by nondeterminism. It works by constraining a program to repeat the same thread interleavings, or schedules, when given same input. Despite much recent research, it remains an open challenge to build both determin ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
Deterministic multithreading (DMT) eliminates many pernicious software problems caused by nondeterminism. It works by constraining a program to repeat the same thread interleavings, or schedules, when given same input. Despite much recent research, it remains an open challenge to build both deterministic and efficient DMT systems for general programs on commodity hardware. To deterministically resolve a data race, a DMT system must enforce a deterministic schedule of shared memory accesses, or memschedule, which can incur prohibitive overhead. By using schedules consisting only of synchronization operations, or sync-schedule, this overhead can be avoided. However, a sync-schedule is deterministic only for race-free programs, but most programs have races. Our key insight is that races tend to occur only within minor portions of an execution, and a dominant majority of the execution
Automated Concurrency-Bug Fixing
"... Concurrency bugs are widespread in multithreaded programs. Fixing them is time-consuming and error-prone. We present CFix, a system that automates the repair of concurrency bugs. CFix works with a wide variety of concurrency-bug detectors. For each failure-inducing interleaving reported by a bug det ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
(Show Context)
Concurrency bugs are widespread in multithreaded programs. Fixing them is time-consuming and error-prone. We present CFix, a system that automates the repair of concurrency bugs. CFix works with a wide variety of concurrency-bug detectors. For each failure-inducing interleaving reported by a bug detector, CFix first determines a combination of mutual-exclusion and order relationships that, once enforced, can prevent the buggy interleaving. CFix then uses static analysis and testing to determine where to insert what synchronization operations to force the desired mutual-exclusion and order relationships, with a best effort to avoid deadlocks and excessive performance losses. CFix also simplifies its own patches by merging fixes for related bugs. Evaluation using four different types of bug detectors and thirteen real-world concurrency-bug cases shows that CFix can successfully patch these cases without causing deadlocks or excessive performance degradation. Patches automatically generated by CFix are of similar quality to those manually written by developers.
Sound and Precise Analysis of Parallel Programs through Schedule Specialization
"... Parallel programs are known to be difficult to analyze. A key reason is that they typically have an enormous number of execution interleavings, or schedules. Static analysis over all schedules requires over-approximations, resulting in poor precision; dynamic analysis rarely covers more than a tiny ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
(Show Context)
Parallel programs are known to be difficult to analyze. A key reason is that they typically have an enormous number of execution interleavings, or schedules. Static analysis over all schedules requires over-approximations, resulting in poor precision; dynamic analysis rarely covers more than a tiny fraction of all schedules. We propose an approach called schedule specialization to analyze a parallel program over only a small set of schedules for precision, and then enforce these schedules at runtime for soundness of the static analysis results. We build a schedule specialization framework for C/C++ multithreaded programs that use Pthreads. Our framework avoids the need to modify every analysis to be schedule-aware by specializing a program into a simpler program based on a schedule, so that the resultant program can be analyzed with stock analyses for improved precision. Moreover, our framework provides a precise schedule-aware def-use analysis on memory locations, enabling us to build three highly precise analyses: an alias analyzer, a data-race detector, and a path slicer. Evaluation on 17 programs, including 2 real-world programs and 15 popular benchmarks, shows that analyses using our framework reduced may-aliases by 61.9%, false race reports by 69%, and path slices by 48.7%; and detected 7 unknown bugs in well-checked programs.
Multicore Acceleration of Priority-Based Schedulers for Concurrency Bug Detection
"... Testing multithreaded programs is difficult as threads can interleave in a nondeterministic fashion. Untested interleavings can cause failures, but testing all interleavings is infeasible. Many interleaving exploration strategies for bug detection have been proposed, but their relative effectiveness ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
(Show Context)
Testing multithreaded programs is difficult as threads can interleave in a nondeterministic fashion. Untested interleavings can cause failures, but testing all interleavings is infeasible. Many interleaving exploration strategies for bug detection have been proposed, but their relative effectiveness and performance remains unclear as they often lack publicly available implementations and have not been evaluated using common benchmarks. We describe NeedlePoint, an open-source framework that allows selection and comparison of a wide range of interleaving exploration policies for bug detection proposed by prior work. Our experience with NeedlePoint indicates that priority-based probabilistic concurrency testing (the PCT algorithm) finds bugs quickly, but it runs only one thread at a time, which destroys parallelism by serializing executions. To address this problem we propose a parallel version of the PCT algorithm (PPCT). We show that the new algorithm outperforms the original by a factor of 5 × when testing parallel programs on an eight-core machine. We formally prove that parallel PCT provides the same probabilistic coverage guarantees as PCT. Moreover, PPCT is the first algorithm that runs multiple threads while providing coverage guarantees. D.2.5 [Software Engineer-
Pervasive Detection of Process Races in Deployed Systems
"... Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. We present the first study of real process races and the first system designed to detect them. Our study of hundreds of applications shows that process races are numer ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. We present the first study of real process races and the first system designed to detect them. Our study of hundreds of applications shows that process races are numerous, difficult to debug, and a real threat to reliability. To address this problem, we created RACEPRO, a system for automatically detecting these races. RACEPRO checks deployed systems in-vivo by recording live executions then deterministically replaying and checking them later. This approach increases checking coverage beyond the configurations or executions covered by software vendors or beta testing sites. RACEPRO records multiple processes, detects races in the recording among system calls that may concurrently access shared kernel objects, then tries different execution orderings of such system calls to determine which races are harmful and result in failures. To simplify race detection, RACEPRO models under-specified system calls based on load and store micro-operations. To reduce false positives and negatives, RACEPRO uses a replay and go-live mechanism to distill harmful races from benign ones. We have implemented RACEPRO in Linux, shown that it imposes only modest recording overhead, and used it to detect a number of previously unknown bugs in real applications caused by process races. Categories and Subject Descriptors: D.2.4 [Software Engineering]: Software/Program Verification;
Fully Automatic and Precise Detection of Thread Safety Violations
"... Concurrent, object-oriented programs often use thread-safe library classes. Existing techniques for testing a thread-safe class either rely on tests using the class, on formal specifications, or on both. Unfortunately, these techniques often are not fully automatic as they involve the user in analyz ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
Concurrent, object-oriented programs often use thread-safe library classes. Existing techniques for testing a thread-safe class either rely on tests using the class, on formal specifications, or on both. Unfortunately, these techniques often are not fully automatic as they involve the user in analyzing the output. This paper presents an automatic testing technique that reveals concurrency bugs in supposedly thread-safe classes. The analysis requires as input only the class under test and reports only true positives. The key idea is to generate tests in which multiple threads call methods on a shared instance of the tested class. If a concurrent test exhibits an exception or a deadlock that cannot be triggered in any linearized execution of the test, the analysis reports a thread safety violation. The approach is easily applicable, because it is independent of handwritten tests and explicit specifications. The analysis finds 15 concurrency bugs in popular Java libraries, including two previously unknown bugs in the Java standard library.
ConAir: Featherweight concurrency bug recovery via singlethreaded idempotent execution
- In ASPLOS ’13
"... Many concurrency bugs are hidden in deployed software and cause severe failures for end-users. When they finally manifest and become known by developers, they are difficult to fix correctly. To support end-users, we need techniques that help software survive hidden concurrency bugs during production ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Many concurrency bugs are hidden in deployed software and cause severe failures for end-users. When they finally manifest and become known by developers, they are difficult to fix correctly. To support end-users, we need techniques that help software survive hidden concurrency bugs during production runs. To help developers, we need techniques that fix exposed concurrency bugs. The state-of-the-art techniques on concurrency-bug fixing and survival only satisfy a subset of four important properties: compatibility, correctness, generality, and performance. We aim to develop a system that satisfies all of these four properties. To achieve this goal, we leverage two observations: (1) rolling back a single thread is sufficient to recover from most concurrency-bug failures; (2) reexecuting an idempotent region, which requires no memory-state checkpoint, is sufficient to recover from many concurrency-bug failures. Our system ConAir includes a static analysis component that automatically identifies potential failure sites, a static analysis component that automatically identifies the idempotent code regions around every failure site, and a code-transformation component that inserts rollback-recovery code around the identified idempotent regions. We evaluated ConAir on 10 real-world concurrency bugs in widely used C/C++ open-source applications. These bugs cover different types of failure symptoms and root causes. Quantitatively, ConAir helps software survive failures caused by all of these bugs with negligible run-time overhead (<1%) and short recovery time. Qualitatively, ConAir can help recover from failures caused by unknown bugs. It guarantees that program semantics remain unchanged and requires no change to operating systems or hardware.
SyncChecker: Detecting Synchronization Errors between MPI Applications and Libraries
"... Abstract—While improving the performance, nonblocking communication is prone to synchronization errors between MPI applications and the underlying MPI libraries. Such synchronization error occurs in the following way. After initiating nonblocking communication and performing overlapped computation, ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract—While improving the performance, nonblocking communication is prone to synchronization errors between MPI applications and the underlying MPI libraries. Such synchronization error occurs in the following way. After initiating nonblocking communication and performing overlapped computation, the MPI application reuses the message buffer before the MPI library completes the use of the same buffer, which may lead to sending out corrupted message data or reading undefined message data. This paper presents a new method called SyncChecker to detect synchronization errors in MPI nonblocking communication. To examine whether the use of message buffers is well synchronized between the MPI application and the MPI library, SyncChecker first tracks relevant memory accesses in the MPI application and corresponding message send/receive operations in the MPI library. Then it checks whether the correct execution order between the MPI application and the MPI library is enforcedbytheMPIcompletioncheckroutines.Ifnot,SyncChecker reports the error with diagnostic information. To reduce runtime overhead, we propose three dynamic optimizations. We have implemented a prototype of SyncChecker on Linux and evaluated it with seven bug cases, i.e., five introduced by the original developers and two injected, in four different MPI applications. Our experiments show that SyncChecker detects all the evaluated synchronization errors and provides helpful diagnostic information. Moreover, our experiments with seven NAS Parallel Benchmarks demonstrate that SyncChecker incurs moderate runtime overhead, 1.3-9.5 times with an average of 5.2 times, making it suitable for software testing.
Bita: Coverage-Guided, Automatic Testing of Actor Programs
"... Abstract—Actor programs are concurrent programs where concurrent entities communicate asynchronously by exchanging messages. Testing actor programs is challenging because the order of message receives depends on the non-deterministic scheduler and because exploring all schedules does not scale to la ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Actor programs are concurrent programs where concurrent entities communicate asynchronously by exchanging messages. Testing actor programs is challenging because the order of message receives depends on the non-deterministic scheduler and because exploring all schedules does not scale to large programs. This paper presents Bita, a scalable, automatic approach for testing non-deterministic behavior of actor programs. The key idea is to generate and explore schedules that are likely to reveal concurrency bugs because these schedules increase the schedule coverage. We present three schedule coverage criteria for actor programs, an algorithm to generate feasible schedules that increase coverage, and a technique to force a program to comply with a schedule. Applying Bita to real-world actor programs implemented in Scala reveals eight previously unknown concurrency bugs, of which six have already been fixed by the developers. Furthermore, we show our approach to find bugs 122x faster than random scheduling, on average. I.
Understanding the Interleaving-Space Overlap across Inputs and Software Versions
"... In the multi-core era, it is critical to effectively test multithreaded software and expose concurrency bugs before software release. Previous work has made a lot of progress in exercising the interleaving space and detecting concurrency bugs under a given input. Unfortunately, since software often ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
In the multi-core era, it is critical to effectively test multithreaded software and expose concurrency bugs before software release. Previous work has made a lot of progress in exercising the interleaving space and detecting concurrency bugs under a given input. Unfortunately, since software often has many test inputs and constant pressure to release new versions, existing techniques are still too expensive in practice. In this position paper, we use open-source software to study how interleavings, data races and atomicity violations particularly, overlap across test inputs and software versions. We also conduct preliminary explorations to improve the testing efficiency of multi-threaded software by avoiding redundant analysis across inputs and software versions. 1.