Results 1 - 10
of
36
Automated atomicity-violation fixing
- In PLDI
, 2011
"... Fixing software bugs has always been an important and timeconsuming process in software development. Fixing concurrency bugs has become especially critical in the multicore era. However, fixing concurrency bugs is challenging, in part due to nondeterministic failures and tricky parallel reasoning. B ..."
Abstract
-
Cited by 51 (5 self)
- Add to MetaCart
(Show Context)
Fixing software bugs has always been an important and timeconsuming process in software development. Fixing concurrency bugs has become especially critical in the multicore era. However, fixing concurrency bugs is challenging, in part due to nondeterministic failures and tricky parallel reasoning. Beyond correctly fixing the original problem in the software, a good patch should also avoid introducing new bugs, degrading performance unnecessarily, or damaging software readability. Existing tools cannot automate the whole fixing process and provide good-quality patches. We present AFix, a tool that automates the whole process of fixing one common type of concurrency bug: single-variable atomicity violations. AFix starts from the bug reports of existing bugdetection tools. It augments these with static analysis to construct a suitable patch for each bug report. It further tries to combine the patches of multiple bugs for better performance and code readability. Finally, AFix’s run-time component provides testing customized for each patch. Our evaluation shows that patches automatically generated by AFix correctly eliminate six out of eight real-world bugs and significantly decrease the failure probability in the other two cases. AFix patches never introduce new bugs and usually have similar performance to manually-designed patches.
Automated Concurrency-Bug Fixing
"... Concurrency bugs are widespread in multithreaded programs. Fixing them is time-consuming and error-prone. We present CFix, a system that automates the repair of concurrency bugs. CFix works with a wide variety of concurrency-bug detectors. For each failure-inducing interleaving reported by a bug det ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
(Show Context)
Concurrency bugs are widespread in multithreaded programs. Fixing them is time-consuming and error-prone. We present CFix, a system that automates the repair of concurrency bugs. CFix works with a wide variety of concurrency-bug detectors. For each failure-inducing interleaving reported by a bug detector, CFix first determines a combination of mutual-exclusion and order relationships that, once enforced, can prevent the buggy interleaving. CFix then uses static analysis and testing to determine where to insert what synchronization operations to force the desired mutual-exclusion and order relationships, with a best effort to avoid deadlocks and excessive performance losses. CFix also simplifies its own patches by merging fixes for related bugs. Evaluation using four different types of bug detectors and thirteen real-world concurrency-bug cases shows that CFix can successfully patch these cases without causing deadlocks or excessive performance degradation. Patches automatically generated by CFix are of similar quality to those manually written by developers.
permission. Carat: Collaborative Energy Diagnosis for Mobile Devices
, 2013
"... personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires pri ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
(Show Context)
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific
CLAP: Recording Local Executions to Reproduce Concurrency Failures
"... We present CLAP, a new technique to reproduce concurrency bugs. CLAP has two key steps. First, it logs thread local execution paths at runtime. Second, offline, it computes memory dependencies that accord with the logged execution and are able to reproduce the observed bug. The second step works by ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
(Show Context)
We present CLAP, a new technique to reproduce concurrency bugs. CLAP has two key steps. First, it logs thread local execution paths at runtime. Second, offline, it computes memory dependencies that accord with the logged execution and are able to reproduce the observed bug. The second step works by combining constraints from the thread paths and constraints based on a memory model, and computing an execution with a constraint solver. CLAP has four major advantages. First, logging purely local execution of each thread is substantially cheaper than logging memory interactions, which enables CLAP to be efficient compared to previous approaches. Second, our logging does not require any synchronization and hence with no added memory barriers or fences; this minimizes perturbation and missed bugs due to extra synchronizations foreclosing certain racy behaviors. Third, since it uses no synchronization, we extend CLAP to work on a range of relaxed memory models, such as TSO and PSO, in addition to sequential consistency. Fourth, CLAP can compute a much simpler execution than the original one, that reveals the bug with minimal thread context switches. To mitigate the scalability issues, we also present an approach to parallelize constraint solving, which theoretically scales our technique to programs with arbitrary execution length. Experimental results on a variety of multithreaded benchmarks and real world concurrent applications validate these advantages by showing that our technique is effective in reproducing concurrency bugs even under relaxed memory models; furthermore, it is significantly more efficient than a state-of-the-art technique that records shared memory dependencies, reducing execution time overhead by 45 % and log size by 88 % on average. 1.
Isolating and Understanding Concurrency Errors Using Reconstructed Execution Fragments
"... In this paper we propose Recon, a new general approach to concurrency debugging. Recon goes beyond just detecting bugs, it also presents to the programmer short fragments of buggy execution schedules that illustrate how and why bugs happened. These fragments, called reconstructions, are inferred fro ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
(Show Context)
In this paper we propose Recon, a new general approach to concurrency debugging. Recon goes beyond just detecting bugs, it also presents to the programmer short fragments of buggy execution schedules that illustrate how and why bugs happened. These fragments, called reconstructions, are inferred from inter-thread communication surrounding the root cause of a bug and significantly simplify the process of understanding bugs. The key idea in Recon is to monitor executions and build graphs that encode inter-thread communication with enough context information to build reconstructions. Recon leverages reconstructions built from multiple application executions and uses machine learning to identify which ones illustrate the root cause of a bug. Recon’s approach is general because it does not rely on heuristics specific to any type of bug, application, or programming-model. Therefore, it is able to deal with single- and multiple-variable concurrency bugs regardless of their type (e.g., atomicity violation, ordering, etc). To make graph collection efficient, Recon employs selective monitoring and allows metadata information to be imprecise without compromising accuracy. With these optimizations, Recon’s graph collection imposes overheads typically between 5x and 20x for both C/C++ and Java programs, with overheads as low as 13 % in our experiments. We evaluate Recon with buggy applications, and show it produces reconstructions that include all code points involved in bugs ’ causes, and presents them in an accurate order. We include a case study of understanding and fixing a previously unresolved bug to showcase Recon’s effectiveness.
Carat: Collaborative energy debugging for mobile devices
- In HotDep
, 2012
"... We aim to detect and diagnose code misbehavior that wastes energy, which we call energy bugs. This paper describes a method and implementation, called Carat, for performing such diagnosis on mobile devices. Carat takes a collaborative, black-box approach. A noninvasive client app sends intermittent, ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
We aim to detect and diagnose code misbehavior that wastes energy, which we call energy bugs. This paper describes a method and implementation, called Carat, for performing such diagnosis on mobile devices. Carat takes a collaborative, black-box approach. A noninvasive client app sends intermittent, coarse-grained measurements to a server, which identifies correlations between higher expected energy use and client properties like the running apps, device model, and operating system. Carat successfully detected all energy bugs in a controlled experiment and, during a deployment to 883 users, identified 5434 instances of apps exhibiting buggy behavior in the wild. 1
Cooperative Empirical Failure Avoidance for Multithreaded Programs
"... Concurrency errors in multithreaded programs are difficult to find and fix. We propose Aviso, a system for avoiding scheduledependent failures. Aviso monitors events during a program’s execution and, when a failure occurs, records a history of events from the failing execution. It uses this history ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
Concurrency errors in multithreaded programs are difficult to find and fix. We propose Aviso, a system for avoiding scheduledependent failures. Aviso monitors events during a program’s execution and, when a failure occurs, records a history of events from the failing execution. It uses this history to generate schedule constraints that perturb the order of events in the execution and thereby avoids schedules that lead to failures in future program executions. Aviso leverages scenarios where many instances of the same software run, using a statistical model of program behavior and experimentation to determine which constraints most effectively avoid failures. After implementing Aviso, we showed that it decreased failure rates for a variety of important desktop, server, and cloud applications by orders of magnitude, with an average overhead of less than 20 % and, in some cases, as low as 5%. Categories and Subject Descriptors D.1.3 [Concurrent Programming]:
RaceMob: Crowdsourced Data Race Detection
- Proc. 24th ACM Symp. Operating Systems Principles (SOSP
"... Some of the worst concurrency problems in multi-threaded systems today are due to data races—these bugs can have messy consequences, and they are hard to diagnose and fix. To avoid the introduction of such bugs, system developers need discipline and good data race detectors; today, even if they have ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
Some of the worst concurrency problems in multi-threaded systems today are due to data races—these bugs can have messy consequences, and they are hard to diagnose and fix. To avoid the introduction of such bugs, system developers need discipline and good data race detectors; today, even if they have the former, they lack the latter. We present RaceMob, a new data race detector that has both low overhead and good accuracy. RaceMob starts by detecting potential races statically (hence it has few false negatives), and then dynamically validates whether these are true races (hence has few false positives). It achieves low runtime overhead and a high degree of realism by combining real-user crowdsourcing with a new on-demand dynamic data race validation technique. We evaluated RaceMob on ten systems, including Apache, SQLite, and Memcached—it detects data races with higher accuracy than state-of-the-art detectors (both static and dynamic), and RaceMob users experience an average runtime overhead of about 2%, which is orders of magnitude less than the overhead of modern dynamic data race detectors. To the best of our knowledge, Race-Mob is the first data race detector that can both be used always-on in production and provides good accuracy. 1
Production-Run Software Failure Diagnosis via Hardware Performance Counters
"... Sequential and concurrency bugs are widespread in deployed software. They cause severe failures and huge financial loss during production runs. Tools that diagnose production-run failures with low overhead are needed. The state-of-the-art diagnosis techniques use software instrumentation to sample p ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Sequential and concurrency bugs are widespread in deployed software. They cause severe failures and huge financial loss during production runs. Tools that diagnose production-run failures with low overhead are needed. The state-of-the-art diagnosis techniques use software instrumentation to sample program properties at run time and use off-line statistical analysis to identify properties most correlated with failures. Although promising, these techniques suffer from high run-time overhead, which is sometimes over 100%, for concurrency-bug failure diagnosis and hence are not suitable for production-run usage. We present PBI, a system that uses existing hardware performance counters to diagnose production-run failures caused by sequential and concurrency bugs with low overhead. PBI is designed based on several key observations. First, a few widely supported performance counter events can reflect a wide variety of common software bugs and can be monitored by hardware with almost no overhead. Second, the counter overflow interrupt supported by existing hardware and operating systems provides a natural and effective mechanism to conduct event sampling at user level. Third, the noise and non-determinism in interrupt delivery complements well with statistical processing. We evaluate PBI using 13 real-world concurrency and sequential bugs from representative open-source server, client, and utility programs, and 10 bugs from a widely used software-testing benchmark. Quantitatively, PBI can effectively diagnose failures caused by these bugs with a small overhead that is never higher than 10 %. Qualitatively, PBI does not require any change to software and presents a novel use of existing hardware performance counters.
Statistical debugging for real-world performance problems
- In ACM SIGPLAN Conference on ObjectOriented Programming, Systems, Languages and Applications (OOPSLA
, 2014
"... Design and implementation defects that lead to inefficient computation widely exist in software. These defects are dif-ficult to avoid and discover. They lead to severe performance degradation and energy waste during production runs, and are becoming increasingly critical with the meager increase of ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Design and implementation defects that lead to inefficient computation widely exist in software. These defects are dif-ficult to avoid and discover. They lead to severe performance degradation and energy waste during production runs, and are becoming increasingly critical with the meager increase of single-core hardware performance and the increasing con-cerns about energy constraints. Effective tools that diagnose performance problems and point out the inefficiency root cause are sorely needed. The state of the art of performance diagnosis is pre-liminary. Profiling can identify the functions that consume the most computation resources, but can neither identify the ones that waste the most resources nor explain why. Performance-bug detectors can identify specific type of in-