Results 1 - 10
of
55
Speculative execution in a distributed file system
- ACM Trans. Comput. Syst
, 2006
"... Speculator provides Linux kernel support for speculative execution. It allows multiple processes to share speculative state by tracking causal dependencies propagated through interprocess communication. It guarantees correct execution by preventing speculative processes from externalizing output, e. ..."
Abstract
-
Cited by 49 (13 self)
- Add to MetaCart
Speculator provides Linux kernel support for speculative execution. It allows multiple processes to share speculative state by tracking causal dependencies propagated through interprocess communication. It guarantees correct execution by preventing speculative processes from externalizing output, e.g., sending a network message or writing to the screen, until the speculations on which that output depends have proven to be correct. Speculator improves the performance of distributed file systems by masking I/O latency and increasing I/O throughput. Rather than block during a remote operation, a file system predicts the operation’s result, then uses Speculator to checkpoint the state of the calling process and speculatively continue its execution based on the predicted result. If the prediction is correct, the checkpoint is discarded; if it is incorrect, the calling process is restored to the checkpoint, and the operation is retried. We have modified the client, server, and network protocol of two distributed file systems to use Speculator. For PostMark and Andrew-style benchmarks, speculative execution results in a factor of 2 performance improvement for NFS over local-area networks and an order of magnitude improvement over wide-area networks. For the same benchmarks, Speculator enables the Blue File System to provide the consistency of single-copy file semantics and the safety of synchronous I/O, yet still outperform current distributed file systems with weaker consistency and safety.
Rethink the sync
- In Proc. OSDI
, 2006
"... We introduce external synchrony, a new model for local file I/O that provides the reliability and simplicity of synchronous I/O, yet also closely approximates the performance of asynchronous I/O. An external observer cannot distinguish the output of a computer with an externally synchronous file sys ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
We introduce external synchrony, a new model for local file I/O that provides the reliability and simplicity of synchronous I/O, yet also closely approximates the performance of asynchronous I/O. An external observer cannot distinguish the output of a computer with an externally synchronous file system from the output of a computer with a synchronous file system. No application modification is required to use an externally synchronous file system: in fact, application developers can program to the simpler synchronous I/O abstraction and still receive excellent performance. We have implemented an externally synchronous file system for Linux, called xsyncfs. Xsyncfs provides the same durability and ordering guarantees as those provided by a synchronously mounted ext3 file system. Yet, even for I/O-intensive benchmarks, xsyncfs performance is within 7 % of ext3 mounted asynchronously. Compared to ext3 mounted synchronously, xsyncfs is up to two orders of magnitude faster. 1
Exterminator: Automatically correcting memory errors with high probability
- In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM
, 2007
"... Programs written in C and C++ are susceptible to memory errors, including buffer overflows and dangling pointers. These errors, which can lead to crashes, erroneous execution, and security vulnerabilities, are notoriously costly to repair. Tracking down their location in the source code is difficult ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
Programs written in C and C++ are susceptible to memory errors, including buffer overflows and dangling pointers. These errors, which can lead to crashes, erroneous execution, and security vulnerabilities, are notoriously costly to repair. Tracking down their location in the source code is difficult, even when the full memory state of the program is available. Once the errors are finally found, fixing them remains challenging: even for critical security-sensitive bugs, the average time between initial reports and the issuance of a patch is nearly 1 month. We present Exterminator, a system that automatically corrects heap-based memory errors without programmer intervention. Exterminator exploits randomization to pinpoint errors with high precision. From this information, Exterminator derives runtime patches that fix these errors both in current and subsequent executions. In addition, Exterminator enables collaborative bug correction by merging patches generated by multiple users. We present analytical and empirical results that demonstrate Exterminator’s effectiveness at detecting and correcting both injected and real faults. 1.
Cooperative Bug Isolation
, 2004
"... Statistical debugging uses lightweight instrumentation and statistical models to identify program behaviors that are strongly predictive of failure. However, most software is mostly correct; nearly all monitored behaviors are poor predictors of failure. We propose an adaptive monitoring strategy tha ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
Statistical debugging uses lightweight instrumentation and statistical models to identify program behaviors that are strongly predictive of failure. However, most software is mostly correct; nearly all monitored behaviors are poor predictors of failure. We propose an adaptive monitoring strategy that mitigates the overhead associated with monitoring poor failure predictors. We begin by monitoring a small portion of the program, then automatically refine instrumentation over time to zero in on bugs. We formulate this approach as a search on the control-dependence graph of the program. We present and evaluate various heuristics that can be used for this search. We also discuss the construction of a binary instrumentor for incorporating the feedback loop into post-deployment monitoring. Performance measurements show that adaptive bug isolation yields an average performance overhead of 1 % for a class of large applications, as opposed to 87 % for realistic sampling-based instrumentation and 300 % for complete binary instrumentation.
Sweeper: A lightweight endto-end system for defending against fast worms
- InProceedings of 2007 EuroSys Conference
"... The vulnerabilities that plague computers cause endless grief to users. Slammer compromised millions of hosts in minutes; a hit-list worm would take under a second. Recently proposed techniques respond better than manual approaches, but require expensive instrumentation, which limits deployment. Alt ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
The vulnerabilities that plague computers cause endless grief to users. Slammer compromised millions of hosts in minutes; a hit-list worm would take under a second. Recently proposed techniques respond better than manual approaches, but require expensive instrumentation, which limits deployment. Although spreading “antibodies ” (e.g. signatures) ameliorates this limitation, hosts depending on antibodies are defenseless until inoculation; to the fastest hit-list worms this delay is crucial. Additionally, most recently proposed techniques cannot provide recovery to provide continuous service after an attack. We propose a novel solution called Sweeper that provides both fast and accurate post-attack analysis and efficient recovery with low normal execution overhead. Sweeper innovatively combines several techniques: (1) Sweeper uses lightweight monitoring techniques to detect a wide array of suspicious requests, providing a first level of defense. (2) By cleverly leveraging lightweight checkpointing, Sweeper postpones heavyweight monitoring until absolutely necessary — after an attack is detected. Sweeper rolls back and re-executes multiple times to dynamically apply heavyweight analysis techniques via dynamic binary instrumentation. Since only the execution involved in the attack is analyzed, the analysis is efficient, yet thorough. (3) Based on the analysis results, Sweeper automatically generates lowoverhead antibodies to prevent future attacks of the same vulnerability. (4) Finally, Sweeper again re-executes to perform fast recovery for continuous service. We implement Sweeper in a real system. Our experimental results with three real-world servers and four real security vulnerabilities show that Sweeper can detect an attack and generate antibodies in under 60 milliseconds. Our results also show that Sweeper imposes under 1 % overhead during normal execution, clearly suitable for widespread production deployment (especially since Sweeper also allows partial deployment). Finally, we analytically show that, for a
A Case for an Interleaving Constrained Shared-Memory Multi-Processor
- In ISCA
, 2009
"... Shared-memory multi-threaded programming is inherently more difficult than single-threaded programming. The main source of complexity is that, the threads of an application can interleave in so many different ways. To ensure correctness, a programmer has to test all possible thread interleavings, wh ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Shared-memory multi-threaded programming is inherently more difficult than single-threaded programming. The main source of complexity is that, the threads of an application can interleave in so many different ways. To ensure correctness, a programmer has to test all possible thread interleavings, which, however, is impractical. Many rare thread interleavings remain untested in production systems, and they are the root cause for a majority of concurrency bugs. We propose a shared-memory multiprocessor design that avoids untested interleavings to improve the correctness of a multi-threaded program. Since untested interleavings tend to occur infrequently at runtime, the performance cost of avoiding them is not high. We propose to encode the set of tested correct interleavings in a program’s binary executable using Predecessor Set (PSet) constraints. These constraints are efficiently enforced at runtime using processor support, which ensures that the runtime follows a tested interleaving. We analyze several bugs in open source applications such as MySQL, Apache, Mozilla, etc., and show that, by enforcing PSet constraints, we can avoid not only data races and atomicity violations, but also other forms of concurrency bugs.
CrystalBall: Predicting and Preventing Inconsistencies in Deployed Distributed Systems
"... We propose a new approach for developing and deploying distributed systems, in which nodes predict distributed consequences of their actions, and use this information to detect and avoid errors. Each node continuously runs a state exploration algorithm on a recent consistent snapshot of its neighbor ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
We propose a new approach for developing and deploying distributed systems, in which nodes predict distributed consequences of their actions, and use this information to detect and avoid errors. Each node continuously runs a state exploration algorithm on a recent consistent snapshot of its neighborhood and predicts possible future violations of specified safety properties. We describe a new state exploration algorithm, consequence prediction, which explores causally related chains of events that lead to property violation. This paper describes the design and implementation of this approach, termed CrystalBall. We evaluate CrystalBall on RandTree, BulletPrime, Paxos, and Chord distributed system implementations. We identified new bugs in mature Mace implementations of three systems. Furthermore, we show that if the bug is not corrected during system development, CrystalBall is effective in steering the execution away from inconsistent states at runtime.
Debugging in the (Very) Large: Ten Years of Implementation and Experience
"... Windows Error Reporting (WER) is a distributed system that automates the processing of error reports coming from an installed base of a billion machines. WER has collected billions of error reports in ten years of operation. It collects error data automatically and classifies errors into buckets, wh ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Windows Error Reporting (WER) is a distributed system that automates the processing of error reports coming from an installed base of a billion machines. WER has collected billions of error reports in ten years of operation. It collects error data automatically and classifies errors into buckets, which are used to prioritize developer effort and report fixes to users. WER uses a progressive approach to data collection, which minimizes overhead for most reports yet allows developers to collect detailed information when needed. WER takes advantage of its scale to use error statistics as a tool in debugging; this allows developers to isolate bugs that could not be found at smaller scale. WER has been designed for large scale: one pair of database servers can record all the errors that occur on all Windows computers worldwide.
Automatically Patching Errors in Deployed Software
, 2009
"... We present ClearView, a system for automatically patching errors in deployed software. ClearView works on stripped Windows x86 binaries without any need for source code, debugging information, or other external information, and without human intervention. ClearView (1) observes normal executions to ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
We present ClearView, a system for automatically patching errors in deployed software. ClearView works on stripped Windows x86 binaries without any need for source code, debugging information, or other external information, and without human intervention. ClearView (1) observes normal executions to learn invariants that characterize the application’s normal behavior, (2) uses error detectors to monitor the execution to detect failures, (3) identifies violations of learned invariants that occur during failed executions, (4) generates candidate repair patches that enforce selected invariants by changing the state or the flow of control to make the invariant true, and (5) observes the continued execution of patched applications to select the most successful patch. ClearView is designed to correct errors in software with high availability requirements. Aspects of ClearView that make it particularly
A Randomized Dynamic Program Analysis Technique for Detecting Real Deadlocks
"... We present a novel dynamic analysis technique that finds real deadlocks in multi-threaded programs. Our technique runs in two stages. In the first stage, we use an imprecise dynamic analysis technique to find potential deadlocks in a multi-threaded program by observing an execution of the program. I ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
We present a novel dynamic analysis technique that finds real deadlocks in multi-threaded programs. Our technique runs in two stages. In the first stage, we use an imprecise dynamic analysis technique to find potential deadlocks in a multi-threaded program by observing an execution of the program. In the second stage, we control a random thread scheduler to create the potential deadlocks with high probability. Unlike other dynamic analysis techniques, our approach has the advantage that it does not give any false warnings. We have implemented the technique in a prototype tool for Java, and have experimented on a number of large multi-threaded Java programs. We report a number of previously known and unknown real deadlocks that were found in these benchmarks.

