Results 1 - 10
of
20
Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults
"... This paper is motivated by a simple observation: although recently developed BFT state machine replication protocols are quite fast, they don’t actually tolerate Byznatine faults very well. In particular a single faulty client or server in PBFT, Q/U, HQ, and Zyzzyva can render each of these systems ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This paper is motivated by a simple observation: although recently developed BFT state machine replication protocols are quite fast, they don’t actually tolerate Byznatine faults very well. In particular a single faulty client or server in PBFT, Q/U, HQ, and Zyzzyva can render each of these systems effectively unusable for many applications by reducing their throughput by two orders of magnitude or more, from thousands of requests per second to fewer than 10 requests per second. The problem comes not because these systems fail to meet the guarantees they promise, but because the guarantees they promise are insufficient for the high assurance systems for which BFT techniques are likely to be of most interest. In this paper, we describe Aardvark, a new BFT replication protocol that guarantees good performance during uncivil periods, when the network is reliable but when up to f servers and any number of clients are faulty. Aardvark gives up some performance compared to protocols that focus on optimizing for the best case, but Aardvark’s peak throughput of 40527 requests per second seems sufficient for many applications. Because Aardvark is less aggressively tuned for the fault free case, it is guaranteed to remain within a constant factor of 40527 when faults occur. We observe throughputs of between 11706 and 40527 for a broad range of injected faults.
BFT Protocols Under Fire
"... Much recent work on Byzantine state machine replication focuses on protocols with improved performance under benign conditions (LANs, homogeneous replicas, limited crash faults), with relatively little evaluation under typical, practical conditions (WAN delays, packet loss, transient disconnection, ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Much recent work on Byzantine state machine replication focuses on protocols with improved performance under benign conditions (LANs, homogeneous replicas, limited crash faults), with relatively little evaluation under typical, practical conditions (WAN delays, packet loss, transient disconnection, shared resources). This makes it difficult for system designers to choose the appropriate protocol for a real target deployment. Moreover, most protocol implementations differ in their choice of runtime environment, crypto library, and transport, hindering direct protocol comparisons even under similar conditions. We present a simulation environment for such protocols that combines a declarative networking system with a robust network simulator. Protocols can be rapidly implemented from pseudocode in the high-level declarative language of the former, while network conditions and (measured) costs of communication packages and crypto primitives can be plugged into the latter. We show that the resulting simulator faithfully predicts the performance of native protocol implementations, both as published and as measured in our local network. We use the simulator to compare representative protocols under identical conditions and rapidly explore the effects of changes in the costs of crypto operations, workloads, network conditions and faults. For example, we show that Zyzzyva outperforms protocols like PBFT and Q/U under most but not all conditions, indicating that one-size-fits-all protocols may be hard if not impossible to design in practice. 1
Refined quorum systems
- In Proceedings of the 26th annual ACM symposium on Principles of distributed computing
, 2007
"... Abstract. It is considered good distributed computing practice to devise object implementations that tolerate contention, periods of asynchrony and a large number of failures, but perform fast if few failures occur, the system is synchronous and there is no contention. This paper initiates the first ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Abstract. It is considered good distributed computing practice to devise object implementations that tolerate contention, periods of asynchrony and a large number of failures, but perform fast if few failures occur, the system is synchronous and there is no contention. This paper initiates the first study of quorum systems that help design such implementations by encompassing, at the same time, optimal resilience, as well as optimal best-case complexity. We introduce the notion of a refined quorum system (RQS) of some set S as a set of three classes of subsets (quorums) of S: first class quorums are also second class quorums, themselves being also third class quorums. First class quorums have large intersections with all other quorums, second class quorums typically have smaller intersections with those of the third class, the latter simply correspond to traditional quorums. Intuitively, under uncontended and synchronous conditions, a distributed object implementation would expedite an operation if a quorum of the first class is accessed, then degrade gracefully depending on whether a quorum of the second or the third class is accessed. Our notion of refined quorum system is devised assuming a general adversary structure, and this basically allows algorithms relying on refined quorum systems to relax the assumption of independent process failures, often questioned in practice.
Trusting the Cloud
"... More and more users store data in “clouds ” that are accessed remotely over the Internet. We survey well-known cryptographic tools for providing integrity and consistency for data stored in clouds and discuss recent research in cryptography and distributed computing addressing these problems. Storin ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
More and more users store data in “clouds ” that are accessed remotely over the Internet. We survey well-known cryptographic tools for providing integrity and consistency for data stored in clouds and discuss recent research in cryptography and distributed computing addressing these problems. Storing data in clouds Many providers now offer a wide variety of flexible online data storage services, ranging from passive ones, such as online archiving, to active ones, such as collaboration and social networking. They have become known as computing and storage “clouds. ” Such clouds allow users to abandon local storage and use online alternatives, such as Amazon S3, Nirvanix CloudNAS, or Microsoft SkyDrive. Some cloud providers utilize the fact that online storage can be accessed from any location connected to the Internet, and offer additional functionality; for example, Apple MobileMe allows users to synchronize common applications that run on multiples devices. Clouds also offer computation resources, such as Amazon EC2, which can significantly reduce the cost of maintaining such resources locally. Finally, online collaboration tools, such as Google Apps or versioning repositories for source code, make it easy to collaborate with colleagues across organizations and countries, as practiced by the authors of this paper. What can go wrong? Although the advantages of using clouds are unarguable, there are many risks involved with releasing control over your data. One concern that many users are aware of is loss of privacy. Nevertheless, the popularity of social networks and online data sharing repositories suggests that many users are willing to forfeit privacy,
Depot: Cloud storage with minimal trust
"... Abstract: We describe the design, implementation, and evaluation of Depot, a cloud storage system that minimizes trust assumptions. Depot assumes less than any prior system about the correct operation of participating hosts—Depot tolerates Byzantine failures, including malicious or buggy behavior, b ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract: We describe the design, implementation, and evaluation of Depot, a cloud storage system that minimizes trust assumptions. Depot assumes less than any prior system about the correct operation of participating hosts—Depot tolerates Byzantine failures, including malicious or buggy behavior, by any number of clients or servers—yet provides safety and availability guarantees (on consistency, staleness, durability, and recovery) that are useful. The key to safeguarding safety without sacrificing availability (and vice versa) in this environment is to join forks: participants (clients and servers) that observe inconsistent behaviors by other participants can join their forked view into a single view that is consistent with what each individually observed. Our experimental evaluation suggests that the costs of protecting the system are modest. Depot adds a few hundred bytes of metadata to each update and each stored object, and requires hashing and signing each update. 1
Venus: Verification for untrusted cloud storage
- In Proc. ACM CCSW
, 2010
"... This paper presents Venus, a service for securing user interaction with untrusted cloud storage. Specifically, Venus guarantees integrity and consistency for applications accessing a key-based object store service, without requiring trusted components or changes to the storage provider. Venus comple ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper presents Venus, a service for securing user interaction with untrusted cloud storage. Specifically, Venus guarantees integrity and consistency for applications accessing a key-based object store service, without requiring trusted components or changes to the storage provider. Venus completes all operations optimistically, guaranteeing data integrity. It then verifies operation consistency and notifies the application. Whenever either integrity or consistency is violated, Venus alerts the application. We implemented Venus and evaluated it with Amazon S3 commodity storage service. The evaluation shows that it adds no noticeable overhead to storage operations.
Tolerating latency in replicated state machines through client speculation
"... Replicated state machines are an important and widelystudied methodology for tolerating a wide range of faults. Unfortunately, while replicas should be distributed geographically for maximum fault tolerance, current replicated state machine protocols tend to magnify the effects of high network laten ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Replicated state machines are an important and widelystudied methodology for tolerating a wide range of faults. Unfortunately, while replicas should be distributed geographically for maximum fault tolerance, current replicated state machine protocols tend to magnify the effects of high network latencies caused by geographic distribution. In this paper, we examine how to use speculative execution at the clients of a replicated service to reduce the impact of network and protocol latency. We first give design principles for using client speculation with replicated services, such as generating early replies and prioritizing throughput over latency. We then describe a mechanism that allows speculative clients to make new requests through replica-resolved speculation and predicated writes. We implement a detailed case study that applies this approach to a standard Byzantine fault tolerant protocol (PBFT) for replicated NFS and counter services. Client speculation trades in 18 % maximum throughput to decrease the effective latency under light workloads, letting us speed up run time on singleclient micro-benchmarks 1.08–19 × when the client is co-located with the primary. On a macro-benchmark, reduced latency gives the client a speedup of up to 5×. 1
Tiered Fault Tolerance for Long-Term Integrity
"... Fault-tolerant services typically make assumptions about the type and maximum number of faults that they can tolerate while providing their correctness guarantees; when such a fault threshold is violated, correctness is lost. We revisit the notion of fault thresholds in the context of long-term arch ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Fault-tolerant services typically make assumptions about the type and maximum number of faults that they can tolerate while providing their correctness guarantees; when such a fault threshold is violated, correctness is lost. We revisit the notion of fault thresholds in the context of long-term archival storage. We observe that fault thresholds are inevitably violated in longterm services, making traditional fault tolerance inapplicable to the long-term. In this work, we undertake a “reallocation of the fault-tolerance budget ” of a long-term service. We split the service into service pieces, each of which can tolerate a different number of faults without failing (and without causing the whole service to fail): each piece can be either in a critical trusted fault tier, which must never fail, or an untrusted fault tier, which can fail massively and often, or other fault tiers in between. By carefully engineering the split of a long-term service into pieces that must obey distinct fault thresholds, we can prolong its inevitable demise. We demonstrate this approach with Bonafide, a long-term key-value store that, unlike all similar systems proposed in the literature, maintains integrity in the face of Byzantine faults without requiring self-certified data. We describe the notion of tiered fault tolerance, the design, implementation, and experimental evaluation of Bonafide, and argue that our approach is a practical yet significant improvement over the state of the art for long-term services. 1
Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults
"... This paper argues for a new approach to building Byzantine fault tolerant systems. We observe that although recently developed BFT state machine replication protocols are quite fast, they don’t actually tolerate Byzantine faults very well: a single faulty client or server is capable of rendering PB ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper argues for a new approach to building Byzantine fault tolerant systems. We observe that although recently developed BFT state machine replication protocols are quite fast, they don’t actually tolerate Byzantine faults very well: a single faulty client or server is capable of rendering PBFT, Q/U, HQ, and Zyzzyva virtually unusable. In this paper, we (1) demonstrate that existing protocols are dangerously fragile, (2) define a set of principles for constructing BFT services that remain useful even when Byzantine faults occur, and (3) apply these new principles to construct a new protocol, Aardvark, which can achieve peak performance within 25 % of that of the best existing protocol in our tests and which provides a significant fraction of that performance when the network is well behaved and up to f servers and any number of clients are faulty. We observe useful throughputs between 11706 and 38667 for a broad range of injected faults.
Prophecy: Using History for High-Throughput Fault Tolerance
- 7TH USENIX SYMPOSIUM ON NETWORK DESIGN AND IMPLEMENTATION (NSDI ’10)
, 2010
"... Byzantine fault-tolerant (BFT) replication has enjoyed a series of performance improvements, but remains costly due to its replicated work. We eliminate this cost for read-mostly workloads through Prophecy, a system that interposes itself between clients and any replicated service. At Prophecy’s cor ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Byzantine fault-tolerant (BFT) replication has enjoyed a series of performance improvements, but remains costly due to its replicated work. We eliminate this cost for read-mostly workloads through Prophecy, a system that interposes itself between clients and any replicated service. At Prophecy’s core is a trusted sketcher component, designed to extend the semi-trusted load balancer that mediates access to an Internet service. The sketcher performs fast, load-balanced reads when results are historically consistent, and slow, replicated reads otherwise. Despite its simplicity, Prophecy provides a new form of consistency called delay-once consistency. Along the way, we derive a distributed variant of Prophecy that achieves the same consistency but without any trusted components. A prototype implementation demonstrates Prophecy’s high throughput compared to BFT systems. We also describe and evaluate Prophecy’s ability to scale-out to support large replica groups or multiple replica groups. As Prophecy is most effective when state updates are rare, we finally present a measurement study of popular websites that demonstrates a large proportion of static data.

