Results 1 - 10
of
85
Zyzzyva: Speculative byzantine fault tolerance
- In Symposium on Operating Systems Principles (SOSP
, 2007
"... We present Zyzzyva, a protocol that uses speculation to reduce the cost and simplify the design of Byzantine fault tolerant state machine replication. In Zyzzyva, replicas respond to a client’s request without first running an expensive three-phase commit protocol to reach agreement on the order in ..."
Abstract
-
Cited by 78 (10 self)
- Add to MetaCart
We present Zyzzyva, a protocol that uses speculation to reduce the cost and simplify the design of Byzantine fault tolerant state machine replication. In Zyzzyva, replicas respond to a client’s request without first running an expensive three-phase commit protocol to reach agreement on the order in which the request must be processed. Instead, they optimistically adopt the order proposed by the primary and respond immediately to the client. Replicas can thus become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas converge on a single total ordering of requests, and only rely on responses that are consistent with this total order. This approach allows Zyzzyva to reduce replication overheads to near their theoretical minima.
Building a Reactive Immune System for Software Services
- In Proceedings of the USENIX Annual Technical Conference
, 2004
"... We propose a new approach for reacting to a wide variety of software failures, ranging from remotely exploitable vulnerabilities to more mundane bugs that cause abnormal program termination (e.g., illegal memory dereference). Our emphasis is in creating "self-healing" software that can protect itsel ..."
Abstract
-
Cited by 76 (25 self)
- Add to MetaCart
We propose a new approach for reacting to a wide variety of software failures, ranging from remotely exploitable vulnerabilities to more mundane bugs that cause abnormal program termination (e.g., illegal memory dereference). Our emphasis is in creating "self-healing" software that can protect itself against a recurring fault until a more comprehensive fix is applied.
HQ replication: A hybrid quorum protocol for byzantine fault tolerance
- In Proc. OSDI
, 2006
"... There are currently two approaches to providing Byzantine-fault-tolerant state machine replication: a replica-based approach, e.g., BFT, that uses communication between replicas to agree on a proposed ordering of requests, and a quorum-based approach, such as Q/U, in which clients contact replicas d ..."
Abstract
-
Cited by 69 (7 self)
- Add to MetaCart
There are currently two approaches to providing Byzantine-fault-tolerant state machine replication: a replica-based approach, e.g., BFT, that uses communication between replicas to agree on a proposed ordering of requests, and a quorum-based approach, such as Q/U, in which clients contact replicas directly to optimistically execute operations. Both approaches have shortcomings: the quadratic cost of inter-replica communication is unnecessary when there is no contention, and Q/U requires a large number of replicas and performs poorly under contention. We present HQ, a hybrid Byzantine-fault-tolerant state machine replication protocol that overcomes these problems. HQ employs a lightweight quorum-based protocol when there is no contention, but uses BFT to resolve contention when it arises. Furthermore, HQ uses only 3f +1 replicas to tolerate f faults, providing optimal resilience to node failures. We implemented a prototype of HQ, and we compare its performance to BFT and Q/U analytically and experimentally. Additionally, in this work we use a new implementation of BFT designed to scale as the number of faults increases. Our results show that both HQ and our new implementation of BFT scale as f increases; additionally our hybrid approach of using BFT to handle contention works well. 1
PeerReview: Practical accountability for distributed systems
"... We describe PeerReview, a system that provides accountability in distributed systems. PeerReview ensures that Byzantine faults whose effects are observed by a correct node are eventually detected and irrefutably linked to a faulty node. At the same time, PeerReview ensures that a correct node can al ..."
Abstract
-
Cited by 62 (8 self)
- Add to MetaCart
We describe PeerReview, a system that provides accountability in distributed systems. PeerReview ensures that Byzantine faults whose effects are observed by a correct node are eventually detected and irrefutably linked to a faulty node. At the same time, PeerReview ensures that a correct node can always defend itself against false accusations. These guarantees are particularly important for systems that span multiple administrative domains, which may not trust each other. PeerReview works by maintaining a secure record of the messages sent and received by each node. The record is used to automatically detect when a node’s behavior deviates from that of a given reference implementation, thus exposing faulty nodes. PeerReview is widely applicable: it only requires that a correct node’s actions are deterministic, that nodes can sign messages, and that each node is periodically checked by a correct node. We demonstrate that Peer-Review is practical by applying it to three different types of distributed systems: a network filesystem, a peer-to-peer system, and an overlay multicast system.
Attested append-only memory: Making adversaries stick to their word
- In Proc. of SOSP
, 2007
"... Researchers have made great strides in improving the fault tolerance of both centralized and replicated systems against arbitrary (Byzantine) faults. However, there are hard limits to how much can be done with entirely untrusted components; for example, replicated state machines cannot tolerate more ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
Researchers have made great strides in improving the fault tolerance of both centralized and replicated systems against arbitrary (Byzantine) faults. However, there are hard limits to how much can be done with entirely untrusted components; for example, replicated state machines cannot tolerate more than a third of their replica population being Byzantine. In this paper, we investigate how minimal trusted abstractions can push through these hard limits in practical ways. We propose Attested Append-Only Memory (A2M), a trusted system facility that is small, easy to implement and easy to verify formally. A2M provides the programming abstraction of a trusted log, which leads to protocol designs immune to equivocation – the ability of a faulty host to lie in different ways to different clients or servers – which is a common source of Byzantine headaches. Using A2M, we improve upon the state of the art in Byzantine-fault tolerant replicated state machines, producing A2M-enabled protocols (variants of Castro and Liskov’s PBFT) that remain correct (linearizable) and keep making progress (live) even when half the replicas are faulty, in contrast to the previous upper bound. We also present an A2M-enabled single-server shared storage protocol that guarantees linearizability despite server faults. We implement A2M and our protocols, evaluate them experimentally through micro- and macro-benchmarks, and argue that the improved fault tolerance is cost-effective for a broad range of uses, opening up new avenues for practical, more reliable services.
Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults
"... This paper is motivated by a simple observation: although recently developed BFT state machine replication protocols are quite fast, they don’t actually tolerate Byznatine faults very well. In particular a single faulty client or server in PBFT, Q/U, HQ, and Zyzzyva can render each of these systems ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This paper is motivated by a simple observation: although recently developed BFT state machine replication protocols are quite fast, they don’t actually tolerate Byznatine faults very well. In particular a single faulty client or server in PBFT, Q/U, HQ, and Zyzzyva can render each of these systems effectively unusable for many applications by reducing their throughput by two orders of magnitude or more, from thousands of requests per second to fewer than 10 requests per second. The problem comes not because these systems fail to meet the guarantees they promise, but because the guarantees they promise are insufficient for the high assurance systems for which BFT techniques are likely to be of most interest. In this paper, we describe Aardvark, a new BFT replication protocol that guarantees good performance during uncivil periods, when the network is reliable but when up to f servers and any number of clients are faulty. Aardvark gives up some performance compared to protocols that focus on optimizing for the best case, but Aardvark’s peak throughput of 40527 requests per second seems sufficient for many applications. Because Aardvark is less aggressively tuned for the fault free case, it is guaranteed to remain within a constant factor of 40527 when faults occur. We observe throughputs of between 11706 and 40527 for a broad range of injected faults.
High throughput Byzantine fault tolerance
- In DSN
, 2004
"... We argue for a simple change to Byzantine Fault Tolerant state machine replication libraries in order to provide high throughput. Traditional state machine replication based Byzantine fault tolerant (BFT) techniques provide high availability and security but fail to provide high throughput. This lim ..."
Abstract
-
Cited by 23 (9 self)
- Add to MetaCart
We argue for a simple change to Byzantine Fault Tolerant state machine replication libraries in order to provide high throughput. Traditional state machine replication based Byzantine fault tolerant (BFT) techniques provide high availability and security but fail to provide high throughput. This limitation stems from the fundamental assumption of generalized state machine replication techniques that all replicas execute requests sequentially in the same total order to ensure consistency across replicas. We propose a high throughput Byzantine fault tolerant architecture that uses application-specific information to identify and concurrently execute independent requests. Our architecture thus provides a general way to exploit application parallelism in order to provide high throughput without compromising correctness. Although this approach is extremely simple, it yields dramatic practical benefits. When sufficient application concurrency and hardware resources exist, CBASE, our system prototype, provides orders of magnitude improvements in throughput over BASE, a traditional BFT architecture. CBASE-FS, a Byzantine fault tolerant file system that uses CBASE, achieves twice the throughput of BASE-FS for the IOZone micro-benchmarks even in a configuration with modest available hardware parallelism. 1
Thema: Byzantine-fault-tolerant middleware for web services applications
- In Proceedings of the IEEE Symposium on Reliable Distributed Systems
, 2005
"... Distributed applications composed of collections of Web Services may call for diverse levels of reliability in different parts of the system. Byzantine fault tolerance (BFT) is a general strategy that has recently been shown to be practical for the development of certain classes of survivable, clien ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Distributed applications composed of collections of Web Services may call for diverse levels of reliability in different parts of the system. Byzantine fault tolerance (BFT) is a general strategy that has recently been shown to be practical for the development of certain classes of survivable, client–server, distributed applications; however, little research has been done on incorporating it into selective parts of multi-tier, distributed applications like Web Services that have heterogeneous reliability requirements. To understand the impacts of combining BFT and Web Services, we have created Thema, a new BFT middleware system that extends the BFT and Web Services technologies to provide a structured way to build Byzantine-fault-tolerant, survivable Web Services that application developers can use like other Web Services. From a reliability perspective, our enhancements are also novel in that they allow Byzantine-fault-tolerant services: (1) to support the multi-tiered requirements of Web Services, and (2) to provide standardized Web Services support for their own clients (through WSDL interfaces and SOAP communication). In this paper we study key architectural implications of combining BFT with Web services and provide a performance evaluation of Thema using the TPC-W benchmark. 1
Scaling byzantine fault-tolerant replication to wide area networks
- In DSN ’06: Proceedings of the International Conference on Dependable Systems and Networks (DSN’06
, 2006
"... Abstract — This paper presents the first hierarchical Byzantine tolerant replication architecture suitable to systems that span multiple wide area sites. The architecture confines the effects of any malicious replica to its local site, reduces message complexity of wide area communication, and allow ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
Abstract — This paper presents the first hierarchical Byzantine tolerant replication architecture suitable to systems that span multiple wide area sites. The architecture confines the effects of any malicious replica to its local site, reduces message complexity of wide area communication, and allows read-only queries to be performed locally within a site for the price of additional hardware. A prototype implementation is evaluated over several network topologies and is compared with a flat Byzantine tolerant approach. I.
Tolerating Byzantine Faults in Transaction Processing Systems using Commit Barrier Scheduling ABSTRACT
"... This paper describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares answers from queries and updates on multiple replicas which are unmodified, off-the-shelf systems, to provide a single da ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
This paper describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares answers from queries and updates on multiple replicas which are unmodified, off-the-shelf systems, to provide a single database that is Byzantine fault tolerant. The scheme works when the replicas are homogeneous, but it also allows heterogeneous replication in which replicas come from different vendors. Heterogeneous replicas reduce the impact of bugs and security compromises because they are implemented independently and are thus less likely to suffer correlated failures. The main challenge in designing a replication scheme for transaction processing systems is ensuring that the different replicas execute transactions in equivalent serial orders while allowing a high degree of concurrency. Our scheme meets this goal using a novel concurrency control protocol, commit barrier scheduling (CBS). We have implemented CBS in the context of a replicated SQL database, HRDB (Heterogeneous Replicated DB), which has been tested with unmodified production versions of several commercial and open source databases as replicas. Our experiments show an HRDB configuration that can tolerate one faulty replica has only a modest performance overhead (about 17 % for the TPC-C benchmark). HRDB successfully masks several Byzantine faults observed in practice and we have used it to find a new bug in MySQL.

