Results 1 - 10
of
40
Small Byzantine Quorum Systems
- DISTRIBUTED COMPUTING
, 2001
"... In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in ..."
Abstract
-
Cited by 366 (48 self)
- Add to MetaCart
In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in the write protocol and by using read and write quorums of different sizes. Since engineering a reliable network layer on an unreliable network is difficult, two other possibilities must be explored. The first is to strengthen the model by allowing synchronous networks that use time-outs to identify failed links or machines. We consider running synchronous and asynchronous Byzantine Quorum protocols over synchronous networks and conclude that, surprisingly, "self-timing" asynchronous Byzantine protocols may offer significant advantages for many synchronous networks when network time-outs are long. We show how to extend an existing Byzantine Quorum protocol to eliminate its dependency on reliable networking and to handle message loss and retransmission explicitly.
Fault-scalable Byzantine fault-tolerant services
- In Proceedings of the 20th ACM Symposium on Operating Systems Principles
, 2005
"... A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool that enables construction of fault-scalable Byzantine faulttolerant services. The optimistic quorum-based nature of the Q/U ..."
Abstract
-
Cited by 92 (6 self)
- Add to MetaCart
A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool that enables construction of fault-scalable Byzantine faulttolerant services. The optimistic quorum-based nature of the Q/U protocol allows it to provide better throughput and fault-scalability than replicated state machines using agreement-based protocols. A prototype service built using the Q/U protocol outperforms the same service built using a popular replicated state machine implementation at all system sizes in experiments that permit an optimistic execution. Moreover, the performance of the Q/U protocol decreases by only 36 % as the number of Byzantine faults tolerated increases from one to five, whereas the performance of the replicated state machine decreases by 83%.
Secure and Scalable Replication in Phalanx
- In Proceedings of the 17th IEEE Symposium on Reliable Distributed Systems
, 1998
"... ) Dahlia Malkhi Michael K. Reiter AT&T Labs Research, Florham Park, NJ, USA fdalia,reiterg@research.att.com Abstract Phalanx is a software system for building a persistent, survivable data repository that supports shared data abstractions (e.g., variables, mutual exclusion) for clients. Phalanx ..."
Abstract
-
Cited by 83 (8 self)
- Add to MetaCart
) Dahlia Malkhi Michael K. Reiter AT&T Labs Research, Florham Park, NJ, USA fdalia,reiterg@research.att.com Abstract Phalanx is a software system for building a persistent, survivable data repository that supports shared data abstractions (e.g., variables, mutual exclusion) for clients. Phalanx implements data abstractions that ensure useful properties without trusting the servers supporting these abstractions or the clients accessing them, i.e., Phalanx can survive even the arbitrarily malicious corruption of clients and (some number of) servers. At the core of the system are survivable replication techniques that enable efficient scaling to hundreds of Phalanx servers. In this paper we describe the implementation of some of the data abstractions provided by Phalanx, discuss their ability to scale to large systems, and describe an example application. 1. Introduction In this paper we introduce Phalanx, a software system for building persistent services that support shared data ab...
Efficient Byzantine-Tolerant Erasure-Coded Storage
- PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, JUNE 2004
, 2004
"... This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning enables the protocol to efficiently provide linearizability and wait-freedom of read and write operations to erasure-coded data in asynchrono ..."
Abstract
-
Cited by 73 (12 self)
- Add to MetaCart
This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning enables the protocol to efficiently provide linearizability and wait-freedom of read and write operations to erasure-coded data in asynchronous environments with Byzantine failures of clients and servers. By exploiting versioning storage-nodes, the protocol shifts most work to clients and allows highly optimistic operation: reads occur in a single round-trip unless clients observe concurrency or write failures. Measurements of a storage system prototype using this protocol show that it scales well with the number of failures tolerated, and its performance compares favorably with an efficient implementation of Byzantine-tolerant state machine replication.
Fault Detection for Byzantine Quorum Systems
, 1999
"... In this paper we explore techniques to detect Byzantine server failures in asynchronous replicated data services. Our goal is to detect arbitrary failures of data servers in a system where each client accesses the replicated data at only a subset (quorum) of servers in each operation. In such a s ..."
Abstract
-
Cited by 32 (12 self)
- Add to MetaCart
In this paper we explore techniques to detect Byzantine server failures in asynchronous replicated data services. Our goal is to detect arbitrary failures of data servers in a system where each client accesses the replicated data at only a subset (quorum) of servers in each operation. In such a system, some correct servers can be out of date after a write and can therefore return values other than the most up-to-date value in response to a client's read request, thus complicating the task of determining the number of faulty servers in the system at any point in time. We initiate the study of detecting server failures in this context, and propose two statistical approaches for estimating the risk posed by faulty servers based on responses to read requests.
Persistent objects in the Fleet system
- In DISCEX II
, 2001
"... Fleet is a middleware system implementing a distributed repository for persistent Java objects. Fleet is primarily targeted for supporting highly critical applications: in particular, the objects it stores maintain correct semantics despite the arbitrary failure (including hostile corruption) of a l ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
Fleet is a middleware system implementing a distributed repository for persistent Java objects. Fleet is primarily targeted for supporting highly critical applications: in particular, the objects it stores maintain correct semantics despite the arbitrary failure (including hostile corruption) of a limited number of Fleet servers and, for some object types, of clients allowed to invoke methods on those objects. Fleet is designed to be highly available, dynamically extensible with new object types, and scalable to large numbers of servers and clients. Previous papers described the replication technology underlying Fleet; in this paper we describe the design of Fleet objects, including how new objects are introduced into the system, how they are named, and their default semantics. 1.
Quorum Systems in Replicated Databases: Science or Fiction?
- BULL. IEEE TECHNICAL COMMITTEE ON DATA ENGINEERING
, 1998
"... A quorum system is a collection of subsets of servers, every two of which intersect. Quorum systems have been suggested as a tool for concurrency control in replicated databases almost twenty years ago. They promised to guarantee strict consistency and to provide high availability and fault-toleranc ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
A quorum system is a collection of subsets of servers, every two of which intersect. Quorum systems have been suggested as a tool for concurrency control in replicated databases almost twenty years ago. They promised to guarantee strict consistency and to provide high availability and fault-tolerance in the face of server crashes and network partitions. Despite these promises, current commercial replicated databases typically do not use quorum systems. Instead they use mechanisms which guarantee much weaker consistency, if any. Moreover, the interest in quorum systems seems to be waning even in the database research community. This paper
A Dynamic Primary Configuration Group Communication Service
, 1999
"... Quorum-based methods for managing replicated data are popular because they provide availability of both reads and writes in the presence of faulty behavior by some sites or communication links. Over a very log time, it may be... ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
Quorum-based methods for managing replicated data are popular because they provide availability of both reads and writes in the presence of faulty behavior by some sites or communication links. Over a very log time, it may be...
On Correlated Failures in Survivable Storage Systems
, 2002
"... The design of survivable storage systems involves inherent trade-oJ among properties such as performance, security, and availability. A toolbox of simple and accurate models of these properties allows a designer to make informed decisions. This report focuses on availability modeling. We describe ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
The design of survivable storage systems involves inherent trade-oJ among properties such as performance, security, and availability. A toolbox of simple and accurate models of these properties allows a designer to make informed decisions. This report focuses on availability modeling. We describe two ways of extending the classic model of availability with a single "correlation parameter" to accommodate correlated failures. We evaluate the efficacy of the models by comparing their results with real measurements. We also show the use of the models as design decision tools: we analyze the effects of availability and correlation on the ordering of data distribution schemes and we investigate the placement of related files.

