Results 1 - 10
of
76
Small Byzantine Quorum Systems
- DISTRIBUTED COMPUTING
, 2001
"... In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in ..."
Abstract
-
Cited by 366 (48 self)
- Add to MetaCart
In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in the write protocol and by using read and write quorums of different sizes. Since engineering a reliable network layer on an unreliable network is difficult, two other possibilities must be explored. The first is to strengthen the model by allowing synchronous networks that use time-outs to identify failed links or machines. We consider running synchronous and asynchronous Byzantine Quorum protocols over synchronous networks and conclude that, surprisingly, "self-timing" asynchronous Byzantine protocols may offer significant advantages for many synchronous networks when network time-outs are long. We show how to extend an existing Byzantine Quorum protocol to eliminate its dependency on reliable networking and to handle message loss and retransmission explicitly.
Atomic Snapshots of Shared Memory
, 1993
"... . This paper introduces a general formulation of atomic snapshot memory, a shared memory partitioned into words written (updated) by individual processes, or instantaneously read (scanned) in its entirety. This paper presents three wait-free implementations of atomic snapshot memory. The first imple ..."
Abstract
-
Cited by 148 (42 self)
- Add to MetaCart
. This paper introduces a general formulation of atomic snapshot memory, a shared memory partitioned into words written (updated) by individual processes, or instantaneously read (scanned) in its entirety. This paper presents three wait-free implementations of atomic snapshot memory. The first implementation in this paper uses unbounded (integer) fields in these registers, and is particularly easy to understand. The second implementation uses bounded registers. Its correctness proof follows the ideas of the unbounded implementation. Both constructions implement a single-writer snapshot memory, in which each word may be updated by only one process, from single-writer, n-reader registers. The third algorithm implements a multi-writer snapshot memory from atomic n-writer, n-reader registers, again echoing key ideas from the earlier constructions. All operations require \Theta(n 2 ) reads and writes to the component shared registers in the worst case. Categories and Subject Discriptors:...
The asynchronous computability theorem for tresilient tasks
- In Proceedings of the 1993 ACM Symposium on Theory of Computing
, 1993
"... We give necessary and sufficient combinatorial conditions characterizing the computational tasks that can be solved by N asynchronous processes, up to t of which can fail by halting. The range of possible input and output values for an asynchronous task can be associated with a high-dimensional geom ..."
Abstract
-
Cited by 95 (14 self)
- Add to MetaCart
We give necessary and sufficient combinatorial conditions characterizing the computational tasks that can be solved by N asynchronous processes, up to t of which can fail by halting. The range of possible input and output values for an asynchronous task can be associated with a high-dimensional geometric structure called a simplicial complex. Our main theorem characterizes computability y in terms of the topological properties of this complex. Most notably, a given task is computable only if it can be associated with a complex that is simply connected with trivial homology groups. In other words, the complex has “no holes!” Applications of this characterization include the first impossibility results for several long-standing open problems in distributed computing, such as the “renaming ” problem of Attiya et. al., the “k-set agreement ” problem of Chaudhuri, and a generalization of the approximate agreement problem. 1
FAB: Building Distributed Enterprise Disk Arrays from Commodity Components
, 2004
"... This paper describes the design, implementation, and evaluation of a Federated Array of Bricks (FAB), a distributed disk array that provides the reliability of traditional enterprise arrays with lower cost and better scalability. FAB is built from a collection of bricks, small storage appliances con ..."
Abstract
-
Cited by 92 (7 self)
- Add to MetaCart
This paper describes the design, implementation, and evaluation of a Federated Array of Bricks (FAB), a distributed disk array that provides the reliability of traditional enterprise arrays with lower cost and better scalability. FAB is built from a collection of bricks, small storage appliances containing commodity disks, CPU, NVRAM, and network interface cards. FAB deploys a new majority-votingbased algorithm to replicate or erasure-code logical blocks across bricks and a reconfiguration algorithm to move data in the background when bricks are added or decommissioned. We argue that voting is practical and necessary for reliable, high-throughput storage systems such as FAB. We have implemented a FAB prototype on a 22-node Linux cluster. This prototype sustains 85MB/second of throughput for a database workload, and 270MB/second for a bulk-read workload. In addition, it can outperform traditional masterslave replication through performance decoupling and can handle brick failures and recoveries smoothly without disturbing client requests.
Disk Paxos
"... We present an algorithm, called Disk Paxos, for implementing a reliable distributed system with a network of processors and disks. Like the original Paxos algorithm, Disk Paxos maintains consistency in the presence of arbitrary non-Byzantine faults. Progress can be guaranteed as long as a majori ..."
Abstract
-
Cited by 50 (1 self)
- Add to MetaCart
We present an algorithm, called Disk Paxos, for implementing a reliable distributed system with a network of processors and disks. Like the original Paxos algorithm, Disk Paxos maintains consistency in the presence of arbitrary non-Byzantine faults. Progress can be guaranteed as long as a majority of the disks are available, even if all processors but one have failed.
An Architecture for Survivable Coordination in Large Distributed Systems
- IEEE Transactions on Knowledge and Data Engineering
, 2000
"... Coordination among processes in a distributed system can be rendered very complex in a large-scale system where messages may be delayed or lost, and when processes may participate only transiently or behave arbitrarily, e.g., after suffering a security breach. In this paper, we propose a scalable ar ..."
Abstract
-
Cited by 49 (15 self)
- Add to MetaCart
Coordination among processes in a distributed system can be rendered very complex in a large-scale system where messages may be delayed or lost, and when processes may participate only transiently or behave arbitrarily, e.g., after suffering a security breach. In this paper, we propose a scalable architecture to support coordination in such extreme conditions. Our architecture consists of a collection of persistent data servers that implement simple shared data abstractions for clients, without trusting the clients or even the servers themselves. We show that by interacting with these untrusted servers, clients can solve distributed consensus, a powerful and fundamental coordination primitive. Our architecture is very practical, and we describe the implementation of its main components in a system called Phalanx. 1 Introduction In this paper we propose a system architecture that supports efficient and scalable coordination among distributed clients. Our architecture strives to support...
Round-by-round fault detectors: Unifying synchrony and asynchrony
- In Proc of the 17th ACM Symp. Principles of Distributed Computing (PODC
, 1998
"... and insights. 1 Introduction For many years, researchers studying synchronous message-passing systems have considered algorithms composed of rounds of computation. In each round, a process sends a message to the others and then waits to receive messages from the other processes. The synchronous natu ..."
Abstract
-
Cited by 43 (7 self)
- Add to MetaCart
and insights. 1 Introduction For many years, researchers studying synchronous message-passing systems have considered algorithms composed of rounds of computation. In each round, a process sends a message to the others and then waits to receive messages from the other processes. The synchronous nature of the system ensures that, by the end of the round, each process receives all messages sent to it in that round by correct processes. In the parlance of Elrad and Frances [1] then, each round of a synchronous system is a communication-closed-layer.
Performing Work Efficiently In The Presence Of Faults
- SIAM Journal on Computing
, 1992
"... . We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, all n units of work will be performed. We consider three parameters: the number of messages sent, the total number of units of work performed (including multiplicities), and time. We present three protocols for solving the problem. All three are work-optimal, doing O(n+ t) work. The first has moderate costs in the remaining two parameters, sending O(t p t) messages, and taking O(n + t) time. This protocol can be easily modified to run in any completely asynchronous system equipped with a failure detection mechanism. The second sends only O(t log t) messages, but its running time is large (O(t 2 (n+ t)2 n+t )). The third is essentially time-optimal in the (usual) case in which there a...
Are Wait-Free Algorithms Fast?
, 1991
"... The time complexity of wait-free algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any wait-free algorithm that achieves approximate agreement among n processes is proved. ..."
Abstract
-
Cited by 42 (12 self)
- Add to MetaCart
The time complexity of wait-free algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any wait-free algorithm that achieves approximate agreement among n processes is proved. In contrast, there exists a non-wait-free algorithm that solves this problem in constant time. This implies an (log n) time separation between the wait-free and non-wait-free computation models. On the positive side, we present an O(log n) time wait-free approximate agreement algorithm; the complexity of this algorithm is within a small constant of the lower bound.
GeoQuorums: Implementing Atomic Memory in Mobile Ad Hoc Networks
, 2004
"... We present a new approach, the GeoQuorums approach, for implementing atomic read/write shared memory in mobile ad hoc networks. Our approach is based on associating abstract atomic objects with certain geographic locations. We assume the existence of focal points, geographic areas that are normall ..."
Abstract
-
Cited by 41 (10 self)
- Add to MetaCart
We present a new approach, the GeoQuorums approach, for implementing atomic read/write shared memory in mobile ad hoc networks. Our approach is based on associating abstract atomic objects with certain geographic locations. We assume the existence of focal points, geographic areas that are normally "populated" by mobile nodes.

