Results 1 - 10
of
16
Unreliable Failure Detectors for Reliable Distributed Systems
- Journal of the ACM
, 1996
"... We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with ..."
Abstract
-
Cited by 807 (17 self)
- Add to MetaCart
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
Secure agreement protocols: Reliable and atomic group multicast in Rampart
- In Proceedings of the 2nd ACM Conference on Computer and Communications Security
, 1994
"... Reliable and atomic group multicast have been pro-posed as fundamental communication paradigms to sup-port secure distributed computing in systems in which processes may behave maliciously. These protocols en-able messages to be multicast to a group of processes, while ensuring that all honest group ..."
Abstract
-
Cited by 162 (17 self)
- Add to MetaCart
Reliable and atomic group multicast have been pro-posed as fundamental communication paradigms to sup-port secure distributed computing in systems in which processes may behave maliciously. These protocols en-able messages to be multicast to a group of processes, while ensuring that all honest group members deliver the same messages and, in the case of atomic multi-cast, deliver these messages in the same order. We present new reliable and atomic group multicast pro-tocols for asynchronous distributed systems. We also describe their implementation as part of Rampart, a toolkit for building high-integrily distributed services, i.e., services that remain correct and available despite the corruption of some component servers by an at-tacker. To our knowledge, Rampart is the first system to demonstrate reliable and atomic group multicast in asynchronous systems subject to process corruptions. 1
Replication Management Using the State Machine Approach
, 1993
"... This paper is a tutorial on the state machine approach. It describes the approach and its implementation for two representative environments. Small examples suffice to illustrate the points. However, the approach has been successfully applied to larger examples; some of these are mentioned in 9. Sec ..."
Abstract
-
Cited by 101 (0 self)
- Add to MetaCart
This paper is a tutorial on the state machine approach. It describes the approach and its implementation for two representative environments. Small examples suffice to illustrate the points. However, the approach has been successfully applied to larger examples; some of these are mentioned in 9. Section 2 describes how a system can be viewed in terms of a state machine, clients, and output devices. Coping with failures is the subject of 3 through 6. An important class of optimizations--- based on the use of time---is discussed in 7. Section 8 describes dynamic reconfiguration. The history of the approach and related work is discussed in 9
Database Replication Techniques: a Three Parameter Classification
- IN SRDS
, 2000
"... Data replication is an increasingly important topic as databases are more and more deployed over clusters of workstations. One of the challenges in database replication is to introduce replication without severely affecting performance. Because of this difficulty, current database products use lazy ..."
Abstract
-
Cited by 76 (8 self)
- Add to MetaCart
Data replication is an increasingly important topic as databases are more and more deployed over clusters of workstations. One of the challenges in database replication is to introduce replication without severely affecting performance. Because of this difficulty, current database products use lazy replication, which is very efficient but can compromise consistency. As an alternative, eager replication guarantees consistency but most existing protocols have a prohibitive cost. In order to clarify the current state of the art and open up new avenues for research, this paper analyses existing eager techniques using three key parameters. In our analysis, we distinguish eight classes of eager replication protocols and, for each category, discuss its requirements, capabilities, and cost. The contribution lies in showing when eager replication is feasible and in spelling out the different aspects a database replication protocol must account for.
Efficient Byzantine-Tolerant Erasure-Coded Storage
- PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, JUNE 2004
, 2004
"... This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning enables the protocol to efficiently provide linearizability and wait-freedom of read and write operations to erasure-coded data in asynchrono ..."
Abstract
-
Cited by 73 (12 self)
- Add to MetaCart
This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning enables the protocol to efficiently provide linearizability and wait-freedom of read and write operations to erasure-coded data in asynchronous environments with Byzantine failures of clients and servers. By exploiting versioning storage-nodes, the protocol shifts most work to clients and allows highly optimistic operation: reads occur in a single round-trip unless clients observe concurrency or write failures. Measurements of a storage system prototype using this protocol show that it scales well with the number of failures tolerated, and its performance compares favorably with an efficient implementation of Byzantine-tolerant state machine replication.
Secure communication in minimal connectivity models
- Journal of Cryptology
, 1998
"... Abstract. Problems of secure communication and computation have been studied extensively in network models. In this work, we ask what is possible in the information-theoretic setting when the adversary is very strong (Byzantine) and the network connectivity is very low (minimum needed for crash-tole ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
Abstract. Problems of secure communication and computation have been studied extensively in network models. In this work, we ask what is possible in the information-theoretic setting when the adversary is very strong (Byzantine) and the network connectivity is very low (minimum needed for crash-tolerance). For some natural models, our results imply a sizable gap between the connectivity required for perfect security and for probabilistic security. Our results also have implications to the com-monly studied simple channel model and to general secure multiparty computation. 1
Fault Detection for Byzantine Quorum Systems
, 1999
"... In this paper we explore techniques to detect Byzantine server failures in asynchronous replicated data services. Our goal is to detect arbitrary failures of data servers in a system where each client accesses the replicated data at only a subset (quorum) of servers in each operation. In such a s ..."
Abstract
-
Cited by 32 (12 self)
- Add to MetaCart
In this paper we explore techniques to detect Byzantine server failures in asynchronous replicated data services. Our goal is to detect arbitrary failures of data servers in a system where each client accesses the replicated data at only a subset (quorum) of servers in each operation. In such a system, some correct servers can be out of date after a write and can therefore return values other than the most up-to-date value in response to a client's read request, thus complicating the task of determining the number of faulty servers in the system at any point in time. We initiate the study of detecting server failures in this context, and propose two statistical approaches for estimating the risk posed by faulty servers based on responses to read requests.
Adversarial contention resolution for simple channels
- In: 17th Annual Symposium on Parallelism in Algorithms and Architectures
, 2005
"... This paper analyzes the worst-case performance of randomized backoff on simple multiple-access channels. Most previous analysis of backoff has assumed a statistical arrival model. For batched arrivals, in which all n packets arrive at time 0, we show the following tight high-probability bounds. Rand ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
This paper analyzes the worst-case performance of randomized backoff on simple multiple-access channels. Most previous analysis of backoff has assumed a statistical arrival model. For batched arrivals, in which all n packets arrive at time 0, we show the following tight high-probability bounds. Randomized binary exponential backoff has makespan Θ(nlgn), and more generally, for any constant r, r-exponential backoff has makespan Θ(nlog lgr n). Quadratic backoff has makespan Θ((n/lg n) 3/2), and more generally, for r> 1, r-polynomial backoff has makespan Θ((n/lg n) 1+1/r). Thus, for batched inputs, both exponential and polynomial backoff are highly sensitive to backoff constants. We exhibit a monotone superpolynomial subexponential backoff algorithm, called loglog-iterated backoff, that achieves makespan Θ(nlg lgn/lg lglgn). We provide a matching lower bound showing that this strategy is optimal among all monotone backoff algorithms. Of independent interest is that this lower bound was proved with a delay sequence argument. In the adversarial-queuing model, we present the following stability and instability results for exponential backoff and loglogiterated backoff. Given a (λ,T)-stream, in which at most n = λT packets arrive in any interval of size T, exponential backoff is stable for arrival rates of λ = O(1/lgn) and unstable for arrival rates of λ = Ω(lglgn/lg n); loglog-iterated backoff is stable for arrival rates of λ = O(1/(lg lgnlgn)) and unstable for arrival rates of λ = Ω(1/lg n). Our instability results show that bursty input is close to being worst-case for exponential backoff and variants and that even small bursts can create instabilities in the channel.
Objects Shared by Byzantine Processes
, 2003
"... Work to date on algorithms for message-passing systems has explored a wide variety of types of faults, but corresponding work on shared memory systems has usually assumed that only crash faults are possible. In this work, we explore situations in which processes accessing shared objects can fail arb ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Work to date on algorithms for message-passing systems has explored a wide variety of types of faults, but corresponding work on shared memory systems has usually assumed that only crash faults are possible. In this work, we explore situations in which processes accessing shared objects can fail arbitrarily (Byzantine faults).
Unreliable Failure Detectors For Asynchronous Distributed Systems
- in the Proceedings of the 10 th Annual ACM Symposium on Principles of Distributed Computing
, 1993
"... equivalent in asynchronous systems. Thus all our results regarding the solvability of Consensus using failure detectors, apply to Atomic Broadcast as well. The work in this thesis was funded by an IBM graduate fellowship and grants from NSF, DARPA/NASA, the IBM Endicott Programming Laboratory, Siem ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
equivalent in asynchronous systems. Thus all our results regarding the solvability of Consensus using failure detectors, apply to Atomic Broadcast as well. The work in this thesis was funded by an IBM graduate fellowship and grants from NSF, DARPA/NASA, the IBM Endicott Programming Laboratory, Siemens Corp and the Natural Siences and Engineering Research Council of Canada. Biographical Sketch Tushar Deepak Chandra was born in New Delhi, India on November 13, 1966. He spent his childhood in various cities in India: Bombay, Calcutta and finally Kanpur. After completing high school at the Doon school, he went on to do a Bachelor of Technology in Computer Science at the Indian Institute of Technology at Kanpur. He joined the graduate program in Computer Science at Cornell University in August 1988. iii This thesis is dedicated to my parents who taught me how to think. iv Acknowledgements A large number of people contributed either directly or i

