Results 1 - 10
of
12
Unreliable Failure Detectors for Reliable Distributed Systems
- Journal of the ACM
, 1996
"... We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with ..."
Abstract
-
Cited by 807 (17 self)
- Add to MetaCart
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
The Consensus Problem in Unreliable Distributed Systems (A Brief Survey)
, 2000
"... Agreement problems involve a system of processes, some of which may be faulty. A fundamental problem of fault-tolerant distributed computing is for the reliable processes to reach a consensus. We survey the considerable literature on this problem that has developed over the past few years and giv ..."
Abstract
-
Cited by 102 (2 self)
- Add to MetaCart
Agreement problems involve a system of processes, some of which may be faulty. A fundamental problem of fault-tolerant distributed computing is for the reliable processes to reach a consensus. We survey the considerable literature on this problem that has developed over the past few years and give an informal overview of the major theoretical results in the area.
Studies in Secure Multiparty Computation and Applications
, 1996
"... Consider a set of parties who do not trust each other, nor the channels by which they communicate. Still, the parties wish to correctly compute some common function of their local inputs, while keeping their local data as private as possible. This, in a nutshell, is the problem of secure multiparty ..."
Abstract
-
Cited by 72 (6 self)
- Add to MetaCart
Consider a set of parties who do not trust each other, nor the channels by which they communicate. Still, the parties wish to correctly compute some common function of their local inputs, while keeping their local data as private as possible. This, in a nutshell, is the problem of secure multiparty computation. This problem is fundamental in cryptography and in the study of distributed computations. It takes many different forms, depending on the underlying network, on the function to be computed, and on the amount of distrust the parties have in each other and in the network. We study several aspects of secure multiparty computation. We first present new definitions of this problem in various settings. Our definitions draw from previous ideas and formalizations, and incorporate aspects that were previously overlooked. Next we study the problem of dealing with adaptive adversaries. (Adaptive adversaries are adversaries that corrupt parties during the course of the computation, based on...
Unreliable Failure Detectors For Asynchronous Distributed Systems
- in the Proceedings of the 10 th Annual ACM Symposium on Principles of Distributed Computing
, 1993
"... equivalent in asynchronous systems. Thus all our results regarding the solvability of Consensus using failure detectors, apply to Atomic Broadcast as well. The work in this thesis was funded by an IBM graduate fellowship and grants from NSF, DARPA/NASA, the IBM Endicott Programming Laboratory, Siem ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
equivalent in asynchronous systems. Thus all our results regarding the solvability of Consensus using failure detectors, apply to Atomic Broadcast as well. The work in this thesis was funded by an IBM graduate fellowship and grants from NSF, DARPA/NASA, the IBM Endicott Programming Laboratory, Siemens Corp and the Natural Siences and Engineering Research Council of Canada. Biographical Sketch Tushar Deepak Chandra was born in New Delhi, India on November 13, 1966. He spent his childhood in various cities in India: Bombay, Calcutta and finally Kanpur. After completing high school at the Doon school, he went on to do a Bachelor of Technology in Computer Science at the Indian Institute of Technology at Kanpur. He joined the graduate program in Computer Science at Cornell University in August 1988. iii This thesis is dedicated to my parents who taught me how to think. iv Acknowledgements A large number of people contributed either directly or i
Reaching (and Maintaining) Agreement in the Presence of Mobile Faults (Extended Abstract)
- Proc. 8th International Workshop on Distributed Algorithms, LNCS (857
, 1994
"... ) Juan A. Garay IBM T. J. Watson Research Center P.O. Box 704, Yorktown Heights, NY 10598, USA. Abstract. In this paper we consider a model where malicious agents can corrupt hosts and move around in a network of processors. We investigate the issue of fault mobility and the faults' power of disrup ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
) Juan A. Garay IBM T. J. Watson Research Center P.O. Box 704, Yorktown Heights, NY 10598, USA. Abstract. In this paper we consider a model where malicious agents can corrupt hosts and move around in a network of processors. We investigate the issue of fault mobility and the faults' power of disruption as a function of the fundamental parameter in such systems: the faults' speed. We do so by evaluating in a mobile-fault environment a classical testbed problem for fault-tolerant distributed computing: Byzantine agreement. We present a family of mobile-fault models MF( t n\Gamma1 ; ae). In MF( t n\Gamma1 ; ae) there are a total of n processors, the maximum number of mobile faults is t, and their roaming pace is ae (for example, ae = 3 means that it takes a virus at least 3 rounds to "hop" to the next host). We define MBA, a version of the agreement problem that is adequate for such environments, and give a series of impossibility results. We show, in particular, that under no other a...
Clock Synchronization with Faults and Recoveries (Extended Abstract)
- In Proc. 19th ACM Symposium on Principles of Distributed Computing (PODC
, 2000
"... We present a convergence-function based clock synchronization algorithm, which is simple, efficient and fault-tolerant. The algorithm is tolerant of failures and allows recoveries, as long as less than a third of the processors are faulty `at the same time'. Arbitrary (Byzantine) faults are tolerate ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We present a convergence-function based clock synchronization algorithm, which is simple, efficient and fault-tolerant. The algorithm is tolerant of failures and allows recoveries, as long as less than a third of the processors are faulty `at the same time'. Arbitrary (Byzantine) faults are tolerated, without requiring awareness of failure or recovery. In contrast, previous clock synchronization algorithms limited the total number of faults throughout the execution, which is not realistic, or assumed fault detection.
Practical Impact of Group Communication Theory
, 2003
"... this paper. in the context of the V System [1.17]. In the paper, Cheriton and Zwaenepoel mention operations such as join group, leave group, send message to group, etc. It is interesting to notice that the paper also mentions the publish-subscribe paradigm. The concept of virtual synchrony was intr ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
this paper. in the context of the V System [1.17]. In the paper, Cheriton and Zwaenepoel mention operations such as join group, leave group, send message to group, etc. It is interesting to notice that the paper also mentions the publish-subscribe paradigm. The concept of virtual synchrony was introduced in the paper by Birman and Joseph, as a specification that encompasses atomic broadcast (abcast), causal broadcast (cbcast) and group broadcast (gbcast) [1.8]. The paper does not give any precise specification of these group communication primitives, but stresses on the benefit of using these abstractions to develop fault-tolerant software: We argue that this approach to building distributed and fault-tolerant software is more straightforward, more flexible and more likely to yield correct solutions than alternative approaches
Optimal Resiliency against Mobile Faults
, 1995
"... In this paper we consider a model where malicious agents can corrupt hosts and move around in a network of processors. We consider a family of mobilefault models MF(t/(n-1),p) there are a total of n processors, the maximum number of mobile faults is t, and their roaming pace is p(for example, p = 3 ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper we consider a model where malicious agents can corrupt hosts and move around in a network of processors. We consider a family of mobilefault models MF(t/(n-1),p) there are a total of n processors, the maximum number of mobile faults is t, and their roaming pace is p(for example, p = 3 means that it takes an agent at least 3 rounds to "hop" to the next host). We study in these models the classical testbed problem for fault-tolerant distributed computing: Byzantine agreement.
Non-Atomic Commitment Problem: A comparative study between the 2PC and a new protocol based on the consensus paradigm
"... The atomic commitment problem is of primary importance in distributed systems, this problem become difficult to solve if some participants which are involved by the execution of the transaction commitment fail. Several protocols have been implemented to allow participants to terminate the commitment ..."
Abstract
- Add to MetaCart
The atomic commitment problem is of primary importance in distributed systems, this problem become difficult to solve if some participants which are involved by the execution of the transaction commitment fail. Several protocols have been implemented to allow participants to terminate the commitment of transactions. In this paper we give a comparative study between the two phase commit protocol (which is blocking) and a new protocol that resolve the non-blocking atomic commitment problem using the consensus paradigm. The results of our comparison are based on a simulation of a set of sites that try to execute the commitment phase of a transaction in an asynchronous distributed system. In our implementation of the new protocol that we call the Consensus Commitment Protocol (in short CCP), we have used the solution of the consensus problem introduced by Chandra and Toueg and which is based on the concept of the unreliable failure detectors. The CCP used is based on the algorithm that uses the eventually strong failure detector noted OS. We presented also basic ideas to implement a failure detector of the OS class, this later permit - when some properties hold - to resolve the consensus problem, we present also a new property of failure detection, which is simpler than the Eventual Weak Accuracy property. In addition to some basic concepts necessary to simulate an asynchronous distributed system. The distributed system simulated make easy to test distributed applications with any number of sites, and in every failure scenario we want.
Cynthia Dwork And Nancy Lynch
- Journal of the ACM
, 1988
"... The concept of partial synchrony in a distributed system is introduced. Partial synchrony lies between the cases of a synchronous system and an asynchronous system. In a synchronous system, there is a known fixed upper bound A on the time required for a message to be sent from one processor to ano ..."
Abstract
- Add to MetaCart
The concept of partial synchrony in a distributed system is introduced. Partial synchrony lies between the cases of a synchronous system and an asynchronous system. In a synchronous system, there is a known fixed upper bound A on the time required for a message to be sent from one processor to another and a known fixed upper bound (I, on the relative speeds of different processors. In an asynchronous system no fixed upper bounds A and (I, exist. In one version of partial synchrony, fixed bounds A and (I, exist, but they are not known a priori. The problem is to design protocols that work correctly in the partially synchronous system regardless of the actual values of the bounds A and (I,. In another version of partial synchrony, the bounds are known, but are only guaranteed to hold starting at some unknown time T, and protocols must be designed to work correctly regardless of when time T occurs. Fault-tolerant consensus protocols are given for various cases of partial synchrony and various fault models. Lower bounds that show in most cases that our protocols are optimal with respect to the number of faults tolerated are also given. Our consensus protocols for partially synchronous processors use new protocols for fault-tolerant "distributed clocks" that allow partially synchronous processors to reach some approximately common notion of time.

