Results 1  10
of
19
Unreliable Failure Detectors for Reliable Distributed Systems
 Journal of the ACM
, 1996
"... We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with ..."
Abstract

Cited by 977 (19 self)
 Add to MetaCart
(Show Context)
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
The Weakest Failure Detector for Solving Consensus
, 1996
"... We determine what information about failures is necessary and sufficient to solve Consensus in asynchronous distributed systems subject to crash failures. In [CT91], it is shown that 3W, a failure detector that provides surprisingly little information about which processes have crashed, is sufficien ..."
Abstract

Cited by 435 (21 self)
 Add to MetaCart
We determine what information about failures is necessary and sufficient to solve Consensus in asynchronous distributed systems subject to crash failures. In [CT91], it is shown that 3W, a failure detector that provides surprisingly little information about which processes have crashed, is sufficient to solve Consensus in asynchronous systems with a majority of correct processes. In this paper, we prove that to solve Consensus, any failure detector has to provide at least as much information as 3W. Thus, 3W is indeed the weakest failure detector for solving Consensus in asynchronous systems with a majority of correct processes.
More Choices Allow More Faults: Set Consensus Problems In Totally Asynchronous Systems
 Information and Computation
, 1992
"... We define the kset consensus problem as an extension of the consensus problem, where each processor decides on a single value such that the set of decided values in any run is of size at most k. We require the agreement condition that all values decided upon are initial values of some processor. ..."
Abstract

Cited by 102 (4 self)
 Add to MetaCart
We define the kset consensus problem as an extension of the consensus problem, where each processor decides on a single value such that the set of decided values in any run is of size at most k. We require the agreement condition that all values decided upon are initial values of some processor. We show that the problem has a simple (k  1)resilient protocol in a totally asynchronous system. In an attempt to come up with a matching lower bound on the number of failures, we study the uncertainty condition, which requires that there must be some initial configuration from which all possible input values can be decided. We prove using a combinatorial argument that any kresilient protocol for the kset agreement problem would satisfy the uncertainty condition, while this is not true for any (k  1)resilient protocol.
Performing work efficiently in the presence of faults
 in the Proceedings of the 11 th ACM Symposium on Principles of Distributed Computing (PODC
, 1998
"... Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process s ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, all n units of work will be performed. We consider three parameters: the number of messages sent, the total number of units of work performed (including multiplicities), and time. We present three protocols for solving the problem. All three are workoptimal, doing O(n+t) work. The first has moderate costs in the remaining two parameters, sending O(t √ t) messages, and taking O(n + t) time. This protocol can be easily modified to run in any completely asynchronous system equipped with a failure detection mechanism. The second sends only O(tlog t) messages, but its running time is large (O(t 2 (n+t)2 n+t)). The third is essentially timeoptimal in the (usual) case in which there are no failures, and its time complexity degrades gracefully as the number of failures increases.
Hundreds of Impossibility Results for Distributed Computing
 Distributed Computing
, 2003
"... We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refe ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refer to time, space and message complexity. These results are useful in understanding the inherent difficulty of individual problems and in studying the power of different models of distributed computing.
Possibility and impossibility results in a shared memory environment
 Acta Inf
, 1996
"... We focus on unreliable asynchronous shared memory model which support only atomic read and write operations. For such a model we provide a necessary condition for the solvability of problems in the presence of multiple undetectable crash failures. Also, by using gametheoretical notions, a necessary ..."
Abstract

Cited by 16 (9 self)
 Add to MetaCart
We focus on unreliable asynchronous shared memory model which support only atomic read and write operations. For such a model we provide a necessary condition for the solvability of problems in the presence of multiple undetectable crash failures. Also, by using gametheoretical notions, a necessary and sufficient condition is provided, for the solvability of problems in the presence of multiple undetectable initial failures (i.e., processes may fail only prior to the execution). Our results imply that many problems such as consensus, choosing a leader, ranking, matching and sorting are unsolvable in the presence of a single crash failure, and that variants of these problems are solvable in the presence of t − 1 crash failures but not in the presence of t crash failures. We show that a shared memory model can simulate various message passing models, and hence our impossibility results hold also for those message passing models. Our results extend and generalize previously known impossibility results for various asynchronous models. Key words: asynchronous protocols, impossibility, shared memory, atomic read and write operations, crash failures, initial failures, winning strategy.
Initial failures in distributed computations
 International Journal of Parallel Programming
, 1989
"... We investigate the possibility of solving problems in completely asynchronous message passing systems where a number of processes may fail prior to execution. By using gametheoretical notions, necessary and sufficient conditions are provided for solving problems is such a model with an without a te ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
We investigate the possibility of solving problems in completely asynchronous message passing systems where a number of processes may fail prior to execution. By using gametheoretical notions, necessary and sufficient conditions are provided for solving problems is such a model with an without a termination requirement. An upper, bound on the message complexity for solving any problem in the model is given, as well as a simple design concept for constructing a solution to any solvable problem. KEY WORDS: winning strategy. Asynchronous protocols; crash failures; initial failures; 1.
Impossibility results in the presence of multiple faulty processes
 Information and Computation
, 1994
"... Abstract. We investigate the impossibility of solving certain problems in an unreliable distributed system where multiple processes may fail. We assume undetectable crash failures which means that a process may become faulty at any time during an execution and that no event can happen on a process a ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
(Show Context)
Abstract. We investigate the impossibility of solving certain problems in an unreliable distributed system where multiple processes may fail. We assume undetectable crash failures which means that a process may become faulty at any time during an execution and that no event can happen on a process after it fails. A sufficient condition is provided for the unsolvability of problems in the presence of multiple faulty processes. Several problems are shown to be solvable in the presence of t − 1 faulty processes but not in the presence of t faulty processes for any t. These problems are variants of problems which are unsolvable in the presence of a single faulty process (such as consensus, choosing a leader, ranking, matching). In order to prove the impossibility result a contradiction is shown among a set of axioms which characterize any faulttolerant protocol solving the problems we treat. In the course of the proof, we present two results that appear to be of independent interest: first, we show that for any protocol there is a computation in which some process is a splitter. This process can split the possible outputs of the protocol to two disjoint sets. In case that the protocol is also faulttolerant, then this splitter must be a decider, that can split its own output values into two different singletons. These results generalize and expand known results for asynchronous systems. 1