Results 1  10
of
23
Unreliable Failure Detectors for Reliable Distributed Systems
 Journal of the ACM
, 1996
"... We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with ..."
Abstract

Cited by 898 (18 self)
 Add to MetaCart
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
Roundbyround fault detectors: Unifying synchrony and asynchrony
 In Proc of the 17th ACM Symp. Principles of Distributed Computing (PODC
, 1998
"... and insights. 1 Introduction For many years, researchers studying synchronous messagepassing systems have considered algorithms composed of rounds of computation. In each round, a process sends a message to the others and then waits to receive messages from the other processes. The synchronous natu ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
and insights. 1 Introduction For many years, researchers studying synchronous messagepassing systems have considered algorithms composed of rounds of computation. In each round, a process sends a message to the others and then waits to receive messages from the other processes. The synchronous nature of the system ensures that, by the end of the round, each process receives all messages sent to it in that round by correct processes. In the parlance of Elrad and Frances [1] then, each round of a synchronous system is a communicationclosedlayer.
Shared Memory vs Message Passing
, 2004
"... This paper determines the computational strength of the shared memory abstraction (a register) emulated over a message passing system, and compares it with fundamental message passing abstractions like consensus and various forms of reliable broadcast. We introduce ..."
Abstract

Cited by 15 (10 self)
 Add to MetaCart
This paper determines the computational strength of the shared memory abstraction (a register) emulated over a message passing system, and compares it with fundamental message passing abstractions like consensus and various forms of reliable broadcast. We introduce
In search of the holy grail: Looking for the weakest failure detector for waitfree set agreement
, 2006
"... ..."
On the Weakest Failure Detector Ever
 PODC'07
, 2007
"... Many problems in distributed computing are impossible when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Many problems in distributed computing are impossible when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is needed to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet nontrivial failure information. We present an abstraction, denoted Υ, that provides very little failure information. In every run of the distributed system, Υ eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period
On failure detectors and type boosters
 In Proceedings of the 17th International Symposium on Distributed Computing (DISC’03
, 2003
"... Abstract. The power of an object type T can be measured as the maximum number n of processes that can solve consensus using only objects of T and registers. This number, denoted cons(T), is called the consensus power of T. This paper addresses the question of the weakest failure detector to solve co ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Abstract. The power of an object type T can be measured as the maximum number n of processes that can solve consensus using only objects of T and registers. This number, denoted cons(T), is called the consensus power of T. This paper addresses the question of the weakest failure detector to solve consensus among a number k> n of processes that communicate using shared objects of a type T with consensus power n. In other words, we seek for a failure detector that is sufficient and necessary to “boost ” the consensus power of a type T from n to k. It was shown in [24] that a certain failure detector, denoted Ωn, is sufficient to boost the power of a type T from n to k, and it was conjectured that Ωn was also necessary. In this paper, we prove this conjecture for oneshot deterministic types. We first show that, for any oneshot deterministic type T with cons(T) ≤ n, Ωn is necessary to boost the power of T from n to n+1. Then we go a step further and show that Ωn is also the weakest to boost the power of (n + 1)ported oneshot deterministic types from n to any k> n. Our result generalizes, in a precise sense, the result of the weakest failure detector to solve consensus in asynchronous messagepassing systems [7]. As a corollary, we show that Ωt is the weakest failure detector to boost the resilience level of a distributed shared memory system, i.e., to solve consensus among n> t processes using (t − 1)resilient objects of consensus power t. 1
Looking for the Weakest Failure Detector for kSet Agreement in Messagepassing Systems
 Is Πk the End of the Road?, INRIA, 2009, http://hal.inria.fr/inria00384993/en/, PI
, 1929
"... Abstract: In the kset agreement problem, each process (in a set of n processes) proposes a value and has to decide a proposed value in such a way that at most k different values are decided. While this problem can easily be solved in asynchronous systems prone to t process crashes when k> t, it can ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Abstract: In the kset agreement problem, each process (in a set of n processes) proposes a value and has to decide a proposed value in such a way that at most k different values are decided. While this problem can easily be solved in asynchronous systems prone to t process crashes when k> t, it cannot be solved when k ≤ t. Since several years, the failure detectorbased approach has been investigated to circumvent this impossibility. While the weakest failure detector class to solve the kset agreement problem in read/write sharedmemory systems has recently been discovered (PODC 2009), the situation is different in messagepassing systems where the weakest failure detector classes are known only for the extreme cases k = 1 (consensus) and k = n − 1 (set agreement). This paper introduces a candidate for the general case. It presents a new failure detector class, denoted Πk, and shows Π1 = Σ × Ω (the weakest class for k = 1), and Πn−1 = L (the weakest class for k = n − 1). Then, the paper investigates the structure of Πk and shows it is the combination of two failures detector classes denoted Σk and Ωk (that generalize the previous “quorums ” and “eventual leaders ” failure detectors classes). Finally, the paper proves that Σk is a necessary requirement (as far as information on failure is concerned) to solve the kset agreement problem in messagepassing systems. The paper presents also a Πn−1based algorithm that solves the (n − 1)set agreement problem. This algorithm provides us with a new algorithmic insight on the way the (n − 1)set agreeement problem can be solved in asynchronous messagepassing systems (insight from the point of view of the nonpartitioning constraint defined by Σn−1).
(Almost) all objects are universal in message passing systems (Extended Abstract)
 IN INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING
, 2005
"... This paper shows that all shared atomic object types that can solve consensus among k>1 processes have the same weakest failure detector in a message passing system with process crash failures. In such a system, object types such as testandset, fetchandadd, andqueue, known to have weak synchroni ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
This paper shows that all shared atomic object types that can solve consensus among k>1 processes have the same weakest failure detector in a message passing system with process crash failures. In such a system, object types such as testandset, fetchandadd, andqueue, known to have weak synchronization power in a shared memory system are thus, in a precise sense, equivalent to universal types like compareandswap, known to have the strongest synchronization power. In the particular case of a message passing system of two processes, we show that, interestingly, even a register is in that sense universal.
Sharing is harder than agreeing
 IN: PODC 2008: PROCEEDINGS OF THE TWENTYSEVENTH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING
, 2008
"... One of the most celebrated results of the theory of distributed computing is the impossibility, in an asynchronous system of n processes that communicate through shared memory registers, to solve the set agreement problem where the processes need to decide on up to n − 1 among their n initial values ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
One of the most celebrated results of the theory of distributed computing is the impossibility, in an asynchronous system of n processes that communicate through shared memory registers, to solve the set agreement problem where the processes need to decide on up to n − 1 among their n initial values. In short, the result indicates that the register abstraction is too weak to implement the set agreement one. This paper explores the relation between these abstractions in a message passing system where a register is not a given physical device but is rather itself implemented by processes communicating through message passing. We show that, maybe surprisingly, the information about process failures that is necessary and sufficient to implement a register shared by two particular processes is sufficient but not necessary to implement set agreement. We later generalize this result by considering kset agreement, where the processes can decide on up to k values, and comparing it with a register shared by any particular subset of 2k processes. We prove that, for 1 ≤ k ≤ n/2, (a) any failure information that is sufficient to implement a register shared by 2k processes is sufficient to implement (n − k)set agreement but (b) a failure information that is sufficient for (n − k)set agreement is not sufficient for a register shared by 2k processes. We also prove that (c) a failure information that is sufficient for a register shared by 2k processes is not sufficient for ((nk)1)set agreement.
Failure Detectors to Solve Asynchronous kSet Agreement: a Glimpse of Recent Results
"... Abstract: In the kset agreement problem, each process proposes a value and has to decide a value in such a way that a decided value is a proposed value and at most k different values are decided. This problem can easily be solved in synchronous systems or in asynchronous systems prone to t process ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract: In the kset agreement problem, each process proposes a value and has to decide a value in such a way that a decided value is a proposed value and at most k different values are decided. This problem can easily be solved in synchronous systems or in asynchronous systems prone to t process crashes when t < k. In contrast, it has been shown that kset agreement cannot be solved in asynchronous systems when k ≤ t. Hence, since several years, the failure detectorbased approach has been investigated to circumvent this impossibility. This approach consists in enriching the underlying asynchronous system with an additional module per process that provides it with information on failures. Hence, without becoming synchronous, the enriched system is no longer fully asynchronous. This paper surveys this approach in both asynchronous shared memory systems and asynchronous message passing systems. It presents and discusses recent results and associated kset agreement algorithms.