Results 1 - 10
of
19
Unreliable Failure Detectors for Reliable Distributed Systems
- Journal of the ACM
, 1996
"... We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with ..."
Abstract
-
Cited by 807 (17 self)
- Add to MetaCart
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
Round-by-round fault detectors: Unifying synchrony and asynchrony
- In Proc of the 17th ACM Symp. Principles of Distributed Computing (PODC
, 1998
"... and insights. 1 Introduction For many years, researchers studying synchronous message-passing systems have considered algorithms composed of rounds of computation. In each round, a process sends a message to the others and then waits to receive messages from the other processes. The synchronous natu ..."
Abstract
-
Cited by 43 (7 self)
- Add to MetaCart
and insights. 1 Introduction For many years, researchers studying synchronous message-passing systems have considered algorithms composed of rounds of computation. In each round, a process sends a message to the others and then waits to receive messages from the other processes. The synchronous nature of the system ensures that, by the end of the round, each process receives all messages sent to it in that round by correct processes. In the parlance of Elrad and Frances [1] then, each round of a synchronous system is a communication-closed-layer.
Shared Memory vs Message Passing
, 2004
"... This paper determines the computational strength of the shared memory abstraction (a register) emulated over a message passing system, and compares it with fundamental message passing abstractions like consensus and various forms of reliable broadcast. We introduce ..."
Abstract
-
Cited by 15 (10 self)
- Add to MetaCart
This paper determines the computational strength of the shared memory abstraction (a register) emulated over a message passing system, and compares it with fundamental message passing abstractions like consensus and various forms of reliable broadcast. We introduce
On the Weakest Failure Detector Ever
- PODC'07
, 2007
"... Many problems in distributed computing are impossible when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Many problems in distributed computing are impossible when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is needed to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure information. We present an abstraction, denoted Υ, that provides very little failure information. In every run of the distributed system, Υ eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period
On failure detectors and type boosters
- In Proceedings of the 17th International Symposium on Distributed Computing (DISC’03
, 2003
"... Abstract. The power of an object type T can be measured as the maximum number n of processes that can solve consensus using only objects of T and registers. This number, denoted cons(T), is called the consensus power of T. This paper addresses the question of the weakest failure detector to solve co ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Abstract. The power of an object type T can be measured as the maximum number n of processes that can solve consensus using only objects of T and registers. This number, denoted cons(T), is called the consensus power of T. This paper addresses the question of the weakest failure detector to solve consensus among a number k> n of processes that communicate using shared objects of a type T with consensus power n. In other words, we seek for a failure detector that is sufficient and necessary to “boost ” the consensus power of a type T from n to k. It was shown in [24] that a certain failure detector, denoted Ωn, is sufficient to boost the power of a type T from n to k, and it was conjectured that Ωn was also necessary. In this paper, we prove this conjecture for one-shot deterministic types. We first show that, for any one-shot deterministic type T with cons(T) ≤ n, Ωn is necessary to boost the power of T from n to n+1. Then we go a step further and show that Ωn is also the weakest to boost the power of (n + 1)-ported one-shot deterministic types from n to any k> n. Our result generalizes, in a precise sense, the result of the weakest failure detector to solve consensus in asynchronous message-passing systems [7]. As a corollary, we show that Ωt is the weakest failure detector to boost the resilience level of a distributed shared memory system, i.e., to solve consensus among n> t processes using (t − 1)-resilient objects of consensus power t. 1
Looking for the Weakest Failure Detector for k-Set Agreement in Message-passing Systems
- Is Πk the End of the Road?, INRIA, 2009, http://hal.inria.fr/inria-00384993/en/, PI
, 1929
"... Abstract: In the k-set agreement problem, each process (in a set of n processes) proposes a value and has to decide a proposed value in such a way that at most k different values are decided. While this problem can easily be solved in asynchronous systems prone to t process crashes when k> t, it can ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract: In the k-set agreement problem, each process (in a set of n processes) proposes a value and has to decide a proposed value in such a way that at most k different values are decided. While this problem can easily be solved in asynchronous systems prone to t process crashes when k> t, it cannot be solved when k ≤ t. Since several years, the failure detector-based approach has been investigated to circumvent this impossibility. While the weakest failure detector class to solve the k-set agreement problem in read/write shared-memory systems has recently been discovered (PODC 2009), the situation is different in message-passing systems where the weakest failure detector classes are known only for the extreme cases k = 1 (consensus) and k = n − 1 (set agreement). This paper introduces a candidate for the general case. It presents a new failure detector class, denoted Πk, and shows Π1 = Σ × Ω (the weakest class for k = 1), and Πn−1 = L (the weakest class for k = n − 1). Then, the paper investigates the structure of Πk and shows it is the combination of two failures detector classes denoted Σk and Ωk (that generalize the previous “quorums ” and “eventual leaders ” failure detectors classes). Finally, the paper proves that Σk is a necessary requirement (as far as information on failure is concerned) to solve the k-set agreement problem in message-passing systems. The paper presents also a Πn−1-based algorithm that solves the (n − 1)-set agreement problem. This algorithm provides us with a new algorithmic insight on the way the (n − 1)-set agreeement problem can be solved in asynchronous message-passing systems (insight from the point of view of the non-partitioning constraint defined by Σn−1).
(Almost) all objects are universal in message passing systems (Extended Abstract)
- IN INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING
, 2005
"... This paper shows that all shared atomic object types that can solve consensus among k>1 processes have the same weakest failure detector in a message passing system with process crash failures. In such a system, object types such as test-and-set, fetch-and-add, andqueue, known to have weak synchroni ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper shows that all shared atomic object types that can solve consensus among k>1 processes have the same weakest failure detector in a message passing system with process crash failures. In such a system, object types such as test-and-set, fetch-and-add, andqueue, known to have weak synchronization power in a shared memory system are thus, in a precise sense, equivalent to universal types like compareand-swap, known to have the strongest synchronization power. In the particular case of a message passing system of two processes, we show that, interestingly, even a register is in that sense universal.
Sharing is harder than agreeing
- IN: PODC 2008: PROCEEDINGS OF THE TWENTY-SEVENTH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING
, 2008
"... One of the most celebrated results of the theory of distributed computing is the impossibility, in an asynchronous system of n processes that communicate through shared memory registers, to solve the set agreement problem where the processes need to decide on up to n − 1 among their n initial values ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
One of the most celebrated results of the theory of distributed computing is the impossibility, in an asynchronous system of n processes that communicate through shared memory registers, to solve the set agreement problem where the processes need to decide on up to n − 1 among their n initial values. In short, the result indicates that the register abstraction is too weak to implement the set agreement one. This paper explores the relation between these abstractions in a message passing system where a register is not a given physical device but is rather itself implemented by processes communicating through message passing. We show that, maybe surprisingly, the information about process failures that is necessary and sufficient to implement a register shared by two particular processes is sufficient but not necessary to implement set agreement. We later generalize this result by considering k-set agreement, where the processes can decide on up to k values, and comparing it with a register shared by any particular subset of 2k processes. We prove that, for 1 ≤ k ≤ n/2, (a) any failure information that is sufficient to implement a register shared by 2k processes is sufficient to implement (n − k)-set agreement but (b) a failure information that is sufficient for (n − k)-set agreement is not sufficient for a register shared by 2k processes. We also prove that (c) a failure information that is sufficient for a register shared by 2k processes is not sufficient for ((n-k)-1)-set agreement.
Failure detectors and extended Paxos for k-set agreement
, 2007
"... Failure detector class Ωk has been defined in [17] as an extension to failure detector Ω, and an algorithm has been given in [15] to solve k-set agreement using Ωk in asynchronous message-passing systems. In this paper, we extend these previous works in two directions. First, we define two new class ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Failure detector class Ωk has been defined in [17] as an extension to failure detector Ω, and an algorithm has been given in [15] to solve k-set agreement using Ωk in asynchronous message-passing systems. In this paper, we extend these previous works in two directions. First, we define two new classes and Ω′ ′ , which are new ways of extending Ω, and show that they are equivalent of failure detectors Ω ′ k k to Ωk. Class Ω ′ k is more flexible than Ωk in that it does not require the outputs to stabilize eventually, while class Ω ′′ k does not refer to other processes in its outputs and thus serves as a good basis for the partitioned failure detectors we introduce in [6]. Second, we present a new algorithm that solves k-set agreement using Ω ′′ k when a majority of processes do not crash. The algorithm is a faithful extension of the Paxos algorithm [11], and thus it inherits the efficiency, flexibility, and robustness of the Paxos algorithm. In particular, it has better message complexity than the algorithm in [15]. Both the new failure detectors and the new algorithm enrich our understanding of the k-set agreement problem. In particular, they serve as the basis of our study on partitioned failure detectors for k-set agreement [6].
The Multiplicative Power of Consensus Numbers
, 2010
"... Abstract: The Borowsky-Gafni (BG) simulation algorithm is a powerful reduction algorithm that shows that t-resilience of decision tasks can be fully characterized in terms of wait-freedom. Said in another way, the BG simulation shows that the crucial parameter is not the number n of processes but th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract: The Borowsky-Gafni (BG) simulation algorithm is a powerful reduction algorithm that shows that t-resilience of decision tasks can be fully characterized in terms of wait-freedom. Said in another way, the BG simulation shows that the crucial parameter is not the number n of processes but the upper bound t on the number of processes that are allowed to crash. The BG algorithm considers colorless decision tasks in the base read/write shared memory model. (Colorless means that if, a process decides a value, any other process is allowed to decide the very same value.) This paper considers system models made up of n processes prone to up to t crashes, and where the processes communicate by accessing read/write atomic registers (as assumed by the BG) and (differently from the BG) objects with consensus number x, accessible by at most x processes (with x ≤ t < n). Let ASM(n, t, x) denote such a system model. While the BG simulation has shown that the models ASM(n, t, 1) and ASM(t + 1, t, 1) are equivalent, this paper focuses the pair (t, x) of parameters of a system model. Its main result is the following: the system models ASM(n1, t1, x1) and ASM(n2, t2, x2) have the same computational power for colorless ⌋. As can be seen, this contribution complements and extends the BG simulation. It shows that decision tasks if and only if ⌊ t1 x1 t2 x2 consensus numbers have a multiplicative power with respect to failures, namely the system models ASM(n, t ′ , x) and ASM(n, t, 1) are equivalent for colorless decision tasks iff (t × x) ≤ t ′ ≤ (t × x) + (x − 1).

