Results 1 - 10
of
14
On the Weakest Failure Detector Ever
- PODC'07
, 2007
"... Many problems in distributed computing are impossible when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Many problems in distributed computing are impossible when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is needed to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure information. We present an abstraction, denoted Υ, that provides very little failure information. In every run of the distributed system, Υ eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period
J.L.: Failure detectors encapsulate fairness
- In: International Conference on Principles of Distributed Systems. LNCS
"... Abstract. Failure detectors have long been viewed as abstractions for the synchronism present in distributed system models. However, investigations into the exact amount of synchronism encapsulated by a given failure detector have met with limited success. The reason for this is that traditionally, ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. Failure detectors have long been viewed as abstractions for the synchronism present in distributed system models. However, investigations into the exact amount of synchronism encapsulated by a given failure detector have met with limited success. The reason for this is that traditionally, models of partial synchrony are specified with respect to real time, but failure detectors do not encapsulate real time. Instead, we argue that failure detectors encapsulate the fairness in computation and communication. Fairness is a measure of the number of steps executed by one process relative either to the number of steps taken by another process or relative to the duration for which a message is in transit. We argue that failure detectors are substitutable for the fairness properties (rather than real-time properties) of partially synchronous systems. We propose four fairness-based models of partial synchrony and demonstrate that they are, in fact, the ‘weakest system models ’ to implement the canonical failure detectors from the Chandra-Toueg hierarchy. We also propose a set of fairness-based models which encapsulate the Gc parametric failure detectors which eventually and permanently suspect crashed processes, and eventually and permanently trust some fixed set of c correct processes.
The Weak Mutual Exclusion Problem
"... Inthispaperwe definetheWeakMutualExclusion(WME) problem. Analogously to classical Distributed Mutual Exclusion (DME), WME serializes the accesses to a shared resource. Differently from DME, however, the WME abstraction regulates the access to a replicated shared resource, whose copies are locally m ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Inthispaperwe definetheWeakMutualExclusion(WME) problem. Analogously to classical Distributed Mutual Exclusion (DME), WME serializes the accesses to a shared resource. Differently from DME, however, the WME abstraction regulates the access to a replicated shared resource, whose copies are locally maintained by every participating process. Also, in WME, processes suspected to have crashed are possibly ejected from the critical section Weprovethat,unlikeDME,WMEissolvableinapartially synchronous model, i.e. a system where the bounds on communication latency and on relative process speeds are not known in advance, or are known but only hold after an unknown time. Finally we demonstrate that ♦P is the weakest failure detector for solving WME, and present an algorithm that solves WME using ♦P with a majority of correct processes.
With Finite Memory Consensus is Easier Than Reliable Broadcast
, 2008
"... We consider asynchronous distributed systems with message losses and process crashes. We study the impact of finite process memory on the solution to consensus, repeated consensus and reliable broadcast. With finite process memory, we show that in some sense consensus is easier to solve than reliabl ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We consider asynchronous distributed systems with message losses and process crashes. We study the impact of finite process memory on the solution to consensus, repeated consensus and reliable broadcast. With finite process memory, we show that in some sense consensus is easier to solve than reliable broadcast, and that reliable broadcast is as difficult to solve as repeated consensus: More precisely, with finite memory, consensus can be solved with failure detector S, and P− (a variant of the perfect failure detector which is stronger than S) is necessary and sufficient to solve reliable broadcast and repeated consensus. Distributed algorithms, failure detectors, reliable broadcast, consensus, repeated consensus.
K.: Wait-free dining under eventual weak exclusion
, 2006
"... Abstract. We present a wait-free solution to the generalized dining philosophers problem under eventual weak exclusion in environments subject to crash faults. Wait-free dining guarantees that every correct hungry process eventually eats, regardless of process crashes. Eventual weak exclusion (✸WX) ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We present a wait-free solution to the generalized dining philosophers problem under eventual weak exclusion in environments subject to crash faults. Wait-free dining guarantees that every correct hungry process eventually eats, regardless of process crashes. Eventual weak exclusion (✸WX) actually allows scheduling mistakes, whereby mutual exclusion may be violated finitely-many times; for each run, however, there must exist a convergence point after which live neighbors never eat simultaneously. Wait-free dining under ✸WX is particularly useful for synchronization tasks where eventual safety is sufficient for correctness (e.g., duty-cycle scheduling, self-stabilizing daemons, and contention managers). Unfortunately, wait-free dining is unsolvable in asynchronous systems. As such, we characterize sufficient conditions for solvability under partial synchrony by presenting a wait-free dining algorithm for ✸WX using a local refinement of the eventually perfect failure detector ✸P1.
Sigma: A Fault-Tolerant Mutual Exclusion Algorithm in Dynamic Distributed Systems Subject to Process Crashes and Memory Losses
"... This paper introduces the Sigma algorithm that solves fault-tolerant mutual exclusion problem in dynamic systems where the set of processes may be large and change dynamically, processes may crash, and the recovery or replacement of crashed processes may lose all state information (memory losses). S ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper introduces the Sigma algorithm that solves fault-tolerant mutual exclusion problem in dynamic systems where the set of processes may be large and change dynamically, processes may crash, and the recovery or replacement of crashed processes may lose all state information (memory losses). Sigma algorithm includes new messaging mechanisms to tolerate process crashes and memory losses. It does not require any extra cost for process recovery. The paper also shows that the threshold used by the Sigma algorithm is necessary for systems with process crashes and memory losses. 1.
An Efficient Weak Mutual Exclusion Algorithm
, 2009
"... The Weak Mutual Exclusion (WME) is a recently proposed abstraction which, analogously to classical Distributed Mutual Exclusion (DME), permits to serialize concurrent accesses to a shared resource. Unlike DME, however, the WME abstractionregulates the access to a replicated shared resource and is so ..."
Abstract
- Add to MetaCart
The Weak Mutual Exclusion (WME) is a recently proposed abstraction which, analogously to classical Distributed Mutual Exclusion (DME), permits to serialize concurrent accesses to a shared resource. Unlike DME, however, the WME abstractionregulates the access to a replicated shared resource and is solvable in the presence of less restrictive synchrony assumptions, i.e. in an asynchronous system augmented with an eventually perfect failure detector. This paper presents an efficient WME algorithm which outperforms previous solutions in terms of both communication latency and message complexity, while relying on minimal synchrony assumptions.
Technical Report: TAMU-CS-TR-2010-7-1Failure Detectors Encapsulate Fairness ∗
"... Failure detectors have long been viewed as abstractions for the synchronism present in distributed system models. However, investigations into the exact amount of synchronism encapsulated by a given failure detector have met with limited success. The reason for this is that traditionally, models of ..."
Abstract
- Add to MetaCart
Failure detectors have long been viewed as abstractions for the synchronism present in distributed system models. However, investigations into the exact amount of synchronism encapsulated by a given failure detector have met with limited success. The reason for this is that traditionally, models of partial synchrony are specified with respect to real time, but failure detectors do not encapsulate real time. Instead, we argue that failure detectors encapsulate the fairness in computation and communication. Fairness is a measure of the number of steps executed by one process relative either to the number of steps taken by another process or relative to the duration for which a message is in transit. We argue that partially synchronous systems are perhaps better specified with fairness constraints (rather than real-time constraints) on computation and communication. We demonstrate the utility of this approach by specifying the weakest system models to implement failure detectors in the Chandra-Toueg hierarchy. 1
Abstracting out Byzantine Behavior
"... Abstract. Many distributed systems are designed to tolerate the presence of Byzantine failures: an individual process may arbitrarily deviate from the algorithm assigned to it. Depending on the application requirements, systems enjoy various levels of fault-tolerance. Systems based on state machine ..."
Abstract
- Add to MetaCart
Abstract. Many distributed systems are designed to tolerate the presence of Byzantine failures: an individual process may arbitrarily deviate from the algorithm assigned to it. Depending on the application requirements, systems enjoy various levels of fault-tolerance. Systems based on state machine replication are able to mask failures so that their effect is not visible by the application. In contrast, cooperative peer-to-peer systems can tolerate bounded deviant behavior to some extent and therefore do not require masking, as long as each faulty node is exposed eventually. Finding an abstract way to reason about the levels of fault-tolerance is thus of immanent importance. In this paper, we discuss how the information of deviant behavior can be abstracted out in the form of a Byzantine failure detector (BFD). We formally define a BFD abstraction, and we discuss two ways of using the abstraction: (1) monitoring systems in order to retroactively detect Byzantine failures and (2) enforcing systems in order to boost their level of fault-tolerance. Interestingly, the BFD formalism allowed us to determine the relative hardness of implementing two popular abstractions in distributed computing: state machine replication and weak interactive consistency.

