Results 1  10
of
38
Unreliable Failure Detectors for Reliable Distributed Systems
 Journal of the ACM
, 1996
"... We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with ..."
Abstract

Cited by 1070 (19 self)
 Add to MetaCart
(Show Context)
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
The Weakest Failure Detector for Solving Consensus
, 1996
"... We determine what information about failures is necessary and sufficient to solve Consensus in asynchronous distributed systems subject to crash failures. In [CT91], it is shown that 3W, a failure detector that provides surprisingly little information about which processes have crashed, is sufficien ..."
Abstract

Cited by 471 (20 self)
 Add to MetaCart
We determine what information about failures is necessary and sufficient to solve Consensus in asynchronous distributed systems subject to crash failures. In [CT91], it is shown that 3W, a failure detector that provides surprisingly little information about which processes have crashed, is sufficient to solve Consensus in asynchronous systems with a majority of correct processes. In this paper, we prove that to solve Consensus, any failure detector has to provide at least as much information as 3W. Thus, 3W is indeed the weakest failure detector for solving Consensus in asynchronous systems with a majority of correct processes.
Brewer's Conjecture and the Feasibility of Consistent Available PartitionTolerant Web Services
 In ACM SIGACT News
, 2002
"... When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three. In this note, we prove this conjecture in the asynchronous network model, and then discuss solutions to this dilem ..."
Abstract

Cited by 243 (3 self)
 Add to MetaCart
(Show Context)
When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three. In this note, we prove this conjecture in the asynchronous network model, and then discuss solutions to this dilemma in the partially synchronous model.
More Choices Allow More Faults: Set Consensus Problems In Totally Asynchronous Systems
 Information and Computation
, 1992
"... We define the kset consensus problem as an extension of the consensus problem, where each processor decides on a single value such that the set of decided values in any run is of size at most k. We require the agreement condition that all values decided upon are initial values of some processor. ..."
Abstract

Cited by 110 (4 self)
 Add to MetaCart
We define the kset consensus problem as an extension of the consensus problem, where each processor decides on a single value such that the set of decided values in any run is of size at most k. We require the agreement condition that all values decided upon are initial values of some processor. We show that the problem has a simple (k  1)resilient protocol in a totally asynchronous system. In an attempt to come up with a matching lower bound on the number of failures, we study the uncertainty condition, which requires that there must be some initial configuration from which all possible input values can be decided. We prove using a combinatorial argument that any kresilient protocol for the kset agreement problem would satisfy the uncertainty condition, while this is not true for any (k  1)resilient protocol.
The asynchronous computability theorem for tresilient tasks
 In Proceedings of the 1993 ACM Symposium on Theory of Computing
, 1993
"... We give necessary and sufficient combinatorial conditions characterizing the computational tasks that can be solved by N asynchronous processes, up to t of which can fail by halting. The range of possible input and output values for an asynchronous task can be associated with a highdimensional geom ..."
Abstract

Cited by 99 (15 self)
 Add to MetaCart
(Show Context)
We give necessary and sufficient combinatorial conditions characterizing the computational tasks that can be solved by N asynchronous processes, up to t of which can fail by halting. The range of possible input and output values for an asynchronous task can be associated with a highdimensional geometric structure called a simplicial complex. Our main theorem characterizes computability y in terms of the topological properties of this complex. Most notably, a given task is computable only if it can be associated with a complex that is simply connected with trivial homology groups. In other words, the complex has “no holes!” Applications of this characterization include the first impossibility results for several longstanding open problems in distributed computing, such as the “renaming ” problem of Attiya et. al., the “kset agreement ” problem of Chaudhuri, and a generalization of the approximate agreement problem. 1
WaitFree Algorithms for Fast, LongLived Renaming
 Science of Computer Programming
, 1995
"... We consider waitfree solutions to the renaming problem for sharedmemory multiprocessing systems [3, 5]. In the renaming problem, processes are required to choose new names in order to reduce the size of their name space. Previous solutions to the renaming problem have time complexity that is depen ..."
Abstract

Cited by 80 (14 self)
 Add to MetaCart
(Show Context)
We consider waitfree solutions to the renaming problem for sharedmemory multiprocessing systems [3, 5]. In the renaming problem, processes are required to choose new names in order to reduce the size of their name space. Previous solutions to the renaming problem have time complexity that is dependent on the size of the original name space, and allow processes to acquire names only once. In this paper, we present several new renaming algorithms. Most of our algorithms have time complexity that is independent of the size of the original name space, and some of our algorithms solve a new, more general version of the renaming problem called longlived renaming. In longlived renaming algorithms, processes may repeatedly acquire and release names. 1 Introduction In the M renaming problem [2], each of k processes is required to choose a distinct value, called a name, that ranges over f0; :::; M \Gamma 1g. Each process is assumed to have a unique process identifier ranging over f0::N \...
Selfstabilizing population protocols
 In Ninth International Conference on Principles of Distributed Systems
"... This paper studies selfstabilization in networks of anonymous, asynchronously interacting nodes where the size of the network is unknown. Constantspace protocols are given for Dijkstrastyle roundrobin token circulation, leader election in rings, 2hop coloring in degreebounded graphs, and estab ..."
Abstract

Cited by 44 (9 self)
 Add to MetaCart
(Show Context)
This paper studies selfstabilization in networks of anonymous, asynchronously interacting nodes where the size of the network is unknown. Constantspace protocols are given for Dijkstrastyle roundrobin token circulation, leader election in rings, 2hop coloring in degreebounded graphs, and establishing consistent global orientation in an undirected ring. A protocol to construct a spanning tree in regular graphs using O(log D) memory is also given, where D is the diameter of the graph. A general method for eliminating nondeterministic transitions from the selfstabilizing implementation of a large family of behaviors is used to simplify the constructions, and general conditions under which protocol composition preserves behavior are used in proving their correctness.
Unifying Synchronous and Asynchronous MessagePassing Models
 In Proceedings of the 17th Annual ACM Symposium on Principles of Distributed Computing
, 1998
"... We take a significant step toward unifying the synchronous, semisynchronous, and asynchronous messagepassing models of distributed computation. The key idea is the concept of a pseudosphere, a new combinatorial structure in which each process from a set of processes is independently assigned a valu ..."
Abstract

Cited by 25 (11 self)
 Add to MetaCart
We take a significant step toward unifying the synchronous, semisynchronous, and asynchronous messagepassing models of distributed computation. The key idea is the concept of a pseudosphere, a new combinatorial structure in which each process from a set of processes is independently assigned a value from a set of values. Pseudospheres have a number of nice combinatorial properties, but their principal interest lies in the observation that the behavior of protocols in the three models can be characterized as simple unions of pseudospheres, where the exact structure of these unions is determined by the timing properties of the model. We use this pseudosphere construction to derive new and remarkably succinct proofs of bounds on consensus and kset agreement in the asynchronous and synchronous models, as well as the first lower bound on waitfree kset agreement in the semisynchronous model. To appear in the 16th Annual ACM Symposium on Principles of Distributed Computing (PODC98), Puer...
Using kExclusion to Implement Resilient, Scalable Shared Objects (Extended Abstract)
 in Proceedings of the 13th Annual ACM Symposium on Principles of Distributed Computing
, 1994
"... ) James H. Anderson and Mark Moir Department of Computer Science The University of North Carolina at Chapel Hill Chapel Hill, North Carolina 275993175, USA Abstract We present a methodology for the implementation of resilient shared objects that allows the desired level of resiliency to be selecte ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
) James H. Anderson and Mark Moir Department of Computer Science The University of North Carolina at Chapel Hill Chapel Hill, North Carolina 275993175, USA Abstract We present a methodology for the implementation of resilient shared objects that allows the desired level of resiliency to be selected based on performance concerns. This methodology is based on the kexclusion and renaming problems. To make this methodology practical, we present a number of fast kexclusion algorithms that employ "local spin" techniques to minimize the impact of the processortomemory bottleneck. We also present a new "longlived" renaming algorithm. Our k exclusion algorithms are based on commonlyavailable synchronization primitives, are fast in the absence of contention, and have scalable performance when contention exceeds expected thresholds. By contrast, all prior kexclusion algorithms either require unrealistic atomic operations or perform badly. Our kexclusion algorithms are also the first ...