Results 1 -
6 of
6
Unreliable Failure Detectors for Reliable Distributed Systems
- Journal of the ACM
, 1996
"... We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with ..."
Abstract
-
Cited by 807 (17 self)
- Add to MetaCart
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
On the Formal Specification of Group Membership Services
, 1995
"... The problem of group membership has been the focus of much theoretical and experimental work on fault-tolerant distributed systems. This has resulted in a voluminous literature and several formal specifications of this problem have been given. In this paper, we examine the two most referenced formal ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
The problem of group membership has been the focus of much theoretical and experimental work on fault-tolerant distributed systems. This has resulted in a voluminous literature and several formal specifications of this problem have been given. In this paper, we examine the two most referenced formal specifications of group membership and show that they are unsatisfactory: One has flaws in the formalism and allows undesirable executions, and the other can be satisfied by useless protocols. 1 Introduction Group membership is an important component of several experimental or commercial fault-tolerant distributed systems such as the Highly Available System [Cri87], Isis [Bir93], Horus [vRBC + 93], Transis [ADKM92a], Amoeba [KT91], Newtop [EMS95], and Relacs [BDGB94]. Roughly speaking, a group membership protocol manages the formation and maintenance of a set of processes called a group. For example, a group may be a set of processes that are cooperating towards a common task (e.g., th...
Unreliable Failure Detectors For Asynchronous Distributed Systems
- in the Proceedings of the 10 th Annual ACM Symposium on Principles of Distributed Computing
, 1993
"... equivalent in asynchronous systems. Thus all our results regarding the solvability of Consensus using failure detectors, apply to Atomic Broadcast as well. The work in this thesis was funded by an IBM graduate fellowship and grants from NSF, DARPA/NASA, the IBM Endicott Programming Laboratory, Siem ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
equivalent in asynchronous systems. Thus all our results regarding the solvability of Consensus using failure detectors, apply to Atomic Broadcast as well. The work in this thesis was funded by an IBM graduate fellowship and grants from NSF, DARPA/NASA, the IBM Endicott Programming Laboratory, Siemens Corp and the Natural Siences and Engineering Research Council of Canada. Biographical Sketch Tushar Deepak Chandra was born in New Delhi, India on November 13, 1966. He spent his childhood in various cities in India: Bombay, Calcutta and finally Kanpur. After completing high school at the Doon school, he went on to do a Bachelor of Technology in Computer Science at the Indian Institute of Technology at Kanpur. He joined the graduate program in Computer Science at Cornell University in August 1988. iii This thesis is dedicated to my parents who taught me how to think. iv Acknowledgements A large number of people contributed either directly or i
A Lightweight Solution to Uniform Atomic Broadcast for Asynchronous Systems: Proofs
, 1996
"... Chandra and Toueg proposed in [CT93] a new approach to overcome the impossibility of reaching deterministically Consensus -- and by corollary Atomic Broadcast -- in asynchronous systems subject to crash failures. They augment the asynchronous system with a possibly Unreliable Failure Detector which ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Chandra and Toueg proposed in [CT93] a new approach to overcome the impossibility of reaching deterministically Consensus -- and by corollary Atomic Broadcast -- in asynchronous systems subject to crash failures. They augment the asynchronous system with a possibly Unreliable Failure Detector which provides some information about the operational state of processes. In this report, we present an extension of the Consensus problem that we call Uniform Prefix Agreement. This extension enables all the processes to propose a flow of messages during an execution -- instead of one as in the Consensus problem -- and uses all these proposed messages to compose its decision value. Prefix Agreement is based on an Unreliable Failure Detectors. We use repeated executions of Prefix Agreement to build an efficient and lightweight Uniform Atomic Broadcast algorithm. This report describes the Uniform Prefix Agreement and Uniform Atomic Broadcast algorithms, and provides proofs of their correctnes...

