MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Unreliable Failure Detectors for Reliable Distributed Systems (1995) [661 citations — 12 self]

Abstract:

We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infnite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].

Citations

1089 Impossibility of Distributed Consensus with One Faulty Process – Fischer, Lynch, et al. - 1985
867 The Byzantine Generals Problem – Lamport, Shostak, et al. - 1982
589 Implementing fault-tolerant services using the state machine approach: a tutorial – Schneider - 1990
455 Reliable Communication in the Presence of Failures – Birman, Joseph - 1987
397 Knowledge and common knowledge in a distributed environment – Halpern, Moses - 1990
374 Reaching agreement in the presence of faults – Pease, Shostak, et al. - 1980
343 Reliable broadcast protocols – Chang, Maxemchuk - 1984
326 Transis: A communication subsystem for high availability – Amir, Dolev, et al. - 1992
307 Consensus in the Presence of Partial Synchrony – Dwork, Lynch, et al. - 1988
298 The weakest failure detector for solving Consensus – Chandra, Hadzilacos, et al. - 1996
265 Fault-tolerant broadcasts and related problems – Hadzilacos, Toueg - 1993
212 Preserving and using context information in interprocess communication – Peterson, Bucholz, et al. - 1989
206 Atomic broadcast: From simple message diffusion to Byzantine agreement – Cristian, Aghili, et al. - 1985
194 On the minimal synchronism needed for distributed consensus – Dolev, Dwork, et al. - 1987
153 Another advantage of free choice: completely asynchronous agreement protocols (extended abstract – Ben-Or - 1983
153 Using process groups to implement failure detection in asynchronous environments – Ricciardi, Birman - 1991
150 Delta-4: A Generic Architecture for Dependable Distributed Computing – Powell - 1991
132 Memory requirements for agreement among unreliable asynchronous processes – Loui, Abu-Amara - 1987
122 A modular approach to fault-tolerant broadcasts and related problems – Hadzilacos, Toueg - 1994
116 Asynchronous consensus and broadcast protocols – Bracha, Toueg - 1985
94 The Consensus Problem in Unreliable Distributed Systems (A Brief Survey – Fischer - 1983
78 Reaching approximate agreement in the presence of faults – Dolev, Lynch, et al. - 1986
72 SIFT: Design and analysis of a fault-tolerant computer for aircraft control – Wensley
69 Revisiting the relationship between non blocking atomic commitment and consensus – Guerraoui - 1995
59 The implementation of reliable distributed multiprocess systems – Lamport - 1978
58 Automatically increasing the fault-tolerance of distributed algorithms – Neiger, Toueg - 1990
47 Cynthia Dwork: Randomization in Byzantine Agreement – Chor - 1989
45 Fault-tolerance in the advanced automation system – Cristian, Dancey, et al. - 1990
40 Bounds on the time to reach agreement in the presence of timing uncertainty – Attiya, Dwork, et al. - 1991
34 Achievable cases in an asynchronous environment – Attiya, Bar-Noy, et al. - 1987
34 Using failure detectors to solve consensus in asynchronous shared-memory systems – Lo, Hadzilacos - 1994
29 A combinatorial characterization of the distributed tasks that are solvable in the presence of one faulty processor – Biran, Moran, et al. - 1988
26 Towards Optimal Distributed Consensus – Berman, Garay, et al. - 1989
25 Election vs. consensus in asynchronous systems – Sabel, Marzullo - 1995
24 Cheating husbands and other stories: a case study of knowledge, action, and communication – Moses, Dolev, et al. - 1986
19 Fault-tolerant decision making in totally asynchronous distributed systems – Bridgland, Watro - 1987
18 Time and message efficient reliable broadcasts – Chandra, Toueg - 1990
17 Reliable scheduling in a TMR database system – Pittelli, Garcia-Molina - 1989
16 A new solution for the Byzantine generals problem – Reischuk - 1982
16 Shmuel Zaks. A combinatorial characterization of the distributed tasks that are solvable in the presence of one faulty processor – Biran, Moran - 1988
15 Impossibility of group membership in asynchronous systems – Chandra, Hadzilacos, et al. - 1995
12 Early-delivery atomic broadcast – Gopal, Strong, et al. - 1990
12 Failure detectors and the wait-free hierarchy – Neiger - 1995
9 The Amoeba Distributed operating system: Selected papers – Mullender - 1987
8 Issues in the design of highly available computing services – Cristian - 1987
5 Isis - A Distributed Programming Environment – Birman - 1990
3 Early-stopping distributed bidding and applications – Budhiraja, Gopal, et al. - 1990
1 E-mail correspondence. Showed that 3W cannot be used to solve non-blocking atomic commit – Chandra, Larrea - 1994
1 Time and message e#cient reliable broadcasts – Chandra, Toueg - 1990
1 Achievable cases in an asynchronous environment – ATrIYA, BAR-N•, et al. - 1987