MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Distributed Fault Tolerance - Lessons Learnt from Delta-4 (1994)

by Learnt From Delta ,  David Powell
Add To MetaCart

Abstract:

Software-implemented approaches to fault tolerance are very resilient to change since evolution in hardware technology does not require extensive re-design of specialized hardware. This paper argues the case for implementing fault tolerance in a distributed fashion and reports the approach adopted in the European Delta-4 project. Fault tolerance is achieved by replicating capsules (the run-time representation of application objects) on distributed nodes interconnected by a local area network. Capsule groups can be configured to tolerate either stopping failures or arbitrary failures. Multipoint protocols are used for coordinating capsule groups and for error processing and fault treatment. The paper concludes with a critical analysis of the project's results. 1. Introduction Many, if not most, modern computing systems are distributed systems. Distribution is often motivated by organizational reasons (e.g., sharing of data in integrated information systems) or physical constraints (e.g...

Citations

589 Implementing fault-tolerant services using the state machine approach: a tutorial – Schneider - 1990
206 Atomic broadcast: From simple message diffusion to Byzantine agreement – Cristian, Aghili, et al. - 1985
136 Why do computers stop and what can be done about it – Gray - 1985
108 Distributed Systems – Mullender - 1993
103 Failure mode assumptions and assumption coverage, in – Powell - 1992
51 Amp: A highly parallel atomic multicast protocol – Verissimo, Rodrigues, et al. - 1989
48 Exploiting Replication in Distributed Systems – Birman, Joseph - 1989
45 Fault-tolerance in the advanced automation system – Cristian, Dancey, et al. - 1990
44 Reliable multicast between micro-kernels – Renesse, Birman, et al.
34 The DELTA-4 extra performance architecture (XPA – Barrett, Hilborne, et al. - 1990
28 Replicated procedure call – Cooper - 1984
22 Delta-4: A Generic Architecture for Dependable – Powell - 1991
21 Experimental evaluation of the fault tolerance of an atomic multicast system – Arlat, Aguera, et al. - 1990
15 Dependability: Basic Concepts and Terminology, Dependable Computing and Fault-Tolerance – Laprie - 1992
14 Active replication in delta-4 – Chérèque, Powell, et al. - 1992
13 Using Passive Replicates in Delta-4 to provide Dependable Distributed Computing – Speirs, Barrett - 1989
9 A Theoretician's View of Fault Tolerant Distributed Computing – Fischer - 1990
8 Formal Specification and Mechanical Verification of SIFT: A Fault-Tolerant Flight Control System – Melliar-Smith, Schwartz - 1982
3 Dependability Evaluation of Bus and Ring Communication Topologies for the Delta-4 Distributed Fault-Tolerant Architecture – Kanoun, Powell - 1991
2 Dependability Testing Report – Arlat, Crouzet, et al. - 1991