Results 1 -
2 of
2
Understanding Fault-Tolerant Distributed Systems
- Communications of the ACM
, 1993
"... We propose a small number of basic concepts that can be used to explain the architecture of fault-tolerant distributed systems and we discuss a list of architectural issues that we find useful to consider when designing or examining such systems. For each issue we present known solutions and design ..."
Abstract
-
Cited by 296 (23 self)
- Add to MetaCart
We propose a small number of basic concepts that can be used to explain the architecture of fault-tolerant distributed systems and we discuss a list of architectural issues that we find useful to consider when designing or examining such systems. For each issue we present known solutions and design alternatives, we discuss their relative merits and we give examples of systems which adopt one approach or the other. The aim is to introduce some order in the complex discipline of designing and understanding fault-tolerant distributed systems. 1 Introduction Computing systems consist of a multitude of hardware and software components that are bound to fail eventually. In many systems, such component failures can lead to unanticipated, potentially disruptive failure behavior and to service unavailability. Some systems are designed to be fault-tolerant: they either exhibit a well-defined failure behavior when components fail or mask component failures to users, that is, continue to provid...
Structural Analysis of Explicit FaultTolerant Programs
- Proc of 8 th IEEE International Symposium on High-Assurance Systems engineering (HASE’04
, 2004
"... Explicit fault tolerant programs are characterized by proactive efforts to ensure robustness and ability of fault correction. A fault tolerant application is usually realized conforming to one of a collection of standard techniques. Graph based methods can be used to examine existing applications to ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Explicit fault tolerant programs are characterized by proactive efforts to ensure robustness and ability of fault correction. A fault tolerant application is usually realized conforming to one of a collection of standard techniques. Graph based methods can be used to examine existing applications to derive a control flow abstraction with respect to the fault-tolerance architecture. This abstraction, which we call the fault tolerance behavioural type, can be used as basis of structural analysis of the implemented architecture. This paper outlines the basic ideas and demonstrates their application using CTL model checking to verify fault tolerance properties of explicit fault-tolerant programs. Key Words: fault tolerant programming, analysis of fault tolerance features, explicit fault tolerance, control flow abstraction, model checking 1.

