Results 1 - 10
of
25
Closure and Convergence: A Foundation of Fault-Tolerant Computing
- IEEE Transactions on Software Engineering
, 1993
"... We give a formal definition of what it means for a system to "tolerate" a class of "faults". The definition consists of two conditions: One, if a fault occurs when the system state is within a set of "legal" states, the resulting state is within some larger set and, if faults continue occurring, the ..."
Abstract
-
Cited by 103 (28 self)
- Add to MetaCart
We give a formal definition of what it means for a system to "tolerate" a class of "faults". The definition consists of two conditions: One, if a fault occurs when the system state is within a set of "legal" states, the resulting state is within some larger set and, if faults continue occurring, the system state remains within that larger set (Closure). And two, if faults stop occurring, the system eventually reaches a state within the legal set (Convergence). We demonstrate the applicability of our definition for specifying and verifying the fault-tolerance properties of a variety of digital and computer systems. Further, using the definition, we obtain a simple classification of fault-tolerant systems and discuss methods for their systematic design. as traditionally been studied in the context of specifi...
A Comparison of Bus Architectures for Safety-Critical Embedded Systems
, 2001
"... Abstract. Embedded systems for safety-critical applications often integrate multiple “functions ” and must generally be fault-tolerant. These requirements lead to a need for mechanisms and services that provide protection against fault propagation and ease the construction of distributed fault-toler ..."
Abstract
-
Cited by 78 (4 self)
- Add to MetaCart
Abstract. Embedded systems for safety-critical applications often integrate multiple “functions ” and must generally be fault-tolerant. These requirements lead to a need for mechanisms and services that provide protection against fault propagation and ease the construction of distributed fault-tolerant applications. A number of bus architectures have been developed to satisfy this need. This paper reviews the requirements on these architectures, the mechanisms employed, and the services provided. Four representative architectures (SAFEbus TM, SPIDER, TTA, and FlexRay) are briefly described. 1
An Overview of Formal Verification for the Time-Triggered Architecture
, 2002
"... We describe formal verification of some of the key algorithms in the Time-Triggered Architecture (TTA) for real-time safety-critical control applications. ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
We describe formal verification of some of the key algorithms in the Time-Triggered Architecture (TTA) for real-time safety-critical control applications.
Maximizable routing metrics
- In Proc. IEEE ICNP
, 1998
"... Abstract—We present a simple theory for maximizable routing metrics. First, we give a formal definition of routing metrics and identify two important properties: boundedness and monotonicity. We show that these two properties are both necessary and sufficient for a routing metric to be maximizable i ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Abstract—We present a simple theory for maximizable routing metrics. First, we give a formal definition of routing metrics and identify two important properties: boundedness and monotonicity. We show that these two properties are both necessary and sufficient for a routing metric to be maximizable in any network. We show how to combine two (or more) routing metrics into a single composite metric such that if the original metrics are both bounded and monotonic (and, hence, maximizable), then the composite metric is also bounded and monotonic (and, hence, maximizable). We present several applications of our theory. We show that the composite routing metric used in the Inter-Gateway Routing Protocol (IGRP) is not maximizable and we show that Enhanced IGRP (EIGRP) does not behave as expected for nonmonotonic metrics. We also show that a technique for scalable link-state routing does not work correctly when applied to composite metrics. A common theme throughout our paper is that the intuitions generated by using distance metrics to produce shortest paths do not carry over to other routing metrics. Index Terms—Communication system routing, communication system signaling, computer networks, distance vector, distributed
Agreement On A Common X-Y Coordinate System By A Group Of Mobile Robots
- In Proc. Dagstuhl Seminar on Modeling and Planning for Sensor-Based Intelligent Robots, Dagstuhl
, 1996
"... this paper we discuss the agreement problem on a common ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
this paper we discuss the agreement problem on a common
Self-Stabilizing Depth-First Token Passing on Rooted Networks
, 1997
"... We present a deterministic distributed depth-first token passing protocol on a rooted network. ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We present a deterministic distributed depth-first token passing protocol on a rooted network.
Fault-containing Self-stabilizing Distributed Protocols
- Distributed Computing
, 2000
"... Self-stabilization is an elegant approach for designing a class of fault-tolerant distributed protocols. A self-stabilizing protocol is guaranteed to eventually converge to a legitimate state after a transient fault. However, even a minor transient fault can cause vast disruption in the system befor ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Self-stabilization is an elegant approach for designing a class of fault-tolerant distributed protocols. A self-stabilizing protocol is guaranteed to eventually converge to a legitimate state after a transient fault. However, even a minor transient fault can cause vast disruption in the system before legitimacy is reached. This paper introduces the notion of fault-containment to address this particular weakness of self-stabilizing systems. Informally, a fault-containing self-stabilizing protocol, in addition to providing self-stabilization, contains the effects of faults. This ensures that disruption during recovery from faults, is proportional to the extent of the faults. The paper begins with a formal framework for specifying and evaluating fault-containing self-stabilizing protocols. The main result of the paper is a transformer that converts any non-reactive self-stabilizing protocol into an equivalent fault-containing self-stabilizing protocol that can repair any single fault in t...
Stabilization of Maximal Metric Trees
- Workshop on Self-Stabilizing Systems ’99
, 1999
"... We present a formal definition of routing metrics and provide the necessary and sufficient conditions for a routing metric to be optimizable along a tree. Based upon these conditions we present a generalization of the shortest path tree which we call the "maximal metric tree". We present a stabilizi ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We present a formal definition of routing metrics and provide the necessary and sufficient conditions for a routing metric to be optimizable along a tree. Based upon these conditions we present a generalization of the shortest path tree which we call the "maximal metric tree". We present a stabilizing protocol for constructing maximal metric trees. Our protocol demonstrates that the distance-vector routing paradigm may be extended to any metric that is optimizable along a tree and in a self-stabilizing manner. Examples of maximal metric trees include shortest path trees (distancevector) , depth first search trees, maximum flow trees, and reliability trees. 1. Introduction A number of papers have addressed stabilizing spanning tree construction and self-stabilizing shortest path tree protocols may be found in [DIM93, AKY90, AKM93, AG94]. Although not always explicit about this, most of the stabilizing tree protocols in the literature are based upon a distancevector approach. In the di...
Self-Stabilizing Mutual Exclusion in the Presence of Faulty Nodes
, 1995
"... This paper presents the RatchetFT distributed faulttolerant mutual exclusion algorithm for processor rings. RatchetFT is self-stabilizing, in that if mutual exclusion is lost due to any sequence of on-line failures and repairs of processors, mutual exclusion will eventually be regained. This resear ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper presents the RatchetFT distributed faulttolerant mutual exclusion algorithm for processor rings. RatchetFT is self-stabilizing, in that if mutual exclusion is lost due to any sequence of on-line failures and repairs of processors, mutual exclusion will eventually be regained. This research demonstrates that self-stabilization can be achieved in the presence of faulty processors, provided that these faulty processors always appear to behave incorrectly. Self-stabilization is achievable even if faulty processor behavior is not restricted to transient failures or other simple failure models. The key results of the paper include the specification of RatchetFT and a detailed sketch of its correctness proof. 1 Introduction Self-stabilizing systems have primarily been studied for fault-free environments [8,10,5,14] in which, given any arbitrary initial state, a self-stabilizing system will eventually reach, and remain in, a legitimate state. Transient failures affecting either sy...

