Results 1 - 10
of
59
Distributed Reset
- IEEE Transactions on Computers
, 1990
"... We design a reset subsystem that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary. Our design is layered, and comprises three main components: a leader election, a spanning tree construction, and a diffusing computation. Eac ..."
Abstract
-
Cited by 137 (23 self)
- Add to MetaCart
We design a reset subsystem that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary. Our design is layered, and comprises three main components: a leader election, a spanning tree construction, and a diffusing computation. Each of these components is self-stabilizing in the following sense. If the coordination between the up processes in the system is ever lost (due to failures or repairs of processes and channels) then each component eventually reaches a state where coordination is regained. This capability makes our reset subsystem very robust: it can tolerate fail-stop failures and repairs of processes and channels even when a reset is in progress. Categories and Subject Descriptors: C.2.4 [Computer Communication Systems]: Distributed Systems--distributed applications, network operating systems ; D.1.3 [Programming Techniques]: Concurrent Programming ; D.4.5 [Operating Systems]: Reliability--verification, fa...
Self-Stabilization by Local Checking and Correction
, 1997
"... this paper appeared in the 32nd Proceedings of the IEEE Foundations of Computer Science (FOCS) Conference, 1991. ..."
Abstract
-
Cited by 113 (29 self)
- Add to MetaCart
this paper appeared in the 32nd Proceedings of the IEEE Foundations of Computer Science (FOCS) Conference, 1991.
Closure and Convergence: A Foundation of Fault-Tolerant Computing
- IEEE Transactions on Software Engineering
, 1993
"... We give a formal definition of what it means for a system to "tolerate" a class of "faults". The definition consists of two conditions: One, if a fault occurs when the system state is within a set of "legal" states, the resulting state is within some larger set and, if faults continue occurring, the ..."
Abstract
-
Cited by 103 (28 self)
- Add to MetaCart
We give a formal definition of what it means for a system to "tolerate" a class of "faults". The definition consists of two conditions: One, if a fault occurs when the system state is within a set of "legal" states, the resulting state is within some larger set and, if faults continue occurring, the system state remains within that larger set (Closure). And two, if faults stop occurring, the system eventually reaches a state within the legal set (Convergence). We demonstrate the applicability of our definition for specifying and verifying the fault-tolerance properties of a variety of digital and computer systems. Further, using the definition, we obtain a simple classification of fault-tolerant systems and discuss methods for their systematic design. as traditionally been studied in the context of specifi...
SuperStabilizing Protocols for Dynamic Distributed Systems
- Chicago Journal of Theoretical Computer Science
, 1995
"... Two aspects of reliability of distributed protocols are a protocol's ability to recover from transient faults and a protocol's ability to function in a dynamic environment. Approaches for both of these aspects have been separately developed, but have drawbacks when applied to an environment that has ..."
Abstract
-
Cited by 75 (13 self)
- Add to MetaCart
Two aspects of reliability of distributed protocols are a protocol's ability to recover from transient faults and a protocol's ability to function in a dynamic environment. Approaches for both of these aspects have been separately developed, but have drawbacks when applied to an environment that has both transient faults and dynamic changes. This paper introduces definitions and methods for addressing both concerns in the design of systems. A protocol is superstabilizing if it is (i) self-stabilizing, meaning that it is guaranteed to respond to an arbitrary transient fault by eventually satisfying and maintaining a legitimacy predicate, and (ii) it is guaranteed to satisfy a passage predicate at all times when the system undergoes topology changes starting from a legitimate state. The passage predicate is typically a safety property that should hold while the protocol makes progress towards re-establishing legitimacy following a topology change. Specific contributions of the paper inc...
Resource bounds for self stabilizing message driven protocols
- Proc. of the Tenth Annual ACM Symposium on Principles of Distributed Computation
, 1991
"... Abstract. Self-stabilizing message driven protocols are defined and discussed. The class weakexclusion that contains many natural tasks such as ℓ-exclusion and token-passing is defined, and it is shown that in any execution of any self-stabilizing protocol for a task in this class, the configuration ..."
Abstract
-
Cited by 34 (10 self)
- Add to MetaCart
Abstract. Self-stabilizing message driven protocols are defined and discussed. The class weakexclusion that contains many natural tasks such as ℓ-exclusion and token-passing is defined, and it is shown that in any execution of any self-stabilizing protocol for a task in this class, the configuration size must grow at least in a logarithmic rate. This last lower bound is valid even if the system is supported by a time-out mechanism that prevents communication deadlocks. Then we present three self-stabilizing message driven protocols for token-passing. The rate of growth of configuration size for all three protocols matches the aforementioned lower bound. Our protocols are presented for two processor systems but can be easily adapted to rings of arbitrary size. Our results have an interesting interpretation in terms of automata theory.
Stabilization-preserving atomicity refinement
- IN DISC99 DISTRIBUTED COMPUTING 13TH INTERNATIONAL SYMPOSIUM
, 1999
"... Program renements from an abstract to a concrete model empower designers to reason effectively in the abstract and architects to implement effectively in the concrete. For refinements to be useful, they must not only preserve functionality properties but also dependability properties. In this paper ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
Program renements from an abstract to a concrete model empower designers to reason effectively in the abstract and architects to implement effectively in the concrete. For refinements to be useful, they must not only preserve functionality properties but also dependability properties. In this paper, we focus our attention on refinements that preserve the property of stabilization. We distinguish between two types of stabilization-preserving refinements -- atomicity refinement and semantics refinement -- and study the former. Specifically, we present a stabilization-preserving atomicity refinement from a model where a process can atomically access the state of all its neighbors and update its own state, to a model where a process can only atomically access the state of any one of its neighbors or atomically update its own state. (Of course, correctness properties, including termination and fairness, are also preserved.) Our refinement is based on a low-atomicity, bounded-space, stabilizing solution to the dining philosophers problem. It is readily extended to: (a) solve stabilization-preserving semantics refinement, (b) solve the drinking philosophers problem, and (c) allow further refinement into a message-passing model.
Self-Stabilizing End-to-End Communication
, 1996
"... Self-stabilizing protocols must begin operating correctly even when started from an arbitrary state. The end-to-end problem is to ensure reliable message delivery across an unreliable network under the weakest possible guarantee from the network -- that the sender and receiver are never separated by ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
Self-stabilizing protocols must begin operating correctly even when started from an arbitrary state. The end-to-end problem is to ensure reliable message delivery across an unreliable network under the weakest possible guarantee from the network -- that the sender and receiver are never separated by a cut of permanently failed links. In this paper we present the first self-stabilizing end-to-end protocol. Our solution has message complexities comparable with the best known non-stabilizing solutions. Our solution also has good stabilization time complexity: the time for the protocol to stabilize has the same complexity as the time the protocol takes to deliver a message. 1 Introduction Informally, a protocol is self-stabilizing if when started from an arbitrary global state it exhibits "correct" behavior after finite time. While typical protocols are designed to cope with a specified set of failure modes (e.g., message loss, link failures), a self-stabilizing protocol essentially copes...
Constraint Satisfaction as a Basis for Designing Nonmasking Fault-Tolerance
, 1996
"... We present a method for the design of nonmasking fault-tolerant programs. In our method, a set of constraints is associated with each program. As long as faults do not occur, the constraints are continually satisfied under the execution of program actions. Whenever some of the constraints are violat ..."
Abstract
-
Cited by 23 (9 self)
- Add to MetaCart
We present a method for the design of nonmasking fault-tolerant programs. In our method, a set of constraints is associated with each program. As long as faults do not occur, the constraints are continually satisfied under the execution of program actions. Whenever some of the constraints are violated, due to certain faults, all constraints are eventually reestablished by subsequent execution of the program actions. To design programs thus, two types of program actions are distinguished: "closure" actions and "convergence " actions. Closure actions are the actions that perform the intended computation of the program when all of the constraints are satisfied. Convergence actions are the actions that reestablish the constraints when they have been violated. Sufficient conditions for the validation of closure and convergence actions are formalized in terms of a "constraint graph". These conditions are illustrated by designing nonmasking fault-tolerant programs for diffusing computations, ...
The Local Detection Paradigm and its Applications to Self-Stabilization
"... A new paradigm for the design of self-stabilizing distributed algorithms, called local detection, is introduced. The essence of the paradigm is in defining a local condition based on the state of a processor and its immediate neighborhood, such that the system is in a globally legal state if and onl ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
A new paradigm for the design of self-stabilizing distributed algorithms, called local detection, is introduced. The essence of the paradigm is in defining a local condition based on the state of a processor and its immediate neighborhood, such that the system is in a globally legal state if and only if the local condition is satisfied at all the nodes. In this work we also extend the model of self-stabilizing networks traditionally assuming memory failure to include the model of dynamic networks (assuming edge failures and recoveries). We apply the paradigm to the extended model which we call "dynamic self-stabilizing networks. " Without loss of generality, we present the results in the least restrictive shared memory model of read/write atomicity, to which end we construct basic information transfer primitives. Using local detection, we develop deterministic and randomized self-stabilizing algorithms that maintain a rooted spanning tree in a general network whose topology changes dynamically. The deterministic algorithm assumes unique identities while the randomized assumes an anonymous network. The algorithms use a constant number of memory words per edge in each node; and both The size of memory words and of messages is the number of bits necessary to represent a node identity (typically O(log n) bits where n is the size of the network). These algorithms provide for the easy construction of self-stabilizing protocols for numerous tasks: reset, routing, topology-update and self-stabilization transformers that automatically self-stabilize existing protocols for which local detection conditions can be defined.
Self-Stabilizing Depth-First Token Circulation In Arbitrary Rooted Networks
- Distributed Computing
, 1998
"... We present a deterministic distributed depth-first token passing protocol on a rooted network. This protocol uses neither the processor identifiers nor the size of the network, but assumes the existence of a distinguished processor, called the root of the network. The protocol is self-stabilizing, m ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
We present a deterministic distributed depth-first token passing protocol on a rooted network. This protocol uses neither the processor identifiers nor the size of the network, but assumes the existence of a distinguished processor, called the root of the network. The protocol is self-stabilizing, meaning that starting from an arbitrary state (in response to an arbitrary perturbation modifying the memory state), it is guaranteed to reach a state with no more than one token in the network. Our protocol implements a strictly fair token circulation scheme. The proposed protocol has extremely small state requirement---only 3(\Delta + 1) states per processor, i.e., O(log\Delta) bits per processor, where \Delta is the degree of the network. The protocol can be used to implement a strictly fair distributed mutual exclusion in any rooted network. This protocol can also be used to construct a DFS spanning tree. Keywords: Distributed mutual exclusion, self-stabilization, spanning tree, token passing. 1

