Results 1 - 10
of
83
Distributed Reset
- IEEE Transactions on Computers
, 1990
"... We design a reset subsystem that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary. Our design is layered, and comprises three main components: a leader election, a spanning tree construction, and a diffusing computation. Eac ..."
Abstract
-
Cited by 137 (23 self)
- Add to MetaCart
We design a reset subsystem that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary. Our design is layered, and comprises three main components: a leader election, a spanning tree construction, and a diffusing computation. Each of these components is self-stabilizing in the following sense. If the coordination between the up processes in the system is ever lost (due to failures or repairs of processes and channels) then each component eventually reaches a state where coordination is regained. This capability makes our reset subsystem very robust: it can tolerate fail-stop failures and repairs of processes and channels even when a reset is in progress. Categories and Subject Descriptors: C.2.4 [Computer Communication Systems]: Distributed Systems--distributed applications, network operating systems ; D.1.3 [Programming Techniques]: Concurrent Programming ; D.4.5 [Operating Systems]: Reliability--verification, fa...
Self-Stabilization by Local Checking and Correction
, 1997
"... this paper appeared in the 32nd Proceedings of the IEEE Foundations of Computer Science (FOCS) Conference, 1991. ..."
Abstract
-
Cited by 113 (29 self)
- Add to MetaCart
this paper appeared in the 32nd Proceedings of the IEEE Foundations of Computer Science (FOCS) Conference, 1991.
Closure and Convergence: A Foundation of Fault-Tolerant Computing
- IEEE Transactions on Software Engineering
, 1993
"... We give a formal definition of what it means for a system to "tolerate" a class of "faults". The definition consists of two conditions: One, if a fault occurs when the system state is within a set of "legal" states, the resulting state is within some larger set and, if faults continue occurring, the ..."
Abstract
-
Cited by 103 (28 self)
- Add to MetaCart
We give a formal definition of what it means for a system to "tolerate" a class of "faults". The definition consists of two conditions: One, if a fault occurs when the system state is within a set of "legal" states, the resulting state is within some larger set and, if faults continue occurring, the system state remains within that larger set (Closure). And two, if faults stop occurring, the system eventually reaches a state within the legal set (Convergence). We demonstrate the applicability of our definition for specifying and verifying the fault-tolerance properties of a variety of digital and computer systems. Further, using the definition, we obtain a simple classification of fault-tolerant systems and discuss methods for their systematic design. as traditionally been studied in the context of specifi...
SuperStabilizing Protocols for Dynamic Distributed Systems
- Chicago Journal of Theoretical Computer Science
, 1995
"... Two aspects of reliability of distributed protocols are a protocol's ability to recover from transient faults and a protocol's ability to function in a dynamic environment. Approaches for both of these aspects have been separately developed, but have drawbacks when applied to an environment that has ..."
Abstract
-
Cited by 75 (13 self)
- Add to MetaCart
Two aspects of reliability of distributed protocols are a protocol's ability to recover from transient faults and a protocol's ability to function in a dynamic environment. Approaches for both of these aspects have been separately developed, but have drawbacks when applied to an environment that has both transient faults and dynamic changes. This paper introduces definitions and methods for addressing both concerns in the design of systems. A protocol is superstabilizing if it is (i) self-stabilizing, meaning that it is guaranteed to respond to an arbitrary transient fault by eventually satisfying and maintaining a legitimacy predicate, and (ii) it is guaranteed to satisfy a passage predicate at all times when the system undergoes topology changes starting from a legitimate state. The passage predicate is typically a safety property that should hold while the protocol makes progress towards re-establishing legitimacy following a topology change. Specific contributions of the paper inc...
Self-Stabilization Over Unreliable Communication Media
- Distributed Computing
, 1993
"... A self-stabilizing system has the property that it will converge to a desirable state when started from any state. Most previous researchers assumed that processes in self-stabilizing systems may communicate through shared variables while those that studied message passing systems allowed messages w ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
A self-stabilizing system has the property that it will converge to a desirable state when started from any state. Most previous researchers assumed that processes in self-stabilizing systems may communicate through shared variables while those that studied message passing systems allowed messages with unbounded size. This paper discusses the development of self-stabilizing systems which communicate through message passing, and in which messages may be lost in transit. The systems presented all use fixed size message headers. First, a self-stabilizing version of the Alternating Bit Protocol, a fundamental communication protocol for transmitting data across an unreliable communication medium, is presented. Secondly, the alternating-bit protocol is used to construct a self-stabilizing token ring. 1 Introduction Since the development of the first self-stabilizing systems by Dijkstra in the early 1970's [Dij73, Dij82] most researchers have considered systems in which the processes communi...
Self-Stabilization by Local Checking and Global Reset (Extended Abstract)
, 1994
"... Baruch Awerbuch 12 , Boaz Patt-Shamir 2 , George Varghese 3 and Shlomi Dolev 45 1 Dept. of Computer Science, Johns Hopkins University 2 Lab. for Computer Science, MIT 3 Dept. of Computer Science, Washington University 4 Dept. of Computer Science, Texas A&M University 5 School of Comp ..."
Abstract
-
Cited by 37 (12 self)
- Add to MetaCart
Baruch Awerbuch 12 , Boaz Patt-Shamir 2 , George Varghese 3 and Shlomi Dolev 45 1 Dept. of Computer Science, Johns Hopkins University 2 Lab. for Computer Science, MIT 3 Dept. of Computer Science, Washington University 4 Dept. of Computer Science, Texas A&M University 5 School of Computer Science, Carleton University Abstract. We describe a method for transforming asynchronous network protocols into protocols that can sustain any transient fault, i.e., become self-stabilizing. We combine the known notion of local checking with a new notion of internal reset, and prove that given any self-stabilizing internal reset protocol, any locally-checkable protocol can be made self-stabilizing. Our proof is constructive in the sense that we provide explicit code. The method applies to many practical network problems, including spanning tree construction, topology update, and virtual circuit setup. 1 Introduction A network protocol is called self-stabilizing (or stabilizing for sho...
Time-Adaptive Self Stabilization
, 1997
"... We study the scenario where a transient fault hit f of the n nodes of a distributed system by corrupting their state. We consider the basic persistent bit problem, where the system is required to maintain a 0/1 value in the face of transient failures by means of replication. We give an algorithm ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
We study the scenario where a transient fault hit f of the n nodes of a distributed system by corrupting their state. We consider the basic persistent bit problem, where the system is required to maintain a 0/1 value in the face of transient failures by means of replication. We give an algorithm to recover the value quickly: the value of the bit is recovered at all nodes in O(f) time units for an unknown f ! n=2. Moreover, complete state quiescence occurs in O(diam) time units, where diam denotes the actual diameter of the network. This means that the value persists indefinitely so long as any f ! n=2 faults are followed by \Omega\Gamma diam) fault-free time units. We prove matching lower bounds on both the output stabilization time and the state quiescence time. Using our persistent bit algorithm, we present a general transformer which takes a distributed non-reactive non-stabilizing protocol P , and produces a self-stabilizing protocol P 0 which solves the problem P solv...
Self-stabilization by Counter Flushing
- IN PODC94 PROCEEDINGS OF THE THIRTEENTH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING
, 1994
"... A useful way to design simple and robust protocols is to make them self-stabilizing. A protocol ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
A useful way to design simple and robust protocols is to make them self-stabilizing. A protocol
Resource bounds for self stabilizing message driven protocols
- Proc. of the Tenth Annual ACM Symposium on Principles of Distributed Computation
, 1991
"... Abstract. Self-stabilizing message driven protocols are defined and discussed. The class weakexclusion that contains many natural tasks such as ℓ-exclusion and token-passing is defined, and it is shown that in any execution of any self-stabilizing protocol for a task in this class, the configuration ..."
Abstract
-
Cited by 34 (10 self)
- Add to MetaCart
Abstract. Self-stabilizing message driven protocols are defined and discussed. The class weakexclusion that contains many natural tasks such as ℓ-exclusion and token-passing is defined, and it is shown that in any execution of any self-stabilizing protocol for a task in this class, the configuration size must grow at least in a logarithmic rate. This last lower bound is valid even if the system is supported by a time-out mechanism that prevents communication deadlocks. Then we present three self-stabilizing message driven protocols for token-passing. The rate of growth of configuration size for all three protocols matches the aforementioned lower bound. Our protocols are presented for two processor systems but can be easily adapted to rings of arbitrary size. Our results have an interesting interpretation in terms of automata theory.
Local Stabilizer
- In Proceedings of the 5th Israel Symposium on Theory of Computing and Systems
, 1997
"... A local stabilizer protocol that takes any on-line or off-line distributed algorithm and converts it into a synchronous self-stabilizing algorithm with local monitoring and repairing properties is presented. Whenever the self-stabilizing version enters an inconsistent state, the inconsistency is ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
A local stabilizer protocol that takes any on-line or off-line distributed algorithm and converts it into a synchronous self-stabilizing algorithm with local monitoring and repairing properties is presented. Whenever the self-stabilizing version enters an inconsistent state, the inconsistency is detected, in O(1) time, and the system state is repaired in a local manner. The expected computation time that is lost during the repair process is proportional to the largest diameter of a faulty region. An extended abstract of this paper appeared in the Proc. of the 5th Israeli Symposium on Theory of Computing and Systems, June 1997 and a brief announcement in Proc. of the 16th Annual ACM Symp. on Principles of Distributed Computing, August 1997. y Computer Science Department, Tel-Aviv University, Tel-Aviv, 69978, Israel. Email: afek@math.tau.ac.il. z Department of Mathematics and Computer Science, Ben-Gurion University, Beer-Sheva, 84105, Israel. Partially supported by the Israeli m...

