Results 1 - 10
of
12
Synthesis of fault-tolerant concurrent programs
- Proceedings of the 17th ACM Symposium on Principles of Distributed Computing (PODC
, 1998
"... Methods for mechanically synthesizing concurrent programs from temporal logic specifications obviate the need to manually construct a program and compose a proof of its correctness. A serious drawback of extant synthesis methods, however, is that they produce concurrent programs for models of comput ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
Methods for mechanically synthesizing concurrent programs from temporal logic specifications obviate the need to manually construct a program and compose a proof of its correctness. A serious drawback of extant synthesis methods, however, is that they produce concurrent programs for models of computation that are often unrealistic. In particular, these methods assume completely fault-free operation, i.e., the programs they produce are fault-intolerant. In this paper, we show how to mechanically synthesize fault-tolerant concurrent programs for various fault classes. We illustrate our method by synthesizing fault-tolerant solutions to the mutual exclusion and barrier synchronization problems. Categories and Subject Descriptors: F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and Reasoning about Programs—logics of programs, mechanical verification, specification
Hundreds of Impossibility Results for Distributed Computing
- Distributed Computing
, 2003
"... We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, fault-tolerance, different communication media, and randomization. The resource bounds refe ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, fault-tolerance, different communication media, and randomization. The resource bounds refer to time, space and message complexity. These results are useful in understanding the inherent difficulty of individual problems and in studying the power of different models of distributed computing.
A Pursuer-Evader Game for Sensor Networks
- Sixth Symposium on Self-Stabilizing Systems(SSS’03
, 2003
"... In this paper we present a self-stabilizing program for solving a pursuer-evader problem in sensor networks. The program can be tuned for tracking speed or energy efficiency. In the program, sensor motes close to the evader dynamically maintain a "tracking" tree of depth that is always rooted ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
In this paper we present a self-stabilizing program for solving a pursuer-evader problem in sensor networks. The program can be tuned for tracking speed or energy efficiency. In the program, sensor motes close to the evader dynamically maintain a "tracking" tree of depth that is always rooted at the evader.
Resettable Vector Clocks
- PROCEEDINGS OF THE 19TH ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING (PODC
, 2000
"... Vector clocks (VC) are an inherent component of a rich class of distributed applications. In this paper, we consider the problem of realistic -- more specically, bounded-space and fault-tolerant -- implementation of these client applications. To this end, we generalize the notion of VC to resettable ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
Vector clocks (VC) are an inherent component of a rich class of distributed applications. In this paper, we consider the problem of realistic -- more specically, bounded-space and fault-tolerant -- implementation of these client applications. To this end, we generalize the notion of VC to resettable vector clocks (RVC), and provide a realistic implementation of RVC. Further, we identify an interface contract under which our RVC implementation can be substituted for VC in client applications, without aecting the client's correctness. Based on such substitution, we show how to transform the client so that it is itself realistically implemented; we demonstrate our method in the context of Ricart-Agrawala's mutual exclusion program.
Solving Problems in the Presence of Process Crashes and Lossy Links
, 1996
"... We study the effect of link failures on the solvability of problems in asynchronous systems that are subject to process crashes: given a problem that can be solved in a system with process crashes and reliable links, is the problem solvable even if links are lossy? We answer this question for two ty ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
We study the effect of link failures on the solvability of problems in asynchronous systems that are subject to process crashes: given a problem that can be solved in a system with process crashes and reliable links, is the problem solvable even if links are lossy? We answer this question for two types of lossy links, and show that the answer depends on the maximum number of processes that may crash and the nature of the problem to be solved. In particular, we prove that the answer is positive if fewer than half of the processes may crash or if the problem specification does not refer to the state of processes that crash. However, in general, the answer is negative even if each link can loose only a finite number of messages. 1 Introduction We study the effect of link failures on the solvability of problems in distributed systems. In particular, we address the following question: given a problem that can be solved in a system where the only possible failures are process crashes, is th...
LSRP: Local stabilization in shortest path routing
- IN IEEE-IFIP DSN
, 2003
"... We formulate a notion of local stabilization, by which a system self-stabilizes in time proportional to the size of any perturbation that changes the network topology or the state of nodes. The notion implies that the part of the network involved in the stabilization includes at most the nodes who ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
We formulate a notion of local stabilization, by which a system self-stabilizes in time proportional to the size of any perturbation that changes the network topology or the state of nodes. The notion implies that the part of the network involved in the stabilization includes at most the nodes whose distance from the perturbed nodes is proportional to the perturbation size. Also, we present LSRP, a protocol for local stabilization in shortest path routing. LSRP achieves local stabilization via two techniques. First, it layers system computation into three diffusing waves each having a different propagation speed, i.e., “stabilization wave” with the lowest speed, “containment wave ” with intermediate speed, and “super-containment wave” with the highest speed. The containment wave contains the mistakenly initiated stabilization wave, the super-containment wave contains the mistakenly initiated containment wave, and the super-containment wave self-stabilizes itself locally. Second, LSRP avoids forming loops during stabilization, and it removes all transient loops within small constant time. To the best of our knowledge, LSRP is the first protocol that achieves local stabilization in shortest path routing.
Distributed Verification of Minimum Spanning Trees
- Proc. 25th Annual Symposium on Principles of Distributed Computing
, 2006
"... The problem of verifying a Minimum Spanning Tree (MST) was introduced by Tarjan in a sequential setting. Given a graph and a tree that spans it, the algorithm is required to check whether this tree is an MST. This paper investigates the problem in the distributed setting, where the input is given in ..."
Abstract
-
Cited by 12 (11 self)
- Add to MetaCart
The problem of verifying a Minimum Spanning Tree (MST) was introduced by Tarjan in a sequential setting. Given a graph and a tree that spans it, the algorithm is required to check whether this tree is an MST. This paper investigates the problem in the distributed setting, where the input is given in a distributed manner, i.e., every node “knows ” which of its own emanating edges belong to the tree. Informally, the distributed MST verification problem is the following. Label the vertices of the graph in such a way that for every node, given (its own label and) the labels of its neighbors only, the node can detect whether these edges are indeed its MST edges. In this paper we present such a verification scheme with a maximum label size of O(log n log W), where n is the number of nodes and W is the largest weight of an edge. We also give a matching lower bound of Ω(log n log W) (except when W ≤ log n). Both our bounds improve previously known bounds for the problem. Our techniques (both for the lower bound and for the upper bound) may indicate a strong relation between the fields of proof labeling schemes and implicit labeling schemes. For the related problem of tree sensitivity also presented by Tarjan, our method yields rather efficient schemes for both the distributed and the sequential settings.
LOCI: Local Clustering Service for Large Scale Wireless Sensor Networks
"... We present a local, scalable, and fault-tolerant distributed clustering service, LOCI, that partitions a multi-hop wireless network into clusters of bounded physical radius [R; mR] where m is a constant greater than or equal to 2. That is, each cluster has a leader node such that all nodes within ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
We present a local, scalable, and fault-tolerant distributed clustering service, LOCI, that partitions a multi-hop wireless network into clusters of bounded physical radius [R; mR] where m is a constant greater than or equal to 2. That is, each cluster has a leader node such that all nodes within distance R of the leader belong to the cluster but no node beyond distance mR from the leader belongs to the cluster. LOCI is local in that each node only needs information about nodes that are at most distance 2R away. It is scalable in that each node maintains only a constant amount of state and completes its role in clustering in O(R4) time (O(R2) for a stronger system model), independent of the network size. It is fault-tolerant in that it is self-stabilizing in the presence of state corruption, node and link fail-stops, and node joins. The network partitions generated by LOCI form a Voronoi tessellation as each non-clusterhead node joins the cluster of the nearest clusterhead and the number of clusters LOCI yields is within a constant factor approximation of the minimum number of clusters theoretically feasible. Our simulations demonstrate that, for m = 2, the number of clusters constructed by LOCI exceed the minimum number of clusters theoretically feasible only by a factor of 1.5 for a 1-D network and 2.3 for a 2-D network. As hierarchical clustering is readily achieved by instantiating LOCI at multiple levels, LOCI provides a framework for scalable and fault-tolerant distributed tracking structure for pursuer-evader scenarios that has arisen in our recent work in sensor networks. Furthermore, as part of our efforts towards developing sensor network services in the DARPA Network Embedded Software Technology (NEST) program, we have implemented LOCI in TinyOS on the Mica2 mote platform.
Resettable Vector Clocks: A Case Study in Designing Graybox Fault-Tolerance
, 2000
"... The task of designing fault-tolerance for large-scale applications (applications that inevitably contain multiple components) can be signicantly simplied by designing fault-tolerance at the component level. In contrast to the traditional whitebox and blackbox methods, a graybox method for designing ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The task of designing fault-tolerance for large-scale applications (applications that inevitably contain multiple components) can be signicantly simplied by designing fault-tolerance at the component level. In contrast to the traditional whitebox and blackbox methods, a graybox method for designing fault-tolerance to components allows the design of scalable and low-cost fault-tolerance by exploiting the contracts of the components. In this

