Results 1  10
of
10
Component Based Design of Multitolerant Systems
 IEEE Transactions on Software Engineering
, 1998
"... The concept of multitolerance abstracts problems in system dependability and provides a basis for improved design of dependable systems. In the abstraction, each source of undependability in the system is represented as a class of faults, and the corresponding ability of the system to deal with t ..."
Abstract

Cited by 61 (12 self)
 Add to MetaCart
(Show Context)
The concept of multitolerance abstracts problems in system dependability and provides a basis for improved design of dependable systems. In the abstraction, each source of undependability in the system is represented as a class of faults, and the corresponding ability of the system to deal with that undependability source is represented as a type of tolerance. Multitolerance thus refers to the ability of the system to tolerate multiple faultclasses, each in a possibly different way. In this paper, we present a component based method for designing multitolerance. Two types of components are employed by the method, namely detectors and correctors. A theory of detectors, correctors, and their interferencefree composition with intolerant programs is developed, that enables stepwise addition of components to provide tolerance to a new faultclass while preserving the tolerances to the previously added faultclasses. We illustrate the method by designing a fully distributed, mul...
Designing Masking Faulttolerance via Nonmasking Faulttolerance (Extended Abstract)
 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
, 1998
"... Masking faulttolerance guarantees that programs continually satisfy their specification in the presence of faults. By way of contrast, nonmasking faulttolerance does not guarantee as much: it merely guarantees that when faults stop occurring, program executions converge to states from where progra ..."
Abstract

Cited by 38 (12 self)
 Add to MetaCart
Masking faulttolerance guarantees that programs continually satisfy their specification in the presence of faults. By way of contrast, nonmasking faulttolerance does not guarantee as much: it merely guarantees that when faults stop occurring, program executions converge to states from where programs continually (re)satisfy their specification. In this paper, we show that a practical method to design masking faulttolerance is to first design nonmasking faulttolerance and to then transform the nonmasking faulttolerant program minimally so as to achieve masking faulttolerance. We demonstrate this method by designing novel fully distributed programs for termination detection, mutual exclusion, and leader election, that are masking tolerant of any finite number of process failstops and/or repairs.
A Time Optimal SelfStabilizing Synchronizer Using A Phase Clock
, 2006
"... A synchronizer with a phase counter (sometimes called asynchronous phase clock) is an asynchronous distributed algorithm, where each node maintains a local ‘pulse counter’ that simulates the global clock in a synchronous network. In this paper we present a time optimal selfstabilizing scheme for su ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
A synchronizer with a phase counter (sometimes called asynchronous phase clock) is an asynchronous distributed algorithm, where each node maintains a local ‘pulse counter’ that simulates the global clock in a synchronous network. In this paper we present a time optimal selfstabilizing scheme for such a synchronizer, assuming unbounded counters. We give a simple rule by which each node can compute its pulse number as a function of its neighbors ’ pulse numbers. We also show that some of the popular correction functions for phase clock synchronization are not selfstabilizing in asynchronous networks. Using our rule, the counters stabilize in time bounded by the diameter of the network, without invoking global operations. We argue that the use of unbounded counters can be justified by the availability of memory for counters that are large enough to be practically unbounded, and by the existence of reset protocols that can be used to restart the counters in some rare cases where faults will make this necessary.
A design for node coloring and 1fair alternator on de Bruijn networks
 In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications
, 2003
"... In this paper, we study the coloring problem for the undirected binary de Bruijn interconnection network. The coloring scheme is simple and fast. We propose the coloring algorithm by using the pseudo shortestpath spanning tree rooted at 0¡¡¡00. Each processor can find its color number by its own ide ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we study the coloring problem for the undirected binary de Bruijn interconnection network. The coloring scheme is simple and fast. We propose the coloring algorithm by using the pseudo shortestpath spanning tree rooted at 0¡¡¡00. Each processor can find its color number by its own identity. Then, based on our coloring algorithm, we propose a 1fair alternator. Our design is optimal. In our design, each processor can execute the critical step once in every 3 steps. Key words: alternator; coloring; de Bruijn graph; phase synchronization 1
Statement of Research, Teaching and Service Contributions
, 2005
"... As computing systems become an increasingly integral part of our lives, the need for faulttolerance and security in these systems is constantly growing. These computing systems include telecommunication, power systems, collaborative grouporiented systems, sensor networks and electronic commerce. A ..."
Abstract
 Add to MetaCart
As computing systems become an increasingly integral part of our lives, the need for faulttolerance and security in these systems is constantly growing. These computing systems include telecommunication, power systems, collaborative grouporiented systems, sensor networks and electronic commerce. Also, the faulttolerance requirement of a system tends to evolve with new technology. Hence, one needs to add reliability requirements to them while preserving existing ones. Moreover, these systems need to be adaptive so that the approach used for faulttolerance and security can be modified based on environmental conditions. With this motivation, our work has focused on design of faulttolerant and secure systems. In this document, we identify our research, teaching and service activities in the context of faulttolerant and secure systems. 1 Research Accomplishments The initial work on faulttolerance began during undergraduate studies where we designed a faulttolerant mutual exclusion algorithm [1]. Subsequently, as a graduate student at Ohio State University, we focused on identifying foundations of faulttolerant systems. This work (cf. Section 1.1) has resulted in development of generalpurpose methods for designing faulttolerance that have been extensively used in design of multitolerant systems –that tolerate multiple classes of faults while providing a different level of faulttolerance. Subsequently, in last 6 years, as an assistant professor at Michigan State University, our work has focused
The Computational Power of Collision Detection
"... We show that packet collisions, often considered a nuisance in wireless sensor networks, can be used to reduce the cost of computing aggregate functions of distributed data. Formally, we consider a model of computation in which a central controller polls a collection of n sensor nodes, each of which ..."
Abstract
 Add to MetaCart
(Show Context)
We show that packet collisions, often considered a nuisance in wireless sensor networks, can be used to reduce the cost of computing aggregate functions of distributed data. Formally, we consider a model of computation in which a central controller polls a collection of n sensor nodes, each of which possesses an input bit. The controller chooses subsets of the nodes that will respond if they have a 0 or 1 bit. If no nodes respond, the controller detects an empty radio channel; if one node does, the controller detects the response; if two or more nodes respond, the controller detects a collision. The goal of the controller is to compute some function f with the fewest queries in the worst case. By representing this situation using decision trees, we show that (a) any function can be computed in n − 1 queries using a deterministic protocol and 5n/6 + lg n queries using randomization; (b) random functions have query complexity at least n n lg 3 − O(log n); (c) tthreshold functions have query complexity between log3 t+2 and 2.4t + 1.8(t + 1) lg(n/t) + O(log t) for deterministic protocols and O(t) for randomized protocols; (d) the majority function in particular has query complexity at most 11n/24; and (e) graph connectivity (where the input to each node is the status of one edge in the graph) has query complexity at most O ( √ n log n), which is within a logarithmic factor of optimal. Some additional results relate the complexity of computing with collisions to the minimum degree of a polynomial whose sign represents the target function, and give lower bounds for a class of banded functions related to threshold functions. 1
Lowcost Faulttolerance in Barrier Synchronizations
 International Conference on Parallel Processing
, 1998
"... In this paper, we show how faulttolerance can be effectively added to several types of faults in program computations that use barrier synchronization. We divide the faults that occur in practice into two classes, detectable and undetectable, and design a fully distributed program that tolerates th ..."
Abstract
 Add to MetaCart
In this paper, we show how faulttolerance can be effectively added to several types of faults in program computations that use barrier synchronization. We divide the faults that occur in practice into two classes, detectable and undetectable, and design a fully distributed program that tolerates the faults in both classes. Our program guarantees that every barrier is executed correctly even if detectable faults occur, and that eventually every barrier is executed correctly even if undetectable faults occur. Via analytical as well as simulation results we show that the cost of adding faulttolerance is low, in part by comparing the times required by our program with that required by the corresponding faultintolerant counterpart. Keywords: faulttolerance, multitolerance, detectable and undetectable faults, synchronization, concurrency. 1 Introduction In this paper, we show how to effectively add tolerance to several types of faults in program computations that use barrier synchroni...
ActionLevel Addition of LeadsTo Properties to Shared Memory Parallel Programs
, 2008
"... We present a method for largescale automatic addition of LeadsTo properties to shared memory parallel programs. Such an automated addition of LeadsTo is highly desirable in facilitating the design of multicore programs for average programmers. Our approach is actionlevel in that we separately an ..."
Abstract
 Add to MetaCart
We present a method for largescale automatic addition of LeadsTo properties to shared memory parallel programs. Such an automated addition of LeadsTo is highly desirable in facilitating the design of multicore programs for average programmers. Our approach is actionlevel in that we separately analyze and revise the actions of an existing program in order to accommodate a new LeadsTo property while preserving already existing properties. Based on our method, we have developed a software framework that exploits the computational resources of geographically distributed machines to add LeadsTo properties to parallel programs. We demonstrate our approach in the context of a token passing and a barrier synchronization program.
SizeIndependent SelfStabilizing Asynchronous Phase Synchronization in General Graphs*
"... In this paper, we design a selfstabilizing phase synchronizer for distributed systems. The synchronizer enables a node to transfer from one phase to the next one, subject to the condition that at most two consecutive phases appear among all nodes. It does not rely on any system parameter like the n ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we design a selfstabilizing phase synchronizer for distributed systems. The synchronizer enables a node to transfer from one phase to the next one, subject to the condition that at most two consecutive phases appear among all nodes. It does not rely on any system parameter like the number of nodes, and thus fits for dynamic systems where nodes can freely join or leave. Each node just maintains a few variables that are related to its neighborhood; all operations are decided based on local information rather than global information. The memory usage of the proposed algorithm is low; each node has only O(ΔK) states, where Δ is the maximum degree of nodes and K> 1 is the number of phases. To the best of our knowledge, there are no other such sizeindependent selfstabilizing algorithms for systems of general graph topologies.