Results 1  10
of
29
WaitFree Data Structures in the Asynchronous PRAM Model
 In Proceedings of the 2nd Annual Symposium on Parallel Algorithms and Architectures
, 2000
"... In the asynchronous PRAM model, processes communicate by atomically reading and writing shared memory locations. This paper investigates the extent to which asynchronous PRAM permits longlived, highly concurrent data structures. An implementation of a concurrent object is waitfree if every operati ..."
Abstract

Cited by 65 (13 self)
 Add to MetaCart
In the asynchronous PRAM model, processes communicate by atomically reading and writing shared memory locations. This paper investigates the extent to which asynchronous PRAM permits longlived, highly concurrent data structures. An implementation of a concurrent object is waitfree if every operation will complete in a finite number of steps, and it is kbounded waitfree, for some k > 0, if every operation will complete within k steps. In the first part of this paper, we show that there are objects with waitfree implementations but no kbounded waitfree implementations for any k, and that there is an infinite hierarchy of objects with implementations that are kbounded waitfree but not Kbounded waitfree for some K > k. In the second part of the paper, we give an algebraic characterization of a large class of objects that do have waitfree implementations in asynchronous PRAM, as well as a general algorithm for implementing them. Our tools include simple iterative algorithms for waitfree approximate agreement and atomic snapshot.
Performing work efficiently in the presence of faults
 in the Proceedings of the 11 th ACM Symposium on Principles of Distributed Computing (PODC
, 1998
"... Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process s ..."
Abstract

Cited by 44 (0 self)
 Add to MetaCart
Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, all n units of work will be performed. We consider three parameters: the number of messages sent, the total number of units of work performed (including multiplicities), and time. We present three protocols for solving the problem. All three are workoptimal, doing O(n+t) work. The first has moderate costs in the remaining two parameters, sending O(t √ t) messages, and taking O(n + t) time. This protocol can be easily modified to run in any completely asynchronous system equipped with a failure detection mechanism. The second sends only O(tlog t) messages, but its running time is large (O(t 2 (n+t)2 n+t)). The third is essentially timeoptimal in the (usual) case in which there are no failures, and its time complexity degrades gracefully as the number of failures increases.
Hundreds of Impossibility Results for Distributed Computing
 Distributed Computing
, 2003
"... We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refe ..."
Abstract

Cited by 40 (4 self)
 Add to MetaCart
We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refer to time, space and message complexity. These results are useful in understanding the inherent difficulty of individual problems and in studying the power of different models of distributed computing.
An Algorithm for the Asynchronous WriteAll problem based on process collision
, 2001
"... this paper we present a rather straightforward algorithm to solve the WriteAll problem on an asynchronous PRAM, i.e. a machine on which the processes can be stopped and restarted at will. This means that it is also suitable for all other fault models as mentioned in Kanellakis and Shvartsman, page ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
this paper we present a rather straightforward algorithm to solve the WriteAll problem on an asynchronous PRAM, i.e. a machine on which the processes can be stopped and restarted at will. This means that it is also suitable for all other fault models as mentioned in Kanellakis and Shvartsman, page 13 [9]. Using dierent terminology we can say that our algorithm is waitfree, which means that each nonfaulty process will be able to nish the whole task, within a predetermined amount of steps, independent of the actions (or failures) of other processes.
Performing Tasks on Synchronous Restartable MessagePassing Processors
 Distributed Computing
, 2000
"... We consider the problem of performing t tasks in a distributed system of p faultprone processors. This problem, called doall herein, was introduced by Dwork, Halpern and Waarts. Our work deals with a synchronous messagepassing distributed system with processor stopfailures and restarts. We presen ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
We consider the problem of performing t tasks in a distributed system of p faultprone processors. This problem, called doall herein, was introduced by Dwork, Halpern and Waarts. Our work deals with a synchronous messagepassing distributed system with processor stopfailures and restarts. We present two new algorithms based on a new aggressive coordination paradigm by which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f < p stopfailures and it does not allow restarts. It has available processor steps (work) complexity S = O((t + p log p= log log p) log f) and message complexity M = O(t + p log p= log log p + fp). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for p = t and large f , it achieves better work complexity. This algorithm is used as the basis for another algorithm that tolerates stopfailures and restarts. This new algorithm is the first solution for the doall problem that efficiently deals with processor restarts. Its available processor steps complexity is S = O((t + p log p + f) minflog p; log fg), and its message complexity is M = O(t+p log p+fp), where f is the total number of failures.
Towards Practical Deterministic WriteAll Algorithms
 IN PROC., 13TH ACM SYMP. ON PARALLEL ALGORITHMS AND ARCHITECTURES, 2001
, 2001
"... The problem of performing t tasks on n asynchronous or undependable processors is a basic problem in parallel and distributed computing. We consider an abstraction of this problem called the WriteAl l problemusing n processors write 1's into all locations of an array of size t. The most e#cient ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
The problem of performing t tasks on n asynchronous or undependable processors is a basic problem in parallel and distributed computing. We consider an abstraction of this problem called the WriteAl l problemusing n processors write 1's into all locations of an array of size t. The most e#cient known deterministic asynchronous algorithms for this problem are due to Anderson and Woll. The first class of algorithms has work complexity of O(t ), for n t and any #>0, and they are the best known for the full range of processors (n = t). To schedule the work of the processors, the algorithms use lists of q permutations on [q](q n) that have certain combinatorial properties. Instantiating such an algorithm for a specific # either requires substantial preprocessing (exponential in 1/# )to find the requisite permutations, or imposes a prohibitive constant (exponential in 1/# ) hidden by the asymptotic analysis. The second class deals with the specific case of t = n 2, and these algorithms have work complexity of O(t log t). They also use lists of permutations with the same combinatorial properties. However instantiating these algorithms requires exponential in n preprocessing to find the permutations. To alleviate this costly instantiation Kanellakis and Shvartsman proposed a simple way of computing the permutations. They conjectured that their construction has the desired properties but they provided no analysis. In this paper
The Complexity of Synchronous Iterative DoAll with Crashes
, 2001
"... DoAll is the problem of performing N tasks in a distributed system of P failureprone processors [9]. Many distributed and parallel algorithms have been developed for this basic problem and several algorithm simulations have been developed by iterating DoAll algorithms. The eciency of the solut ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
DoAll is the problem of performing N tasks in a distributed system of P failureprone processors [9]. Many distributed and parallel algorithms have been developed for this basic problem and several algorithm simulations have been developed by iterating DoAll algorithms. The eciency of the solutions for DoAll is measured in terms of work complexity where all processing steps taken by the processors are counted. Work is ideally expressed as a function of N , P , and f , the number of processor crashes. However the known lower bounds and the upper bounds for extant algorithms do not adequately show how work depends on f . We present the rst nontrivial lower bounds for DoAll that capture the dependence of work on N , P and f . For the model of computation where processors are able to make perfect loadbalancing decisions locally, we also present matching upper bounds. Thus we give the rst complete analysis of DoAll for this model. We dene the riterative DoAll problem that abstracts the repeated use of DoAll such as found in algorithm simulations. Our fsensitive analysis enables us to derive a tight bound for riterative DoAll work (that is stronger than the rfold work complexity of a single DoAll). Our approach that models perfect loadbalancing allows for the analysis of specic algorithms to be divided into two parts: (i) the analysis of the cost of tolerating failures while performing work, and (ii) the analysis of the cost of implementing loadbalancing. We demonstrate the utility and generality of this approach by improving the analysis of two known ecient algorithms. We give an improved analysis of an ecient messagepassing algorithm (algorithm AN [5]). We also derive a new and complete analysis of the best known DoAll algorithm for...
Optimal Scheduling for Disconnected Cooperation
, 2001
"... We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection of processors may establish communication. Before proc ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection of processors may establish communication. Before processors begin communication they execute tasks in the order given by their schedules. Our goal is to schedule work of isolated processors so that when communication is established for the rst time, the number of redundantly executed tasks is controlled. We quantify worst case redundancy as a function of processor advancements through their schedules. In this work we rene and simplify an extant deterministic construction for schedules with n t, and we develop a new analysis of its waste. The new analysis shows that for any pair of schedules, the number of redundant tasks can be controlled for the entire range of t tasks. Our new result is asymptotically optimal: the tails of these schedules are within a 1 +O(n 1 4 ) factor of the lower bound. We also present two new deterministic constructions one for t n, and the other for t n 3=2 , which substantially improve pairwise waste for all prexes of length t= p n, and oer near optimal waste for the tails of the schedules. Finally, we present bounds for waste of any collection of k 2 processors for both deterministic and randomized constructions. 1
Lower Bounds in Distributed Computing
, 2000
"... This paper discusses results that say what cannot be computed in certain environments or when insucient resources are available. A comprehensive survey would require an entire book. As in Nancy Lynch's excellent 1989 paper, \A Hundred Impossibility Proofs for Distributed Computing" [86], we shall re ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
This paper discusses results that say what cannot be computed in certain environments or when insucient resources are available. A comprehensive survey would require an entire book. As in Nancy Lynch's excellent 1989 paper, \A Hundred Impossibility Proofs for Distributed Computing" [86], we shall restrict ourselves to some of the results we like best or think are most important. Our aim is to give you the avour of the results and some of the techniques that have been used. We shall also mention some interesting open problems and provide an extensive list of references. The focus will be on results from the past decade.
The DoAll Problem with Byzantine Processor Failures
 Theoretical Computer Science
, 2003
"... DoAll is the abstract problem of using n processors to cooperatively perform m independent tasks in the presence of failures. This problem can be used as the cornerstone in identifying aspects of the tradeoff between efficiency and faulttolerance in cooperative computing and in developing efficie ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
DoAll is the abstract problem of using n processors to cooperatively perform m independent tasks in the presence of failures. This problem can be used as the cornerstone in identifying aspects of the tradeoff between efficiency and faulttolerance in cooperative computing and in developing efficient and faulttolerant algorithms for distributed cooperative applications. Many algorithms have been developed for DoAll in various models of computation, including messagepassing, partitionable networks, and sharedmemory models and under various failure models. However, to the best of our knowledge, DoAll has not been studied under Byzantine processor failures, where a faulty processor may exhibit completely unconstrained behavior. Byzantine failures model any arbitrary type of processor malfunction, including for example, failures of individual components within the processors.