Results 1  10
of
49
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract

Cited by 497 (14 self)
 Add to MetaCart
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM5.
Waitfree Parallel Algorithms for the UnionFind Problem
 In Proc. 23rd ACM Symposium on Theory of Computing
, 1994
"... We are interested in designing efficient data structures for a shared memory multiprocessor. In this paper we focus on the UnionFind data structure. We consider a fully asynchronous model of computation where arbitrary delays are possible. Thus we require our solutions to the data structure problem ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
We are interested in designing efficient data structures for a shared memory multiprocessor. In this paper we focus on the UnionFind data structure. We consider a fully asynchronous model of computation where arbitrary delays are possible. Thus we require our solutions to the data structure problem have the waitfree property, meaning that each thread continues to make progress on its operations, independent of the speeds of the other threads. In this model efficiency is best measured in terms of the total number of instructions used to perform a sequence of data structure operations, the work performed by the processors. We give a waitfree implementation of an efficient algorithm for UnionFind. In addition we show that the worst case performance of the algorithm can be improved by simulating a synchronized algorithm, or by simulating a larger machine if the data structure requests support sufficient parallelism. Our solutions apply to a much more general adversary model than has be...
Hundreds of Impossibility Results for Distributed Computing
 Distributed Computing
, 2003
"... We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refe ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refer to time, space and message complexity. These results are useful in understanding the inherent difficulty of individual problems and in studying the power of different models of distributed computing.
Performing work efficiently in the presence of faults
 in the Proceedings of the 11 th ACM Symposium on Principles of Distributed Computing (PODC
, 1998
"... Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process s ..."
Abstract

Cited by 44 (0 self)
 Add to MetaCart
Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, all n units of work will be performed. We consider three parameters: the number of messages sent, the total number of units of work performed (including multiplicities), and time. We present three protocols for solving the problem. All three are workoptimal, doing O(n+t) work. The first has moderate costs in the remaining two parameters, sending O(t √ t) messages, and taking O(n + t) time. This protocol can be easily modified to run in any completely asynchronous system equipped with a failure detection mechanism. The second sends only O(tlog t) messages, but its running time is large (O(t 2 (n+t)2 n+t)). The third is essentially timeoptimal in the (usual) case in which there are no failures, and its time complexity degrades gracefully as the number of failures increases.
Are WaitFree Algorithms Fast?
, 1991
"... The time complexity of waitfree algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any waitfree algorithm that achieves approximate agreement among n processes is proved. ..."
Abstract

Cited by 42 (12 self)
 Add to MetaCart
The time complexity of waitfree algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any waitfree algorithm that achieves approximate agreement among n processes is proved. In contrast, there exists a nonwaitfree algorithm that solves this problem in constant time. This implies an (log n) time separation between the waitfree and nonwaitfree computation models. On the positive side, we present an O(log n) time waitfree approximate agreement algorithm; the complexity of this algorithm is within a small constant of the lower bound.
Parallel Algorithms with Processor Failures and Delays
, 1995
"... We study efficient deterministic parallel algorithms on two models: restartable failstop CRCW PRAMs and asynchronous PRAMs. In the first model, synchronous processors are subject to arbitrary stop failures and restarts determined by an online adversary and involving loss of private but not shared ..."
Abstract

Cited by 41 (7 self)
 Add to MetaCart
We study efficient deterministic parallel algorithms on two models: restartable failstop CRCW PRAMs and asynchronous PRAMs. In the first model, synchronous processors are subject to arbitrary stop failures and restarts determined by an online adversary and involving loss of private but not shared memory; the complexity measures are completed work (where processors are charged for completed fixedsize update cycles) and overhead ratio (completed work amortized over necessary work and failures). In the second model, the result of the computation is a serializaton of the actions of the processors determined by an online adversary; the complexity measure is total work (number of steps taken by all processors). Despite their differences the two models share key algorithmic techniques. We present new algorithms for the WriteAll problem (in which P processors write ones into an array of size N ) for the two models. These algorithms can be used to implement a simulation strategy for any N ...
Faulttolerant data structures
 In Proceedings of 37th IEEE FOCS
, 1996
"... We consider the tolerance of data structures to memory faults. We observe that many pointerbased data structures (e.g. linked lists, trees, etc.) are highly nonresilient to faults. A single fault in a linked list or tree may result in the loss of the entire set of data. In this paper we present a f ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
We consider the tolerance of data structures to memory faults. We observe that many pointerbased data structures (e.g. linked lists, trees, etc.) are highly nonresilient to faults. A single fault in a linked list or tree may result in the loss of the entire set of data. In this paper we present a formal framework for studying the fault tolerance properties of pointerbased data structures, and we provide fault tolerant versions of the stack, the linked list, and the dictionary tree. 1
TimeOptimal MessageEfficient Work Performance in the Presence of Faults
, 1994
"... Performing work in parallel by a multitude of processes in a distributed environment is currently a fast growing area of computer applications (due to its cost effectiveness). Adaptation of such applications to changes in system's parallelism (i.e., the availability of processes) is essential for im ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
Performing work in parallel by a multitude of processes in a distributed environment is currently a fast growing area of computer applications (due to its cost effectiveness). Adaptation of such applications to changes in system's parallelism (i.e., the availability of processes) is essential for improved performance and reliability. In this work we consider one aspect of coping with dynamic processes failures in such a setting, namely the following scenario formulated by Dwork, Halpern and Waarts [DHW92]: a system of n synchronous processes that communicate only by sending messages to one another. These processes must perform m independent units of work. Processes may fail by crashing and waitfreeness is required, i.e. that whenever at least one process survives, all m units of work will be performed. We consider the notion of fast algorithms in this setting, yet we are not willing to trade improved time for a high cost in communication. Thus, we require message efficiency as well. ...
Resolving Message Complexity of Byzantine Agreement and Beyond
 in Proc. 36th IEEE Symposium on Foundations of Computer Science
, 1995
"... Byzantine Agreement among processors is a basic primitive in distributed computing. It comes in a number of basic fault models: "Crash", "Omission " and "Malicious" adversarial behaviors. The message complexity of the primitive has been known for the strong failure models of Malicious and Omission a ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
Byzantine Agreement among processors is a basic primitive in distributed computing. It comes in a number of basic fault models: "Crash", "Omission " and "Malicious" adversarial behaviors. The message complexity of the primitive has been known for the strong failure models of Malicious and Omission adversary since the early 80's, while the question for the more benign Crash failure model has been open. In this paper we show how to solve agreement in the presence of crash failures using O(n) messages which is optimal, thus settling a thirteen year old open problem. Our solution has almost linear time and our new algorithmic techniques have further implications: ffl A family of "early stopping" agreement protocols with improved messagecomplexity. ffl A new solution to "Checkpoint" yielding a substantial improvement of the protocol for distributed work performance under adaptive parallelism in a network of workstations. Columbia University and TelAviv University. galil@cs.columbia.edu...
Highly Efficient Asynchronous Execution of LargeGrained Parallel Programs
, 1993
"... An nthread parallel program P is largegrained if in every parallel step the computations on each of the threads are complex procedures requiring numerous processor instructions. This practically relevant style of programs differs from PRAM programs in its large granularity and the possibility that ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
An nthread parallel program P is largegrained if in every parallel step the computations on each of the threads are complex procedures requiring numerous processor instructions. This practically relevant style of programs differs from PRAM programs in its large granularity and the possibility that within a parallel step the computations on different threads may considerably vary in size. Let M be an nprocessor asynchronous parallel system, with no restriction on the. degree of asynchrony and without any specialized synchronization mechanisms. It is a challenging theoretical as well as practically important problem to ensure correct execution of P on such a parallel machine. Let P be a largegrained program requiring total work W for its execution on a synchronous nprocessor parallel system. We present a transformation (compilation) of P into a program C(P) which correctly and efficiently effects the computation of P on the asynchronous machine M. Under moderate assumptions on the granularity of threads and the size of the program variables, execution of C(P) requires just O(Wlog * n) expected total work, and the memory space overhead is a small multiplicative constant. This result is the first of its kind. The solution involves a number of new concepts and methods. These include methods for storing program and control variables, employing a combination