Results 1  10
of
76
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract

Cited by 562 (15 self)
 Add to MetaCart
(Show Context)
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM5.
A method for implementing lockfree shared data structures
 In Proceedings of the 5th ACM Symposium on Parallel Algorithms and Architectures
, 1993
"... barnesQmpisb.mpg.de ..."
Waitfree Parallel Algorithms for the UnionFind Problem
 In Proc. 23rd ACM Symposium on Theory of Computing
, 1994
"... We are interested in designing efficient data structures for a shared memory multiprocessor. In this paper we focus on the UnionFind data structure. We consider a fully asynchronous model of computation where arbitrary delays are possible. Thus we require our solutions to the data structure problem ..."
Abstract

Cited by 54 (0 self)
 Add to MetaCart
(Show Context)
We are interested in designing efficient data structures for a shared memory multiprocessor. In this paper we focus on the UnionFind data structure. We consider a fully asynchronous model of computation where arbitrary delays are possible. Thus we require our solutions to the data structure problem have the waitfree property, meaning that each thread continues to make progress on its operations, independent of the speeds of the other threads. In this model efficiency is best measured in terms of the total number of instructions used to perform a sequence of data structure operations, the work performed by the processors. We give a waitfree implementation of an efficient algorithm for UnionFind. In addition we show that the worst case performance of the algorithm can be improved by simulating a synchronized algorithm, or by simulating a larger machine if the data structure requests support sufficient parallelism. Our solutions apply to a much more general adversary model than has be...
Parallel Algorithms with Processor Failures and Delays
, 1995
"... We study efficient deterministic parallel algorithms on two models: restartable failstop CRCW PRAMs and asynchronous PRAMs. In the first model, synchronous processors are subject to arbitrary stop failures and restarts determined by an online adversary and involving loss of private but not shared ..."
Abstract

Cited by 53 (12 self)
 Add to MetaCart
We study efficient deterministic parallel algorithms on two models: restartable failstop CRCW PRAMs and asynchronous PRAMs. In the first model, synchronous processors are subject to arbitrary stop failures and restarts determined by an online adversary and involving loss of private but not shared memory; the complexity measures are completed work (where processors are charged for completed fixedsize update cycles) and overhead ratio (completed work amortized over necessary work and failures). In the second model, the result of the computation is a serializaton of the actions of the processors determined by an online adversary; the complexity measure is total work (number of steps taken by all processors). Despite their differences the two models share key algorithmic techniques. We present new algorithms for the WriteAll problem (in which P processors write ones into an array of size N ) for the two models. These algorithms can be used to implement a simulation strategy for any N ...
Performing work efficiently in the presence of faults
 in the Proceedings of the 11 th ACM Symposium on Principles of Distributed Computing (PODC
, 1998
"... Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process s ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, all n units of work will be performed. We consider three parameters: the number of messages sent, the total number of units of work performed (including multiplicities), and time. We present three protocols for solving the problem. All three are workoptimal, doing O(n+t) work. The first has moderate costs in the remaining two parameters, sending O(t √ t) messages, and taking O(n + t) time. This protocol can be easily modified to run in any completely asynchronous system equipped with a failure detection mechanism. The second sends only O(tlog t) messages, but its running time is large (O(t 2 (n+t)2 n+t)). The third is essentially timeoptimal in the (usual) case in which there are no failures, and its time complexity degrades gracefully as the number of failures increases.
Parallel processing on networks of workstations: A faulttolerant, high performance approach
 In Proc. 15th IEEE Intl. Conf. on Distributed Computing Systems
, 1995
"... One of the most sought after software innovation of this decade is the construction of systems using offtheshelf workstations that actually deliver, and even surpass, the power and reliability of supercomputers. Using completely novel techniques: eager scheduling, evasive memory layouts and disper ..."
Abstract

Cited by 46 (15 self)
 Add to MetaCart
One of the most sought after software innovation of this decade is the construction of systems using offtheshelf workstations that actually deliver, and even surpass, the power and reliability of supercomputers. Using completely novel techniques: eager scheduling, evasive memory layouts and dispersed data management, it is possible to build a execution environment for parallel programs on workstation networks. These techniques were originally developed in a theoretical framework for an abstract machine which models a shared memory asynchronous multiprocessor. The network of workstations platform presents an inherently asynchronous environment for the execution of our parallel program. This gives rise to substantial problems of correctness of the computation and of proper automatic load balancing of the work amongst the processors, so that a slow processor will not hold up the total computation. A limiting case of asynchrony is when a processor becomes infinitely slow, i.e fails. Our methodology copes with all these problems, as well as with memory failures. An interesting feature of this system is that it is neither a faulttolerant system extended for parallel processing nor is it parallel processing system extended for fault tolerance. The same novel mechanisms ensure both properties. 1.
Are WaitFree Algorithms Fast?
, 1991
"... The time complexity of waitfree algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any waitfree algorithm that achieves approximate agreement among n processes i ..."
Abstract

Cited by 39 (11 self)
 Add to MetaCart
(Show Context)
The time complexity of waitfree algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any waitfree algorithm that achieves approximate agreement among n processes is proved. In contrast, there exists a nonwaitfree algorithm that solves this problem in constant time. This implies an (log n) time separation between the waitfree and nonwaitfree computation models. On the positive side, we present an O(log n) time waitfree approximate agreement algorithm; the complexity of this algorithm is within a small constant of the lower bound.
Highly Efficient Asynchronous Execution of LargeGrained Parallel Programs
, 1993
"... An nthread parallel program P is largegrained if in every parallel step the computations on each of the threads are complex procedures requiring numerous processor instructions. This practically relevant style of programs differs from PRAM programs in its large granularity and the possibility that ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
An nthread parallel program P is largegrained if in every parallel step the computations on each of the threads are complex procedures requiring numerous processor instructions. This practically relevant style of programs differs from PRAM programs in its large granularity and the possibility that within a parallel step the computations on different threads may considerably vary in size. Let M be an nprocessor asynchronous parallel system, with no restriction on the. degree of asynchrony and without any specialized synchronization mechanisms. It is a challenging theoretical as well as practically important problem to ensure correct execution of P on such a parallel machine. Let P be a largegrained program requiring total work W for its execution on a synchronous nprocessor parallel system. We present a transformation (compilation) of P into a program C(P) which correctly and efficiently effects the computation of P on the asynchronous machine M. Under moderate assumptions on the granularity of threads and the size of the program variables, execution of C(P) requires just O(Wlog * n) expected total work, and the memory space overhead is a small multiplicative constant. This result is the first of its kind. The solution involves a number of new concepts and methods. These include methods for storing program and control variables, employing a combination
A Theory of Competitive Analysis for Distributed Algorithms
, 1994
"... We introduce a theory of competitive analysis for distributed algorithms. The first steps in this direction were made in the seminal papers of Bartal, Fiat, and Rabani [l'?], and of Awerbuch, Kutten, and Peleg [15], in the context of data management and job scheduling. In these papers, as well ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
We introduce a theory of competitive analysis for distributed algorithms. The first steps in this direction were made in the seminal papers of Bartal, Fiat, and Rabani [l'?], and of Awerbuch, Kutten, and Peleg [15], in the context of data management and job scheduling. In these papers, as well as in other subsequent work [l4, 4, 181, the cost of a distributed algorithm as compared to the cost of an optimal globalcontrol algorithm. Here we introduce a more refined notion of competitiveness for distributed algorithms, one that reflects the performance of distributed algorithms more accurately. In particular, our theory allows one to compare the cost of a distributed online algorithm to the cost of an optimal distributed algorithm. We demonstrate our method by studying the
Combining Tentative and Definite Executions for Very Fast Dependable Parallel Computing (Extended Abstract)
, 1991
"... We present a general and efficient strategy for computing robustly on unreliable parallel machines. The model of a parallel machine that we use is a CRCW PRAM with dynamic resource fluctuations: processors can fail during the computation, and may possibly be restored later. We first introduce the no ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
We present a general and efficient strategy for computing robustly on unreliable parallel machines. The model of a parallel machine that we use is a CRCW PRAM with dynamic resource fluctuations: processors can fail during the computation, and may possibly be restored later. We first introduce the notions of definite and tentative algorithms for executing a single parallel step of an ideal parallel machine on the unreliable machine. A definite algorithm is one that guarantees a correct execution of a