Results 1  10
of
70
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract

Cited by 547 (15 self)
 Add to MetaCart
(Show Context)
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM5.
Waitfree Parallel Algorithms for the UnionFind Problem
 In Proc. 23rd ACM Symposium on Theory of Computing
, 1994
"... We are interested in designing efficient data structures for a shared memory multiprocessor. In this paper we focus on the UnionFind data structure. We consider a fully asynchronous model of computation where arbitrary delays are possible. Thus we require our solutions to the data structure problem ..."
Abstract

Cited by 52 (0 self)
 Add to MetaCart
(Show Context)
We are interested in designing efficient data structures for a shared memory multiprocessor. In this paper we focus on the UnionFind data structure. We consider a fully asynchronous model of computation where arbitrary delays are possible. Thus we require our solutions to the data structure problem have the waitfree property, meaning that each thread continues to make progress on its operations, independent of the speeds of the other threads. In this model efficiency is best measured in terms of the total number of instructions used to perform a sequence of data structure operations, the work performed by the processors. We give a waitfree implementation of an efficient algorithm for UnionFind. In addition we show that the worst case performance of the algorithm can be improved by simulating a synchronized algorithm, or by simulating a larger machine if the data structure requests support sufficient parallelism. Our solutions apply to a much more general adversary model than has be...
Performing work efficiently in the presence of faults
 in the Proceedings of the 11 th ACM Symposium on Principles of Distributed Computing (PODC
, 1998
"... Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process s ..."
Abstract

Cited by 49 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, all n units of work will be performed. We consider three parameters: the number of messages sent, the total number of units of work performed (including multiplicities), and time. We present three protocols for solving the problem. All three are workoptimal, doing O(n+t) work. The first has moderate costs in the remaining two parameters, sending O(t √ t) messages, and taking O(n + t) time. This protocol can be easily modified to run in any completely asynchronous system equipped with a failure detection mechanism. The second sends only O(tlog t) messages, but its running time is large (O(t 2 (n+t)2 n+t)). The third is essentially timeoptimal in the (usual) case in which there are no failures, and its time complexity degrades gracefully as the number of failures increases.
Parallel Algorithms with Processor Failures and Delays
, 1995
"... We study efficient deterministic parallel algorithms on two models: restartable failstop CRCW PRAMs and asynchronous PRAMs. In the first model, synchronous processors are subject to arbitrary stop failures and restarts determined by an online adversary and involving loss of private but not shared ..."
Abstract

Cited by 47 (9 self)
 Add to MetaCart
We study efficient deterministic parallel algorithms on two models: restartable failstop CRCW PRAMs and asynchronous PRAMs. In the first model, synchronous processors are subject to arbitrary stop failures and restarts determined by an online adversary and involving loss of private but not shared memory; the complexity measures are completed work (where processors are charged for completed fixedsize update cycles) and overhead ratio (completed work amortized over necessary work and failures). In the second model, the result of the computation is a serializaton of the actions of the processors determined by an online adversary; the complexity measure is total work (number of steps taken by all processors). Despite their differences the two models share key algorithmic techniques. We present new algorithms for the WriteAll problem (in which P processors write ones into an array of size N ) for the two models. These algorithms can be used to implement a simulation strategy for any N ...
Parallel processing on networks of workstations: A faulttolerant, high performance approach
 In Proc. 15th IEEE Intl. Conf. on Distributed Computing Systems
, 1995
"... One of the most sought after software innovation of this decade is the construction of systems using offtheshelf workstations that actually deliver, and even surpass, the power and reliability of supercomputers. Using completely novel techniques: eager scheduling, evasive memory layouts and disper ..."
Abstract

Cited by 45 (15 self)
 Add to MetaCart
One of the most sought after software innovation of this decade is the construction of systems using offtheshelf workstations that actually deliver, and even surpass, the power and reliability of supercomputers. Using completely novel techniques: eager scheduling, evasive memory layouts and dispersed data management, it is possible to build a execution environment for parallel programs on workstation networks. These techniques were originally developed in a theoretical framework for an abstract machine which models a shared memory asynchronous multiprocessor. The network of workstations platform presents an inherently asynchronous environment for the execution of our parallel program. This gives rise to substantial problems of correctness of the computation and of proper automatic load balancing of the work amongst the processors, so that a slow processor will not hold up the total computation. A limiting case of asynchrony is when a processor becomes infinitely slow, i.e fails. Our methodology copes with all these problems, as well as with memory failures. An interesting feature of this system is that it is neither a faulttolerant system extended for parallel processing nor is it parallel processing system extended for fault tolerance. The same novel mechanisms ensure both properties. 1.
Are WaitFree Algorithms Fast?
, 1991
"... The time complexity of waitfree algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any waitfree algorithm that achieves approximate agreement among n processes i ..."
Abstract

Cited by 39 (11 self)
 Add to MetaCart
(Show Context)
The time complexity of waitfree algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any waitfree algorithm that achieves approximate agreement among n processes is proved. In contrast, there exists a nonwaitfree algorithm that solves this problem in constant time. This implies an (log n) time separation between the waitfree and nonwaitfree computation models. On the positive side, we present an O(log n) time waitfree approximate agreement algorithm; the complexity of this algorithm is within a small constant of the lower bound.
A Theory of Competitive Analysis for Distributed Algorithms
, 1994
"... We introduce a theory of competitive analysis for distributed algorithms. The first steps in this direction were made in the seminal papers of Bartal, Fiat, and Rabani [l'?], and of Awerbuch, Kutten, and Peleg [15], in the context of data management and job scheduling. In these papers, as well ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
We introduce a theory of competitive analysis for distributed algorithms. The first steps in this direction were made in the seminal papers of Bartal, Fiat, and Rabani [l'?], and of Awerbuch, Kutten, and Peleg [15], in the context of data management and job scheduling. In these papers, as well as in other subsequent work [l4, 4, 181, the cost of a distributed algorithm as compared to the cost of an optimal globalcontrol algorithm. Here we introduce a more refined notion of competitiveness for distributed algorithms, one that reflects the performance of distributed algorithms more accurately. In particular, our theory allows one to compare the cost of a distributed online algorithm to the cost of an optimal distributed algorithm. We demonstrate our method by studying the
Highly Efficient Asynchronous Execution of LargeGrained Parallel Programs
, 1993
"... An nthread parallel program P is largegrained if in every parallel step the computations on each of the threads are complex procedures requiring numerous processor instructions. This practically relevant style of programs differs from PRAM programs in its large granularity and the possibility that ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
An nthread parallel program P is largegrained if in every parallel step the computations on each of the threads are complex procedures requiring numerous processor instructions. This practically relevant style of programs differs from PRAM programs in its large granularity and the possibility that within a parallel step the computations on different threads may considerably vary in size. Let M be an nprocessor asynchronous parallel system, with no restriction on the. degree of asynchrony and without any specialized synchronization mechanisms. It is a challenging theoretical as well as practically important problem to ensure correct execution of P on such a parallel machine. Let P be a largegrained program requiring total work W for its execution on a synchronous nprocessor parallel system. We present a transformation (compilation) of P into a program C(P) which correctly and efficiently effects the computation of P on the asynchronous machine M. Under moderate assumptions on the granularity of threads and the size of the program variables, execution of C(P) requires just O(Wlog * n) expected total work, and the memory space overhead is a small multiplicative constant. This result is the first of its kind. The solution involves a number of new concepts and methods. These include methods for storing program and control variables, employing a combination
Combining Tentative and Definite Executions for Very Fast Dependable Parallel Computing (Extended Abstract)
, 1991
"... We present a general and efficient strategy for computing robustly on unreliable parallel machines. The model of a parallel machine that we use is a CRCW PRAM with dynamic resource fluctuations: processors can fail during the computation, and may possibly be restored later. We first introduce the no ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
We present a general and efficient strategy for computing robustly on unreliable parallel machines. The model of a parallel machine that we use is a CRCW PRAM with dynamic resource fluctuations: processors can fail during the computation, and may possibly be restored later. We first introduce the notions of definite and tentative algorithms for executing a single parallel step of an ideal parallel machine on the unreliable machine. A definite algorithm is one that guarantees a correct execution of a
Online Scheduling of Parallel Programs on Heterogeneous Systems with Applications to Cilk
 Theory of Computing Systems Special Issue on SPAA
, 2002
"... We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.