Results 1 - 10
of
50
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract
-
Cited by 471 (14 self)
- Add to MetaCart
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.
Wait-free Parallel Algorithms for the Union-Find Problem
- In Proc. 23rd ACM Symposium on Theory of Computing
, 1994
"... We are interested in designing efficient data structures for a shared memory multiprocessor. In this paper we focus on the Union-Find data structure. We consider a fully asynchronous model of computation where arbitrary delays are possible. Thus we require our solutions to the data structure problem ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
We are interested in designing efficient data structures for a shared memory multiprocessor. In this paper we focus on the Union-Find data structure. We consider a fully asynchronous model of computation where arbitrary delays are possible. Thus we require our solutions to the data structure problem have the wait-free property, meaning that each thread continues to make progress on its operations, independent of the speeds of the other threads. In this model efficiency is best measured in terms of the total number of instructions used to perform a sequence of data structure operations, the work performed by the processors. We give a wait-free implementation of an efficient algorithm for Union-Find. In addition we show that the worst case performance of the algorithm can be improved by simulating a synchronized algorithm, or by simulating a larger machine if the data structure requests support sufficient parallelism. Our solutions apply to a much more general adversary model than has be...
Performing Work Efficiently In The Presence Of Faults
- SIAM Journal on Computing
, 1992
"... . We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, all n units of work will be performed. We consider three parameters: the number of messages sent, the total number of units of work performed (including multiplicities), and time. We present three protocols for solving the problem. All three are work-optimal, doing O(n+ t) work. The first has moderate costs in the remaining two parameters, sending O(t p t) messages, and taking O(n + t) time. This protocol can be easily modified to run in any completely asynchronous system equipped with a failure detection mechanism. The second sends only O(t log t) messages, but its running time is large (O(t 2 (n+ t)2 n+t )). The third is essentially time-optimal in the (usual) case in which there a...
Parallel processing on networks of workstations: A fault-tolerant, high performance approach
- In Proc. 15th IEEE Intl. Conf. on Distributed Computing Systems
, 1995
"... One of the most sought after software innovation of this decade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and reliability of supercomputers. Using completely novel techniques: eager scheduling, evasive memory layouts and disper ..."
Abstract
-
Cited by 42 (15 self)
- Add to MetaCart
One of the most sought after software innovation of this decade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and reliability of supercomputers. Using completely novel techniques: eager scheduling, evasive memory layouts and dispersed data management, it is possible to build a execution environment for parallel programs on workstation networks. These techniques were originally developed in a theoretical framework for an abstract machine which models a shared memory asynchronous multiprocessor. The network of workstations platform presents an inherently asynchronous environment for the execution of our parallel program. This gives rise to substantial problems of correctness of the computation and of proper automatic load balancing of the work amongst the processors, so that a slow processor will not hold up the total computation. A limiting case of asynchrony is when a processor becomes infinitely slow, i.e fails. Our methodology copes with all these problems, as well as with memory failures. An interesting feature of this system is that it is neither a fault-tolerant system extended for parallel processing nor is it parallel processing system extended for fault tolerance. The same novel mechanisms ensure both properties. 1.
Are Wait-Free Algorithms Fast?
, 1991
"... The time complexity of wait-free algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any wait-free algorithm that achieves approximate agreement among n processes is proved. ..."
Abstract
-
Cited by 42 (12 self)
- Add to MetaCart
The time complexity of wait-free algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any wait-free algorithm that achieves approximate agreement among n processes is proved. In contrast, there exists a non-wait-free algorithm that solves this problem in constant time. This implies an (log n) time separation between the wait-free and non-wait-free computation models. On the positive side, we present an O(log n) time wait-free approximate agreement algorithm; the complexity of this algorithm is within a small constant of the lower bound.
Parallel Algorithms with Processor Failures and Delays
, 1995
"... We study efficient deterministic parallel algorithms on two models: restartable fail-stop CRCW PRAMs and asynchronous PRAMs. In the first model, synchronous processors are subject to arbitrary stop failures and restarts determined by an on-line adversary and involving loss of private but not shared ..."
Abstract
-
Cited by 40 (8 self)
- Add to MetaCart
We study efficient deterministic parallel algorithms on two models: restartable fail-stop CRCW PRAMs and asynchronous PRAMs. In the first model, synchronous processors are subject to arbitrary stop failures and restarts determined by an on-line adversary and involving loss of private but not shared memory; the complexity measures are completed work (where processors are charged for completed fixed-size update cycles) and overhead ratio (completed work amortized over necessary work and failures). In the second model, the result of the computation is a serializaton of the actions of the processors determined by an on-line adversary; the complexity measure is total work (number of steps taken by all processors). Despite their differences the two models share key algorithmic techniques. We present new algorithms for the Write-All problem (in which P processors write ones into an array of size N ) for the two models. These algorithms can be used to implement a simulation strategy for any N ...
Highly Efficient Asynchronous Execution of Large-Grained Parallel Programs
, 1993
"... An n-thread parallel program P is large-grained if in every parallel step the computations on each of the threads are complex procedures requiring numerous processor instructions. This practically relevant style of programs differs from PRAM programs in its large granularity and the possibility that ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
An n-thread parallel program P is large-grained if in every parallel step the computations on each of the threads are complex procedures requiring numerous processor instructions. This practically relevant style of programs differs from PRAM programs in its large granularity and the possibility that within a parallel step the computations on different threads may considerably vary in size. Let M be an n-processor asynchronous parallel system, with no restriction on the. degree of asynchrony and without any specialized synchronization mechanisms. It is a challenging theoretical as well as practically important problem to ensure correct execution of P on such a parallel machine. Let P be a large-grained program requiring total work W for its execution on a synchronous n-processor parallel system. We present a transformation (compilation) of P into a program C(P) which correctly and efficiently effects the computation of P on the asynchronous machine M. Under moderate assumptions on the granularity of threads and the size of the program variables, execution of C(P) requires just O(Wlog * n) expected total work, and the memory space overhead is a small multiplicative constant. This result is the first of its kind. The solution involves a number of new concepts and methods. These include methods for storing program and control variables, employing a combination
Combining Tentative and Definite Executions for Very Fast Dependable Parallel Computing (Extended Abstract)
, 1991
"... We present a general and efficient strategy for computing robustly on unreliable parallel machines. The model of a parallel machine that we use is a CRCW PRAM with dynamic resource fluctuations: processors can fail during the computation, and may possibly be restored later. We first introduce the no ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
We present a general and efficient strategy for computing robustly on unreliable parallel machines. The model of a parallel machine that we use is a CRCW PRAM with dynamic resource fluctuations: processors can fail during the computation, and may possibly be restored later. We first introduce the notions of definite and tentative algorithms for executing a single parallel step of an ideal parallel machine on the unreliable machine. A definite algorithm is one that guarantees a correct execution of a
PRAM Computations Resilient to Memory Faults
- 2nd European Symposium on Algorithms ESA’94
"... : PRAMs with faults in their shared memory are investigated. Efficient general simulations on such machines of algorithms designed for fully reliable PRAMs are developed. The PRAM we work with is the Concurrent-Read Concurrent-Write (CRCW) variant. Two possible settings for error occurrence are cons ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
: PRAMs with faults in their shared memory are investigated. Efficient general simulations on such machines of algorithms designed for fully reliable PRAMs are developed. The PRAM we work with is the Concurrent-Read Concurrent-Write (CRCW) variant. Two possible settings for error occurrence are considered: the errors may be either static (once a memory cell is checked to be operational it remains so during the computation) or dynamic (a potentially faulty cell may crash at any time, the total number of such cells being bounded). A simulation consists of two phases: memory formatting and the proper part done in a step-by-step way. For each error setting (static or dynamic), two simulations are presented: one with a O(1)-time per-step cost, the other with a O(log n)-time per-step cost. The other parameters of these simulations (number of processors, memory size, formatting time) are shown in table 1 in section 6. The simulations are randomized and Monte Carlo: they always operate within ...
Work-competitive scheduling for cooperative computing with dynamic groups
- SIAM JOURNAL ON COMPUTING
, 2005
"... The problem of cooperatively performing a set of t tasks in a decentralized computing environment subject to failures is one of the fundamental problems in distributed computing. The setting with partitionable networks is especially challenging, as algorithmic solutions must accommodate the possib ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
The problem of cooperatively performing a set of t tasks in a decentralized computing environment subject to failures is one of the fundamental problems in distributed computing. The setting with partitionable networks is especially challenging, as algorithmic solutions must accommodate the possibility that groups of processors become disconnected (and, perhaps, reconnected) during the computation. The efficiency of task-performing algorithms is often assessed in terms of work: the total number of tasks, counting multiplicities, performed by all of the processors during the computation. In general, the scenario where the processors are partitioned into g disconnected components causes any task-performing algorithm to have work Ω(t · g) even if each group of processors performs no more than the optimal number of Θ(t) tasks. Given that such pessimistic lower bounds apply to any scheduling algorithm, we pursue a competitive analysis. Specifically, this paper studies a simple randomized scheduling algorithm for p asynchronous processors, connected by a dynamically changing communication medium, to complete t known tasks. The performance of this algorithm is compared against that of an omniscient off-line algorithm with full knowledge of the future changes in the communication medium. The paper describes a notion of computation width, which associates a natural number with a history of changes in the communication medium, and shows both upper and lower bounds on work-competitiveness in terms of this quantity. Specifically, it is shown that the simple randomized algorithm obtains the competitive ratio (1 + cw/e), where cw is the computation width and e is the base of the natural logarithm (e =2.7182...); this competitive ratio is then shown to be tight.

