Results 1  10
of
24
Performing work efficiently in the presence of faults
 in the Proceedings of the 11 th ACM Symposium on Principles of Distributed Computing (PODC
, 1998
"... Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process s ..."
Abstract

Cited by 44 (0 self)
 Add to MetaCart
Abstract. We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, all n units of work will be performed. We consider three parameters: the number of messages sent, the total number of units of work performed (including multiplicities), and time. We present three protocols for solving the problem. All three are workoptimal, doing O(n+t) work. The first has moderate costs in the remaining two parameters, sending O(t √ t) messages, and taking O(n + t) time. This protocol can be easily modified to run in any completely asynchronous system equipped with a failure detection mechanism. The second sends only O(tlog t) messages, but its running time is large (O(t 2 (n+t)2 n+t)). The third is essentially timeoptimal in the (usual) case in which there are no failures, and its time complexity degrades gracefully as the number of failures increases.
Hundreds of Impossibility Results for Distributed Computing
 Distributed Computing
, 2003
"... We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refe ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refer to time, space and message complexity. These results are useful in understanding the inherent difficulty of individual problems and in studying the power of different models of distributed computing.
Faulttolerant data structures
 In Proceedings of 37th IEEE FOCS
, 1996
"... We consider the tolerance of data structures to memory faults. We observe that many pointerbased data structures (e.g. linked lists, trees, etc.) are highly nonresilient to faults. A single fault in a linked list or tree may result in the loss of the entire set of data. In this paper we present a f ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
We consider the tolerance of data structures to memory faults. We observe that many pointerbased data structures (e.g. linked lists, trees, etc.) are highly nonresilient to faults. A single fault in a linked list or tree may result in the loss of the entire set of data. In this paper we present a formal framework for studying the fault tolerance properties of pointerbased data structures, and we provide fault tolerant versions of the stack, the linked list, and the dictionary tree. 1
Highly Efficient Asynchronous Execution of LargeGrained Parallel Programs
, 1993
"... An nthread parallel program P is largegrained if in every parallel step the computations on each of the threads are complex procedures requiring numerous processor instructions. This practically relevant style of programs differs from PRAM programs in its large granularity and the possibility that ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
An nthread parallel program P is largegrained if in every parallel step the computations on each of the threads are complex procedures requiring numerous processor instructions. This practically relevant style of programs differs from PRAM programs in its large granularity and the possibility that within a parallel step the computations on different threads may considerably vary in size. Let M be an nprocessor asynchronous parallel system, with no restriction on the. degree of asynchrony and without any specialized synchronization mechanisms. It is a challenging theoretical as well as practically important problem to ensure correct execution of P on such a parallel machine. Let P be a largegrained program requiring total work W for its execution on a synchronous nprocessor parallel system. We present a transformation (compilation) of P into a program C(P) which correctly and efficiently effects the computation of P on the asynchronous machine M. Under moderate assumptions on the granularity of threads and the size of the program variables, execution of C(P) requires just O(Wlog * n) expected total work, and the memory space overhead is a small multiplicative constant. This result is the first of its kind. The solution involves a number of new concepts and methods. These include methods for storing program and control variables, employing a combination
Online Scheduling of Parallel Programs on Heterogeneous Systems with Applications to Cilk
 Theory of Computing Systems Special Issue on SPAA
, 2002
"... We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.
Efficient Execution of Nondeterministic Parallel Programs on Asynchronous Systems
, 1996
"... We consider the problem of asynchronous execution of parallel programs. We assume that the original program is designed for a synchronous system, whereas the actual system may be asynchronous. We seek an automatic execution scheme, which allows the asynchronous system to execute the synchronous prog ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
We consider the problem of asynchronous execution of parallel programs. We assume that the original program is designed for a synchronous system, whereas the actual system may be asynchronous. We seek an automatic execution scheme, which allows the asynchronous system to execute the synchronous program. Previous execution schemes provide solutions only for the case where the original program is deterministic. Here, we provide the first solution for the more general case where the original program can be nondeterministic (e.g. randomized). Our scheme is based on a novel agreement protocol for the asynchronous parallel setting. Our protocol allows n asynchronous processors to agree on n wordsized values in O(n log n log log n) total work. Total work is defined to be the summation of the number of steps performed by all processors (including steps from busy waiting). 1 Introduction Motivation. Parallel programs are frequently designed assuming tightlycoupled processors, operating in ...
2007): Scheduling dags on asynchronous processors
 19th ACM Symp. on Parallel Algorithms and Architectures
"... This paper addresses the problem of scheduling a DAG of unitlength tasks on asynchronous processors, that is, processors having different and changing speeds. The objective is to minimize the makespan, that is, the time to execute the entire DAG. Asynchrony is modeled by an oblivious adversary, whi ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
This paper addresses the problem of scheduling a DAG of unitlength tasks on asynchronous processors, that is, processors having different and changing speeds. The objective is to minimize the makespan, that is, the time to execute the entire DAG. Asynchrony is modeled by an oblivious adversary, which is assumed to determine the processor speeds at each point in time. The oblivious adversary may change processor speeds arbitrarily and arbitrarily often, but makes speed decisions independently of any random choices of the scheduling algorithm. This paper gives bounds on the makespan of two randomized online firingsquad scheduling algorithms, All and Level. These two schedulers are shown to have good makespan even when asynchrony is arbitrarily extreme. Let W and D denote, respectively, the number of tasks and the longest path in the DAG, and let πave denote the average speed of the p processors during the execution. In All each processor repeatedly chooses a random task to execute from among all ready tasks (tasks whose predecessors have been executed). Scheduler All is shown to have a makespan Tp = W
Fast Deterministic Simulation of Computations on Faulty Parallel Machines
 in Proc. of the 3rd Ann. European Symp. on Algorithms, 1995, Springer Verlag LNCS 979
, 1995
"... A method of deterministic simulation of fully operational parallel machines on the analogous machines prone to errors is developed. The simulation is presented for the exclusiveread exclusivewrite (EREW) PRAM and the Optical Communication Parallel Computer (OCPC), but it applies to a large class o ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
A method of deterministic simulation of fully operational parallel machines on the analogous machines prone to errors is developed. The simulation is presented for the exclusiveread exclusivewrite (EREW) PRAM and the Optical Communication Parallel Computer (OCPC), but it applies to a large class of parallel computers. It is shown that simulations of operational multiprocessor machines on faulty ones can be performed with logarithmic slowdown in the worst case. More precisely, we prove that both a PRAM with a bounded fraction of faulty processors and memory cells and an OCPC with a bounded fraction of faulty processors can simulate deterministically their faultfree counterparts with O(log n) slowdown and preprocessing done in time O(log 2 n). The fault model is as follows. The faults are deterministic (worstcase distribution) and static (do not change in the course of a computation). If a processor attempts to communicate with some other processor (in the case of an OCPC) or re...
SharedMemory Simulations on a FaultyMemory DMM
, 1996
"... this paper are synchronous, and the time performance is our major efficiency criterion. We consider a DMM with faulty memory words, otherwise everything is assumed to be operational. In particular the communication between the processors and the MUs is reliable, and a processor may always attempt to ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
this paper are synchronous, and the time performance is our major efficiency criterion. We consider a DMM with faulty memory words, otherwise everything is assumed to be operational. In particular the communication between the processors and the MUs is reliable, and a processor may always attempt to obtain an access to any MU, and, having been granted it, may access any memory word in it, even if all of them are faulty. The only restriction on the distribution of faults among memory words is that their total number is bounded from above by a fraction of the total number of memory words in all the MUs. In particular, some MUs may contain only operational cells, some only faulty cells, and some mixed cells. This report presents fast simulations of the PRAM on a DMM with faulty memory.
Optimal Scheduling for Disconnected Cooperation
, 2001
"... We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection of processors may establish communication. Before proc ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection of processors may establish communication. Before processors begin communication they execute tasks in the order given by their schedules. Our goal is to schedule work of isolated processors so that when communication is established for the rst time, the number of redundantly executed tasks is controlled. We quantify worst case redundancy as a function of processor advancements through their schedules. In this work we rene and simplify an extant deterministic construction for schedules with n t, and we develop a new analysis of its waste. The new analysis shows that for any pair of schedules, the number of redundant tasks can be controlled for the entire range of t tasks. Our new result is asymptotically optimal: the tails of these schedules are within a 1 +O(n 1 4 ) factor of the lower bound. We also present two new deterministic constructions one for t n, and the other for t n 3=2 , which substantially improve pairwise waste for all prexes of length t= p n, and oer near optimal waste for the tails of the schedules. Finally, we present bounds for waste of any collection of k 2 processors for both deterministic and randomized constructions. 1