Results 11  20
of
76
Online Scheduling of Parallel Programs on Heterogeneous Systems with Applications to Cilk
 Theory of Computing Systems Special Issue on SPAA
, 2002
"... We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.
Workcompetitive scheduling for cooperative computing with dynamic groups
 SIAM JOURNAL ON COMPUTING
, 2005
"... The problem of cooperatively performing a set of t tasks in a decentralized computing environment subject to failures is one of the fundamental problems in distributed computing. The setting with partitionable networks is especially challenging, as algorithmic solutions must accommodate the possib ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
The problem of cooperatively performing a set of t tasks in a decentralized computing environment subject to failures is one of the fundamental problems in distributed computing. The setting with partitionable networks is especially challenging, as algorithmic solutions must accommodate the possibility that groups of processors become disconnected (and, perhaps, reconnected) during the computation. The efficiency of taskperforming algorithms is often assessed in terms of work: the total number of tasks, counting multiplicities, performed by all of the processors during the computation. In general, the scenario where the processors are partitioned into g disconnected components causes any taskperforming algorithm to have work Ω(t · g) even if each group of processors performs no more than the optimal number of Θ(t) tasks. Given that such pessimistic lower bounds apply to any scheduling algorithm, we pursue a competitive analysis. Specifically, this paper studies a simple randomized scheduling algorithm for p asynchronous processors, connected by a dynamically changing communication medium, to complete t known tasks. The performance of this algorithm is compared against that of an omniscient offline algorithm with full knowledge of the future changes in the communication medium. The paper describes a notion of computation width, which associates a natural number with a history of changes in the communication medium, and shows both upper and lower bounds on workcompetitiveness in terms of this quantity. Specifically, it is shown that the simple randomized algorithm obtains the competitive ratio (1 + cw/e), where cw is the computation width and e is the base of the natural logarithm (e =2.7182...); this competitive ratio is then shown to be tight.
PRAM Computations Resilient to Memory Faults
 2nd European Symposium on Algorithms ESA’94
"... : PRAMs with faults in their shared memory are investigated. Efficient general simulations on such machines of algorithms designed for fully reliable PRAMs are developed. The PRAM we work with is the ConcurrentRead ConcurrentWrite (CRCW) variant. Two possible settings for error occurrence are cons ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
(Show Context)
: PRAMs with faults in their shared memory are investigated. Efficient general simulations on such machines of algorithms designed for fully reliable PRAMs are developed. The PRAM we work with is the ConcurrentRead ConcurrentWrite (CRCW) variant. Two possible settings for error occurrence are considered: the errors may be either static (once a memory cell is checked to be operational it remains so during the computation) or dynamic (a potentially faulty cell may crash at any time, the total number of such cells being bounded). A simulation consists of two phases: memory formatting and the proper part done in a stepbystep way. For each error setting (static or dynamic), two simulations are presented: one with a O(1)time perstep cost, the other with a O(log n)time perstep cost. The other parameters of these simulations (number of processors, memory size, formatting time) are shown in table 1 in section 6. The simulations are randomized and Monte Carlo: they always operate within ...
Performing Tasks on Synchronous Restartable MessagePassing Processors
 Distributed Computing
, 2000
"... We consider the problem of performing t tasks in a distributed system of p faultprone processors. This problem, called doall herein, was introduced by Dwork, Halpern and Waarts. Our work deals with a synchronous messagepassing distributed system with processor stopfailures and restarts. We presen ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
We consider the problem of performing t tasks in a distributed system of p faultprone processors. This problem, called doall herein, was introduced by Dwork, Halpern and Waarts. Our work deals with a synchronous messagepassing distributed system with processor stopfailures and restarts. We present two new algorithms based on a new aggressive coordination paradigm by which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f < p stopfailures and it does not allow restarts. It has available processor steps (work) complexity S = O((t + p log p= log log p) log f) and message complexity M = O(t + p log p= log log p + fp). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for p = t and large f , it achieves better work complexity. This algorithm is used as the basis for another algorithm that tolerates stopfailures and restarts. This new algorithm is the first solution for the doall problem that efficiently deals with processor restarts. Its available processor steps complexity is S = O((t + p log p + f) minflog p; log fg), and its message complexity is M = O(t+p log p+fp), where f is the total number of failures.
Performing tasks on restartable messagepassing processors
 in Proc. of the 11th Intl Workshop on Distr. Alg. (WDAG’97
, 1997
"... Abstract. This work presents new algorithms for the "DoAll " problem that consists of performing t tasks reliably in a messagepassing synchronous system of p faultprone processors. The algorithms are based on an aggressive coordination paradigm in which multiple coordinators may be acti ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
(Show Context)
Abstract. This work presents new algorithms for the "DoAll " problem that consists of performing t tasks reliably in a messagepassing synchronous system of p faultprone processors. The algorithms are based on an aggressive coordination paradigm in which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f < p stopfailures and it does not allow restarts. It has the available processor steps complexity S = O((t + plogp/loglogp), log f) and the message complexity M = O(t + plogp/loglogp + f • p). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for large f, it has better S complexity. This algorithm is used as the basis for another algorithm which tolerates any pattern of stopfailures and restarts. This new algorithm is the first solution for the DoAll problem that efficiently deals with processor restarts. Its available processor steps complexity is S = O((t + p log p + f). rain{log p, log f}), and its message complexity is M = O(t +p. logp + f.p), where f is the number of failures. 1
Three non Conventional Paradigms of Parallel Computation
, 1992
"... . We consider three paradigms of computation where the benefits of a parallel solution are greater than usual. Paradigm 1 works on a timevarying input data set, whose size increases with time. In paradigm 2 the data set is fixed, but the processors may fail at any time with a given constant probabi ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
. We consider three paradigms of computation where the benefits of a parallel solution are greater than usual. Paradigm 1 works on a timevarying input data set, whose size increases with time. In paradigm 2 the data set is fixed, but the processors may fail at any time with a given constant probability. In paradigm 3, the execution of a single operation may require more than one processor, for security or reliability reasons. We discuss the organization of PRAM algorithms for these paradigms, and prove new bounds on parallel speedup. 1 Introduction The theory of parallel algorithms has a well known body, developed on the PRAM model [5]. Some folklore principles are at the base of this theory, in particular the ones that express upper and lower bounds on the processing time. Let \Pi be a problem of size N , and let T s (N) be the time required by the best known sequential algorithm A s to solve \Pi . Any parallel algorithm A p that solves \Pi with a number P of PRAM processors...
The Complexity of Synchronous Iterative DoAll with Crashes
, 2001
"... DoAll is the problem of performing N tasks in a distributed system of P failureprone processors [9]. Many distributed and parallel algorithms have been developed for this basic problem and several algorithm simulations have been developed by iterating DoAll algorithms. The eciency of the solut ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
(Show Context)
DoAll is the problem of performing N tasks in a distributed system of P failureprone processors [9]. Many distributed and parallel algorithms have been developed for this basic problem and several algorithm simulations have been developed by iterating DoAll algorithms. The eciency of the solutions for DoAll is measured in terms of work complexity where all processing steps taken by the processors are counted. Work is ideally expressed as a function of N , P , and f , the number of processor crashes. However the known lower bounds and the upper bounds for extant algorithms do not adequately show how work depends on f . We present the rst nontrivial lower bounds for DoAll that capture the dependence of work on N , P and f . For the model of computation where processors are able to make perfect loadbalancing decisions locally, we also present matching upper bounds. Thus we give the rst complete analysis of DoAll for this model. We dene the riterative DoAll problem that abstracts the repeated use of DoAll such as found in algorithm simulations. Our fsensitive analysis enables us to derive a tight bound for riterative DoAll work (that is stronger than the rfold work complexity of a single DoAll). Our approach that models perfect loadbalancing allows for the analysis of specic algorithms to be divided into two parts: (i) the analysis of the cost of tolerating failures while performing work, and (ii) the analysis of the cost of implementing loadbalancing. We demonstrate the utility and generality of this approach by improving the analysis of two known ecient algorithms. We give an improved analysis of an ecient messagepassing algorithm (algorithm AN [5]). We also derive a new and complete analysis of the best known DoAll algorithm for...
A Framework for Automatic Adaptation of Tunable Distributed Applications. Cluster Computing 4(1
, 2001
"... ..."
(Show Context)
Scheduling Cilk Multithreaded Parallel Programs on Processors of Different Speeds
 In Proceedings of the 12th Annual Symposium on Parallel Algorithms and Architectures
, 2000
"... We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and de ne the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it kee...
Towards Practical Deterministic WriteAll Algorithms
 IN PROC., 13TH ACM SYMP. ON PARALLEL ALGORITHMS AND ARCHITECTURES, 2001
, 2001
"... The problem of performing t tasks on n asynchronous or undependable processors is a basic problem in parallel and distributed computing. We consider an abstraction of this problem called the WriteAl l problemusing n processors write 1's into all locations of an array of size t. The most e# ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
The problem of performing t tasks on n asynchronous or undependable processors is a basic problem in parallel and distributed computing. We consider an abstraction of this problem called the WriteAl l problemusing n processors write 1's into all locations of an array of size t. The most e#cient known deterministic asynchronous algorithms for this problem are due to Anderson and Woll. The first class of algorithms has work complexity of O(t ), for n t and any #>0, and they are the best known for the full range of processors (n = t). To schedule the work of the processors, the algorithms use lists of q permutations on [q](q n) that have certain combinatorial properties. Instantiating such an algorithm for a specific # either requires substantial preprocessing (exponential in 1/# )to find the requisite permutations, or imposes a prohibitive constant (exponential in 1/# ) hidden by the asymptotic analysis. The second class deals with the specific case of t = n 2, and these algorithms have work complexity of O(t log t). They also use lists of permutations with the same combinatorial properties. However instantiating these algorithms requires exponential in n preprocessing to find the permutations. To alleviate this costly instantiation Kanellakis and Shvartsman proposed a simple way of computing the permutations. They conjectured that their construction has the desired properties but they provided no analysis. In this paper