Results 1  10
of
32
Online Scheduling of Parallel Programs on Heterogeneous Systems with Applications to Cilk
 Theory of Computing Systems Special Issue on SPAA
, 2002
"... We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
(Show Context)
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of di erent speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.
Global Computing Systems
 In LSSC ’01: Proceedings of the Third International Conference on LargeScale Scientific ComputingRevised Papers
, 2001
"... Global Computing harvest the idle time of Internet connected computers to run very large distributed applications. The unprecedented scale of the GCS paradigm requires to revisit the basic issues of distributed systems: performance models, security, faulttolerance and scalability. The first par ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Global Computing harvest the idle time of Internet connected computers to run very large distributed applications. The unprecedented scale of the GCS paradigm requires to revisit the basic issues of distributed systems: performance models, security, faulttolerance and scalability. The first parts of this paper review recent work in Global Computing, with particular interest in PeertoPeer systems. In the last section, we present XtremWeb, the Global Computing System we are currently developing.
PRAM Computations Resilient to Memory Faults
 2nd European Symposium on Algorithms ESA’94
"... : PRAMs with faults in their shared memory are investigated. Efficient general simulations on such machines of algorithms designed for fully reliable PRAMs are developed. The PRAM we work with is the ConcurrentRead ConcurrentWrite (CRCW) variant. Two possible settings for error occurrence are cons ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
(Show Context)
: PRAMs with faults in their shared memory are investigated. Efficient general simulations on such machines of algorithms designed for fully reliable PRAMs are developed. The PRAM we work with is the ConcurrentRead ConcurrentWrite (CRCW) variant. Two possible settings for error occurrence are considered: the errors may be either static (once a memory cell is checked to be operational it remains so during the computation) or dynamic (a potentially faulty cell may crash at any time, the total number of such cells being bounded). A simulation consists of two phases: memory formatting and the proper part done in a stepbystep way. For each error setting (static or dynamic), two simulations are presented: one with a O(1)time perstep cost, the other with a O(log n)time perstep cost. The other parameters of these simulations (number of processors, memory size, formatting time) are shown in table 1 in section 6. The simulations are randomized and Monte Carlo: they always operate within ...
A Tutorial on Algebraic Topology and Distributed Computation
 Lecture Notes in Computer Science
, 1994
"... This document is a set of course notes from an informal tutorial to be presented at UCLA in August 1994. These notes are intended to be informative, even provocative, but are not intended to be balanced, comprehensive, or authoritative. All reasonable suggestions for improvement will be enthusiastic ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
This document is a set of course notes from an informal tutorial to be presented at UCLA in August 1994. These notes are intended to be informative, even provocative, but are not intended to be balanced, comprehensive, or authoritative. All reasonable suggestions for improvement will be enthusiastically acknowledged in future revisions. 1 Introduction
Scheduling Cilk Multithreaded Parallel Programs on Processors of Different Speeds
 In Proceedings of the 12th Annual Symposium on Parallel Algorithms and Architectures
, 2000
"... We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and de ne the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it kee...
Towards Practical Deterministic WriteAll Algorithms
 IN PROC., 13TH ACM SYMP. ON PARALLEL ALGORITHMS AND ARCHITECTURES, 2001
, 2001
"... The problem of performing t tasks on n asynchronous or undependable processors is a basic problem in parallel and distributed computing. We consider an abstraction of this problem called the WriteAl l problemusing n processors write 1's into all locations of an array of size t. The most e# ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
The problem of performing t tasks on n asynchronous or undependable processors is a basic problem in parallel and distributed computing. We consider an abstraction of this problem called the WriteAl l problemusing n processors write 1's into all locations of an array of size t. The most e#cient known deterministic asynchronous algorithms for this problem are due to Anderson and Woll. The first class of algorithms has work complexity of O(t ), for n t and any #>0, and they are the best known for the full range of processors (n = t). To schedule the work of the processors, the algorithms use lists of q permutations on [q](q n) that have certain combinatorial properties. Instantiating such an algorithm for a specific # either requires substantial preprocessing (exponential in 1/# )to find the requisite permutations, or imposes a prohibitive constant (exponential in 1/# ) hidden by the asymptotic analysis. The second class deals with the specific case of t = n 2, and these algorithms have work complexity of O(t log t). They also use lists of permutations with the same combinatorial properties. However instantiating these algorithms requires exponential in n preprocessing to find the permutations. To alleviate this costly instantiation Kanellakis and Shvartsman proposed a simple way of computing the permutations. They conjectured that their construction has the desired properties but they provided no analysis. In this paper
Metacomputing with MILAN
 In Proceeding of the 8 th Heterogeneous Computing Workshop
, 1999
"... The MILAN project, a joint effort involving Arizona State University and NewYork University, has produced and validated fundamental techniques for the realization of efficient, reliable, predictable virtual machines on top of metacomputing environments that consist of an unreliable and dynamically c ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
The MILAN project, a joint effort involving Arizona State University and NewYork University, has produced and validated fundamental techniques for the realization of efficient, reliable, predictable virtual machines on top of metacomputing environments that consist of an unreliable and dynamically changing set of machines. In addition to the techniques, the principal outcomes of the project include three parallel programming systemsCalypso, Chime, and Charlotte which enable applications be developed for ideal, shared memory, parallel machines to execute on distributed platforms that are subject to failures, slowdowns, and changing resource availability. The lessons learnt from the MILAN project are being used to design Computing Communities, a metacomputing framework for general computations. 1. Motivation MILAN (Metacomputing In Large Asynchronous Networks) is a joint project of Arizona State University and NewYork University. The primary objective of the MILAN project is to p...
Efficient Execution of Nondeterministic Parallel Programs on Asynchronous Systems
, 1996
"... We consider the problem of asynchronous execution of parallel programs. We assume that the original program is designed for a synchronous system, whereas the actual system may be asynchronous. We seek an automatic execution scheme, which allows the asynchronous system to execute the synchronous prog ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
(Show Context)
We consider the problem of asynchronous execution of parallel programs. We assume that the original program is designed for a synchronous system, whereas the actual system may be asynchronous. We seek an automatic execution scheme, which allows the asynchronous system to execute the synchronous program. Previous execution schemes provide solutions only for the case where the original program is deterministic. Here, we provide the first solution for the more general case where the original program can be nondeterministic (e.g. randomized). Our scheme is based on a novel agreement protocol for the asynchronous parallel setting. Our protocol allows n asynchronous processors to agree on n wordsized values in O(n log n log log n) total work. Total work is defined to be the summation of the number of steps performed by all processors (including steps from busy waiting). 1 Introduction Motivation. Parallel programs are frequently designed assuming tightlycoupled processors, operating in ...
Fast FaultTolerant Concurrent Access to Shared Objects
, 1996
"... We consider a synchronous model of distributed computation in which n nodes communicate via pointtopoint messages, subject to the following constraints: (i) in a single "step", a node can only send or receive O(log n) words, and (ii) communication is unreliable in that a constant fraction ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
We consider a synchronous model of distributed computation in which n nodes communicate via pointtopoint messages, subject to the following constraints: (i) in a single "step", a node can only send or receive O(log n) words, and (ii) communication is unreliable in that a constant fraction of all messages are lost at each step due to node and/or link failures. We design and analyze a simple local protocol for providing fast concurrent access to shared objects in this faulty network environment. In our protocol, clients use a hashingbased method to access shared objects. When a large number of clients attempt to read a given object at the same time, the object is rapidly replicated to an appropriate number of servers. Once the necessary level of replication has been achieved, each remaining request for the object is serviced within O(1) expected steps. Our protocol has practical potential for supporting high levels of concurrency in distributed file systems over wide area networks.
Scheduling DAGs on asynchronous processors
 19TH ACM SYMP. ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 2007
"... This paper addresses the problem of scheduling a DAG of unitlength tasks on asynchronous processors, that is, processors having different and changing speeds. The objective is to minimize the makespan, that is, the time to execute the entire DAG. Asynchrony is modeled by an oblivious adversary, whi ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
This paper addresses the problem of scheduling a DAG of unitlength tasks on asynchronous processors, that is, processors having different and changing speeds. The objective is to minimize the makespan, that is, the time to execute the entire DAG. Asynchrony is modeled by an oblivious adversary, which is assumed to determine the processor speeds at each point in time. The oblivious adversary may change processor speeds arbitrarily and arbitrarily often, but makes speed decisions independently of any random choices of the scheduling algorithm. This paper gives bounds on the makespan of two randomized online firingsquad scheduling algorithms, All and Level. These two schedulers are shown to have good makespan even when asynchrony is arbitrarily extreme. Let W and D denote, respectively, the number of tasks and the longest path in the DAG, and let πave denote the average speed of the p processors during the execution. In All each processor repeatedly chooses a random task to execute from among all ready tasks (tasks whose predecessors have been executed). Scheduler All is shown to have a makespan Tp = W