Results 1 - 10
of
56
Scheduling Strategies for Master-Slave Tasking on Heterogeneous Processor Grids
, 2002
"... In this paper, we consider the problem of allocating a large number of independent, equal-sized tasks to a heterogeneous "grid" computing platform. We use a non-oriented graph to model a grid, where resources can have different speeds of computation and communication, as well as different overlap ca ..."
Abstract
-
Cited by 72 (34 self)
- Add to MetaCart
In this paper, we consider the problem of allocating a large number of independent, equal-sized tasks to a heterogeneous "grid" computing platform. We use a non-oriented graph to model a grid, where resources can have different speeds of computation and communication, as well as different overlap capabilities. We show how to determine the optimal steady-state scheduling strategy for each processor (the fraction of time spent computing and the fraction of time spent communicating with each neighbor). This result holds for a quite general framework, allowing for cycles and multiple paths in the interconnection graph, and allowing for several masters. Because
Broadcast Scheduling Optimization for Heterogeneous Cluster Systems
, 2000
"... C.17D=174,>)EF=)+*F3(12-9= 24H3C#9>**D31 MD3<=1NIOMP QR 9U 1NIOMP QR [C(./\ =VC@ 29110-43290 * 2,- VC@ 29110-43290 IU(*VS^\a?&((G4&(11(&./b&(5?5J<3=7&(c9= 939> -40160 ?&((G4&(11( F=(<= 17 15600-39120 G4&(11(&./b&(5 oqphi Ers]r_"Ot uv40. 90-38070 k 27420-39120 ./b&(5?5J<3=7&(c9 ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
C.17D=174,>)EF=)+*F3(12-9= 24H3C#9>**D31 MD3<=1NIOMP QR 9U 1NIOMP QR [C(./\ =VC@ 29110-43290 * 2,- VC@ 29110-43290 IU(*VS^\a?&((G4&(11(&./b&(5?5J<3=7&(c9= 939> -40160 ?&((G4&(11( F=(<= 17 15600-39120 G4&(11(&./b&(5 oqphi Ers]r_"Ot uv40. 90-38070 k 27420-39120 ./b&(5?5J<3=7&(c9= F= 90-38070 k 27420-39120 ./b&(5?5J<3=7&(c9= F=(<= MF3M 714?V./$F3 9=561|D= /$F3 15000-35970 0-37020 5 \ 561|D= /$F3 15000-35970 0-37020 5?5J<3=7&(c9= rs]r}12(8U^V./(_F39U561UD= &N1>&C* |&(1<= U561UD= *F=CF:rs$ry7H0939U (4%&(5?9>\./4 U H1*eI=\ >\./4 U *( 28829-29700 JF=D= U 5?5J<3=7&( O[ !@ 1.
Bandwidth-efficient Collective Communication for Clustered Wide Area Systems
- In Proc. International Parallel and Distributed Processing Symposium (IPDPS 2000), Cancun
, 1999
"... Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks and thus allow parallel programs to run on geographically distributed resources. A major problem in programming such wide-area parallel applications is the difference in communication costs inside and between clu ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks and thus allow parallel programs to run on geographically distributed resources. A major problem in programming such wide-area parallel applications is the difference in communication costs inside and between clusters. Latency and bandwidth of WANs often are orders of magnitude worse than those of local networks. Our MagPIe library eases wide-area parallel programming by providing an efficient implementation of MPI's collective communication operations. MagPIe exploits the hierarchical structure of clustered wide-area systems and minimizes the communication overhead over the WAN links. In this paper, we present improved algorithms for collective communication that achieve shorter completion times by simultaneously using the aggregate bandwidth of the available wide-area links. Our new algorithms split messages into multiple segments that are sent in parallel over different WAN links, thus resulting ...
Centralized Versus Distributed Schedulers for Multiple Bag-of-Task Applications
- IN INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS’2006. IEEE COMPUTER
, 2006
"... Multiple applications that execute concurrently on heterogeneous platforms compete for CPU and network resources. In this paper we consider the problem of scheduling applications to ensure fair and e#cient execution on a distributed network of processors. We limit our study to the case where communi ..."
Abstract
-
Cited by 23 (10 self)
- Add to MetaCart
Multiple applications that execute concurrently on heterogeneous platforms compete for CPU and network resources. In this paper we consider the problem of scheduling applications to ensure fair and e#cient execution on a distributed network of processors. We limit our study to the case where communication is restricted to a tree embedded in the network, and the applications consist of a large number of independent tasks that originate at the tree's root. The tasks of a given application all have the same computation and communication requirements, but these requirements can vary for different applications. Each application is given a weight that quantifies its relative value. The goal of scheduling is to maximize throughput while executing tasks from each application in the same ratio as their weights. We can
Assessing the impact and limits of steady-state scheduling for mixed task and data parallelism on heterogeneous platforms
, 2004
"... ..."
Techniques for Mapping Tasks to Machines in Heterogeneous Computing Systems
- 2004 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2004
, 2004
"... Heterogeneous computing (HC) is the coordinated use of different types of machines, networks, and interfaces to maximize their combined performance and/or cost-effectiveness. HC systems are becoming a plausible technique for eciently solving computationally intensive problems. The applicability and ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Heterogeneous computing (HC) is the coordinated use of different types of machines, networks, and interfaces to maximize their combined performance and/or cost-effectiveness. HC systems are becoming a plausible technique for eciently solving computationally intensive problems. The applicability and strength of HC systems are derived from their ability to match computing needs to appropriate resources. In an HC system, tasks need to be matched to machines, and the execution of the tasks must be scheduled. The goal of this invited keynote paper is to: (1) introduce the reader to some of the different distributed and parallel types of HC environments
Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflow
- ALGORITHMICA
, 2007
"... Mapping applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline or fork graphs. Several antagonist criteria should be optimized for workflow applications, such as throughput and latency (or a combination). In this paper, we consider a si ..."
Abstract
-
Cited by 15 (12 self)
- Add to MetaCart
Mapping applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline or fork graphs. Several antagonist criteria should be optimized for workflow applications, such as throughput and latency (or a combination). In this paper, we consider a simplified model with no communication cost, and we provide an exhaustive list of complexity results for different problem instances. Pipeline or fork stages can be replicated in order to increase the throughput by sending consecutive data sets onto different processors. In some cases, stages can also be data-parallelized, i.e. the computation of one single data set is shared between several processors. This leads to a decrease of the latency and an increase of the throughput. Some instances of this simple model are shown to be NP-hard, thereby exposing the inherent complexity of the mapping problem. We provide polynomial algorithms for other problem instances. Altogether, we provide solid theoretical foundations for the study of mono-criterion or bi-criteria mapping optimization problems.
A Polynomial-Time Algorithm for Allocating Independent Tasks on Heterogeneous Fork-Graphs
, 2002
"... In this paper, we consider the problem of allocating a large number of independent, equal-sized tasks to a heterogeneous processor farm. The master processor P 0 can process a task within w 0 time-units; it communicates a task in d i time-units to the i-th slave P i , 1 i p, which requires w i ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
In this paper, we consider the problem of allocating a large number of independent, equal-sized tasks to a heterogeneous processor farm. The master processor P 0 can process a task within w 0 time-units; it communicates a task in d i time-units to the i-th slave P i , 1 i p, which requires w i time-units to process it. We assume communication-computation overlap capabilities for each slave (and for the master), but the communication medium is exclusive: the master can only communicate with a single slave at each time-step. We give a
Broadcast trees for heterogeneous platforms
- 19th International Parallel and Distributed Processing Symposium (IPDPS’05
, 2005
"... Laboratoire de l'Informatique du Paralle'lisme E'cole Normale Supe'rieure de LyonUnite ' Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL no 5668 ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Laboratoire de l'Informatique du Paralle'lisme E'cole Normale Supe'rieure de LyonUnite ' Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL no 5668

