Results 1  10
of
25
BandwidthCentric Allocation of Independent Tasks on Heterogeneous Platforms
 In International Parallel and Distributed Processing Symposium (IPDPS’2002). IEEE Computer
, 2001
"... In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds of comput ..."
Abstract

Cited by 76 (28 self)
 Add to MetaCart
In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds of computation and communication, as well as dierent overlap capabilities. We dene a base model, and show how to determine the maximum steadystate throughput of a node in the base model, assuming we already know the throughput of the subtrees rooted at the node's children. Thus, a bottomup traversal of the tree determines the rate at which tasks can be processed in the full tree. The best allocation is bandwidthcentric: if enough bandwidth is available, then all nodes are kept busy; if bandwidth is limited, then tasks should be allocated only to the children which have suciently small communication times, regardless of their computation power. We then show how nodes with other capabilities ones that allow more or less overlapping of computation and communication than the base model can be transformed to equivalent nodes in the base model. We also show how to handle a more general communication model. Finally, we present simulation results of several demanddriven task allocation policies that show that our bandwidthcentric method obtains better results than allocating tasks to all processors on a rstcome, rst serve basis. Key words: heterogeneous computer, allocation, scheduling, grid, metacomputing. Corresponding author: Jeanne Ferrante The work of Larry Carter and Jeanne Ferrante was performed while visiting LIP. 1 1
A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)
, 2001
"... In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the perform ..."
Abstract

Cited by 50 (25 self)
 Add to MetaCart
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous platforms with respect to the performance of the processors. When targeting unidimensional grids, the loadbalancing problem can be solved rather easily. When targeting twodimensional grids, which are the key to scalability and efficiency for numerical kernels, the problem turns out to be surprisingly difficult. We formally state the 2D loadbalancing problem and prove its NPcompleteness. Next, we introduce a data allocation heuristic, which turns out to be very satisfactory: Its practical usefulness is demonstrated by MPI experiments conducted with a heterogeneous network of workstations.
Matrix Multiplication on Heterogeneous Platforms
, 2001
"... this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the ..."
Abstract

Cited by 37 (17 self)
 Add to MetaCart
this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different speed resources while minimizing the communication volume. We formally state this problem in a geometric framework and prove its NPcompleteness. Next, we introduce a (polynomial) columnbased heuristic, which turns out to be very satisfactory: We derive a theoretical performance guarantee for the heuristic and we assess its practical usefulness through MPI experiments
Array Decompositions for Nonuniform Computational Environments
, 1996
"... Twodimensional arrays are useful in a large variety of scientific and engineering applications. Parallelization of these applications requires the decomposition of array elements among different machines. Several datadecomposition techniques have been studied in the literature for machines with un ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
Twodimensional arrays are useful in a large variety of scientific and engineering applications. Parallelization of these applications requires the decomposition of array elements among different machines. Several datadecomposition techniques have been studied in the literature for machines with uniform computational power. In this paper we develop new methods for decomposing arrays into a cluster of machines with nonuniform computational power. Simulation results show that our methods provide superior decomposition over naive schemes. 1 Introduction Dataparallel applications requires the partitioning of data among processors in a way that the computation load on each node is proportional to its computational power, while minimizing communication. Twodimensional arrays are widely used in scientific and engineering problems such as weather prediction and image processing. In this paper we discuss the decomposition of twodimensional arrays for a nonuniform computational environment ...
A partitioning advisory system for networked dataparallel processing
 Concurrency: Practice and Experience
, 1995
"... With the increased performance capabilities of desktop computers, networked computing has become a popular vehicle for using parallelism to solve a variety of computationally intense problems. However, node heterogeneity and high communication costs may limit performance unless the problem space is ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
With the increased performance capabilities of desktop computers, networked computing has become a popular vehicle for using parallelism to solve a variety of computationally intense problems. However, node heterogeneity and high communication costs may limit performance unless the problem space is carefully partitioned across the network in a way that considers both the capabilities of the machines and the high network communication costs. We describe an advisory system that is designed to help the programmer, compiler, or runtime environment choose the best decomposition strategy for partitioning specific dataparallel applications across a given collection of machines. The system includes provisions for assessing the capabilities of the participating machines and the network in light of the current workload. Given information about the problem space, the machine speeds, and the network, the system provides a ranking of three standard partitioning methods. We test the validity of our system by comparing the observed relative performance with predicted relative performance of different data decompositions on a program with a variable number of floating point operations and a 5point stencil communication pattern.
Data partitioning with a realistic performance model of networks of heterogeneous computers
 In International Parallel and Distributed Processing Symposium IPDPS’2004. IEEE Computer
, 2004
"... The paper presents a performance model that can be used to optimally schedule arbitrary tasks on a network of heterogeneous computers when there is an upper bound on the size of the task that can be solved by each computer. We formulate a problem of partitioning of an nelement set over p heterogene ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
The paper presents a performance model that can be used to optimally schedule arbitrary tasks on a network of heterogeneous computers when there is an upper bound on the size of the task that can be solved by each computer. We formulate a problem of partitioning of an nelement set over p heterogeneous processors using this advanced performance model and give its efficient solution of the complexity O(p 3 ×log 2 n).
Data Partitioning for Networked Parallel Processing
, 1993
"... The workstation model of parallel processing presents specific challenges caused by the latency of the communications network and the workload imbalance that arises from the heterogeneity of the nodes. Data partitioning is critically important for parallel processing in this environment. We mathemat ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
The workstation model of parallel processing presents specific challenges caused by the latency of the communications network and the workload imbalance that arises from the heterogeneity of the nodes. Data partitioning is critically important for parallel processing in this environment. We mathematically characterize the communication costs for four data decomposition schemes: scatter, contiguous point, contiguous row, and block. These methods are analyzed in terms of problem size, number of processors, network speed, and communication pattern. Bounds are established for the performance of these decomposition schemes that can be used to make betterinformed data partitioning decisions. 1 Introduction The wide availability and improving speed of workstations make clusters of these machines an attractive alternative to traditional parallel computers. However, the virtual multicomputer formed by such a network presents challenges unique to this architecture. While data partitioning is a...
LoadBalancing Iterative Computations on Heterogeneous Clusters
"... We focus on mapping iterative algorithms onto heterogeneous clusters. The application data is partitioned over the processors, which are arranged along a virtual ring. At each iteration, independent calculations are carried out in parallel, and some communications take place between consecutive p ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
We focus on mapping iterative algorithms onto heterogeneous clusters. The application data is partitioned over the processors, which are arranged along a virtual ring. At each iteration, independent calculations are carried out in parallel, and some communications take place between consecutive processors in the ring. The question is to determine how to slice the application data into chunks, and assign these chunks to the processors, so that the total execution time is minimized. A major
Supporting Dynamic Spacesharing on Clusters of Nondedicated Workstations
 In Proceedings of the 17th International conference on distributed computing
, 1997
"... Clusters of workstations are increasingly being viewed as a costeffective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that the number of idle workstations available for executing parallel applications is con ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Clusters of workstations are increasingly being viewed as a costeffective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that the number of idle workstations available for executing parallel applications is constantly fluctuating. In this paper, we present a case for scheduling parallel applications on nondedicated workstation clusters using dynamic spacesharing, a policy under which the number of processors allocated to an application can be changed during its execution. We describe an approach that uses applicationlevel checkpointing and data repartitioning for supporting dynamic spacesharing and for handling the dynamic reconfiguration triggered when failure or owner activity is detected on a workstation being used by a parallel application. The performance advantages of dynamic spacesharing are quantified through a simulation study, and experimental results are presented for the overhead o...
Data Redistribution Algorithms For Heterogeneous Processor Rings
, 2004
"... We consider the problem of redistributing data on homogeneous and heterogeneous ring of processors. The problem arises in several applications, each time after that a loadbalancing mechanism is invoked (but we do not discuss the loadbalancing mechanism itself). We provide algorithms that aim at op ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
We consider the problem of redistributing data on homogeneous and heterogeneous ring of processors. The problem arises in several applications, each time after that a loadbalancing mechanism is invoked (but we do not discuss the loadbalancing mechanism itself). We provide algorithms that aim at optimizing the data redistribution, both for unidirectional and bidirectional rings, and we give complete proofs of correctness. One major contribution of the paper is that we are able to prove the optimality of the proposed algorithms in all cases except that of a bidirectional heterogeneous ring, for which the problem remains open.