Results 1  10
of
15
BandwidthCentric Allocation of Independent Tasks on Heterogeneous Platforms
 In International Parallel and Distributed Processing Symposium (IPDPS’2002). IEEE Computer
, 2001
"... In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds of comput ..."
Abstract

Cited by 76 (28 self)
 Add to MetaCart
In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds of computation and communication, as well as dierent overlap capabilities. We dene a base model, and show how to determine the maximum steadystate throughput of a node in the base model, assuming we already know the throughput of the subtrees rooted at the node's children. Thus, a bottomup traversal of the tree determines the rate at which tasks can be processed in the full tree. The best allocation is bandwidthcentric: if enough bandwidth is available, then all nodes are kept busy; if bandwidth is limited, then tasks should be allocated only to the children which have suciently small communication times, regardless of their computation power. We then show how nodes with other capabilities ones that allow more or less overlapping of computation and communication than the base model can be transformed to equivalent nodes in the base model. We also show how to handle a more general communication model. Finally, we present simulation results of several demanddriven task allocation policies that show that our bandwidthcentric method obtains better results than allocating tasks to all processors on a rstcome, rst serve basis. Key words: heterogeneous computer, allocation, scheduling, grid, metacomputing. Corresponding author: Jeanne Ferrante The work of Larry Carter and Jeanne Ferrante was performed while visiting LIP. 1 1
A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)
, 2001
"... In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the perform ..."
Abstract

Cited by 50 (25 self)
 Add to MetaCart
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous platforms with respect to the performance of the processors. When targeting unidimensional grids, the loadbalancing problem can be solved rather easily. When targeting twodimensional grids, which are the key to scalability and efficiency for numerical kernels, the problem turns out to be surprisingly difficult. We formally state the 2D loadbalancing problem and prove its NPcompleteness. Next, we introduce a data allocation heuristic, which turns out to be very satisfactory: Its practical usefulness is demonstrated by MPI experiments conducted with a heterogeneous network of workstations.
On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications
 In Proceedings of the 7th International Conference on Database Theory
, 1999
"... . Partitioning a multidimensional data set into rectangular partitions subject to certain constraints is an important problem that arises in many database applications, including histogrambased selectivity estimation, loadbalancing, and construction of index structures. While provably optimal ..."
Abstract

Cited by 44 (7 self)
 Add to MetaCart
. Partitioning a multidimensional data set into rectangular partitions subject to certain constraints is an important problem that arises in many database applications, including histogrambased selectivity estimation, loadbalancing, and construction of index structures. While provably optimal and efficient algorithms exist for partitioning onedimensional data, the multidimensional problem has received less attention, except for a few special cases. As a result, the heuristic partitioning techniques that are used in practice are not well understood, and come with no guarantees on the quality of the solution. In this paper, we present algorithmic and complexitytheoretic results for the fundamental problem of partitioning a twodimensional array into rectangular tiles of arbitrary size in a way that minimizes the number of tiles required to satisfy a given constraint. Our main results are approximation algorithms for several partitioning problems that provably approxima...
Efficient Runtime Support for Irregular BlockStructured Applications
, 1998
"... Parallel implementations of scientific applications often rely on elaborate dynamic data structures with complicated communication patterns. We describe a set of intuitive geometric programming abstractions that simplify coordination of irregular blockstructured scientific calculations without sacr ..."
Abstract

Cited by 44 (17 self)
 Add to MetaCart
Parallel implementations of scientific applications often rely on elaborate dynamic data structures with complicated communication patterns. We describe a set of intuitive geometric programming abstractions that simplify coordination of irregular blockstructured scientific calculations without sacrificing performance. We have implemented these abstractions in KeLP, a C++ runtime library. KeLP's abstractions enable the programmer to express complicated communication patterns for dynamic applications, and to tune communication activity with a highlevel, abstract interface. We show that KeLP's flexible communication model effectively manages elaborate data motion patterns arising in structured adaptive mesh refinement, and achieves performance comparable to handcoded messagepassing on several structured numerical kernels. to appear in J. Parallel and Distributed Computing 1 Introduction Many scientific numerical methods employ structured irregular representations to improve accura...
Efficient Array Partitioning
, 1997
"... We consider the problem of partitioning an array of n items into p intervals so that the maximum weight of the intervals is minimized. The currently best known bound for this problem is O(np) [MS95]. In this paper, we present two improved algorithms for this problem: one runs in time O(n + p²(log ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
We consider the problem of partitioning an array of n items into p intervals so that the maximum weight of the intervals is minimized. The currently best known bound for this problem is O(np) [MS95]. In this paper, we present two improved algorithms for this problem: one runs in time O(n + p²(log n)²) and the other runs in time O(n log n). The former is optimal whenever p p n= log n, and the latter is nearoptimal for arbitrary p. We consider the natural generalization of this partitioning to two dimensions, where an n \Theta n array of items is to be partitioned into p² blocks by partitioning the rows and columns into p intervals each and considering the blocks induced by this partition. The problem is to find that partition which minimizes the maximum weight among the resulting blocks. This problem is known to be NPhard [GM96]. Independently, Charikar et. al. have given a simple proof that shows that the problem is in fact NPhard to approximate within a factor of t...
Approximation Algorithms for Array Partitioning Problems
 JOURNAL OF ALGORITHMS
, 2005
"... We study the problem of optimally partitioning a twodimensional array of elements by cutting each coordinate axis into p (resp., q) intervals, resulting in p x q rectangular regions. This problem arises in several applications in databases, parallel computation, and image processing. Our ma ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
We study the problem of optimally partitioning a twodimensional array of elements by cutting each coordinate axis into p (resp., q) intervals, resulting in p x q rectangular regions. This problem arises in several applications in databases, parallel computation, and image processing. Our main contribution are new approximation algorithms for these NP Complete problems that improve significantly over previously known bounds. The algorithms are fast and simple, work for a variety of measures of partitioning quality, generalize to dimensions d >2, and achieve almost optimal approximation ratios. We also extend previous NP Completeness results for this class of problems.
LoadBalancing Iterative Computations on Heterogeneous Clusters
"... We focus on mapping iterative algorithms onto heterogeneous clusters. The application data is partitioned over the processors, which are arranged along a virtual ring. At each iteration, independent calculations are carried out in parallel, and some communications take place between consecutive p ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
We focus on mapping iterative algorithms onto heterogeneous clusters. The application data is partitioned over the processors, which are arranged along a virtual ring. At each iteration, independent calculations are carried out in parallel, and some communications take place between consecutive processors in the ring. The question is to determine how to slice the application data into chunks, and assign these chunks to the processors, so that the total execution time is minimized. A major
Data Redistribution Algorithms For Heterogeneous Processor Rings
, 2004
"... We consider the problem of redistributing data on homogeneous and heterogeneous ring of processors. The problem arises in several applications, each time after that a loadbalancing mechanism is invoked (but we do not discuss the loadbalancing mechanism itself). We provide algorithms that aim at op ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
We consider the problem of redistributing data on homogeneous and heterogeneous ring of processors. The problem arises in several applications, each time after that a loadbalancing mechanism is invoked (but we do not discuss the loadbalancing mechanism itself). We provide algorithms that aim at optimizing the data redistribution, both for unidirectional and bidirectional rings, and we give complete proofs of correctness. One major contribution of the paper is that we are able to prove the optimality of the proposed algorithms in all cases except that of a bidirectional heterogeneous ring, for which the problem remains open.
An Overview of Heterogeneous High Performance and Grid Computing
 In Engineering the Grid
, 2006
"... Abstract. This paper is an overview the ongoing academic research, development, and uses of heterogeneous parallel and distributed computing. This work is placed in the context of scientific computing. The simulation of very large systems often requires computational capabilities which cannot be sat ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract. This paper is an overview the ongoing academic research, development, and uses of heterogeneous parallel and distributed computing. This work is placed in the context of scientific computing. The simulation of very large systems often requires computational capabilities which cannot be satisfied by a single processing system. A possible way to solve this problem is to couple different computational resources, perhaps distributed geographically. Heterogeneous distributed computing is a means to overcome the limitations of single computing systems.
On performance analysis of heterogeneous parallel algorithms
 Parallel Computing
, 2004
"... AbstractThe paper presents an approach to performance analysis of heterogeneous parallel algorithms. As a typical heterogeneous parallel algorithm is just a modification of some homogeneous one, the idea is to compare the heterogeneous algorithm with its homogeneous prototype, and to assess the h ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
AbstractThe paper presents an approach to performance analysis of heterogeneous parallel algorithms. As a typical heterogeneous parallel algorithm is just a modification of some homogeneous one, the idea is to compare the heterogeneous algorithm with its homogeneous prototype, and to assess the heterogeneous modification rather than analyse the algorithm as an isolated entity. A criterion of optimality of heterogeneous parallel algorithms is suggested. A parallel algorithm of matrix multiplication on heterogeneous clusters is used to illustrate the proposed approach. 1. Introduction. Heterogeneous networks of computers are a promising distributedmemory parallel architecture. In the most general case, a heterogeneous network includes PCs, workstations, multiprocessor servers, clusters of workstations, and even supercomputers. Unlike traditional homogeneous parallel platforms, the heterogeneous parallel