Results 1 - 10
of
23
Bandwidth-Centric Allocation of Independent Tasks on Heterogeneous Platforms
- In International Parallel and Distributed Processing Symposium (IPDPS’2002). IEEE Computer
, 2001
"... In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds of comput ..."
Abstract
-
Cited by 71 (26 self)
- Add to MetaCart
In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds of computation and communication, as well as dierent overlap capabilities. We dene a base model, and show how to determine the maximum steady-state throughput of a node in the base model, assuming we already know the throughput of the subtrees rooted at the node's children. Thus, a bottom-up traversal of the tree determines the rate at which tasks can be processed in the full tree. The best allocation is bandwidth-centric: if enough bandwidth is available, then all nodes are kept busy; if bandwidth is limited, then tasks should be allocated only to the children which have suciently small communication times, regardless of their computation power. We then show how nodes with other capabilities ones that allow more or less overlapping of computation and communication than the base model can be transformed to equivalent nodes in the base model. We also show how to handle a more general communication model. Finally, we present simulation results of several demand-driven task allocation policies that show that our bandwidth-centric method obtains better results than allocating tasks to all processors on a rst-come, rst serve basis. Key words: heterogeneous computer, allocation, scheduling, grid, metacomputing. Corresponding author: Jeanne Ferrante The work of Larry Carter and Jeanne Ferrante was performed while visiting LIP. 1 1
Matrix Multiplication on Heterogeneous Platforms
, 2001
"... this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the ..."
Abstract
-
Cited by 35 (19 self)
- Add to MetaCart
this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different speed resources while minimizing the communication volume. We formally state this problem in a geometric framework and prove its NP-completeness. Next, we introduce a (polynomial) column-based heuristic, which turns out to be very satisfactory: We derive a theoretical performance guarantee for the heuristic and we assess its practical usefulness through MPI experiments
Array Decompositions for Nonuniform Computational Environments
, 1996
"... Two-dimensional arrays are useful in a large variety of scientific and engineering applications. Parallelization of these applications requires the decomposition of array elements among different machines. Several data-decomposition techniques have been studied in the literature for machines with un ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Two-dimensional arrays are useful in a large variety of scientific and engineering applications. Parallelization of these applications requires the decomposition of array elements among different machines. Several data-decomposition techniques have been studied in the literature for machines with uniform computational power. In this paper we develop new methods for decomposing arrays into a cluster of machines with nonuniform computational power. Simulation results show that our methods provide superior decomposition over naive schemes. 1 Introduction Data-parallel applications requires the partitioning of data among processors in a way that the computation load on each node is proportional to its computational power, while minimizing communication. Two-dimensional arrays are widely used in scientific and engineering problems such as weather prediction and image processing. In this paper we discuss the decomposition of twodimensional arrays for a nonuniform computational environment ...
A partitioning advisory system for networked data-parallel processing
- Concurrency: Practice and Experience
, 1995
"... With the increased performance capabilities of desktop computers, networked computing has become a popular vehicle for using parallelism to solve a variety of computationally intense problems. However, node heterogeneity and high communication costs may limit performance unless the problem space is ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
With the increased performance capabilities of desktop computers, networked computing has become a popular vehicle for using parallelism to solve a variety of computationally intense problems. However, node heterogeneity and high communication costs may limit performance unless the problem space is carefully partitioned across the network in a way that considers both the capabilities of the machines and the high network communication costs. We describe an advisory system that is designed to help the programmer, compiler, or run-time environment choose the best decomposition strategy for partitioning specific data-parallel applications across a given collection of machines. The system includes provisions for assessing the capabilities of the participating machines and the network in light of the current workload. Given information about the problem space, the machine speeds, and the network, the system provides a ranking of three standard partitioning methods. We test the validity of our system by comparing the observed relative performance with predicted relative performance of different data decompositions on a program with a variable number of floating point operations and a 5-point stencil communication pattern.
Data Partitioning for Networked Parallel Processing
, 1993
"... The workstation model of parallel processing presents specific challenges caused by the latency of the communications network and the workload imbalance that arises from the heterogeneity of the nodes. Data partitioning is critically important for parallel processing in this environment. We mathemat ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
The workstation model of parallel processing presents specific challenges caused by the latency of the communications network and the workload imbalance that arises from the heterogeneity of the nodes. Data partitioning is critically important for parallel processing in this environment. We mathematically characterize the communication costs for four data decomposition schemes: scatter, contiguous point, contiguous row, and block. These methods are analyzed in terms of problem size, number of processors, network speed, and communication pattern. Bounds are established for the performance of these decomposition schemes that can be used to make better-informed data partitioning decisions. 1 Introduction The wide availability and improving speed of workstations make clusters of these machines an attractive alternative to traditional parallel computers. However, the virtual multicomputer formed by such a network presents challenges unique to this architecture. While data partitioning is a...
Load-Balancing Iterative Computations on Heterogeneous Clusters
"... We focus on mapping iterative algorithms onto heterogeneous clusters. The application data is partitioned over the processors, which are arranged along a virtual ring. At each iteration, independent calculations are carried out in parallel, and some communications take place between consecutive p ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We focus on mapping iterative algorithms onto heterogeneous clusters. The application data is partitioned over the processors, which are arranged along a virtual ring. At each iteration, independent calculations are carried out in parallel, and some communications take place between consecutive processors in the ring. The question is to determine how to slice the application data into chunks, and assign these chunks to the processors, so that the total execution time is minimized. A major
Supporting Dynamic Space-sharing on Clusters of Non-dedicated Workstations
- In Proceedings of the 17th International conference on distributed computing
, 1997
"... Clusters of workstations are increasingly being viewed as a cost-effective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that the number of idle workstations available for executing parallel applications is con ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Clusters of workstations are increasingly being viewed as a cost-effective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that the number of idle workstations available for executing parallel applications is constantly fluctuating. In this paper, we present a case for scheduling parallel applications on non-dedicated workstation clusters using dynamic space-sharing, a policy under which the number of processors allocated to an application can be changed during its execution. We describe an approach that uses application-level checkpointing and data repartitioning for supporting dynamic spacesharing and for handling the dynamic reconfiguration triggered when failure or owner activity is detected on a workstation being used by a parallel application. The performance advantages of dynamic space-sharing are quantified through a simulation study, and experimental results are presented for the overhead o...
A Decomposition Advisory System for Heterogeneous Data-Parallel Processing
, 1994
"... Networked computing has become a popular method for using parallelism to solve a variety of computationally intense problems. However, high communication costs and processor heterogeneity may limit performance unless the problem space is carefully partitioned. We propose a decomposition advisory sys ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Networked computing has become a popular method for using parallelism to solve a variety of computationally intense problems. However, high communication costs and processor heterogeneity may limit performance unless the problem space is carefully partitioned. We propose a decomposition advisory system that is designed to help choose the best data partitioning strategy. The goal of this research is to determine the partitioning scheme(s) expected to yield the best performance for a particular data-parallel problem with known regular communication patterns on a collection of heterogeneous processors. Given information about the problem space and the network, the system provides a ranking of standard partitioning methods. 1 Introduction High performance computing, once only within the scope of supercomputers and expensive parallel machines, has become attainable through the use of networks of independent, possibly heterogeneous, computers. However, heterogeneous processing presents a n...
Data Redistribution Algorithms For Heterogeneous Processor Rings
, 2004
"... We consider the problem of redistributing data on homogeneous and heterogeneous ring of processors. The problem arises in several applications, each time after that a load-balancing mechanism is invoked (but we do not discuss the load-balancing mechanism itself). We provide algorithms that aim at op ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We consider the problem of redistributing data on homogeneous and heterogeneous ring of processors. The problem arises in several applications, each time after that a load-balancing mechanism is invoked (but we do not discuss the load-balancing mechanism itself). We provide algorithms that aim at optimizing the data redistribution, both for unidirectional and bi-directional rings, and we give complete proofs of correctness. One major contribution of the paper is that we are able to prove the optimality of the proposed algorithms in all cases except that of a bi-directional heterogeneous ring, for which the problem remains open.
Non-Uniform 2-D Grid Partitioning for Heterogeneous Parallel Architectures
- In Proceedings of the 9th International Parallel Processing Symposium
, 1995
"... Numerous applications in science and engineering have a problem space that can be represented as a 2-dimensional grid. While some of these problems exhibit uniform computational requirements over all regions of the grid, others are non-uniform: that is, some regions of the grid have more data points ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Numerous applications in science and engineering have a problem space that can be represented as a 2-dimensional grid. While some of these problems exhibit uniform computational requirements over all regions of the grid, others are non-uniform: that is, some regions of the grid have more data points than others. We introduce a new block decomposition method, Fair Binary Recursive Decomposition (FBRD), which is suitable for a collection of heterogeneous processors, and extend it to accommodate nonuniform problems (NUFBRD). Mathematical comparisons of the NUFBRD method and other common partitioning schemes are presented to show the expected performance level of this new decomposition technique. 1 Introduction Numerous problems that arise in science and engineering can be visualized as a 2-dimensional grid where the values of the individual elements vary over time in response to the values of some subset of nearest neighbors. Examples include thermal conduction, fluid dynamics, oceanogra...

