Results 1  10
of
38
BandwidthCentric Allocation of Independent Tasks on Heterogeneous Platforms
 In International Parallel and Distributed Processing Symposium (IPDPS’2002). IEEE Computer
, 2001
"... In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds ..."
Abstract

Cited by 79 (29 self)
 Add to MetaCart
(Show Context)
In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds of computation and communication, as well as dierent overlap capabilities. We dene a base model, and show how to determine the maximum steadystate throughput of a node in the base model, assuming we already know the throughput of the subtrees rooted at the node's children. Thus, a bottomup traversal of the tree determines the rate at which tasks can be processed in the full tree. The best allocation is bandwidthcentric: if enough bandwidth is available, then all nodes are kept busy; if bandwidth is limited, then tasks should be allocated only to the children which have suciently small communication times, regardless of their computation power. We then show how nodes with other capabilities ones that allow more or less overlapping of computation and communication than the base model can be transformed to equivalent nodes in the base model. We also show how to handle a more general communication model. Finally, we present simulation results of several demanddriven task allocation policies that show that our bandwidthcentric method obtains better results than allocating tasks to all processors on a rstcome, rst serve basis. Key words: heterogeneous computer, allocation, scheduling, grid, metacomputing. Corresponding author: Jeanne Ferrante The work of Larry Carter and Jeanne Ferrante was performed while visiting LIP. 1 1
Parallel data mining for association rules on sharedmemory multiprocessors
 In Proc. Supercomputing’96
, 1996
"... Abstract. In this paper we present a new parallel algorithm for data mining of association rules on sharedmemory multiprocessors. We study the degree of parallelism, synchronization, and data locality issues, and present optimizations for fast frequency computation. Experiments show that a signific ..."
Abstract

Cited by 75 (19 self)
 Add to MetaCart
Abstract. In this paper we present a new parallel algorithm for data mining of association rules on sharedmemory multiprocessors. We study the degree of parallelism, synchronization, and data locality issues, and present optimizations for fast frequency computation. Experiments show that a significant improvement of performance is achieved using our proposed optimizations. We also achieved good speedup for the parallel algorithm. A lot of datamining tasks (e.g. association rules, sequential patterns) use complex pointerbased data structures (e.g. hash trees) that typically suffer from suboptimal data locality. In the multiprocessor case shared access to these data structures may also result in false sharing. For these tasks it is commonly observed that the recursive data structure is built once and accessed multiple times during each iteration. Furthermore, the access patterns after the build phase are highly ordered. In such cases locality and false sharing sensitive memory placement of these structures can enhance performance significantly. We evaluate a set of placement policies for parallel association discovery, and show that simple placement schemes can improve execution time by more than a factor of two. More complex schemes yield additional gains.
A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)
, 2001
"... In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the perform ..."
Abstract

Cited by 54 (25 self)
 Add to MetaCart
(Show Context)
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous platforms with respect to the performance of the processors. When targeting unidimensional grids, the loadbalancing problem can be solved rather easily. When targeting twodimensional grids, which are the key to scalability and efficiency for numerical kernels, the problem turns out to be surprisingly difficult. We formally state the 2D loadbalancing problem and prove its NPcompleteness. Next, we introduce a data allocation heuristic, which turns out to be very satisfactory: Its practical usefulness is demonstrated by MPI experiments conducted with a heterogeneous network of workstations.
Matrix Multiplication on Heterogeneous Platforms
, 2001
"... this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the ..."
Abstract

Cited by 45 (17 self)
 Add to MetaCart
this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different speed resources while minimizing the communication volume. We formally state this problem in a geometric framework and prove its NPcompleteness. Next, we introduce a (polynomial) columnbased heuristic, which turns out to be very satisfactory: We derive a theoretical performance guarantee for the heuristic and we assess its practical usefulness through MPI experiments
Adaptive parallel computing on heterogeneous networks with mpC
 Parallel Computing
, 2002
"... The paper presents a new advanced version of the mpC parallel language. The language was designed specially for programming highperformance parallel computations on heterogeneous networks of computers. The advanced version allows the programmer to define at runtime all the main features of the unde ..."
Abstract

Cited by 37 (21 self)
 Add to MetaCart
The paper presents a new advanced version of the mpC parallel language. The language was designed specially for programming highperformance parallel computations on heterogeneous networks of computers. The advanced version allows the programmer to define at runtime all the main features of the underlying parallel algorithm, which have an impact on the application execution performance. The mpC programming system uses this information along with the information about the performance of the executing network to map the processes of the parallel program to this network so as to achieve better execution time.
Algorithmic Issues on Heterogeneous Computing Platforms
, 1998
"... This paper discusses some algorithmic issues when computing with a heterogeneous network of workstations (the typical poor man's parallel computer). Dealing with processors of different speeds requires to use more involved strategies than blockcyclic data distributions. Dynamic data distributi ..."
Abstract

Cited by 17 (10 self)
 Add to MetaCart
This paper discusses some algorithmic issues when computing with a heterogeneous network of workstations (the typical poor man's parallel computer). Dealing with processors of different speeds requires to use more involved strategies than blockcyclic data distributions. Dynamic data distribution is a first possibility but may prove impractical and not scalable due to communication and control overhead. Static data distributions tuned to balance execution times constitute another possibility but may prove inefficient due to variations in the processor speeds (e.g. because of different workloads during the computation). We introduce a static distribution strategy that can be refined on the y, and we show that it is wellsuited to parallelizing scientific computing applications such as finitedifference stencils or LU decomposition.
Static Tiling for Heterogeneous Computing Platforms
, 1999
"... In the framework of fully permutable loops, tiling has been extensively studied as a sourceto source program transformation. However, little work has been devoted to the mapping and scheduling of the tiles on physical processors. Moreover, targeting heterogeneous computing platforms has, to the bes ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
In the framework of fully permutable loops, tiling has been extensively studied as a sourceto source program transformation. However, little work has been devoted to the mapping and scheduling of the tiles on physical processors. Moreover, targeting heterogeneous computing platforms has, to the best of our knowledge, never been considered. In this paper we extend static tiling techniques to the context of limited computational resources with dierentspeed processors. In particular, we present eÆcient scheduling and mapping strategies that are asymptotically optimal. The practical usefulness of these strategies is fully demonstrated by MPI experiments on a heterogeneous network of workstations. Key words: tiling, communicationcomputation overlap, mapping, limited resources, dierentspeed processors, heterogeneous networks Corresponding author: Yves Robert LIP, Ecole Normale Superieure de Lyon, 69364 Lyon Cedex 07, France Phone: + 33 4 72 72 80 37, Fax: + 33 4 72 72 80 80 Email: Y...
LoadBalancing Scatter Operations for Grid Computing
 IN 12TH HETEROGENEOUS COMPUTING WORKSHOP (HCW’2003). IEEE CS
, 2003
"... We present solutions to statically loadbalance scatter operations in parallel codes run on Grids. Our loadbalancing strategy is based on the modification of the data distributions used in scatter operations. We need to modify the user source code, but we want to keep the code as close as possible t ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
We present solutions to statically loadbalance scatter operations in parallel codes run on Grids. Our loadbalancing strategy is based on the modification of the data distributions used in scatter operations. We need to modify the user source code, but we want to keep the code as close as possible to the original. Hence, we study the replacement of scatter operations with a parameterized scatter, allowing a custom distribution of data. The paper presents: 1) a general algorithm which finds an optimal distribution of data across processors; 2) a quicker guaranteed heuristic relying on hypotheses on communications and computations; 3) a policy on the ordering of the processors. Experimental results with an MPI scientific code of seismic tomography illustrate the benefits obtained from our loadbalancing.
Static Scheduling Strategies for Heterogeneous Systems
, 2002
"... In this paper, we consider static scheduling techniques for heterogeneous systems, such as clusters and grids. We successively deal with minimum makespan scheduling, divisible load scheduling and steadystate scheduling. Finally, ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
In this paper, we consider static scheduling techniques for heterogeneous systems, such as clusters and grids. We successively deal with minimum makespan scheduling, divisible load scheduling and steadystate scheduling. Finally,
Algorithmic Issues for (Distributed) Heterogeneous Computing Platforms
 In Rajkumar Buyya and Toni Cortes, editors, Cluster Computing Technologies, Environments, and Applications (CCTEA'99). CSREA
, 1999
"... Future computing platforms will be distributed and heterogeneous. Such platforms range from heterogeneous networks of workstations (NOWs) to collections of NOWs and parallel servers scattered throughout the world and linked through highspeed networks. Implementing tightlycoupled algorithms on such ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Future computing platforms will be distributed and heterogeneous. Such platforms range from heterogeneous networks of workstations (NOWs) to collections of NOWs and parallel servers scattered throughout the world and linked through highspeed networks. Implementing tightlycoupled algorithms on such platforms raises several challenging issues. New data distribution and load balancing strategies are required to squeeze the most out of heterogeneous platforms. In this paper, we rst summarize previous results obtained for heterogeneous NOWs, dealing with the implementation of standard numerical kernels such as nitedierence stencils or dense linear solvers. Next we target distributed collections of heterogeneous NOWs, and we discuss data allocation strategies for dense linear solvers on top of such platforms. These results indicate that a major algorithmic and software eort is needed to come up with eÆcient numerical libraries on the computational grid. Keywords: metacomputing, heter...