Results 1  10
of
22
BandwidthCentric Allocation of Independent Tasks on Heterogeneous Platforms
 In International Parallel and Distributed Processing Symposium (IPDPS’2002). IEEE Computer
, 2001
"... In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds of comput ..."
Abstract

Cited by 76 (28 self)
 Add to MetaCart
In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds of computation and communication, as well as dierent overlap capabilities. We dene a base model, and show how to determine the maximum steadystate throughput of a node in the base model, assuming we already know the throughput of the subtrees rooted at the node's children. Thus, a bottomup traversal of the tree determines the rate at which tasks can be processed in the full tree. The best allocation is bandwidthcentric: if enough bandwidth is available, then all nodes are kept busy; if bandwidth is limited, then tasks should be allocated only to the children which have suciently small communication times, regardless of their computation power. We then show how nodes with other capabilities ones that allow more or less overlapping of computation and communication than the base model can be transformed to equivalent nodes in the base model. We also show how to handle a more general communication model. Finally, we present simulation results of several demanddriven task allocation policies that show that our bandwidthcentric method obtains better results than allocating tasks to all processors on a rstcome, rst serve basis. Key words: heterogeneous computer, allocation, scheduling, grid, metacomputing. Corresponding author: Jeanne Ferrante The work of Larry Carter and Jeanne Ferrante was performed while visiting LIP. 1 1
Parallel data mining for association rules on sharedmemory multiprocessors
 In Proc. Supercomputing’96
, 1996
"... Abstract. In this paper we present a new parallel algorithm for data mining of association rules on sharedmemory multiprocessors. We study the degree of parallelism, synchronization, and data locality issues, and present optimizations for fast frequency computation. Experiments show that a signific ..."
Abstract

Cited by 72 (19 self)
 Add to MetaCart
Abstract. In this paper we present a new parallel algorithm for data mining of association rules on sharedmemory multiprocessors. We study the degree of parallelism, synchronization, and data locality issues, and present optimizations for fast frequency computation. Experiments show that a significant improvement of performance is achieved using our proposed optimizations. We also achieved good speedup for the parallel algorithm. A lot of datamining tasks (e.g. association rules, sequential patterns) use complex pointerbased data structures (e.g. hash trees) that typically suffer from suboptimal data locality. In the multiprocessor case shared access to these data structures may also result in false sharing. For these tasks it is commonly observed that the recursive data structure is built once and accessed multiple times during each iteration. Furthermore, the access patterns after the build phase are highly ordered. In such cases locality and false sharing sensitive memory placement of these structures can enhance performance significantly. We evaluate a set of placement policies for parallel association discovery, and show that simple placement schemes can improve execution time by more than a factor of two. More complex schemes yield additional gains.
A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)
, 2001
"... In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the perform ..."
Abstract

Cited by 50 (25 self)
 Add to MetaCart
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous platforms with respect to the performance of the processors. When targeting unidimensional grids, the loadbalancing problem can be solved rather easily. When targeting twodimensional grids, which are the key to scalability and efficiency for numerical kernels, the problem turns out to be surprisingly difficult. We formally state the 2D loadbalancing problem and prove its NPcompleteness. Next, we introduce a data allocation heuristic, which turns out to be very satisfactory: Its practical usefulness is demonstrated by MPI experiments conducted with a heterogeneous network of workstations.
Matrix Multiplication on Heterogeneous Platforms
, 2001
"... this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the ..."
Abstract

Cited by 37 (17 self)
 Add to MetaCart
this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different speed resources while minimizing the communication volume. We formally state this problem in a geometric framework and prove its NPcompleteness. Next, we introduce a (polynomial) columnbased heuristic, which turns out to be very satisfactory: We derive a theoretical performance guarantee for the heuristic and we assess its practical usefulness through MPI experiments
Adaptive parallel computing on heterogeneous networks with mpC
 Parallel Computing
, 2002
"... The paper presents a new advanced version of the mpC parallel language. The language was designed specially for programming highperformance parallel computations on heterogeneous networks of computers. The advanced version allows the programmer to define at runtime all the main features of the unde ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
The paper presents a new advanced version of the mpC parallel language. The language was designed specially for programming highperformance parallel computations on heterogeneous networks of computers. The advanced version allows the programmer to define at runtime all the main features of the underlying parallel algorithm, which have an impact on the application execution performance. The mpC programming system uses this information along with the information about the performance of the executing network to map the processes of the parallel program to this network so as to achieve better execution time.
Algorithmic Issues on Heterogeneous Computing Platforms
, 1998
"... This paper discusses some algorithmic issues when computing with a heterogeneous network of workstations (the typical poor man's parallel computer). Dealing with processors of dierent speeds requires to use more involved strategies than blockcyclic data distributions. Dynamic data distribution is ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
This paper discusses some algorithmic issues when computing with a heterogeneous network of workstations (the typical poor man's parallel computer). Dealing with processors of dierent speeds requires to use more involved strategies than blockcyclic data distributions. Dynamic data distribution is a rst possibility but may prove impractical and not scalable due to communication and control overhead.
Static Tiling for Heterogeneous Computing Platforms
, 1999
"... In the framework of fully permutable loops, tiling has been extensively studied as a sourceto source program transformation. However, little work has been devoted to the mapping and scheduling of the tiles on physical processors. Moreover, targeting heterogeneous computing platforms has, to the bes ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
In the framework of fully permutable loops, tiling has been extensively studied as a sourceto source program transformation. However, little work has been devoted to the mapping and scheduling of the tiles on physical processors. Moreover, targeting heterogeneous computing platforms has, to the best of our knowledge, never been considered. In this paper we extend static tiling techniques to the context of limited computational resources with dierentspeed processors. In particular, we present eÆcient scheduling and mapping strategies that are asymptotically optimal. The practical usefulness of these strategies is fully demonstrated by MPI experiments on a heterogeneous network of workstations. Key words: tiling, communicationcomputation overlap, mapping, limited resources, dierentspeed processors, heterogeneous networks Corresponding author: Yves Robert LIP, Ecole Normale Superieure de Lyon, 69364 Lyon Cedex 07, France Phone: + 33 4 72 72 80 37, Fax: + 33 4 72 72 80 80 Email: Y...
LoadBalancing Scatter Operations for Grid Computing
 IN 12TH HETEROGENEOUS COMPUTING WORKSHOP (HCW’2003). IEEE CS
, 2003
"... We present solutions to statically loadbalance scatter operations in parallel codes run on Grids. Our loadbalancing strategy is based on the modification of the data distributions used in scatter operations. We need to modify the user source code, but we want to keep the code as close as possible t ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
We present solutions to statically loadbalance scatter operations in parallel codes run on Grids. Our loadbalancing strategy is based on the modification of the data distributions used in scatter operations. We need to modify the user source code, but we want to keep the code as close as possible to the original. Hence, we study the replacement of scatter operations with a parameterized scatter, allowing a custom distribution of data. The paper presents: 1) a general algorithm which finds an optimal distribution of data across processors; 2) a quicker guaranteed heuristic relying on hypotheses on communications and computations; 3) a policy on the ordering of the processors. Experimental results with an MPI scientific code of seismic tomography illustrate the benefits obtained from our loadbalancing.
LoadBalancing Iterative Computations on Heterogeneous Clusters
"... We focus on mapping iterative algorithms onto heterogeneous clusters. The application data is partitioned over the processors, which are arranged along a virtual ring. At each iteration, independent calculations are carried out in parallel, and some communications take place between consecutive p ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
We focus on mapping iterative algorithms onto heterogeneous clusters. The application data is partitioned over the processors, which are arranged along a virtual ring. At each iteration, independent calculations are carried out in parallel, and some communications take place between consecutive processors in the ring. The question is to determine how to slice the application data into chunks, and assign these chunks to the processors, so that the total execution time is minimized. A major
Algorithms and Tools for (Distributed) Heterogeneous Computing: A Prospective Report
, 1999
"... We discuss algorithms and tools to help program and use metacomputing resources in the forthcoming years. Metacomputing with highly distributed heterogeneous environments stands to become a major, if not dominant, method to implement all kinds of parallel applications. In this report, we survey some ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We discuss algorithms and tools to help program and use metacomputing resources in the forthcoming years. Metacomputing with highly distributed heterogeneous environments stands to become a major, if not dominant, method to implement all kinds of parallel applications. In this report, we survey some general aspects of metacomputing (hardware, system and administration issues, as well as the application eld). Next we identify some algorithmic issues and software challenges that must be solved to eÆciently program and/or transparently use such platforms: Data decomposition techniques for cluster computing, Granularity issues for metacomputing, Scheduling and loadbalancing methods, Programming models. We illustrate each of these issues and challenges by the analysis of several case studies: Cluster ScaLAPACK, AppLeS, Globus, Legion, Albatross and Netsolve. We conclude this report by stating some nal remarks and recommendations. mbox Acknowledgments: This research report is...