Results 1 
5 of
5
On Supernode Transformation with Minimized Total Running Time
"... With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to find an optimal supernode size and optimal supernode relative side lengths of a supernode transformation (also known as tiling). We identify three ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to find an optimal supernode size and optimal supernode relative side lengths of a supernode transformation (also known as tiling). We identify three parameters of supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For algorithms with perfectly nested loops and uniform dependencies, for sufficiently large supernodes and number of processors, and for the case where multiple supernodes are mapped to a single processor, we give an order n polynomial whose real positive roots include the optimal supernode size. For two special cases: (1) two dimensional algorithm problems and (2) ndimensional algorithm problems where the communication cost is dominated by the startup penalty and therefore, can be approximated by a constant, we give a closed form expression for the optimal supernode s...
Optimal Grain Size Computation for Pipelined Algorithms
 In Europar'96 Parallel Processing
, 1996
"... . In this paper, we present a method for overlapping communications on parallel computers for pipelined algorithms. We first introduce a general theoretical model which leads to a generic computation scheme for the optimal packet size. Then, we use the OPIUM 3 library, which provides an easytous ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
. In this paper, we present a method for overlapping communications on parallel computers for pipelined algorithms. We first introduce a general theoretical model which leads to a generic computation scheme for the optimal packet size. Then, we use the OPIUM 3 library, which provides an easytouse and efficient way to compute, in the general case, this optimal packet size, on the column LU factorization; the implementation and performance measures are made on an Intel Paragon. Keywords : Communications overlap, pipelined algorithms, optimal packet size computation. 1 Introduction Parallel distributed memory machines improve performances and memory capacity but their use adds an overhead due to the communications. To obtain programs that perform and scale well, this overhead must be hidden. Several solutions exist. The choice of a good data distribution is the first step that can be done to lower the number and the size of communications. Depending of the dependences within the cod...
On Optimal Size And Shape Of Supernode Transformations
"... Supernode transformation has been proposed to reduce the communication startup cost by grouping a number of iterations in a perfectly nested loop with uniform dependencies as a supernode which is assigned to a processor as a single unit. A supernode transformation is specified by n families of hyper ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Supernode transformation has been proposed to reduce the communication startup cost by grouping a number of iterations in a perfectly nested loop with uniform dependencies as a supernode which is assigned to a processor as a single unit. A supernode transformation is specified by n families of hyperplanes which slice the iteration space into parallelepiped supernodes, the grain size of a supernode, and the relative side lengths of the parallelepiped supernode. The total running time is affected by the three factors. In this paper, how to find an optimal grain size and an optimal relative side length vector, with the goal of minimizing total running time, is addressed. Our results show that the optimal grain size is proportional to the ratio of the communication startup cost and the computation speed of the processor, and that the optimal supernode shape is similar to the shape of the index space, in the case of hypercube index spaces and supernodes. 1 INTRODUCTION Supernode partition...
Time Optimal Supernode Shape For Algorithms With n Extreme Dependence Directions
, 1998
"... With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to find an optimal supernode shape of a supernode transformation (also known as tiling). We assume the communication cost to be dominated by the startup ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to find an optimal supernode shape of a supernode transformation (also known as tiling). We assume the communication cost to be dominated by the startup penalty and therefore, can be approximated by a constant. We identify three parameters of supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For algorithms with perfectly nested loops and uniform dependencies, we give a closed form for an optimal linear schedule vector, a necessary and sufficient condition for an optimal relative side lengths, and for dependence cones with n extreme directions, we prove that the total running time is minimized for the cutting hyperplane directions corresponding to the n extreme directions of the dependence cone. The results are derived in continuous space and should for that reason be considered approximate. ...
On Supernode Partitioning Hyperplanes for Two Dimensional Algorithms
 Proceedings IASTED
, 1997
"... Supernode transformation has been proposed to reduce the communication startup cost by grouping a number of iterations in a loop as a supernode which is assigned to a processor as a single unit. A supernode transformation is specified by n families of hyperplanes which slice the iteration space into ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Supernode transformation has been proposed to reduce the communication startup cost by grouping a number of iterations in a loop as a supernode which is assigned to a processor as a single unit. A supernode transformation is specified by n families of hyperplanes which slice the iteration space into parallelepiped supernodes, the grain size of a supernode, and the relative side lengths of the parallelepiped supernode. The total running time is affected by these three factors. In this paper, we discuss how to find supernode partitioning hyperplanes assuming a given grain size. We prove that for two dimensional algorithms, the supernode partitioning with the two extreme dependence vectors as its hyperplane directions is optimal with respect to the total execution time. Keywords Supernode partitioning, tiling, parallelizing compilers, distributed memory multicomputer, minimizing running time. 1 INTRODUCTION A problem in distributed memory parallel systems is the communication startu...