Results 1 
9 of
9
Generalized multipartitioning for multidimensional arrays
 In Proceedings of the International Parallel and Distributed Processing Symposium, Fort Lauderdale, FL
, 2002
"... Multipartitioning is a strategy for parallelizing computations that require solving 1D recurrences along each dimension of a multidimensional array. Previous techniques for multipartitioning yield efficient parallelizations over 3D domains only when the number of processors is a perfect square. Thi ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Multipartitioning is a strategy for parallelizing computations that require solving 1D recurrences along each dimension of a multidimensional array. Previous techniques for multipartitioning yield efficient parallelizations over 3D domains only when the number of processors is a perfect square. This paper considers the general problem of computing multipartitionings for ddimensional data volumes on an arbitrary number of processors. We describe an algorithm that computes an optimal multipartitioning onto all of the processors for this general case. Finally, we describe how we extended the Rice dHPF compiler for High Performance Fortran to generate code that exploits generalized multipartitioning and show that the compiler’s generated code for the NAS SP computational fluid dynamics benchmark achieves scalable high performance. 1.
DataParallel Compiler Support for Multipartitioning
, 2001
"... . Multipartitioning is a skewedcyclic block distribution that yields better parallel e#ciency and scalability for linesweep computations than traditional block partitionings. This paper describes extensions to the Rice dHPF compiler for High Performance Fortran that enable it to support multip ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
. Multipartitioning is a skewedcyclic block distribution that yields better parallel e#ciency and scalability for linesweep computations than traditional block partitionings. This paper describes extensions to the Rice dHPF compiler for High Performance Fortran that enable it to support multipartitioned data distributions and optimizations that enable dHPF to generate e#cient multipartitioned code. We describe experiments applying these techniques to parallelize serial versions of the NAS SP and BT application benchmarks and show that the performance of the code generated by dHPF is approaching that of handcoded parallelizations based on multipartitioning. 1
On Efficient Parallelization of LineSweep Computations
 In 9th Workshop on Compilers for Parallel Computers
, 2001
"... Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed e#ciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same nu ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed e#ciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same number of tiles to compute at each step in the sweep. This property results in full, balanced parallelism. A secondary benefit of multipartitionings is that they induce only coarsegrain communication. Previously, computing a ddimensional multipartitioning required that p 1 d1 be integral, where p is the number of processors. Here, we describe an algorithm to compute a ddimensional multipartitioning of an array of # dimensions for an arbitrary number of processors, for any d, 2 # d # #. When using a multipartitioning to parallelize a line sweep computation, the best partitioning is the one that exploits all of the processors and has the smallest communication volume. To compute the best multipartitioning of a #dimensional array, we describe a cost model for selecting d, the dimensionality of the best partitioning, and the number of cuts along each partitioned dimension. In practice, our technique will choose a 3dimensional multipartitioning for a 3dimensional linesweep computation, except when p is a prime; previously, a 3dimensional multipartitioning could be applied only when # p is integral. We describe an implementation of multipartitioning in the Rice dHPF compiler and performance results obtained to parallelize a line sweep computation on a range of di#erent numbers of processors. # This work performed while a visiting scholar at Rice University. 1
A Framework for Integrating Data Alignment, Distribution, and Redistribution in Distributed Memory Multiprocessors
 IEEE Trans. Parallel Distributed Systems
, 1995
"... Parallel architectures with physically distributed memory provide a costeffective scalability to solve many large scale scientific problems; however, these systems are very difficult to program and tune. In these systems, the choice of a good data mapping and parallelization strategy can dramati ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Parallel architectures with physically distributed memory provide a costeffective scalability to solve many large scale scientific problems; however, these systems are very difficult to program and tune. In these systems, the choice of a good data mapping and parallelization strategy can dramatically improve the efficiency of the resulting program. In this paper we present a framework for automatic data mapping in the context of distributed memory multiprocessor systems. The framework is based on a new approach that allows the alignment, distribution and redistribution problems to be solved together using a single graph representation. The CommunicationParallelism Graph (CPG) is the structure that holds symbolic information about the potential data movement and parallelism inherent to the whole program. The CPG is then particularized for a given problem size and target system and used to find a minimal cost path through the graph using a general purpose linear 01 integer programming solver. The data layout strategy generated is optimal according to our current cost and compilation models. 1 1
An Evaluation of DataParallel Compiler Support for LineSweep Applications
 JOURNAL OF INSTRUCTIONLEVEL PARALLELISM
, 2003
"... Data parallel compilers have long aimed to equal the performance of carefully handoptimized parallel codes. For tightlycoupled applications based on line sweeps, this goal has been particularly elusive. In the Rice dHPF compiler, we have developed a wide spectrum of optimizations that enable us ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Data parallel compilers have long aimed to equal the performance of carefully handoptimized parallel codes. For tightlycoupled applications based on line sweeps, this goal has been particularly elusive. In the Rice dHPF compiler, we have developed a wide spectrum of optimizations that enable us to closely approach handcoded performance for tightlycoupled line sweep applications including the NAS SP and BT benchmark codes. From lightlymodified copies of standard serial versions of these benchmarks, dHPF generates MPIbased parallel code that is within 4% of the performance of the handcrafted MPI implementations of these codes for a 102³ problem size (Class B) on 64 processors. We describe and quantitatively evaluate the impact of partitioning, communication and memory hierarchy optimizations implemented by dHPF that enable us to approach handcoded performance with compilergenerated parallel code.
Generalized multipartitioning
 In Second Annual Los Alamos Computer Science Institute (LACSI) Sy mposisum
, 2001
"... Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors. With multipartitioning, computations that require solving onedimensional recurrences along each dimension of a multidimensional array can be parallelized effectively. Previous techniques for ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors. With multipartitioning, computations that require solving onedimensional recurrences along each dimension of a multidimensional array can be parallelized effectively. Previous techniques for multipartitioning yield efficient parallelizations over threedimensional domains only when the number of processors is a perfect square. This paper considers the general problem of computing optimal multipartitionings for ddimensional data volumes on an arbitrary number of processors. We describe an algorithm that computes an optimal multipartitioning for this general case, which enables multipartitioning to be used for performing efficient parallelizations of linesweep computations under arbitrary conditions. Finally, we describe a prototype implementation of generalized multipartitioning in the Rice dHPF compiler and performance results obtained when using it to parallelize a line sweep computation for different numbers of processors. 1
Efficient parallelization of linesweep computations
, 2001
"... Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed efficiently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same ..."
Abstract
 Add to MetaCart
Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed efficiently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same number of tiles to compute at each step in the sweep. This property results in full, balanced parallelism. A secondary benefit of multipartitionings is that they induce only coarsegrain communication. Previously, computing a ddimensional multipartitioning required that p 1 d−1 be integral, where p is the number of processors. Here, we describe an algorithm to compute a ddimensional multipartitioning of an array of ρ dimensions for an arbitrary number of processors, for any d, 2 ≤ d ≤ ρ. When using a multipartitioning to parallelize a line sweep computation, the best partitioning is the one that exploits all of the processors and has the smallest communication volume. To compute the best multipartitioning of a ρdimensional array, we describe a cost model for selecting d, the dimensionality of the best partitioning, and the number of cuts along each partitioned dimension. In practice, our technique will choose a 3dimensional multipartitioning for a 3dimensional linesweep computation, except when p is a prime; previously, a 3dimensional multipartitioning could be applied only when √ p is integral. We describe an implementation of multipartitioning in the Rice dHPF compiler and performance results obtained to parallelize a line sweep computation on a range of different numbers of processors. ∗This work performed while a visiting scholar at Rice University.
Latin HyperRectangles for Efficient Parallelization of LineSweep Computations
"... Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed eciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same nu ..."
Abstract
 Add to MetaCart
Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed eciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same number of tiles to compute at each step in the sweep, in other words, it describes a latin hyperrectangle, natural extension of the notion of latin squares. This property results in full, balanced parallelism. A secondary benet of multipartitionings is that they induce only coarsegrain communication. All of the multipartitionings described in the literature to date assign only one tile per processor per hyperplane of a multipartitioning (latin hypercube). While this class of multipartitionings is optimal for two dimensions, in three dimensions it requires the number of processors to be a perfect square. This paper considers the general problem of computing optimal multipartitionings for multidimensional data volumes on an arbitrary number of processors. We describe an algorithm to compute a ddimensional multipartitioning of a multidimensional array for an arbitrary number of processors. When using a multipartitioning to parallelize a line sweep computation, the best partitioning is the one that exploits all of the processors and has the smallest communication volume. To compute the best multipartitioning of a multidimensional array, we describe a cost model for selecting d, the dimensionality of the best partitioning, and the number of cuts along each partitioned dimension. In practice, our technique will choose a 3dimensional multipartitioning for a 3dimensional linesweep computation, except when p is a prime; previously, a 3dimensional multipartitioning could be a...
Scientific Computing Research Environments for the Mathematica Sciences
, 2001
"... This report describes the research projects and accomplishments made possible through the availability of the sixteen processor SGI Origin 2000, purchased in parts with the funds from NSF SCREMS grant NSF 9872009. To date the SGI Origin 2000 has served as the main computing facility in many inte ..."
Abstract
 Add to MetaCart
This report describes the research projects and accomplishments made possible through the availability of the sixteen processor SGI Origin 2000, purchased in parts with the funds from NSF SCREMS grant NSF 9872009. To date the SGI Origin 2000 has served as the main computing facility in many interdisciplinary projects involving 48 faculty, research scientists, postdocs, graduate and undergraduate students from six departments at Rice University, as well as several visiting scholars and collaborators from other universities. Computations performed on the SGI Origin 2000 have led to 44 journal articles, proceedings articles, and technical reports. Availability of the SGI Origin 2000 on campus has led to a significant increase in the complexity of the problems we are able to tackle and it has served as the catalyst for several of the research projects described in the report. The sixteen processor SGI Origin 2000 continues to be a widely used and important computing resource on campus