Results 1 -
9 of
9
Generalized multipartitioning for multi-dimensional arrays
- In Proceedings of the International Parallel and Distributed Processing Symposium, Fort Lauderdale, FL
, 2002
"... Multipartitioning is a strategy for parallelizing computations that require solving 1D recurrences along each dimension of a multi-dimensional array. Previous techniques for multipartitioning yield efficient parallelizations over 3D domains only when the number of processors is a perfect square. Thi ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Multipartitioning is a strategy for parallelizing computations that require solving 1D recurrences along each dimension of a multi-dimensional array. Previous techniques for multipartitioning yield efficient parallelizations over 3D domains only when the number of processors is a perfect square. This paper considers the general problem of computing multipartitionings for d-dimensional data volumes on an arbitrary number of processors. We describe an algorithm that computes an optimal multipartitioning onto all of the processors for this general case. Finally, we describe how we extended the Rice dHPF compiler for High Performance Fortran to generate code that exploits generalized multipartitioning and show that the compiler’s generated code for the NAS SP computational fluid dynamics benchmark achieves scalable high performance. 1.
Data-Parallel Compiler Support for Multipartitioning
, 2001
"... . Multipartitioning is a skewed-cyclic block distribution that yields better parallel e#ciency and scalability for line-sweep computations than traditional block partitionings. This paper describes extensions to the Rice dHPF compiler for High Performance Fortran that enable it to support multip ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
. Multipartitioning is a skewed-cyclic block distribution that yields better parallel e#ciency and scalability for line-sweep computations than traditional block partitionings. This paper describes extensions to the Rice dHPF compiler for High Performance Fortran that enable it to support multipartitioned data distributions and optimizations that enable dHPF to generate e#cient multipartitioned code. We describe experiments applying these techniques to parallelize serial versions of the NAS SP and BT application benchmarks and show that the performance of the code generated by dHPF is approaching that of hand-coded parallelizations based on multipartitioning. 1
On Efficient Parallelization of Line-Sweep Computations
- In 9th Workshop on Compilers for Parallel Computers
, 2001
"... Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that line-sweep computations can be performed e#ciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same nu ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that line-sweep computations can be performed e#ciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same number of tiles to compute at each step in the sweep. This property results in full, balanced parallelism. A secondary benefit of multipartitionings is that they induce only coarse-grain communication. Previously, computing a d-dimensional multipartitioning required that p 1 d-1 be integral, where p is the number of processors. Here, we describe an algorithm to compute a d-dimensional multipartitioning of an array of # dimensions for an arbitrary number of processors, for any d, 2 # d # #. When using a multipartitioning to parallelize a line sweep computation, the best partitioning is the one that exploits all of the processors and has the smallest communication volume. To compute the best multipartitioning of a #-dimensional array, we describe a cost model for selecting d, the dimensionality of the best partitioning, and the number of cuts along each partitioned dimension. In practice, our technique will choose a 3-dimensional multipartitioning for a 3-dimensional line-sweep computation, except when p is a prime; previously, a 3-dimensional multipartitioning could be applied only when # p is integral. We describe an implementation of multipartitioning in the Rice dHPF compiler and performance results obtained to parallelize a line sweep computation on a range of di#erent numbers of processors. # This work performed while a visiting scholar at Rice University. 1
A Framework for Integrating Data Alignment, Distribution, and Redistribution in Distributed Memory Multiprocessors
- IEEE Trans. Parallel Distributed Systems
, 1995
"... Parallel architectures with physically distributed memory provide a cost-effective scalability to solve many large scale scientific problems; however, these systems are very difficult to program and tune. In these systems, the choice of a good data mapping and parallelization strategy can dramati ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Parallel architectures with physically distributed memory provide a cost-effective scalability to solve many large scale scientific problems; however, these systems are very difficult to program and tune. In these systems, the choice of a good data mapping and parallelization strategy can dramatically improve the efficiency of the resulting program. In this paper we present a framework for automatic data mapping in the context of distributed memory multiprocessor systems. The framework is based on a new approach that allows the alignment, distribution and redistribution problems to be solved together using a single graph representation. The Communication-Parallelism Graph (CPG) is the structure that holds symbolic information about the potential data movement and parallelism inherent to the whole program. The CPG is then particularized for a given problem size and target system and used to find a minimal cost path through the graph using a general purpose linear 0-1 integer programming solver. The data layout strategy generated is optimal according to our current cost and compilation models. 1 1
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications
- JOURNAL OF INSTRUCTION-LEVEL PARALLELISM
, 2003
"... Data parallel compilers have long aimed to equal the performance of carefully hand-optimized parallel codes. For tightly-coupled applications based on line sweeps, this goal has been particularly elusive. In the Rice dHPF compiler, we have developed a wide spectrum of optimizations that enable us ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Data parallel compilers have long aimed to equal the performance of carefully hand-optimized parallel codes. For tightly-coupled applications based on line sweeps, this goal has been particularly elusive. In the Rice dHPF compiler, we have developed a wide spectrum of optimizations that enable us to closely approach hand-coded performance for tightly-coupled line sweep applications including the NAS SP and BT benchmark codes. From lightly-modified copies of standard serial versions of these benchmarks, dHPF generates MPI-based parallel code that is within 4% of the performance of the hand-crafted MPI implementations of these codes for a 102³ problem size (Class B) on 64 processors. We describe and quantitatively evaluate the impact of partitioning, communication and memory hierarchy optimizations implemented by dHPF that enable us to approach hand-coded performance with compiler-generated parallel code.
Generalized multipartitioning
- In Second Annual Los Alamos Computer Science Institute (LACSI) Sy mposisum
, 2001
"... Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors. With multipartitioning, computations that require solving one-dimensional recurrences along each dimension of a multi-dimensional array can be parallelized effectively. Previous techniques for ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors. With multipartitioning, computations that require solving one-dimensional recurrences along each dimension of a multi-dimensional array can be parallelized effectively. Previous techniques for multipartitioning yield efficient parallelizations over threedimensional domains only when the number of processors is a perfect square. This paper considers the general problem of computing optimal multipartitionings for d-dimensional data volumes on an arbitrary number of processors. We describe an algorithm that computes an optimal multipartitioning for this general case, which enables multipartitioning to be used for performing efficient parallelizations of linesweep computations under arbitrary conditions. Finally, we describe a prototype implementation of generalized multipartitioning in the Rice dHPF compiler and performance results obtained when using it to parallelize a line sweep computation for different numbers of processors. 1
Efficient parallelization of line-sweep computations
, 2001
"... Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that line-sweep computations can be performed efficiently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same ..."
Abstract
- Add to MetaCart
Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that line-sweep computations can be performed efficiently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same number of tiles to compute at each step in the sweep. This property results in full, balanced parallelism. A secondary benefit of multipartitionings is that they induce only coarse-grain communication. Previously, computing a d-dimensional multipartitioning required that p 1 d−1 be integral, where p is the number of processors. Here, we describe an algorithm to compute a d-dimensional multipartitioning of an array of ρ dimensions for an arbitrary number of processors, for any d, 2 ≤ d ≤ ρ. When using a multipartitioning to parallelize a line sweep computation, the best partitioning is the one that exploits all of the processors and has the smallest communication volume. To compute the best multipartitioning of a ρ-dimensional array, we describe a cost model for selecting d, the dimensionality of the best partitioning, and the number of cuts along each partitioned dimension. In practice, our technique will choose a 3-dimensional multipartitioning for a 3-dimensional line-sweep computation, except when p is a prime; previously, a 3-dimensional multipartitioning could be applied only when √ p is integral. We describe an implementation of multipartitioning in the Rice dHPF compiler and performance results obtained to parallelize a line sweep computation on a range of different numbers of processors. ∗This work performed while a visiting scholar at Rice University.
Latin Hyper-Rectangles for Efficient Parallelization of Line-Sweep Computations
"... Multipartitioning is a strategy for partitioning multi-dimensional arrays among a collection of processors so that line-sweep computations can be performed eciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same nu ..."
Abstract
- Add to MetaCart
Multipartitioning is a strategy for partitioning multi-dimensional arrays among a collection of processors so that line-sweep computations can be performed eciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same number of tiles to compute at each step in the sweep, in other words, it describes a latin hyper-rectangle, natural extension of the notion of latin squares. This property results in full, balanced parallelism. A secondary benet of multipartitionings is that they induce only coarse-grain communication. All of the multipartitionings described in the literature to date assign only one tile per processor per hyperplane of a multipartitioning (latin hyper-cube). While this class of multipartitionings is optimal for two dimensions, in three dimensions it requires the number of processors to be a perfect square. This paper considers the general problem of computing optimal multipartitionings for multi-dimensional data volumes on an arbitrary number of processors. We describe an algorithm to compute a d-dimensional multipartitioning of a multi-dimensional array for an arbitrary number of processors. When using a multipartitioning to parallelize a line sweep computation, the best partitioning is the one that exploits all of the processors and has the smallest communication volume. To compute the best multipartitioning of a multi-dimensional array, we describe a cost model for selecting d, the dimensionality of the best partitioning, and the number of cuts along each partitioned dimension. In practice, our technique will choose a 3-dimensional multipartitioning for a 3-dimensional line-sweep computation, except when p is a prime; previously, a 3-dimensional multipartitioning could be a...
Scientific Computing Research Environments for the Mathematica Sciences
, 2001
"... This report describes the research projects and accomplishments made possible through the availability of the sixteen processor SGI Origin 2000, purchased in parts with the funds from NSF SCREMS grant NSF 98-72009. To date the SGI Origin 2000 has served as the main computing facility in many inte ..."
Abstract
- Add to MetaCart
This report describes the research projects and accomplishments made possible through the availability of the sixteen processor SGI Origin 2000, purchased in parts with the funds from NSF SCREMS grant NSF 98-72009. To date the SGI Origin 2000 has served as the main computing facility in many interdisciplinary projects involving 48 faculty, research scientists, postdocs, graduate and undergraduate students from six departments at Rice University, as well as several visiting scholars and collaborators from other universities. Computations performed on the SGI Origin 2000 have led to 44 journal articles, proceedings articles, and technical reports. Availability of the SGI Origin 2000 on campus has led to a significant increase in the complexity of the problems we are able to tackle and it has served as the catalyst for several of the research projects described in the report. The sixteen processor SGI Origin 2000 continues to be a widely used and important computing resource on campus

