Results 1 
7 of
7
Localizing Nonaffine Array References
, 1999
"... Existing techniques can enhance the locality of arrays indexed by affine functions of induction variables. This paper presents a technique to localize nonaffine array references, such as the indirect memory references common in sparsematrix computations. Our optimization combines elements of tilin ..."
Abstract

Cited by 48 (9 self)
 Add to MetaCart
Existing techniques can enhance the locality of arrays indexed by affine functions of induction variables. This paper presents a technique to localize nonaffine array references, such as the indirect memory references common in sparsematrix computations. Our optimization combines elements of tiling, datacentric tiling, data remapping and inspectorexecutor parallelization. We describe our technique, bucket tiling, which includes the tasks of permutation generation, data remapping, and loop regeneration. We show that profitability cannot generally be determined at compiletime, but requires an extension to runtime. We demonstrate our technique on three codes: integer sort, conjugate gradient, and a kernel used in simulating a beating heart. We observe speedups of 1.91 on integer sort, 1.57 on conjugate gradient, and 2.69 on the heart kernel. 1. Introduction Researchers have long sought to increase data locality and exploit parallelism in loop nests [34, 32, 16, 5, 33, 18]. These wor...
Renumbering Unstructured Grids to Improve the Performance of Codes on Hierarchical Memory Machines
, 1995
"... The performance of unstructured grid codes on workstations and distributed memory parallel computers is substantially affected by the efficiency of the memory hierarchy. This efficiency essentially depends on the order of computation and numbering of the grid. Most grid generators do not take into a ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
The performance of unstructured grid codes on workstations and distributed memory parallel computers is substantially affected by the efficiency of the memory hierarchy. This efficiency essentially depends on the order of computation and numbering of the grid. Most grid generators do not take into account the effect of the memory hierarchy when producing grids so application programmers must renumber grids to improve the performance of their codes. To design a good renumbering scheme a detailed runtime analysis of the data movement in an application code is needed. Thus, a memory hierarchy simulator has been developed to analyse the effect of existing renumbering schemes such as bandwidth reduction, the Greedy method, colouring, random numbering, and the original numbering produced by the grid generator. The renumbering is applied to either vertices, edges, faces or cells and two algorithms are proposed to consistently renumber the other entities used in the solver. The simulated and a...
Compiler and RunTime Support for Irregular Computations
, 1995
"... There are many important applications in computational fluid dynamics, circuit simulation and structural analysis that can be more accurately modeled using iterations on unstructured grids. In these problems, regular compiler analysis for Massively Parallel Processors (MPP) with distributed address ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
There are many important applications in computational fluid dynamics, circuit simulation and structural analysis that can be more accurately modeled using iterations on unstructured grids. In these problems, regular compiler analysis for Massively Parallel Processors (MPP) with distributed address space fails because communication can only be determined at runtime. However, in many of these applications the communication pattern repeats for every iteration. Therefore, equivalent optimizations to the regular case can be achieved with a combination of runtime support (RTS) and compiler analysis.
Automatic Parallelization of the Conjugate Gradient Algorithm
 In The Eight International Workshop on Languages and Compilers for Parallel Computing, LNCS #1033
, 1995
"... The conjugate gradient (CG) method is a popular Krylov space method for solving systems of linear equations of the form Ax = b, where A is a symmetric positivedefinite matrix. This method can be applied regardless of whether A is dense or sparse. In this paper, we show how restructuring compile ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The conjugate gradient (CG) method is a popular Krylov space method for solving systems of linear equations of the form Ax = b, where A is a symmetric positivedefinite matrix. This method can be applied regardless of whether A is dense or sparse. In this paper, we show how restructuring compiler technology can be applied to transform a sequential, dense matrix CG program into a parallel, sparse matrix CG program. On the IBM SP2, the performance of our compiled code is comparable to that of handwritten code from the PETSc library at Argonne.
On the Applicability of Program Comprehension Techniques to the Automatic Parallelization of Sparse Matrix Computations
"... Spaceefficient data structures for sparse matrices are an important concept in numerical programming because they allow for considerable savings in space and time compared with common twodimensional arrays. Unfortunately, for such programs it is usually impossible to statically determine all data ..."
Abstract
 Add to MetaCart
Spaceefficient data structures for sparse matrices are an important concept in numerical programming because they allow for considerable savings in space and time compared with common twodimensional arrays. Unfortunately, for such programs it is usually impossible to statically determine all data dependencies. Thus, automatic parallelization of such codes is usually done at run time by applying the inspectorexecutor technique, incurring tremendous overhead. Program comprehension techniques exploit knowledge on frequently occurring implementation variations of important computations. They have been shown to improve many important fields of automatic parallelization of dense matrix computations, such as automatic program transformation and local algorithm replacement, data flow analysis, array distribution, and performance prediction. In this study we investigate up to which degree this approach could be generalized to sequential codes implementing sparse matrix computations, and h...
unknown title
, 1996
"... Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines ..."
Abstract
 Add to MetaCart
Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines