## Vienna-Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation (1997)

Venue: | IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS |

Citations: | 32 - 11 self |

### BibTeX

@ARTICLE{Ujaldon97vienna-fortran/hpfextensions,

author = {Manuel Ujaldon and Emilio L. Zapata and Barbara M. Chapman and Hans P. Zima},

title = {Vienna-Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation},

journal = {IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS},

year = {1997},

volume = {8},

pages = {1068--1083}

}

### Years of Citing Articles

### OpenURL

### Abstract

Vienna Fortran, High Performance Fortran (HPF) and other data parallel languages have been introduced to allow the programming of massively parallel distributed-memory machines (DMMP) at a relatively high level of abstraction based on the SPMD paradigm. Their main features include directives to express the distribution of data and computations across the processors of a machine. In this paper, we use Vienna-Fortran as a general framework for dealing with sparse data structures. We describe new methods for the representation and distribution of such data on DMMPs, and propose simple language features that permit the user to characterize a matrix as "sparse" and specify the associated representation. Together with the data distribution for the matrix, this enables the compiler and runtime system to translate sequential sparse code into explicitly parallel message-passing code. We develop new compilation and runtime techniques, which focus on achieving storage economy and reducing communi...

### Citations

534 |
Computer Solution of Large Sparse Positive Definite Matrices
- GEORGE, LIU
- 1981
(Show Context)
Citation Context ... a small number of its elements are non-zero. A range of methods have been developed which enable sparse computations to be performed with considerable savings in terms of both memory and computation =-=[16]-=-. Solution schemes are often optimized to take advantage of the structure within the matrix. This has consequences for parallelization. Firstly, we want to retain as much of these savings as possible ... |

529 |
der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, Society for Industrial and Applied Mathematics
- Barrett, Berry, et al.
- 1994
(Show Context)
Citation Context ...m There is a wide range of techniques to solve linear systems. Among them, iterative methods use successive approximations to obtain more accurated solutions at each step. The Conjugate Gradient (CG) =-=[3]-=- is the oldest, best known, and most effective of the nonstationary iterative methods for symmetric positive definite systems. The convergence process can be speeded up by using a preconditionator bef... |

292 | The Fortran D Language Specification
- Fox, Hiranandani, et al.
- 1990
(Show Context)
Citation Context ...cular Fortran, to facilitate data parallel programming on a wide range of parallel architectures without sacrificing performance. Important results of this work are Vienna Fortran [10, 28], Fortran D =-=[15]-=- and High Performance Fortran (HPF) [18], which is intended to become a de-facto standard. These languages extend Fortran 77 and Fortran 90 with directives for specifying alignment and distribution of... |

244 |
A partitioning strategy for nonuniform problems on multiprocessors
- Berger, Bokhari
- 1987
(Show Context)
Citation Context ...tion significantly tends to reduce the time to compute the partition at the expense of a slight increase in communication time. Binary Recursive Decomposition (BRD), as proposed by Berger and Bokhari =-=[4]-=-, belongs to the last of these categories. BRD specifies a distribution algorithm where the sparse matrix A is recursively bisected, alternating vertical and horizontal partitioning steps until there ... |

112 | Run-time scheduling and execution of loops on message passing machines
- Saltz, Crowley, et al.
- 1990
(Show Context)
Citation Context ...his obviously enforces the global to local index translation to be also performed at runtime. To parallelize codes that use indirect addressing, compilers typically use an inspector-executor strategy =-=[22]-=-, where each loop accessing to distributed variables is tranformed by inserting an additional preprocessing loop, called an inspector. The inspector translates the global addresses accessed by the ind... |

58 |
Vienna Fortran: A Language Specification (Version 1.1
- Zima, Brezany, et al.
- 1991
(Show Context)
Citation Context ... languages, in particular Fortran, to facilitate data parallel programming on a wide range of parallel architectures without sacrificing performance. Important results of this work are Vienna Fortran =-=[10, 28]-=-, Fortran D [15] and High Performance Fortran (HPF) [18], which is intended to become a de-facto standard. These languages extend Fortran 77 and Fortran 90 with directives for specifying alignment and... |

48 | Extending HPF for Advanced Data Parallel Applications
- Chapman, Mehrotra, et al.
- 1994
(Show Context)
Citation Context ...plicit by appropriate new language elements, which can be seen as providing a special syntax for an important special case of a user-defined distribution function as defined in Vienna Fortran or HPF+ =-=[11, 12]-=-. The new language features provide the following information to the compiler and the runtime system: ffl The name, index domain, and element type of the sparse matrix are declared. This is done using... |

47 | A MIMD implementation of a parallel Euler solver for unstructured grids - Venkatakrishnan, Simon, et al. - 1992 |

43 |
The structure of Parafrase2: An advanced parallelizing compiler for C and Fortran
- Polychronopoulos, Girkar, et al.
- 1990
(Show Context)
Citation Context ...l class of user-defined data distributions, as proposed in Vienna Fortran and HPF+ [12]. On the other hand, in the area of the automatic parallelization, the most outstanding tools we know (Parafrase =-=[20]-=-, Polaris [6]) are not intended to be a framework for the parallelization of sparse algorithms such as those addressed in our present work. The methods proposed by Saltz et al. for handling irregular ... |

43 | Data Distributions for Sparse Matrix Vector Multiplication, Univ ersity - Romero, Zapata - 1999 |

35 |
Automatic data structure selection and transformation for sparse matrix computations
- Bik, Wijshoff
- 1996
(Show Context)
Citation Context ... a distributed data addressing table, and its associated overhead of memory [17]. In order to enable the compiler to apply more optimizations and simplify the task of the programmer, Bik and Wijshoff =-=[5]-=- have implemented a restructuring compiler which automatically converts programs operating on dense matrices into sparse code. This method postpones the selection of a data structure until the compila... |

24 | Parametric Binary Dissection
- Bokhari, Crockett, et al.
- 1993
(Show Context)
Citation Context ...s mapped to a unique processor. A more flexible variant of this algorithm produces partitions in which the shapes of the individual rectangles are optimized with respect to a user-determined function =-=[7]-=-. In this section, we define Multiple Recursive Decomposition (MRD), a generalization of the BRD method, which also improves the communication structure of the code. Definition 3 MRD Distribution We a... |

17 |
Sparse Block and Cyclic Data Distributions for Matrix Computations
- Asenjo, Romero, et al.
- 1994
(Show Context)
Citation Context ...ons and preserve a compact representation of matrices and vectors, thereby obtaining an efficient workload balance and minimizing communication. Some experiments in parallelizing sparse codes by hand =-=[2]-=-, not only confirmed the suitability of these distributions, but also the excessive amount of time spent during the development and debugging stages of manual parallelization. This encouraged us to bu... |

15 | Index array flattening through program transformation
- Das, Havlak, et al.
- 1995
(Show Context)
Citation Context ...s lookup. In general, application codes in irregular problems normally have code segments and loops with more complex access functions. The most advanced analysis technique, known as slicing analysis =-=[13]-=-, deal with multiple levels of indirection by transforming code that contains such references to code that contains only a single level of indirection. However, the multiple communication phases still... |

13 | Efficient Resolution of Sparse Indirections in Data-Parallel Compilers - Ujaldon, Zapata - 1995 |

12 |
Usersâ€™ Guide for the Harwell-Boeing Sparse
- Duff, Grimes, et al.
- 1992
(Show Context)
Citation Context ...ase of the sparse loops in the Conjugate Gradient algorithm. To account for the effects of different sparsity structures we chose two very different matrices coming from the Harwell-Boeing collection =-=[14]-=-, where they are identified as PSMIGR1 and BCSSTK29. The former contains population migration data and is relatively dense, whereas the latter is a very sparse matrix used in large eigenvalue problems... |

9 |
Numerical experiences with partitioning of unstructured meshes
- Hu, Blake
- 1992
(Show Context)
Citation Context ... Recursive Decomposition (MRD) Common approaches for partitioning unstructured meshes while keeping neighborhood properties are based upon coordinate bisection, graph bisection and spectral bisection =-=[8, 19]-=-. Spectral bisection minimizes communication, but requires huge tables to store the boundaries of each local region and an expensive algorithm to compute it. Graph bisection is algorithmically less ex... |

8 |
Vienna Fortran Compilation System, User's Guide
- Chapman, Benkner, et al.
- 1993
(Show Context)
Citation Context ...optimizations described in this paper if efficient target code is to be generated for a parallel system. There are a variety of languages and compilers targeted at distributed memory multiprocessors (=-=[28, 9, 15, 18]-=-). Some of them do not attempt to deal with loops that arise in sparse or irregular computation. One approach, originating from Fortran D and Vienna Fortran, is based on INDIRECT data distributions an... |

7 |
Performance Language Specification. Version 1.0
- High
- 1993
(Show Context)
Citation Context ...el programming on a wide range of parallel architectures without sacrificing performance. Important results of this work are Vienna Fortran [10, 28], Fortran D [15] and High Performance Fortran (HPF) =-=[18]-=-, which is intended to become a de-facto standard. These languages extend Fortran 77 and Fortran 90 with directives for specifying alignment and distribution of a program's data among the processors, ... |

7 |
Uysal A Manual for the CHAOS
- Saltz, Das, et al.
(Show Context)
Citation Context ...olution for translating CRSlike sparse indices at runtime within data-parallel compilers significantly reduces both time and memory overhead compared to the standard and general-purpose CHAOS library =-=[23]. This tec-=-hnique, that we have called "Sparse Array Rolling" (SAR), encapsulates into a small descriptor information of how the input matrix is distributed across the processors. This allows us to det... |

5 |
Value-Based Distributions in Fortran D
- Hanxleden, Kennedy, et al.
- 1994
(Show Context)
Citation Context ...memory. The major drawback of this approach is the large number of messages that are generated as a consequence of accessing a distributed data addressing table, and its associated overhead of memory =-=[17]-=-. In order to enable the compiler to apply more optimizations and simplify the task of the programmer, Bik and Wijshoff [5] have implemented a restructuring compiler which automatically converts progr... |

4 |
The Scheduling of Sparse Matrix-Vector Multiplication on a Massively Parallel DAP
- Andersen, Mitra, et al.
- 1992
(Show Context)
Citation Context ...ically with block length 1 (see Figure 4.b). Several variants for the representation of the distribution segment in this context are described in the literature, including the MM, ESS and BBS methods =-=[1]-=-. Here we consider a CRS sparse format, which results in the BRS (Block Row Scatter) distributed sparse representation. A very similar distributed representation is that of BCS (Block Column Scatter) ... |

3 |
Massively Parallel Methods for Engineering and
- Camp, Plimpton, et al.
- 1994
(Show Context)
Citation Context ... Recursive Decomposition (MRD) Common approaches for partitioning unstructured meshes while keeping neighborhood properties are based upon coordinate bisection, graph bisection and spectral bisection =-=[8, 19]-=-. Spectral bisection minimizes communication, but requires huge tables to store the boundaries of each local region and an expensive algorithm to compute it. Graph bisection is algorithmically less ex... |

3 |
Parallel Algorithms for Eigenvalues Computation with Sparse Matrices
- Trenas
- 1995
(Show Context)
Citation Context ... each iteration, both solution and residual vectors are updated. 4.3 Lanczos Algorithm Figure 6 illustrates an algorithm in extended HPF for the tridiagonalization of a matrix with the Lanczos method =-=[24]-=-. We use a new directive, indicated by !NSD$, to specify the required declarative information. The execution of the DISTRIBUTE directive results in the computation of the distributed sparse representa... |

2 |
publicated by Springer-Verlag, LNCS 892
- Blume, Eigenmann, et al.
- 1994
(Show Context)
Citation Context ...r-defined data distributions, as proposed in Vienna Fortran and HPF+ [12]. On the other hand, in the area of the automatic parallelization, the most outstanding tools we know (Parafrase [20], Polaris =-=[6]-=-) are not intended to be a framework for the parallelization of sparse algorithms such as those addressed in our present work. The methods proposed by Saltz et al. for handling irregular problems cons... |

2 |
User Defined Mappings in Vienna FORTRAN
- Chapman, Mehrota, et al.
- 1993
(Show Context)
Citation Context ...plicit by appropriate new language elements, which can be seen as providing a special syntax for an important special case of a user-defined distribution function as defined in Vienna Fortran or HPF+ =-=[11, 12]-=-. The new language features provide the following information to the compiler and the runtime system: ffl The name, index domain, and element type of the sparse matrix are declared. This is done using... |

2 |
On the evaluation of parallelization techniques for sparse applications
- Ujaldon, Sharma, et al.
- 1996
(Show Context)
Citation Context .... Here we consider a CRS sparse format, which results in the BRS (Block Row Scatter) distributed sparse representation. A very similar distributed representation is that of BCS (Block Column Scatter) =-=[26]-=-, where the sparse format is compressed by columns. just changing rows by columns and vice versa. The mapping which is established by the BRS choice requires complex auxiliary structures and translati... |