## Asynchronous Progressive Irregular Prefix Operation in HPF2

### BibTeX

@MISC{Brégier_asynchronousprogressive,

author = {Frédéric Brégier and Marie-christine Counilh and Jean Roman},

title = {Asynchronous Progressive Irregular Prefix Operation in HPF2},

year = {}

}

### OpenURL

### Abstract

(this file is intended for private use only) In this paper, we study one kind of irregular computation on distributed arrays, the irregular prefix operation, that is currently not well taken into account by the standard data-parallel language HPF2. We show a parallel implementation that efficiently takes advantage of the independent computations arising in this irregular operation. Our approach is based on the use of a directive which characterizes an irregular prefix operation and on inspector/executor support, implemented in the CoLuMBO library, which optimizes the execution by using an asynchronous communication scheme and then communication/computation overlap. We validate our contribution with results achieved on IBM SP2 for basic experiments and for a sparse Cholesky factorization algorithm applied to real size problems. KEY WORDS: HPF2, irregular application, prefix operation, run-time support, inspection/execution mechanism,

### Citations

534 |
Computer Solution of Large Sparse Positive Definite Systems
- George, Liu
- 1981
(Show Context)
Citation Context ...olesky Example We describe here the experimental results we have obtained for an important computation arising in many scientific and engineering applications: the sparse Cholesky block factorization =-=[8, 7, 1]-=-. In this application, each column block is updated by some column blocks to its left on the sparse matrix; the subset Bi ⊆ [1, i] of column blocks depends here on the sparsity structure of the matrix... |

106 | The Paradigm Compiler for Distributed-Memory Multicomputers
- Banerjee, Chandy, et al.
- 1995
(Show Context)
Citation Context ...tation iterations at the executor stage. Major works include PARTI [16] and CHAOS [11] libraries used in Vienna Fortran and Fortran90D [15] compilers, and PILAR library [12] used in PARADIGM compiler =-=[2]-=-. These libraries are based on agather/scatter approach and use the same optimized communication scheme on every (or at least many) iteration. So they do not address such asynchronous prefix operation... |

88 | Efficient support for irregular applications on distributed-memory machines
- Mukherjee, Sharma, et al.
- 1995
(Show Context)
Citation Context ...PI communication buffers. 2.5. Related Work To our knowledge, progressive irregular prefix operations have not been studied in the context of HPF or HPFlike languages as FortranD [16], Vienna Fortran =-=[14]-=- or HPF+ [3]. Inspector/executor paradigm has been widely used but for solving iterative irregular problems in which communication and computation phases alternate; indeed, in those kinds of applicati... |

67 | Distributed Memory Compiler Design for Sparse Problems
- Saltz, Wu, et al.
- 1991
(Show Context)
Citation Context ...e saturation of the MPI communication buffers. 2.5. Related Work To our knowledge, progressive irregular prefix operations have not been studied in the context of HPF or HPFlike languages as FortranD =-=[16]-=-, Vienna Fortran [14] or HPF+ [3]. Inspector/executor paradigm has been widely used but for solving iterative irregular problems in which communication and computation phases alternate; indeed, in tho... |

65 | High Performance Fortran Language Speci cation
- Forum
- 1996
(Show Context)
Citation Context ...eal size problems. KEY WORDS: HPF2, irregular application, prefix operation, run-time support, inspection/execution mechanism, loop-carried dependencies 1. Introduction High Performance Fortran (HPF2 =-=[10]-=-), the standard language for writing data parallel programs, is quite efficient for regular applications. Nevertheless, efficiency is still a great challenge when irregular applications are considered... |

44 | Runtime and language support for compiling adaptive irregular programs on distributedmemory machines
- Hwang
- 1995
(Show Context)
Citation Context ...ose kinds of applications, the cost of optimizations performed at the inspector stage can be amortized over many computation iterations at the executor stage. Major works include PARTI [16] and CHAOS =-=[11]-=- libraries used in Vienna Fortran and Fortran90D [15] compilers, and PILAR library [12] used in PARADIGM compiler [2]. These libraries are based on agather/scatter approach and use the same optimized ... |

32 |
ADAPTOR - A Transformation Tool for HPF Programs
- Brandes, Zimmermann
- 1994
(Show Context)
Citation Context ...ng [6] and irregular prefix operations. We currently use these libraries by writing HPF2 codes with explicit calls to TriDenT and CoLuMBO primitives. Then, we use the HPF compilation platform ADAPTOR =-=[5]-=- to obtain the final SPMD codes. Matrix C.B. Col. NNZ NOp Av. Coeff. BCSSTK32 4286 44609 5.5 M 1.3 GFlop 0.368% GRID 511 8216 261121 12 M 2.5 GFlop 0.121% OILPAN 8024 73752 9.5 M 3.35 GFlop 0.129% CUB... |

25 |
Exploiting Spatial Regularity in Irreg- ular Iterative Applications
- Lain, Banerjee
- 1995
(Show Context)
Citation Context ...mmunication scheme on every (or at least many) iteration. So they do not address such asynchronous prefix operation since each iteration of the PREFIX loop has its own communication scheme. The PILAR =-=[13]-=- library uses sections as minimal inspected elements, as our CoLuMBO library does. This issnecessary in order to produce an efficient inspector in most cases, since the minimal element considering in ... |

18 |
A comparison of three column based distributed sparse factorization schemes
- Ashcraft, Eisenstat, et al.
- 1991
(Show Context)
Citation Context ...olesky Example We describe here the experimental results we have obtained for an important computation arising in many scientific and engineering applications: the sparse Cholesky block factorization =-=[8, 7, 1]-=-. In this application, each column block is updated by some column blocks to its left on the sparse matrix; the subset Bi ⊆ [1, i] of column blocks depends here on the sparsity structure of the matrix... |

12 | A Mapping and Scheduling Algorithm for Parallel Sparse Fan-In Numerical Factorization
- Hénon, Ramet, et al.
- 1999
(Show Context)
Citation Context ... version (Program 7) is closed to Program 5 using a specific inspection phase and an asynchronous communication scheme; • the MPI version refers to a hand-made high-optimized MPI version presented in =-=[9]-=-. The FIR and FIP versions use other optimizations not presented in this paper for handling irregular applications in HPF2. In both versions, the sparse matrix data structure is represented by using t... |

8 | Compiler and Run-time Support for Irregular Computations
- Lain
- 1997
(Show Context)
Citation Context ...an be amortized over many computation iterations at the executor stage. Major works include PARTI [16] and CHAOS [11] libraries used in Vienna Fortran and Fortran90D [15] compilers, and PILAR library =-=[12]-=- used in PARADIGM compiler [2]. These libraries are based on agather/scatter approach and use the same optimized communication scheme on every (or at least many) iteration. So they do not address such... |

7 |
HPF+: High Performance Fortran for advanced scientific and engineering applications. Future Generation Computer Systems
- Benkner
- 1999
(Show Context)
Citation Context ...ion buffers. 2.5. Related Work To our knowledge, progressive irregular prefix operations have not been studied in the context of HPF or HPFlike languages as FortranD [16], Vienna Fortran [14] or HPF+ =-=[3]-=-. Inspector/executor paradigm has been widely used but for solving iterative irregular problems in which communication and computation phases alternate; indeed, in those kinds of applications, the cos... |

3 | Contribution to better handling of irregular problems in HPF2
- Brandes, Bregier, et al.
- 1998
(Show Context)
Citation Context ...ther optimizations not presented in this paper for handling irregular applications in HPF2. In both versions, the sparse matrix data structure is represented by using the tree notation, introduced in =-=[4]-=-, based on the derived data type of Fortran 90. This notation avoids indirect data accesses coming from the standard irregular programming style, so that both compile-time and run-times techniques can... |

1 |
Scheduling loops with partial loop-carried dependencies
- Brégier, Counilh, et al.
- 2000
(Show Context)
Citation Context ...nd can be executed in parallel (or in any order). The goal of our task scheduler is to order the task execution so as to minimize the global execution time while respecting the dependence constraints =-=[6]-=-. !HPF$ SCHEDULE (J = 1:K-1, bj = 1:NB(J), & ANY(A(J)%BCOL(bj)%BLOC) in A(K)%BCOL(1)%BLOC) DO K = 1, NB_BLOC_COL !HPF$ ON HOME (A(J), J = 1:K-1, & ANY(A(J)%BCOL(:)%BLOC) in A(K)%BCOL(1)%BLOC) & , BEGI... |

1 |
et al. Parallel Algorithms for Matrix Computations
- G
- 1990
(Show Context)
Citation Context ...olesky Example We describe here the experimental results we have obtained for an important computation arising in many scientific and engineering applications: the sparse Cholesky block factorization =-=[8, 7, 1]-=-. In this application, each column block is updated by some column blocks to its left on the sparse matrix; the subset Bi ⊆ [1, i] of column blocks depends here on the sparsity structure of the matrix... |