Results 1 -
4 of
4
Using Integer Sets for Data-Parallel Program Analysis and Optimization
- In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation
, 1998
"... In this paper, we describe our experience with using an abstract integer-set framework to develop the Rice dHPF compiler, a compiler for High Performance Fortran. We present simple, yet general formulations of the major computation partitioning and communication analysis tasks as well as a number of ..."
Abstract
-
Cited by 54 (29 self)
- Add to MetaCart
In this paper, we describe our experience with using an abstract integer-set framework to develop the Rice dHPF compiler, a compiler for High Performance Fortran. We present simple, yet general formulations of the major computation partitioning and communication analysis tasks as well as a number of important optimizations in terms of abstract operations on sets of integer tuples. This approach has made it possible to implement a comprehensive collection of advanced optimizations in dHPF, and to do so in the context of a more general computation partitioning model than previous compilers. One potential limitation of the approach is that the underlying class of integer set problems is fundamentally unable to represent HPF data distributions on a symbolic number of processors. We describe how we extend the approach to compile codes for a symbolic number of processors, without requiring any changes to the set formulations for the above optimizations. We show experimentally that the set re...
Advanced Code Generation for High Performance Fortran
- In Languages, Compilation Techniques and Run Time Systems for Scalable Parallel Systems, Lecture Notes in Computer Science Series
"... this paper, we describe techniques developed in the Rice dHPF compiler to address key code generation challenges that arise in achieving high performance for regular applications on message-passing systems. We focus on techniques required to implement advanced optimizations and to achieve consistent ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
this paper, we describe techniques developed in the Rice dHPF compiler to address key code generation challenges that arise in achieving high performance for regular applications on message-passing systems. We focus on techniques required to implement advanced optimizations and to achieve consistently high performance with existing optimizations. Many of the core communication analysis and code generation algorithms in dHPF are expressed in terms of abstract equations manipulating integer sets. This approach enables general and yet simple implementations of sophisticated optimizations, making it more practical to include a comprehensive set of optimizations in data-parallel compilers. It also enables the compiler to support much more aggressive computation partitioning algorithms than in previous compilers. We therefore believe this approach can provide higher and more consistent levels of performance than are available today. 1. Introduction
Semi-lagrangian formulations with automatic code generation for environmental modeling
- In Proceedings of the 19th ACM Symposium on Applied Computing (SAC’04
, 2004
"... An import issue for numerical weather prediction modes (NWP) is the time it takes to produce a valid forecast. One factor, which greatly influences this simulation time is the size of the time step. However, time step size is often limited by the numerical stability of the used advection schemes. Av ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
An import issue for numerical weather prediction modes (NWP) is the time it takes to produce a valid forecast. One factor, which greatly influences this simulation time is the size of the time step. However, time step size is often limited by the numerical stability of the used advection schemes. Available schemes include semiimplicit Eulerian and semi-Lagrangian schemes. In principal, semi-Lagrangian formulations result in irregular communications on parallel architectures. In this paper we describe automatic code generation for a semi-implicit scheme with a semi-Lagrangian formulation. We describe how code can be generated from a mathematical specification of the advection model, the embedding of the formulations in the CTADEL code generation tool and we show the parallelization of the code. Finally, we show results from preliminary experiments we have conducted with the generated code and the reference code from a production NWP on a number of different architectures.
Design and Evaluation of a Computation Partitioning Framework for Data-Parallel Compilers
, 2001
"... this paper, we present the design and evaluation of a flexible computation partitioning framework used in the Rice dHPF compiler for High Performance Fortran. Our CP framework supports a more general class of static computation partitionings than previous data-parallel compilers, enables sophisticat ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
this paper, we present the design and evaluation of a flexible computation partitioning framework used in the Rice dHPF compiler for High Performance Fortran. Our CP framework supports a more general class of static computation partitionings than previous data-parallel compilers, enables sophisticated partitionings that maximize parallelism in the presence of arbitrary control flow, and supports several novel optimizations that have proven essential for obtaining high overall performance when parallelizing scientific programs. In earlier work, we have shown that the dHPF compiler is able to effectively parallelize HPF versions of existing Fortran codes and achieve speedups that are comparable with hand-coded parallel performance [2, 1]. For example, code generated by dHPF for the NAS application benchmarks SP and BT is within 0--21% of the performance of sophisticated hand-coded message-passing versions of the codes, and these results are achieved with HPF versions that require changes to fewer than 6% of the lines of the original serial codes. Three new CP-based optimizations in the dHPF compiler were key to achieving this level of performance. Two of these three optimizations, along with another new algorithm presented in this paper, require the full generality of our CP framework and could not be implemented in any other existing compiler that we are aware of. In data distribution-based languages such as Fortran D [13], Vienna Fortran [11], and High Performance Fortran (HPF) [20], it is natural to represent a CP for a statement instance as the set of processor(s) that "own" a particular array element or scalar variable (by virtue of a data distribution). For example, the widely-used owner-computes rule [25] (a simple heuristic for computation partitioning selection) ...

