Results 1 -
5 of
5
A Data Parallel Finite Element Method for Computational Fluid Dynamics on the Connection Machine System
- on the Connection Machine System, Comput. Methods
, 1992
"... A finite element method for computational fluid dynamics has been implemented on the Connection Machine systems CM-2 and CM-200. An implicit iterative solution strategy, based on the preconditioned matrix-free GMRES algorithm, is employed. Parallel data structures built on both nodal and elemental s ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
A finite element method for computational fluid dynamics has been implemented on the Connection Machine systems CM-2 and CM-200. An implicit iterative solution strategy, based on the preconditioned matrix-free GMRES algorithm, is employed. Parallel data structures built on both nodal and elemental sets are used to achieve maximum parallelization. Communication primitives provided through the Connection Machine Scientific Software Library substantially improved the overall performance of the program. Computations of three-dimensional compressible flows using unstructured meshes having close to one million elements, such as a complete airplane, demonstrate that the Connection Machine systems are suitable for these applications. Performance comparisons are also carried out with the vector computers Cray Y-MP and Convex C-1. ii Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. T...
Loop Fusion in High Performance Fortran
- IN PROCEEDINGS OF THE 1998 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING
, 1998
"... In this paper we investigate a unique problem associated with fusing loops within a High Performance Fortran (HPF) program. In particular, we discuss the issue of performing loop fusion in an HPF compiler when compiling Fortran90 array assignment statements for execution on a distributedmemory machi ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
In this paper we investigate a unique problem associated with fusing loops within a High Performance Fortran (HPF) program. In particular, we discuss the issue of performing loop fusion in an HPF compiler when compiling Fortran90 array assignment statements for execution on a distributedmemory machine. During compilation of an HPF program, Fortran90 array assignment statements must be scalarized into loop nests. We show how a certain class of these loop nests, when fused, can cause problems for the compiler's distributed-memory code generator. We then present an algorithm which not only prevents the fusion of these loops, but also increases the amount of useful fusion that can be performed.
Optimizing Fortran90D/HPF for Distributed-Memory Computers
, 1997
"... High Performance Fortran (HPF), as well as its predecessor FortranD, has attracted considerable attention as a promising language for writing portable parallel programs for a wide variety of distributed-memory architectures. Programmers express data parallelism using Fortran90 array operations and u ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
High Performance Fortran (HPF), as well as its predecessor FortranD, has attracted considerable attention as a promising language for writing portable parallel programs for a wide variety of distributed-memory architectures. Programmers express data parallelism using Fortran90 array operations and use data layout directives to direct the partitioning of the data and computation among the processors of a parallel machine. For HPF to gain acceptance as a vehicle for parallel scientific programming, it must achieve high performance on problems for which it is well suited. To achieve high performance with an HPF program on a distributed-memory parallel machine, an HPF compiler must do a superb job of translating Fortran90 data-parallel array constructs into an efficient sequence of operations that minimize the overhead associated with data movement and also maximize data locality. This dissertation presents and analyzes a set of advanced optimizations designed to improve the execution perf...
Advanced Scalarization of Array Syntax
- Proceedings of the 9th International Conference on Compiler Construction (CC’2000
, 2000
"... One task of all Fortran 90 compilers is to scalarize the array syntax statements of a program into equivalent sequential code. Most compilers require multiple passes over the program source to ensure correctness of this translation, since their analysis algorithms only work on the scalarized for ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
One task of all Fortran 90 compilers is to scalarize the array syntax statements of a program into equivalent sequential code. Most compilers require multiple passes over the program source to ensure correctness of this translation, since their analysis algorithms only work on the scalarized form. These same compilers then make additional subsequent passes to perform loop optimizations such as loop fusion. In this paper we discuss a strategy that is capable of making advanced scalarization and fusion decisions at the array level. We present an analysis strategy that supports our advanced scalarizer, and we describe the bene ts of this methodology compared to the standard practice. Experimental results show that our strategy can signicantly improve the runtime performance of compiled code, while at the same time improving the performance of the compiler itself. 1
Optimizing Fortran 90D Programs for SIMD Execution
, 1993
"... SIMD architectures offer an alternative to MIMD architectures for obtaining high performance computation through parallelism. These architectures can offer impressive price/performance ratios for certain classes of problems. However, the effectiveness of such machines is greatly affected by the capa ..."
Abstract
- Add to MetaCart
SIMD architectures offer an alternative to MIMD architectures for obtaining high performance computation through parallelism. These architectures can offer impressive price/performance ratios for certain classes of problems. However, the effectiveness of such machines is greatly affected by the capabilities of the compilers which produce code for it. Current compilers have many weaknesses that introduce inefficiencies in the code that they produce. It is our thesis that advanced compiler techniques can produce more efficient SIMD code and exploit the massively parallel hardware closer to its full potential. To validate our thesis, we are designing and implementing compiler transformations that optimize computation and communication given the constraint of a single instruction stream. 1 Introduction Parallel computing has been becoming more and more popular as a method of obtaining high performance. This trend will continue as parallel computers become less expensive and more readily ...

