Results 1 - 10
of
11
Fortran D Language Specification
, 1990
"... This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution ..."
Abstract
-
Cited by 278 (47 self)
- Add to MetaCart
This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution functions. We believe that Fortran D provides a simple machine-independent programming model for most numerical computations. We intend to evaluate its usefulness for both programmers and advanced compilers on a variety of parallel architectures.
Automatic Data Partitioning on Distributed Memory Multiprocessors
, 1991
"... An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper ..."
Abstract
-
Cited by 102 (6 self)
- Add to MetaCart
An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper, we present a novel approach to the problem of automatic data partitioning. We introduce the notion of constraints on data distribution, and show how, based on performance considerations, a compiler identifies constraints to be imposed on the distribution of various data structures. These constraints are then combined by the compiler to obtain a complete and consistent picture of the data distribution scheme, one that offers good performance in terms of the overall execution time.
Tiling Multidimensional Iteration Spaces for Multicomputers
, 1992
"... This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of lo ..."
Abstract
-
Cited by 99 (20 self)
- Add to MetaCart
This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of loop iterations into tiles where the tiles execute atomically -- a processor executing the iterations belonging to a tile receives all the data it needs before executing any one of the iterations in the tile, executes all the iterations in the tile and then sends the data needed by other processors. Since synchronization is not allowed during the execution of a tile, partitioning the iteration space into tiles must not result in deadlock. We first show the equivalence between the problem of finding partitions and the problem of determining the cone for a given set of dependence vectors. We then present an approach to partitioning the iteration space into deadlock-free tiles so that communicati...
Compiler Support for Machine-Independent Parallel Programming in Fortran D
, 1991
"... Because of the complexity and variety of parallel architectures, an efficient machine-independent parallel programming model is needed to make parallel computing truly usable for scientific programmers. We believe that Fortran D, a version of Fortran enhanced with data decomposition specifications, ..."
Abstract
-
Cited by 76 (16 self)
- Add to MetaCart
Because of the complexity and variety of parallel architectures, an efficient machine-independent parallel programming model is needed to make parallel computing truly usable for scientific programmers. We believe that Fortran D, a version of Fortran enhanced with data decomposition specifications, can provide such a programming model. This paper presents the design of a prototype Fortran D compiler for the iPSC/860, a MIMD distributed-memory machine. Issues addressed include data decomposition analysis, guard introduction, communications generation and optimization, program transformations, and storage assignment. A test suite of scientific programs will be used to evaluate the effectiveness of both the compiler technology and programming model for the Fortran D compiler.
Automatic Data Layout for Distributed Memory Machines
, 1995
"... The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. Besides the algorithm selection, the data layout choice is the key intellectual challenge in writing an efficient program in such languages. The ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. Besides the algorithm selection, the data layout choice is the key intellectual challenge in writing an efficient program in such languages. The performance of a data layout depends on the target compilation system, the target machine, the problem size, and the number of available processors. This makes the choice of a good layout extremely difficult for most users of such languages. This thesis discusses the design and implementation of a data layout selection tool that generates Fortran D or HPF style data layout specifications automatically. Because the tool is not embedded in the target compiler and will be run only a few times during the tuning phase of an application, it can use techniques that may be considered too computationally expensive for inclusion in today's compilers. The proposed framework for automatic data layout s...
Tiling Multidimensional Iteration Spaces for Nonshared Memory Machines
- In Supercomputing 91
, 1991
"... This paper addresses the problem of compiling multiply nested loops for nonshared memory machines. The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of loop iterations into t ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
This paper addresses the problem of compiling multiply nested loops for nonshared memory machines. The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of loop iterations into tiles where the tiles execute atomically -- a processor executing the iterations belonging to a tile receives all the data it needs before executing any one of the iterations in the tile, executes all the iterations in the tile and then sends the data needed by other processors. Since synchronization is not allowed during the execution of a tile, partitioning the iteration space into tiles must not result in deadlock. We first show the equivalence between the problem of finding partitions and the problem of determining the cone for a given set of dependence vectors. We then present an approach to partitioning the iteration space into deadlock-free tiles so that communication volume is minimi...
A Methodology for High-Level Synthesis of Communication on Multicomputers
- In Proc. 6th ACM International Conference on Supercomputing, Washington D.C
, 1992
"... Freeing the user from the tedious task of generating explicit communication is one of the primary goals of numerous research projects on compilers for distributed memory machines. In the process of synthesis of communication, the effective use of collective communication routines offers a considerab ..."
Abstract
-
Cited by 32 (13 self)
- Add to MetaCart
Freeing the user from the tedious task of generating explicit communication is one of the primary goals of numerous research projects on compilers for distributed memory machines. In the process of synthesis of communication, the effective use of collective communication routines offers a considerable scope for improving the program performance. This paper presents a methodology for determining the collective communication primitives that should be used for implementing the data movement at various points in the program. We introduce the notion of certain synchronous properties between array references in statements inside loops, and present tests to determine the presence of these properties. These tests enable the compiler to analyze quite precisely the communication requirements of those statements, and implement communication using appropriate primitives. These results not only lay down a framework for synthesis of communication on multicomputers, they also form the basis of our im...
Compiling for Distributed Memory Architectures
- IEEE Transactions on Parallel and Distributed Systems
, 1992
"... this paper, we report on one such system. The cornerstone of our approach is to let the mapping of data on to processors drive process decomposition. The idea underlying our approach is to enable the programmer to write and debug his program in a high-level language using standard high-level abstrac ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
this paper, we report on one such system. The cornerstone of our approach is to let the mapping of data on to processors drive process decomposition. The idea underlying our approach is to enable the programmer to write and debug his program in a high-level language using standard high-level abstractions such as loops and arrays. In addition, he specifies the domain decomposition --- a mapping of the data structures on to the multiprocessor. In most programs we have looked at (such as matrix algorithms and SIMPLE[7]), this is quite straightforward since the programmer thinks naturally in terms of decompositions by columns, rows, blocks, and so on. Given this data decomposition, the compiler performs process decomposition by analyzing the program and specializing it, for each processor, to the data that resides on that processor. Thus, our approach to process decomposition is "data-driven" rather than "program-driven" as are more traditional approaches[1, 19].
Alignment Analysis within the VFCS - A Pragmatic Method for Supporting Data Distribution
, 1996
"... One of the major tasks in programming distributed memory multiprocessors with state of the art parallel languages is the specification of efficient data distribution schemes. Unfortunately, there are no optimal strategies for generating such data distributions and thus automatic support is very diff ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
One of the major tasks in programming distributed memory multiprocessors with state of the art parallel languages is the specification of efficient data distribution schemes. Unfortunately, there are no optimal strategies for generating such data distributions and thus automatic support is very difficult to provide; several heuristics have been proposed to provide some support to the user. In this paper we outline the Alignment Analysis Tool which has been implemented within the Vienna Fortran Compiler VFCS. The tool is able to provide help for finding a good distribution scheme in a pragmatic and general purpose way. It automatically generates alignment proposals for the arrays accessed in a procedure and thus simplifies the data distribution problem. Based upon the fundamental work by Li & Chen and Manish Gupta, our main contributions to the alignment problem are the extension of the set of alignment preferences detected as well as the simple but powerful heuristic for weighing align...
Optimizing Fortran 90D Programs for SIMD Execution
, 1993
"... SIMD architectures offer an alternative to MIMD architectures for obtaining high performance computation through parallelism. These architectures can offer impressive price/performance ratios for certain classes of problems. However, the effectiveness of such machines is greatly affected by the capa ..."
Abstract
- Add to MetaCart
SIMD architectures offer an alternative to MIMD architectures for obtaining high performance computation through parallelism. These architectures can offer impressive price/performance ratios for certain classes of problems. However, the effectiveness of such machines is greatly affected by the capabilities of the compilers which produce code for it. Current compilers have many weaknesses that introduce inefficiencies in the code that they produce. It is our thesis that advanced compiler techniques can produce more efficient SIMD code and exploit the massively parallel hardware closer to its full potential. To validate our thesis, we are designing and implementing compiler transformations that optimize computation and communication given the constraint of a single instruction stream. 1 Introduction Parallel computing has been becoming more and more popular as a method of obtaining high performance. This trend will continue as parallel computers become less expensive and more readily ...

