Results 1 - 10
of
89
Fortran D Language Specification
, 1990
"... This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution ..."
Abstract
-
Cited by 278 (47 self)
- Add to MetaCart
This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution functions. We believe that Fortran D provides a simple machine-independent programming model for most numerical computations. We intend to evaluate its usefulness for both programmers and advanced compilers on a variety of parallel architectures.
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers
- IEEE Transactions on Parallel and Distributed Systems
, 1992
"... An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper ..."
Abstract
-
Cited by 145 (17 self)
- Add to MetaCart
An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper, we present a novel approach to the problem of automatic data partitioning. We introduce the notion of constraints on data distribution, and show how, based on performance considerations, a compiler identifies constraints to be imposed on the distribution of various data structures. These constraints are then combined by the compiler to obtain a complete and consistent picture of the data distribution scheme, one that offers good performance in terms of the overall execution time. We present results of a study we performed on Fortran programs taken from the Linpack and Eispack libraries and the Perfect Benchmarks to determine the applicability of our approach to real programs. The results a...
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1991
"... This paper addresses the problem of partitioning data for distributed memory machines (multicomputers). In current day multicomputers, interprocessor communication is more time-consuming than instruction execution. If insufficient attention is paid to the data allocation problem, then the amount of ..."
Abstract
-
Cited by 81 (13 self)
- Add to MetaCart
This paper addresses the problem of partitioning data for distributed memory machines (multicomputers). In current day multicomputers, interprocessor communication is more time-consuming than instruction execution. If insufficient attention is paid to the data allocation problem, then the amount of time spent in interprocessor communication might be so high as to seriously undermine the benefits of parallelism. It is therefore worthwhile for a compiler to analyze patterns of data usage to determine allocation, in order to minimize interprocessor communication. We present a machineindependent analysis of communication-free partitions. We present a matrix notation to describe array accesses in fully parallel loops which lets us derive sufficient conditions for communication-free partitioning (decomposition) of arrays. In the case of a commonly occurring class of accesses, we present a problem formulation to minimize communication costs, when communication-free partitioning of arrays is not possible.
Compiler Support for Machine-Independent Parallel Programming in Fortran D
, 1991
"... Because of the complexity and variety of parallel architectures, an efficient machine-independent parallel programming model is needed to make parallel computing truly usable for scientific programmers. We believe that Fortran D, a version of Fortran enhanced with data decomposition specifications, ..."
Abstract
-
Cited by 76 (16 self)
- Add to MetaCart
Because of the complexity and variety of parallel architectures, an efficient machine-independent parallel programming model is needed to make parallel computing truly usable for scientific programmers. We believe that Fortran D, a version of Fortran enhanced with data decomposition specifications, can provide such a programming model. This paper presents the design of a prototype Fortran D compiler for the iPSC/860, a MIMD distributed-memory machine. Issues addressed include data decomposition analysis, guard introduction, communications generation and optimization, program transformations, and storage assignment. A test suite of scientific programs will be used to evaluate the effectiveness of both the compiler technology and programming model for the Fortran D compiler.
Nonlinear Array Layouts for Hierarchical Memory Systems
, 1999
"... Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, non-programmable array attribute. ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, non-programmable array attribute. In reality, modern memory systems are architecturally hierarchical rather than flat, with substantial differences in performance among different levels of the hierarchy. This mismatch between the model and the true architecture of memory systems can result in low locality of reference and poor performance. Some of this loss in performance can be recovered by re-ordering computations using transformations such as loop tiling. We explore nonlinear array layout functions as an additional means of improving locality of reference. For a benchmark suite composed of dense matrix kernels, we show by timing and simulation that two specific layouts (4D and Morton) have low implementation costs (2--5% of total running time) and high performance benefits (reducing execution time by factors of 1.1-2.5); that they have smooth performance curves, both across a wide range of problem sizes and over representative cache architectures; and that recursion-based control structures may be needed to fully exploit their potential.
An Overview of the Fortran D Programming System
- IN PROCEEDINGS OF THE FOURTH WORKSHOP ON LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING
, 1991
"... The success of large-scale parallel architectures is limited by the difficulty of developing machine-independent parallel programs. We have developed Fortran D, a version of Fortran extended with data decomposition specifications, to provide a portable data-parallel programming model. This paper pre ..."
Abstract
-
Cited by 66 (16 self)
- Add to MetaCart
The success of large-scale parallel architectures is limited by the difficulty of developing machine-independent parallel programs. We have developed Fortran D, a version of Fortran extended with data decomposition specifications, to provide a portable data-parallel programming model. This paper presents the design of two key components of the Fortran D programming system: a prototype compiler and an environment to assist automatic data decomposition. The Fortran D compiler addresses program partitioning, communication generation and optimization, data decomposition analysis, run-time support for unstructured computations, and storage management. The Fortran D programming environment provides a static performance estimator and an automatic data partitioner. We believe that the Fortran D programming system will significantly ease the task of writing machine-independent data-parallel programs.
Automatic Data Layout for High-Performance Fortran
- IN PROCEEDINGS OF SUPERCOMPUTING '95
, 1994
"... High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet ecient machine independent parallel programming model. Besides the algorithm selection, the data layout choice is the key intellectual step in writing an ec ..."
Abstract
-
Cited by 66 (3 self)
- Add to MetaCart
High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet ecient machine independent parallel programming model. Besides the algorithm selection, the data layout choice is the key intellectual step in writing an ecient HPF program. The developers of HPF did not believe that data layouts can be determined automatically in all cases. Therefore HPF requires the user to specify the data layout. It is the task of the HPF compiler to generate ecient code for the user supplied data layout. The choice
A Static Parameter based Performance Prediction Tool for Parallel Programs
, 1993
"... This paper presents a Parameter based Performance Prediction Tool (P 3 T ) which is part of the Vienna Fortran Compilation System (VFCS), a compiler that automatically translates Fortran programs into message passing programs for massively parallel architectures. The P 3 T is applied to an expli ..."
Abstract
-
Cited by 61 (10 self)
- Add to MetaCart
This paper presents a Parameter based Performance Prediction Tool (P 3 T ) which is part of the Vienna Fortran Compilation System (VFCS), a compiler that automatically translates Fortran programs into message passing programs for massively parallel architectures. The P 3 T is applied to an explicitly parallel program generated by the VFCS, which may contain synchronous as well as asynchronous communication and is attributed with parameters computed in a previous profiling run. It statically computes a set of optional parameters that characterize the behavior of the parallel program. This includes work distribution, the number of data transfers, the amount of data transferred, transfer times, network contention, and the number of cache misses. These parameters can be selectively determined for statements, loops, procedures, and the entire program; furthermore, their effect with respect to individual processors can be examined. The tool plays an important role in the VFCS by providin...
The Alignment-Distribution Graph
- In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing
, 1993
"... Implementing a data-parallel language such as Fortran 90 on a distributed-memory parallel computer requires distributing aggregate data objects (such as arrays) among the memory modules attached to the processors. The mapping of objects to the machine determines the amount of residual communicatio ..."
Abstract
-
Cited by 61 (3 self)
- Add to MetaCart
Implementing a data-parallel language such as Fortran 90 on a distributed-memory parallel computer requires distributing aggregate data objects (such as arrays) among the memory modules attached to the processors. The mapping of objects to the machine determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. We present a program representation called the alignmentdistribution graph that makes these communication requirements explicit. We describe the details of the representation, show how to model communication cost in this framework, and outline several algorithms for determining object mappings that approximately minimize residual communication.
Automatic Data Layout Using 0-1 Integer Programming
- In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT94
, 1994
"... : The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. By shifting much of the burden of machine-dependent optimization to the compiler, the programmer is able to write data-parallel programs that ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
: The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. By shifting much of the burden of machine-dependent optimization to the compiler, the programmer is able to write data-parallel programs that can be compiled and executed with good performance on many different architectures. However, the choice of a good data layout is still left to the programmer. Even the most sophisticated compiler may not be able to compensate for a poorly chosen data layout since many compiler decisions are driven by the data layout specified in the program. The choice of a good data layout depends on many factors, including the target machine architecture, the compilation system, the problem size, and the number of processors available. The option of remapping arrays at specific points in the program makes the choice even harder. Current programming tools provide little or no support for this difficult sele...

