Results 1 - 10
of
59
Interprocedural Compilation of Fortran D for MIMD Distributed-Memory Machines
- COMMUNICATIONS OF THE ACM
, 1992
"... Algorithms exist for compiling Fortran D for MIMD distributed-memory machines, but are significantly restricted in the presence of procedure calls. This paper presents interprocedural analysis, optimization, and code generation algorithms for Fortran D that limit compilation to only one pass over ea ..."
Abstract
-
Cited by 300 (46 self)
- Add to MetaCart
Algorithms exist for compiling Fortran D for MIMD distributed-memory machines, but are significantly restricted in the presence of procedure calls. This paper presents interprocedural analysis, optimization, and code generation algorithms for Fortran D that limit compilation to only one pass over each procedure. This is accomplished by collecting summary information after edits, then compiling procedures in reverse topological order to propagate necessary information. Delaying instantiation of the computation partition, communication, and dynamic data decomposition is key to enabling interprocedural optimization. Recompilation analysis preserves the benefits of separate compilation. Empirical results show that interprocedural optimization is crucial in achieving acceptable performance for a common application.
Fortran D Language Specification
, 1990
"... This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution ..."
Abstract
-
Cited by 278 (47 self)
- Add to MetaCart
This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution functions. We believe that Fortran D provides a simple machine-independent programming model for most numerical computations. We intend to evaluate its usefulness for both programmers and advanced compilers on a variety of parallel architectures.
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers
- IEEE Transactions on Parallel and Distributed Systems
, 1992
"... An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper ..."
Abstract
-
Cited by 145 (17 self)
- Add to MetaCart
An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper, we present a novel approach to the problem of automatic data partitioning. We introduce the notion of constraints on data distribution, and show how, based on performance considerations, a compiler identifies constraints to be imposed on the distribution of various data structures. These constraints are then combined by the compiler to obtain a complete and consistent picture of the data distribution scheme, one that offers good performance in terms of the overall execution time. We present results of a study we performed on Fortran programs taken from the Linpack and Eispack libraries and the Perfect Benchmarks to determine the applicability of our approach to real programs. The results a...
Automatic Data Partitioning on Distributed Memory Multiprocessors
, 1991
"... An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper ..."
Abstract
-
Cited by 102 (6 self)
- Add to MetaCart
An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper, we present a novel approach to the problem of automatic data partitioning. We introduce the notion of constraints on data distribution, and show how, based on performance considerations, a compiler identifies constraints to be imposed on the distribution of various data structures. These constraints are then combined by the compiler to obtain a complete and consistent picture of the data distribution scheme, one that offers good performance in terms of the overall execution time.
Object Distribution in Orca using Compile-Time and Run-Time Techniques
, 1993
"... Orca is a language for parallel programming on distributed systems. Communication in Orca is based on shared data-objects, which is a form of distributed shared memory. The performance of Orca programs depends strongly on how shared dataobjects are distributed among the local physical memories of th ..."
Abstract
-
Cited by 77 (20 self)
- Add to MetaCart
Orca is a language for parallel programming on distributed systems. Communication in Orca is based on shared data-objects, which is a form of distributed shared memory. The performance of Orca programs depends strongly on how shared dataobjects are distributed among the local physical memories of the processors. This paper studies a new and efficient solution to this problem, based on an integration of compile-time and run-time techniques. The Orca compiler has been extended to determine the access patterns of processes to shared objects. The compiler passes a summary of this information to the run-time system, which uses it to make good decisions about which objects to replicate and where to store nonreplicated objects. Measurements show that the new system gives better overall performance than any previous implementation of Orca. 3333333333333333 1 This research was supported in part by a PIONIER grant from the Netherlands Organization for Scientific Research (N.W.O.). 2 This re...
Compiler Support for Machine-Independent Parallel Programming in Fortran D
, 1991
"... Because of the complexity and variety of parallel architectures, an efficient machine-independent parallel programming model is needed to make parallel computing truly usable for scientific programmers. We believe that Fortran D, a version of Fortran enhanced with data decomposition specifications, ..."
Abstract
-
Cited by 76 (16 self)
- Add to MetaCart
Because of the complexity and variety of parallel architectures, an efficient machine-independent parallel programming model is needed to make parallel computing truly usable for scientific programmers. We believe that Fortran D, a version of Fortran enhanced with data decomposition specifications, can provide such a programming model. This paper presents the design of a prototype Fortran D compiler for the iPSC/860, a MIMD distributed-memory machine. Issues addressed include data decomposition analysis, guard introduction, communications generation and optimization, program transformations, and storage assignment. A test suite of scientific programs will be used to evaluate the effectiveness of both the compiler technology and programming model for the Fortran D compiler.
Evaluating Compiler Optimizations For Fortran D
, 1994
"... The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empiric ..."
Abstract
-
Cited by 68 (4 self)
- Add to MetaCart
The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empirically evaluated for stencil computations. Communication optimizations reduce communication overhead by decreasing the number of messages and hide communication overhead by overlapping the cost of remaining messages with local computation. Parallelism optimizations exploit parallel and pipelined computations, and may need to restructure the computation to increase parallelism. Profitability formulas are derived for each optimization. Empirical results show that exploiting parallelism for pipelined computations, reductions, and scans is vital. Message vectorization, collective communication, and efficient coarse-grain pipelining also significantly affect performance. Scalability of communicatio...
Distributed memory compiler design for sparse problems
- IEEE Transactions on Computers
, 1995
"... This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accesses that make it impossible to precisely determine the reference pattern at compile time. This paper pro ..."
Abstract
-
Cited by 66 (10 self)
- Add to MetaCart
This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accesses that make it impossible to precisely determine the reference pattern at compile time. This paper proposes a run time support mechanism that is used e ectively by a compiler to generate e cient code in these situations. The compiler accepts as input aFortran 77 program enhanced with speci cations for distributing data, and outputs a message passing program that runs on the nodes of a distributed memory machine. The runtime support for the compiler consists of a library of primitives designed to support irregular patterns of distributed array accesses and irregularly distributed array partitions. Avariety of performance results on the Intel iPSC/860 are presented.
Runtime support and compilation methods for user-specified irregular data distributions
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1995
"... This paper describes two new ideas by which a High Performance Fortran compiler can deal with irregular computa-tions effectively. The first mechanism invokes a user specified mapping procedure via a set of proposed compiler directives. The directives allow use of program arrays to describe graph c ..."
Abstract
-
Cited by 55 (11 self)
- Add to MetaCart
This paper describes two new ideas by which a High Performance Fortran compiler can deal with irregular computa-tions effectively. The first mechanism invokes a user specified mapping procedure via a set of proposed compiler directives. The directives allow use of program arrays to describe graph connec-tivity, spatial location of array elements, and computational load. The second mechanism is a conservative method for compiling irregular loops in which dependence arises only due to reduction operations. This mechanism in many cases enables a compiler to recognize that it is possible to reuse previously computed infor-mation from inspectors (e.g., communication schedules, loop it-eration partitions, and information that associates off-processor data copies with on-processor buffer locations). This paper also presents performance results for these mechanisms from a For-tran 90D compiler implementation.
Run-time and compile-time support for adaptive irregular problems
- SUPERCOMPUTING’94
, 1994
"... In adaptive irregular problems the data arrays are accessed via indirection arrays, and data access patterns change during computation. Implementing such problems on distributed memory machines requires support for dynamic data partitioning, efficient preprocessing and fast data migration. This rese ..."
Abstract
-
Cited by 49 (9 self)
- Add to MetaCart
In adaptive irregular problems the data arrays are accessed via indirection arrays, and data access patterns change during computation. Implementing such problems on distributed memory machines requires support for dynamic data partitioning, efficient preprocessing and fast data migration. This research presents efficient runtime primitives for such problems. This new set of primitives is part of the CHAOS library. It subsumes the previous PARTI library which targeted only static irregular problems. To demonstrate the efficacy of the runtime support, two real adaptive irregular applications have been parallelized using CHAOS primitives: a molecular dynamics code (CHARMM) and a particle-in-cell code (DSMC). The paper also proposes extensions to Fortran D which can allow compilers to generate more efficient code for adaptive problems. These language extensions have been implemented in the Syracuse Fortran 90D/HPF prototype compiler. The performance of the compiler parallelized codes is compared with the hand parallelized versions.

