Results 1 - 10
of
222
Interprocedural Compilation of Fortran D for MIMD Distributed-Memory Machines
- COMMUNICATIONS OF THE ACM
, 1992
"... Algorithms exist for compiling Fortran D for MIMD distributed-memory machines, but are significantly restricted in the presence of procedure calls. This paper presents interprocedural analysis, optimization, and code generation algorithms for Fortran D that limit compilation to only one pass over ea ..."
Abstract
-
Cited by 300 (46 self)
- Add to MetaCart
Algorithms exist for compiling Fortran D for MIMD distributed-memory machines, but are significantly restricted in the presence of procedure calls. This paper presents interprocedural analysis, optimization, and code generation algorithms for Fortran D that limit compilation to only one pass over each procedure. This is accomplished by collecting summary information after edits, then compiling procedures in reverse topological order to propagate necessary information. Delaying instantiation of the computation partition, communication, and dynamic data decomposition is key to enabling interprocedural optimization. Recompilation analysis preserves the benefits of separate compilation. Empirical results show that interprocedural optimization is crucial in achieving acceptable performance for a common application.
A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers
, 1992
"... This paper describes ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level LAS as building blocks, and an ob ect-based interface to the library routines. The square block s ..."
Abstract
-
Cited by 151 (33 self)
- Add to MetaCart
This paper describes ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level LAS as building blocks, and an ob ect-based interface to the library routines. The square block scattered decomposition is described. The implementation of a distributed memory version of the right-looking LU factorization algorithm on the Intel Delta multicomputer is discussed, and performance results are presented that demonstrated the scalability of the algorithm.
ADIFOR -- Generating Derivative Codes from Fortran Programs
, 1991
"... The numerical methods employed in the solution of many scientific computing problems require the computation of derivatives of a function f : R n !R m . Both the accuracy and the computational requirements of the derivative computation are usually of critical importance for the robustness and sp ..."
Abstract
-
Cited by 135 (53 self)
- Add to MetaCart
The numerical methods employed in the solution of many scientific computing problems require the computation of derivatives of a function f : R n !R m . Both the accuracy and the computational requirements of the derivative computation are usually of critical importance for the robustness and speed of the numerical solution. ADIFOR (Automatic Differentiation In FORtran) is a source transformation tool that accepts Fortran 77 code for the computation of a function and writes portable Fortran 77 code for the computation of the derivatives. In contrast to previous approaches, ADIFOR views automatic differentiation as a source transformation problem. ADIFOR employs the data analysis capabilities of the ParaScope Parallel Programming Environment, which enable us to handle arbitrary Fortran 77 codes and to exploit the computational context in the computation of derivatives. Experimental results show that ADIFOR can handle real-life codes and that ADIFOR-generated codes are competitive wit...
Fortran M: A Language for Modular Parallel Programming
- Journal of Parallel and Distributed Computing
, 1992
"... Fortran M is a small set of extensions to Fortran 77 that supports a modular approach to the design of message-passing programs. It has the following features. (1) Modularity. Programs are constructed by using explicitly-declared communication channels to plug together program modules called process ..."
Abstract
-
Cited by 131 (24 self)
- Add to MetaCart
Fortran M is a small set of extensions to Fortran 77 that supports a modular approach to the design of message-passing programs. It has the following features. (1) Modularity. Programs are constructed by using explicitly-declared communication channels to plug together program modules called processes. A process can encapsulate common data, subprocesses, and internal communication. (2) Safety. Operations on channels are restricted so as to guarantee deterministic execution, even in dynamic computations that create and delete processes and channels. Channels are typed, so a compiler can check for correct usage. (3) Architecture Independence. The mapping of processes to processors can be specified with respect to a virtual computer with size and shape different from that of the target computer. Mapping is specified by annotations that influence performance but not correctness. (4) Efficiency. Fortran M can be compiled efficiently for uniprocessors, sharedmemory computers, distributed-m...
ParaScope: a parallel programming environment
- PROCEEDINGS OF THE IEEE
, 1993
"... The ParaScope parallel programming environment developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope’s compilation system, its pa ..."
Abstract
-
Cited by 120 (33 self)
- Add to MetaCart
The ParaScope parallel programming environment developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope’s compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by proiiding a variety of interactive program tran.formation.s that are effective in exposing parallelism. The debugging svstem detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis. program instrumentation. and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in Fortran D, a machine-independent parallel pro-gramming language intended for use with both distributed-memory and shared-memory parallel computers..
Compiler Optimizations for Fortran D on MIMD Distributed-Memory Machines
- In Proceedings of the 1992 ACM International Conference on Supercomputing
, 1991
"... Massively parallel MIMD distributed-memory machines can provide enormous computation power. However, the difficulty of developing parallel programs for these machines has limited their accessibility. This paper presents compiler algorithms to automatically derive efficient message-passing programs b ..."
Abstract
-
Cited by 96 (13 self)
- Add to MetaCart
Massively parallel MIMD distributed-memory machines can provide enormous computation power. However, the difficulty of developing parallel programs for these machines has limited their accessibility. This paper presents compiler algorithms to automatically derive efficient message-passing programs based on data decompositions. Optimizations are presented to minimize load imbalance and communication costs for both loosely synchronous and pipelined loops. These techniques are employed in the compiler being developed at Rice University for Fortran D, a version of Fortran enhanced with data decomposition specifications. 1 Introduction It is widely recognized that parallel computing represents the only plausible way to continue to increase the computational power available to computational scientists and engineers. However, parallel computers are not likely to be widely successful until they are easy to program. A major component in the success of vector supercomputers is the ability of ...
Object Distribution in Orca using Compile-Time and Run-Time Techniques
, 1993
"... Orca is a language for parallel programming on distributed systems. Communication in Orca is based on shared data-objects, which is a form of distributed shared memory. The performance of Orca programs depends strongly on how shared dataobjects are distributed among the local physical memories of th ..."
Abstract
-
Cited by 77 (20 self)
- Add to MetaCart
Orca is a language for parallel programming on distributed systems. Communication in Orca is based on shared data-objects, which is a form of distributed shared memory. The performance of Orca programs depends strongly on how shared dataobjects are distributed among the local physical memories of the processors. This paper studies a new and efficient solution to this problem, based on an integration of compile-time and run-time techniques. The Orca compiler has been extended to determine the access patterns of processes to shared objects. The compiler passes a summary of this information to the run-time system, which uses it to make good decisions about which objects to replicate and where to store nonreplicated objects. Measurements show that the new system gives better overall performance than any previous implementation of Orca. 3333333333333333 1 This research was supported in part by a PIONIER grant from the Netherlands Organization for Scientific Research (N.W.O.). 2 This re...
Models of Machines and Computation for Mapping in Multicomputers
, 1993
"... It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework whereby results in the field can be compared. Nor is it always easy to assess the relevance of a new result to a particular problem. Furthermore, changes in parallel computing technology have made some of the earlier work of less relevance to current multiprocessor systems. Versions of the mapping problem are classified, and research in the field is considered in terms of its relevance to the problem of programming currently available hardware in the form of a distributed memory multiple instruction stream multiple data stream computer: a multicomputer.
Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1996
"... Array statements are often used to express data-parallelism in scientific languages such as Fortran 90 and High Performance Fortran. In compiling array statements for a distributed-memory machine, efficient generation of communication sets and local index sets is important. We show that for arrays ..."
Abstract
-
Cited by 69 (5 self)
- Add to MetaCart
Array statements are often used to express data-parallelism in scientific languages such as Fortran 90 and High Performance Fortran. In compiling array statements for a distributed-memory machine, efficient generation of communication sets and local index sets is important. We show that for arrays distributed block-cyclically on multiple processors, the local memory access sequence and communication sets can be efficiently enumerated as closed forms using regular sections. First, closed form solutions are presented for arrays that are distributed using block or cyclic distributions. These closed forms are then used with a virtual processor approach to give an efficient solution for arrays with block-cyclic distributions. This approach is based on viewing a block-cyclic distribution as a block (or cyclic) distribution on a set of virtual processors, which are cyclically (or block-wise) mapped to physical processors. These views are referred to as virtual-block or virtual-cyclic...
Evaluating Compiler Optimizations For Fortran D
, 1994
"... The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empiric ..."
Abstract
-
Cited by 68 (4 self)
- Add to MetaCart
The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empirically evaluated for stencil computations. Communication optimizations reduce communication overhead by decreasing the number of messages and hide communication overhead by overlapping the cost of remaining messages with local computation. Parallelism optimizations exploit parallel and pipelined computations, and may need to restructure the computation to increase parallelism. Profitability formulas are derived for each optimization. Empirical results show that exploiting parallelism for pipelined computations, reductions, and scans is vital. Message vectorization, collective communication, and efficient coarse-grain pipelining also significantly affect performance. Scalability of communicatio...

