Results 1 - 10
of
175
Interprocedural Compilation of Fortran D for MIMD Distributed-Memory Machines
- COMMUNICATIONS OF THE ACM
, 1992
"... Algorithms exist for compiling Fortran D for MIMD distributed-memory machines, but are significantly restricted in the presence of procedure calls. This paper presents interprocedural analysis, optimization, and code generation algorithms for Fortran D that limit compilation to only one pass over ea ..."
Abstract
-
Cited by 333 (49 self)
- Add to MetaCart
Algorithms exist for compiling Fortran D for MIMD distributed-memory machines, but are significantly restricted in the presence of procedure calls. This paper presents interprocedural analysis, optimization, and code generation algorithms for Fortran D that limit compilation to only one pass over each procedure. This is accomplished by collecting summary information after edits, then compiling procedures in reverse topological order to propagate necessary information. Delaying instantiation of the computation partition, communication, and dynamic data decomposition is key to enabling interprocedural optimization. Recompilation analysis preserves the benefits of separate compilation. Empirical results show that interprocedural optimization is crucial in achieving acceptable performance for a common application.
Fortran D Language Specification
, 1990
"... This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution ..."
Abstract
-
Cited by 303 (50 self)
- Add to MetaCart
This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution functions. We believe that Fortran D provides a simple machine-independent programming model for most numerical computations. We intend to evaluate its usefulness for both programmers and advanced compilers on a variety of parallel architectures.
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers
- IEEE Transactions on Parallel and Distributed Systems
, 1992
"... An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper ..."
Abstract
-
Cited by 160 (17 self)
- Add to MetaCart
(Show Context)
An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper, we present a novel approach to the problem of automatic data partitioning. We introduce the notion of constraints on data distribution, and show how, based on performance considerations, a compiler identifies constraints to be imposed on the distribution of various data structures. These constraints are then combined by the compiler to obtain a complete and consistent picture of the data distribution scheme, one that offers good performance in terms of the overall execution time. We present results of a study we performed on Fortran programs taken from the Linpack and Eispack libraries and the Perfect Benchmarks to determine the applicability of our approach to real programs. The results a...
ParaScope: a parallel programming environment
- PROCEEDINGS OF THE IEEE
, 1993
"... The ParaScope parallel programming environment developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope’s compilation system, its pa ..."
Abstract
-
Cited by 134 (34 self)
- Add to MetaCart
(Show Context)
The ParaScope parallel programming environment developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope’s compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by proiiding a variety of interactive program tran.formation.s that are effective in exposing parallelism. The debugging svstem detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis. program instrumentation. and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in Fortran D, a machine-independent parallel pro-gramming language intended for use with both distributed-memory and shared-memory parallel computers..
Automatic Array Privatization
- IN UTPAL BANERJEEDAVID GELERNTERALEX NICOLAUDAVID PADUA, EDITOR, PROC. SIXTH WORKSHOP ON LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING
, 1993
"... Array privatization is one of the most effective transformations for the exploitation of parallelism. In this paper, we present a technique for automatic array privatization. Our algorithm uses data flow analysis of array references to identify privatizable arrays intraprocedurally as well as in ..."
Abstract
-
Cited by 130 (26 self)
- Add to MetaCart
Array privatization is one of the most effective transformations for the exploitation of parallelism. In this paper, we present a technique for automatic array privatization. Our algorithm uses data flow analysis of array references to identify privatizable arrays intraprocedurally as well as interprocedurally. It employs static and dynamic resolution to determine the last value of a lived private array.We compare the result of automatic array privatization with that of manual array privatization and identify directions for future improvement. To enhance the effectiveness of our algorithm, wedevelop a goal directly technique to analysis symbolic variables in the present of conditional statements, loops and index arrays.
Tiling Multidimensional Iteration Spaces for Multicomputers
, 1992
"... This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of lo ..."
Abstract
-
Cited by 109 (21 self)
- Add to MetaCart
This paper addresses the problem of compiling perfectly nested loops for multicomputers (distributed memory machines). The relatively high communication startup costs in these machines renders frequent communication very expensive. Motivated by this, we present a method of aggregating a number of loop iterations into tiles where the tiles execute atomically -- a processor executing the iterations belonging to a tile receives all the data it needs before executing any one of the iterations in the tile, executes all the iterations in the tile and then sends the data needed by other processors. Since synchronization is not allowed during the execution of a tile, partitioning the iteration space into tiles must not result in deadlock. We first show the equivalence between the problem of finding partitions and the problem of determining the cone for a given set of dependence vectors. We then present an approach to partitioning the iteration space into deadlock-free tiles so that communicati...
Automatic Data Partitioning on Distributed Memory Multiprocessors
, 1991
"... An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper ..."
Abstract
-
Cited by 108 (6 self)
- Add to MetaCart
An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper, we present a novel approach to the problem of automatic data partitioning. We introduce the notion of constraints on data distribution, and show how, based on performance considerations, a compiler identifies constraints to be imposed on the distribution of various data structures. These constraints are then combined by the compiler to obtain a complete and consistent picture of the data distribution scheme, one that offers good performance in terms of the overall execution time.
Exploiting Task and Data Parallelism on a Multicomputer
- In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1993
"... For many applications, achieving good performance on a private memory parallel computer requires exploiting data parallelism as well as task parallelism. Depending on the size of the input data set and the number of nodes (i.e., processors), different tradeoffs between task and data parallelism are ..."
Abstract
-
Cited by 103 (23 self)
- Add to MetaCart
(Show Context)
For many applications, achieving good performance on a private memory parallel computer requires exploiting data parallelism as well as task parallelism. Depending on the size of the input data set and the number of nodes (i.e., processors), different tradeoffs between task and data parallelism are appropriate for a parallel system. Most existing compilers focus on only one of data parallelism and task parallelism. Therefore, to achieve the desired results, the programmer must separately program the data and task parallelism. We have taken a unified approach to exploiting both kinds of parallelism in a single framework with an existing language. This approach eases the task of programming and exposes the tradeoffs between data and task parallelism to the compiler. We have implemented a parallelizing Fortran compiler for the iWarp system based on this approach. We discuss the design of our compiler, and present performance results to validate our approach. 1 Introduction Many applicati...
Integrating Message-Passing and Shared-Memory: Early Experience
, 1993
"... This paper discusses some of the issues involved in implementing a shared-address space programming model on large-scale, distributed-memory multiprocessors. While such a programming model can be implemented on both shared-memory and messagepassing architectures, we argue that the transparent, coher ..."
Abstract
-
Cited by 100 (15 self)
- Add to MetaCart
(Show Context)
This paper discusses some of the issues involved in implementing a shared-address space programming model on large-scale, distributed-memory multiprocessors. While such a programming model can be implemented on both shared-memory and messagepassing architectures, we argue that the transparent, coherent caching of global data provided by many shared-memory architectures is of crucial importance. Because message-passing mechanisms are much more efficient than shared-memory loads and stores for certain types of interprocessor communication and synchronization operations, however, we argue for building multiprocessors that efficiently support both shared-memory and message-passing mechanisms. We describe an architecture, Alewife, that integrates support for shared-memory and message-passing through a simple interface; we expect the compiler and runtime system to cooperate in using appropriate hardware mechanisms that are most efficient for specific operations. We report on both integrated and exclusively shared-memory implementations of our runtime system and two applications. The integrated runtime system drastically cuts down the cost of communication incurred by the scheduling, load balancing, and certain synchronization operations. We also present preliminary performance results comparing the two systems.