Results 1 
9 of
9
Detecting and Using Affinity in an Automatic Data Distribution Tool
 In Languages and Compilers for Parallel Computing
, 1994
"... This paper describes some aspects of the implementation of our Data Distribution Tool (DDT), which accepts programs written in Fortran77 and obtains alignment and distribution HPF directives for the arrays used in the program. In particular, we describe the phases of the tool which analyze refer ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
This paper describes some aspects of the implementation of our Data Distribution Tool (DDT), which accepts programs written in Fortran77 and obtains alignment and distribution HPF directives for the arrays used in the program. In particular, we describe the phases of the tool which analyze reference patterns in loops, record preferences for alignment and obtain the alignment functions. These functions are static in the sense that they do not change within the scope of the code analyzed (routine or loop). We propose the use of a set of wellknown techniques to extend the scope of the reference pattern analysis and we evaluate their effectiveness in a set of programs from the Perfect Club and SPEC benchmarks.
EXTENT: A Portable Programming Environment for Designing and Implementing HighPerformance Block Recursive Algorithms
, 1994
"... EXTENT is an EXpert system for TENsor product formula Translation. In this paper we present a programming environment for automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronecker) product based programming methodology is used for designing high performance p ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
EXTENT is an EXpert system for TENsor product formula Translation. In this paper we present a programming environment for automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronecker) product based programming methodology is used for designing high performance programs on various architectures. In this programming methodology, block recursive algorithms such as the fast Fourier transform and Strassen's matrix multiplication algorithm are expressed as tensor product formulas involving tensor product and other matrix operations. A tensor product formula can be systematically translated to parallel and/or vector code for various parallel architectures. A prototype system which generates programs for the Cray YMP, Cray T3D, and Intel Paragon has been developed. Performance results for some generated programs are presented. Keywords: Parallel programming environment, Tensor (Kronecker) product, Block recursive algorithm, Parallel program synthesis. 1...
A Framework for Generating DistributedMemory Parallel Programs for Block Recursive Algorithms
 Journal of Parallel and Distributed Computing
, 1996
"... A framework for synthesizing communicationefficient distributedmemory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassen’s matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involve ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
A framework for synthesizing communicationefficient distributedmemory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassen’s matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involves the tensor (Kronecker) product and other matrix operations. This representation is useful in analyzing the communication implications of computation partitioning and data distributions. The programs are synthesized under two different target program models. These two models are based on different ways of managing the distribution of data for optimizing communication. The first model uses pointtopoint interprocessor communication primitives, whereas the second model uses data redistribution primitives involving collective alltomany communication. These two program models are shown to be suitable for different ranges of problem size. The methodology is illustrated by synthesizing communicationefficient programs for the FFT. This framework has been incorporated into the EXTENT system for automatic generation of parallel/vector programs for block recursive algorithms. © 1996 Academic Press, Inc. 1.
DDT: A Research Tool for Automatic Data Distribution in HPF
 in HPF. Scientific Programming
, 1995
"... . This paper describes the features and implementation of our automatic data distribution research tool. The tool (DDT) accepts programs written in Fortran77 and generates HPF directives and executable statements. DDT works by identifying a set of computational phases (procedures and loops). The alg ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
. This paper describes the features and implementation of our automatic data distribution research tool. The tool (DDT) accepts programs written in Fortran77 and generates HPF directives and executable statements. DDT works by identifying a set of computational phases (procedures and loops). The algorithm builds a search space of candidate solutions for these phases which is explored looking for their combination that minimize the overall cost; this cost includes movement cost and computation cost. The data movement cost includes the cost of executing each phase with a given mapping and the remapping costs that have to be paid in order to execute each phase with the mapping selected. The computation cost includes the cost of executing each phase in parallel according to the mapping selected and the owner computes rule. Control flow information is used to identify how phases are sequenced during the execution of the application. 1 Introduction Data distribution is one of the topics of ...
On the Analysis of PAMELA Models
, 1993
"... While last year's report [16] loosely introduced the general concepts behind the Pamela approach toward modeling and analysis of parallel systems, this report exclusively focuses on the calculus of the methodology. In particular, it defines an algorithmic approach toward serialization analysis, whi ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
While last year's report [16] loosely introduced the general concepts behind the Pamela approach toward modeling and analysis of parallel systems, this report exclusively focuses on the calculus of the methodology. In particular, it defines an algorithmic approach toward serialization analysis, which enables (future) mechanization of the analysis. Thus, a technique is developed to automatically compile symbolic performance models in the course of program translation. It is shown that the resulting performance models fundamentally outperform traditional static estimation approaches at a negligible increase in cost. This claim is illustrated by two case studies, i.e., an LU factorization algorithm on a multiprocessor, and a matrixvector update on a multicomputer. Contents 1 Introduction 2 2 Analysis 5 2.1 Mathematical Preliminaries : : : : : : : : : : : : : : : : : : : : : 5 2.2 Formalism : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.3 Homomorphic Mapping : : : :...
Parallel and Distributed Systems Report Series
, 1998
"... In this paper we describe a compilation scheme to translate implicitly parallel programs in the programming language Spar to efficient code for a distributedmemory parallel computer system. The compilation scheme is formulated as a set of transformation rules. In Spar, the language constructs for pa ..."
Abstract
 Add to MetaCart
In this paper we describe a compilation scheme to translate implicitly parallel programs in the programming language Spar to efficient code for a distributedmemory parallel computer system. The compilation scheme is formulated as a set of transformation rules. In Spar, the language constructs for parallelization have been designed for comfortable use by the programmer, not for ease of compilation. Nevertheless, it is shown that it is possible to implement a basic translation scheme with a surprisingly small set of translation rules. A number of optimizations on this basic translation scheme are also easily formulated as rules. 1
Code Generation Techniques for the TaskParallel . . .
, 1998
"... In this paper we describe a compilation scheme to translate implicitly parallel programs in the programming language Spar to efficient code for a distributedmemory parallel computer system. The compilation scheme is formulated as a set of transformation rules. In Spar, the language constructs for p ..."
Abstract
 Add to MetaCart
In this paper we describe a compilation scheme to translate implicitly parallel programs in the programming language Spar to efficient code for a distributedmemory parallel computer system. The compilation scheme is formulated as a set of transformation rules. In Spar, the language constructs for parallelization have been designed for comfortable use by the programmer, not for ease of compilation. Nevertheless, it is shown that it is possible to implement a basic translation scheme with a surprisingly small set of translation rules. A number of optimizations on this basic translation scheme are also easily formulated as rules.