Results 1  10
of
16
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm
 Appl. Math Letters
, 1990
"... In this paper, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are ..."
Abstract

Cited by 27 (13 self)
 Add to MetaCart
In this paper, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to highperformance parallel/vector codes for various architectures. In this paper, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray YMP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storage of size O(7 n ) for multiplying 2 n \Theta 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implem...
EXTENT: A Portable Programming Environment for Designing and Implementing HighPerformance Block Recursive Algorithms
, 1994
"... EXTENT is an EXpert system for TENsor product formula Translation. In this paper we present a programming environment for automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronecker) product based programming methodology is used for designing high performance p ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
EXTENT is an EXpert system for TENsor product formula Translation. In this paper we present a programming environment for automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronecker) product based programming methodology is used for designing high performance programs on various architectures. In this programming methodology, block recursive algorithms such as the fast Fourier transform and Strassen's matrix multiplication algorithm are expressed as tensor product formulas involving tensor product and other matrix operations. A tensor product formula can be systematically translated to parallel and/or vector code for various parallel architectures. A prototype system which generates programs for the Cray YMP, Cray T3D, and Intel Paragon has been developed. Performance results for some generated programs are presented. Keywords: Parallel programming environment, Tensor (Kronecker) product, Block recursive algorithm, Parallel program synthesis. 1...
A Framework for Generating DistributedMemory Parallel Programs for Block Recursive Algorithms
 Journal of Parallel and Distributed Computing
, 1996
"... A framework for synthesizing communicationefficient distributedmemory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassen’s matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involve ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
A framework for synthesizing communicationefficient distributedmemory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassen’s matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involves the tensor (Kronecker) product and other matrix operations. This representation is useful in analyzing the communication implications of computation partitioning and data distributions. The programs are synthesized under two different target program models. These two models are based on different ways of managing the distribution of data for optimizing communication. The first model uses pointtopoint interprocessor communication primitives, whereas the second model uses data redistribution primitives involving collective alltomany communication. These two program models are shown to be suitable for different ranges of problem size. The methodology is illustrated by synthesizing communicationefficient programs for the FFT. This framework has been incorporated into the EXTENT system for automatic generation of parallel/vector programs for block recursive algorithms. © 1996 Academic Press, Inc. 1.
An Algebraic Theory for Modeling Multistage Interconnection Networks
 Journal of Information Science and Engineering
, 1993
"... We use an algebraic theory based on tensor products to model multistage interconnection networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on sharedmemory vector multiprocessors. In this paper, we focus on the modeling of multistage int ..."
Abstract

Cited by 14 (11 self)
 Add to MetaCart
We use an algebraic theory based on tensor products to model multistage interconnection networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on sharedmemory vector multiprocessors. In this paper, we focus on the modeling of multistage interconnection networks. The tensor product representations of the baseline network, the reverse baseline network, the indirect binary ncube network, the generalized cube network, the omega network, and the flip network are given. We present the use of this theory for specifying and verifying network properties such as network partitioning and topological equivalence. Algorithm mapping using tensor product formulation is demonstrated by mapping the matrix transposition algorithm onto multistage interconnection networks. Keywords: Tensor product, parallel architecture, multistage interconnection network, partitionability, topological equivalence, algorithm mapping. 1 Introduction Tensor prod...
Parallelization of DivideandConquer by Translation to Nested Loops
 J. Functional Programming
, 1997
"... We propose a sequence of equational transformations and specializations which turns a divideandconquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divideandconquer. The specializations impose a balanced call tree, a fixed degree of the prob ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
We propose a sequence of equational transformations and specializations which turns a divideandconquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divideandconquer. The specializations impose a balanced call tree, a fixed degree of the problem division, and elementwise operations. Our goal is to select parallel implementations of divideandconquer via a spacetime mapping, which can be determined at compile time. The correctness of our transformations is proved by equational reasoning in Haskell; recursion and iteration are handled by induction. Finally, we demonstrate the practicality of the skeleton by expressing Strassen's matrix multiplication in it.
An Algebraic Theory for Modeling Direct Interconnection Networks
, 1992
"... The theory of tensor products has been used for designing and implementing block recursive numerical algorithms on sharedmemory vector multiprocessors such as the CrayYMP. In this paper, we present an algebraic theory based on tensor products for modeling direct interconnection networks. The devel ..."
Abstract

Cited by 9 (9 self)
 Add to MetaCart
The theory of tensor products has been used for designing and implementing block recursive numerical algorithms on sharedmemory vector multiprocessors such as the CrayYMP. In this paper, we present an algebraic theory based on tensor products for modeling direct interconnection networks. The development of this model is expected to facilitate the development of a methodology for mapping algorithms expressed in tensor product form onto distributedmemory architectures. A network is defined as a tuple that includes a set of processors and a set of permutations expressed in tensor product notation which collectively represent the network topology. The tensor product of networks is defined to facilitate the recursive construction of complex networks from simple networks. Using the tensor product of networks, properties of the simple networks, such as network embedding, can be easily extended to the complex networks. We start with a simple ring network and recursively construct twodimens...
A technique for overlapping computation and communication for block recursive algorithms
 CONCURRENCY: PRACT. EXPER.,VOL.10(2), 73–90 (1998)
, 1998
"... This paper presents a design methodology for developing efficient distributedmemory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributedmemory a ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper presents a design methodology for developing efficient distributedmemory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributedmemory architecture with a circuitswitched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communicationefficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.
A Methodology for Generating Efficient DiskBased Algorithms from Tensor Product Formulas
, 1993
"... . In this paper, we address the issue of automatic generation of diskbased algorithms from tensor product formulas. Diskbased algorithms are required in scientific applications which work with large data sets that do not fit entirely into main memory. Tensor products have been used for designing a ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
. In this paper, we address the issue of automatic generation of diskbased algorithms from tensor product formulas. Diskbased algorithms are required in scientific applications which work with large data sets that do not fit entirely into main memory. Tensor products have been used for designing and implementing block recursive algorithms on sharedmemory, vector and distributedmemory multiprocessors. We extend this theory to generate diskbased code from tensor product formulas. The methodology is based on generating algebraically equivalent tensor product formulas which have better disk performance. We demonstrate this methodology by generating diskbased code for the fast Fourier transform. Keywords: Tensor product, stride permutation, diskbased algorithm, fast Fourier transform. 1 Introduction During the last decade, processor speeds have increased significantly and several strides in the development of high performance architectures have been made. While tremendous progress ...
A Kronecker Compiler for fast transform algorithms
 In 8th SIAM Conf. Parallel Proc. For Sci. Comp
, 1997
"... We present a sourcetosource compiler that processes matrix formulae in the form of Kronecker product factorizations. The Kronecker product notation allows for simple expressions of algorithms such as WalshHadamard, Haar, Slant, Hartley, and FFTs as well as transpositions and wavelet transforms. T ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We present a sourcetosource compiler that processes matrix formulae in the form of Kronecker product factorizations. The Kronecker product notation allows for simple expressions of algorithms such as WalshHadamard, Haar, Slant, Hartley, and FFTs as well as transpositions and wavelet transforms. The compiler is based on a set of term rewriting rules that translate high level matrix descriptions into parallel and sequential loops and assignment statements. We provide backend translators for FORTRAN, FORTRAN90, C and Matlab. 1
Computational Models And Program Synthesis For Parallel OutOfCore Computation
, 1996
"... As the performance gap between processors and memory systems continues to increase, memory and I/O subsystems are increasingly becoming the major bottleneck for many I/Ointensive outofcore applications. To address this problem, new models of parallel computation and new methods of program synthe ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
As the performance gap between processors and memory systems continues to increase, memory and I/O subsystems are increasingly becoming the major bottleneck for many I/Ointensive outofcore applications. To address this problem, new models of parallel computation and new methods of program synthesis for outofcore computation are needed. This thesis presents our contributions in these two areas. We first introduce the concept of resource metrics to characterize various models of parallel compu...