Results 1 - 10
of
16
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm
- Appl. Math Letters
, 1990
"... In this paper, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are ..."
Abstract
-
Cited by 24 (13 self)
- Add to MetaCart
In this paper, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this paper, we present a non-recursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storage of size O(7 n ) for multiplying 2 n \Theta 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implem...
EXTENT: A Portable Programming Environment for Designing and Implementing High-Performance Block Recursive Algorithms
, 1994
"... EXTENT is an EXpert system for TENsor product formula Translation. In this paper we present a programming environment for automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronecker) product based programming methodology is used for designing high performance p ..."
Abstract
-
Cited by 18 (9 self)
- Add to MetaCart
EXTENT is an EXpert system for TENsor product formula Translation. In this paper we present a programming environment for automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronecker) product based programming methodology is used for designing high performance programs on various architectures. In this programming methodology, block recursive algorithms such as the fast Fourier transform and Strassen's matrix multiplication algorithm are expressed as tensor product formulas involving tensor product and other matrix operations. A tensor product formula can be systematically translated to parallel and/or vector code for various parallel architectures. A prototype system which generates programs for the Cray Y-MP, Cray T3D, and Intel Paragon has been developed. Performance results for some generated programs are presented. Keywords: Parallel programming environment, Tensor (Kronecker) product, Block recursive algorithm, Parallel program synthesis. 1...
A Framework for Generating Distributed-Memory Parallel Programs for Block Recursive Algorithms
- Journal of Parallel and Distributed Computing
, 1996
"... A framework for synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassen’s matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involve ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
A framework for synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassen’s matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involves the tensor (Kronecker) product and other matrix operations. This representation is useful in analyzing the communication implications of computation partitioning and data distributions. The programs are synthesized under two different target program models. These two models are based on different ways of managing the distribution of data for optimizing communication. The first model uses point-to-point interprocessor communication primitives, whereas the second model uses data redistribution primitives involving collective all-to-many communication. These two program models are shown to be suitable for different ranges of problem size. The methodology is illustrated by synthesizing communication-efficient programs for the FFT. This framework has been incorporated into the EX-TENT system for automatic generation of parallel/vector programs for block recursive algorithms. © 1996 Academic Press, Inc. 1.
An Algebraic Theory for Modeling Multistage Interconnection Networks
- Journal of Information Science and Engineering
, 1993
"... We use an algebraic theory based on tensor products to model multistage interconnection networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on shared-memory vector multiprocessors. In this paper, we focus on the modeling of multistage int ..."
Abstract
-
Cited by 14 (11 self)
- Add to MetaCart
We use an algebraic theory based on tensor products to model multistage interconnection networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on shared-memory vector multiprocessors. In this paper, we focus on the modeling of multistage interconnection networks. The tensor product representations of the baseline network, the reverse baseline network, the indirect binary n-cube network, the generalized cube network, the omega network, and the flip network are given. We present the use of this theory for specifying and verifying network properties such as network partitioning and topological equivalence. Algorithm mapping using tensor product formulation is demonstrated by mapping the matrix transposition algorithm onto multistage interconnection networks. Keywords: Tensor product, parallel architecture, multistage interconnection network, partitionability, topological equivalence, algorithm mapping. 1 Introduction Tensor prod...
Parallelization of Divide-and-Conquer by Translation to Nested Loops
- J. Functional Programming
, 1997
"... We propose a sequence of equational transformations and specializations which turns a divide-and-conquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divide-and-conquer. The specializations impose a balanced call tree, a fixed degree of the prob ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
We propose a sequence of equational transformations and specializations which turns a divide-and-conquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divide-and-conquer. The specializations impose a balanced call tree, a fixed degree of the problem division, and elementwise operations. Our goal is to select parallel implementations of divide-and-conquer via a space-time mapping, which can be determined at compile time. The correctness of our transformations is proved by equational reasoning in Haskell; recursion and iteration are handled by induction. Finally, we demonstrate the practicality of the skeleton by expressing Strassen's matrix multiplication in it.
An Algebraic Theory for Modeling Direct Interconnection Networks
, 1992
"... The theory of tensor products has been used for designing and implementing block recursive numerical algorithms on shared-memory vector multiprocessors such as the Cray-YMP. In this paper, we present an algebraic theory based on tensor products for modeling direct interconnection networks. The devel ..."
Abstract
-
Cited by 9 (9 self)
- Add to MetaCart
The theory of tensor products has been used for designing and implementing block recursive numerical algorithms on shared-memory vector multiprocessors such as the Cray-YMP. In this paper, we present an algebraic theory based on tensor products for modeling direct interconnection networks. The development of this model is expected to facilitate the development of a methodology for mapping algorithms expressed in tensor product form onto distributed-memory architectures. A network is defined as a tuple that includes a set of processors and a set of permutations expressed in tensor product notation which collectively represent the network topology. The tensor product of networks is defined to facilitate the recursive construction of complex networks from simple networks. Using the tensor product of networks, properties of the simple networks, such as network embedding, can be easily extended to the complex networks. We start with a simple ring network and recursively construct two-dimens...
A technique for overlapping computation and communication for block recursive algorithms
- CONCURRENCY: PRACT. EXPER.,VOL.10(2), 73–90 (1998)
, 1998
"... This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory a ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with a circuit-switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.
A Methodology for Generating Efficient Disk-Based Algorithms from Tensor Product Formulas
, 1993
"... . In this paper, we address the issue of automatic generation of disk-based algorithms from tensor product formulas. Disk-based algorithms are required in scientific applications which work with large data sets that do not fit entirely into main memory. Tensor products have been used for designing a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
. In this paper, we address the issue of automatic generation of disk-based algorithms from tensor product formulas. Disk-based algorithms are required in scientific applications which work with large data sets that do not fit entirely into main memory. Tensor products have been used for designing and implementing block recursive algorithms on shared-memory, vector and distributed-memory multiprocessors. We extend this theory to generate disk-based code from tensor product formulas. The methodology is based on generating algebraically equivalent tensor product formulas which have better disk performance. We demonstrate this methodology by generating disk-based code for the fast Fourier transform. Keywords: Tensor product, stride permutation, disk-based algorithm, fast Fourier transform. 1 Introduction During the last decade, processor speeds have increased significantly and several strides in the development of high performance architectures have been made. While tremendous progress ...
A Kronecker Compiler for fast transform algorithms
- In 8th SIAM Conf. Parallel Proc. For Sci. Comp
, 1997
"... We present a source-to-source compiler that processes matrix formulae in the form of Kronecker product factorizations. The Kronecker product notation allows for simple expressions of algorithms such as Walsh-Hadamard, Haar, Slant, Hartley, and FFTs as well as transpositions and wavelet transforms. T ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present a source-to-source compiler that processes matrix formulae in the form of Kronecker product factorizations. The Kronecker product notation allows for simple expressions of algorithms such as Walsh-Hadamard, Haar, Slant, Hartley, and FFTs as well as transpositions and wavelet transforms. The compiler is based on a set of term rewriting rules that translate high level matrix descriptions into parallel and sequential loops and assignment statements. We provide back-end translators for FORTRAN, FORTRAN-90, C and Matlab. 1
Computational Models And Program Synthesis For Parallel Out-Of-Core Computation
, 1996
"... As the performance gap between processors and memory systems continues to increase, memory and I/O subsystems are increasingly becoming the major bottleneck for many I/O-intensive out-of-core applications. To address this problem, new models of parallel computation and new methods of program synthe ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
As the performance gap between processors and memory systems continues to increase, memory and I/O subsystems are increasingly becoming the major bottleneck for many I/O-intensive out-of-core applications. To address this problem, new models of parallel computation and new methods of program synthesis for out-of-core computation are needed. This thesis presents our contributions in these two areas. We first introduce the concept of resource metrics to characterize various models of parallel compu...

