Results 1  10
of
18
AutoBlocking MatrixMultiplication or Tracking BLAS3 Performance from Source Code
 In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1997
"... An elementary, machineindependent, recursive algorithm for matrix multiplication C+=A*B provides implicit blocking at every level of the memory hierarchy and tests out faster than classically optimal code, tracking handcoded BLAS3 routines. ``Proof of concept'' is demonstrated by racing ..."
Abstract

Cited by 85 (6 self)
 Add to MetaCart
(Show Context)
An elementary, machineindependent, recursive algorithm for matrix multiplication C+=A*B provides implicit blocking at every level of the memory hierarchy and tests out faster than classically optimal code, tracking handcoded BLAS3 routines. ``Proof of concept'' is demonstrated by racing the inplace algorithm against manufacturer's handtuned BLAS3 routines; it can win. The recursive code bifurcates naturally at the top level into independent blockoriented processes, that each writes to a disjoint and contiguous region of memory. Experience has shown that the indexing vastly improves the patterns of memory access at all levels of the memory hierarchy, independently of the sizes of caches or pages and without ad hoc programming. It also exposed a weakness in SGI's C compilers that merrily unroll loops for the superscalar R8000 processor, but do not analogously unfold the base cases of the most elementary recursions. Such deficiencies might deter programmers from using this rich class of recursive algorithms.
Single Assignment C  efficient support for highlevel array operations in a functional setting
, 2003
"... ..."
Matrix Algebra and Applicative Programming
 Functional Programming Languages and Computer Architecture (Proceedings
, 1987
"... General Term: Algorithms. The broad problem of matrix algebra is taken up from the perspective of functional programming. Akey question is how arrays should be represented in order to admit good implementations of wellknown e cient algorithms, and whether functional architecture sheds any new ligh ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
General Term: Algorithms. The broad problem of matrix algebra is taken up from the perspective of functional programming. Akey question is how arrays should be represented in order to admit good implementations of wellknown e cient algorithms, and whether functional architecture sheds any new light on these or other solutions. It relates directly to disarming the \aggregate update &quot; problem. The major thesis is that 2 dary trees should be used to represent ddimensional arrays � examples are matrix operations (d = 2), and a particularly interesting vector (d = 1) algorithm. Sparse and dense matrices are represented homogeneously, but at some overhead that appears tolerable � encouraging results are reviewed and extended. A Pivot Step algorithm is described which o ers optimal stability at no extra cost for searching. The new results include proposed sparseness measures for matrices, improved performance of stable matrix inversion through repeated pivoting while deep within a matrixtree (extendible to solving linear systems), and a clean matrix derivation of the vector algorithm for the fast Fourier transform. Running code is o ered in the appendices.
Matrix Algorithms using Quadtrees
 IN PROC. ATABLE92
, 1992
"... Many scheduling and synchronization problems for largescale multiprocessing can be overcome using functional (or applicative) programming. With this observation, it is strange that so much attention within the functional programming community has focused on the "aggregate update problem" ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Many scheduling and synchronization problems for largescale multiprocessing can be overcome using functional (or applicative) programming. With this observation, it is strange that so much attention within the functional programming community has focused on the "aggregate update problem" [10]: essentially how to implement FORTRAN arrays. This situation is strange because inplace updating of aggregates belongs more to uniprocessing than to mathematics. Several years ago functional style drew me to treatment of ddimensional arrays as 2^dary trees; in particular, matrices become quaternary trees or quadtrees. This convention yields efficient recopyingcumupdate of any array; recursive, algebraic decomposition of conventional arithmetic algorithms; and uniform representations and algorithms for both dense and sparse matrices. For instance, any nonsingular subtree is a candidate as the pivot block for Gaussian elimination; the restriction actually helps identification of pivot b...
A Compositional Framework for Developing Parallel Programs on Two Dimensional Arrays
, 2005
"... ..."
Adapting linear algebra codes to the memory hierarchy using a hypermatrix scheme
 IN: INT. CONF. ON PARALLEL PROCESSING AND APPLIED MATHEMATICS. LNCS
, 2005
"... We present the way in which we adapt data and computations to the underlying memory hierarchy by means of a hierarchical data structure known as hypermatrix. The application of orthogonal block forms produced the best performance for the platforms used. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present the way in which we adapt data and computations to the underlying memory hierarchy by means of a hierarchical data structure known as hypermatrix. The application of orthogonal block forms produced the best performance for the platforms used.
Using randomization to make recursive matrix algorithms practical
"... Abstract Recursive block decomposition algorithms (also known as quadtree algorithms when the blocks are all square) have been proposed to solve wellknown problems such as matrix addition, multiplication, inversion, determinant computation, block LDU decomposition, and Cholesky and QR factorization ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract Recursive block decomposition algorithms (also known as quadtree algorithms when the blocks are all square) have been proposed to solve wellknown problems such as matrix addition, multiplication, inversion, determinant computation, block LDU decomposition, and Cholesky and QR factorization. Until now, such algorithms have been seen as impractical, since they require leading submatrices of the input matrix to be invertible (which is rarely guaranteed). We show how to randomize an input matrix to guarantee that submatrices meet these requirements, and to make recursive block decomposition methods practical on wellconditioned input matrices. The resulting algorithms are elegant, and we show the recursive programs can perform well for both dense and sparse matrices, although with randomization dense computations seem most practical. By `homogenizing ' the input, randomization provides a way to avoid degeneracy in numerical problems that permits simple recursive quadtree algorithms to solve these problems. 1 Introduction We have been investigating alternative computation schemes for largescale matrix computations. A natural functional programming approach called recursive block decomposition (or quadtree decomposition when the blocks are all square) operates via divideandconquer recursion. 1.1 Recursive Block Decomposition Algorithms The basic idea here is that when a matrix is decomposed into smaller blocks, many useful functions of the matrix can be computed recursively. A natural question is whether recursive programming can play a practical role in numerical computation, although today most numerical algorithms are programmed iteratively.
J.J.: A study on load imbalance in parallel hypermatrix multiplication using openmp
 In: Int. Conf. on Parallel Processing and Applied Mathematics. LNCS
, 2005
"... Abstract. In this paper we present our work on the the parallelization of a matrix multiplication code based on the hypermatrix data structure. We have used OpenMP for the parallelization. We have added OpenMP directives to a few loops and experimented with several features available with OpenMP in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we present our work on the the parallelization of a matrix multiplication code based on the hypermatrix data structure. We have used OpenMP for the parallelization. We have added OpenMP directives to a few loops and experimented with several features available with OpenMP in the Intel Fortran Compiler: scheduling algorithms, chunk sizes and nested parallelism. We found that the load imbalance introduced by the hypermatrix structure could not be solved by any of those OpenMP features. 1