Results 1  10
of
11
AutoBlocking MatrixMultiplication or Tracking BLAS3 Performance from Source Code
 In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1997
"... An elementary, machineindependent, recursive algorithm for matrix multiplication C+=A*B provides implicit blocking at every level of the memory hierarchy and tests out faster than classically optimal code, tracking handcoded BLAS3 routines. ``Proof of concept'' is demonstrated by racing the inpla ..."
Abstract

Cited by 76 (6 self)
 Add to MetaCart
An elementary, machineindependent, recursive algorithm for matrix multiplication C+=A*B provides implicit blocking at every level of the memory hierarchy and tests out faster than classically optimal code, tracking handcoded BLAS3 routines. ``Proof of concept'' is demonstrated by racing the inplace algorithm against manufacturer's handtuned BLAS3 routines; it can win. The recursive code bifurcates naturally at the top level into independent blockoriented processes, that each writes to a disjoint and contiguous region of memory. Experience has shown that the indexing vastly improves the patterns of memory access at all levels of the memory hierarchy, independently of the sizes of caches or pages and without ad hoc programming. It also exposed a weakness in SGI's C compilers that merrily unroll loops for the superscalar R8000 processor, but do not analogously unfold the base cases of the most elementary recursions. Such deficiencies might deter programmers from using this rich class of recursive algorithms.
Single Assignment C  efficient support for highlevel array operations in a functional setting
, 2003
"... ..."
A New Array Operation
 Graph Reduction: Proceedings of a Workshop
, 1986
"... This paper proposes a new solution, which is a variant on the "monolothic" approach to array operations. The new solution is also not completely satisfactory, but does have advantages complementary to other proposals. It makes a class of programs easy to express, notably those involving the construc ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
This paper proposes a new solution, which is a variant on the "monolothic" approach to array operations. The new solution is also not completely satisfactory, but does have advantages complementary to other proposals. It makes a class of programs easy to express, notably those involving the construction of histograms. It also allows for parallel implementations without the need to introduce nondeterminism. The work reported here was motivated by discussions at the Workshop on Graph Reduction, of which this is the proceedings. In particular, it was motivated by the contributions of Arvind and Paul Hudak (see this volume), and especially Arvind's observation that the histogram problem is difficult to solve in a satisfactory way. After the first draft of this paper was written, I discovered that Simon Peyton Jones independently suggested the same idea, also prompted by one of Arvind's talks. After the second draft was written, I discovered that Guy Steele and Daniel Hillis use a very similar notion in Connection Machine Lisp [SH86]. Apparently the simple innovation described in this paper is an idea whose time has come. This paper is organized as follows. Section 1 briefly surveys previously proposed array operations. Section 2 introduces the new operation. Section 3 gives a small example of its use. Section 4 discusses two variations on the array operation. Section 5 concludes. 1 Background
Matrix Algebra and Applicative Programming
 Functional Programming Languages and Computer Architecture (Proceedings
, 1987
"... General Term: Algorithms. The broad problem of matrix algebra is taken up from the perspective of functional programming. Akey question is how arrays should be represented in order to admit good implementations of wellknown e cient algorithms, and whether functional architecture sheds any new ligh ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
General Term: Algorithms. The broad problem of matrix algebra is taken up from the perspective of functional programming. Akey question is how arrays should be represented in order to admit good implementations of wellknown e cient algorithms, and whether functional architecture sheds any new light on these or other solutions. It relates directly to disarming the \aggregate update " problem. The major thesis is that 2 dary trees should be used to represent ddimensional arrays � examples are matrix operations (d = 2), and a particularly interesting vector (d = 1) algorithm. Sparse and dense matrices are represented homogeneously, but at some overhead that appears tolerable � encouraging results are reviewed and extended. A Pivot Step algorithm is described which o ers optimal stability at no extra cost for searching. The new results include proposed sparseness measures for matrices, improved performance of stable matrix inversion through repeated pivoting while deep within a matrixtree (extendible to solving linear systems), and a clean matrix derivation of the vector algorithm for the fast Fourier transform. Running code is o ered in the appendices.
A Compositional Framework for Developing Parallel Programs on Two Dimensional Arrays
, 2005
"... The METR technical reports are published as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electron ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
The METR technical reports are published as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.
J.J.: Adapting linear algebra codes to the memory hierarchy using a hypermatrix scheme
 In: Int. Conf. on Parallel Processing and Applied Mathematics. LNCS
, 2005
"... Abstract. We present the way in which we adapt data and computations to the underlying memory hierarchy by means of a hierarchical data structure known as hypermatrix. The application of orthogonal block forms produced the best performance for the platforms used. 1 ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. We present the way in which we adapt data and computations to the underlying memory hierarchy by means of a hierarchical data structure known as hypermatrix. The application of orthogonal block forms produced the best performance for the platforms used. 1
Using randomization to make recursive matrix algorithms practical
"... Abstract Recursive block decomposition algorithms (also known as quadtree algorithms when the blocks are all square) have been proposed to solve wellknown problems such as matrix addition, multiplication, inversion, determinant computation, block LDU decomposition, and Cholesky and QR factorization ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract Recursive block decomposition algorithms (also known as quadtree algorithms when the blocks are all square) have been proposed to solve wellknown problems such as matrix addition, multiplication, inversion, determinant computation, block LDU decomposition, and Cholesky and QR factorization. Until now, such algorithms have been seen as impractical, since they require leading submatrices of the input matrix to be invertible (which is rarely guaranteed). We show how to randomize an input matrix to guarantee that submatrices meet these requirements, and to make recursive block decomposition methods practical on wellconditioned input matrices. The resulting algorithms are elegant, and we show the recursive programs can perform well for both dense and sparse matrices, although with randomization dense computations seem most practical. By `homogenizing ' the input, randomization provides a way to avoid degeneracy in numerical problems that permits simple recursive quadtree algorithms to solve these problems. 1 Introduction We have been investigating alternative computation schemes for largescale matrix computations. A natural functional programming approach called recursive block decomposition (or quadtree decomposition when the blocks are all square) operates via divideandconquer recursion. 1.1 Recursive Block Decomposition Algorithms The basic idea here is that when a matrix is decomposed into smaller blocks, many useful functions of the matrix can be computed recursively. A natural question is whether recursive programming can play a practical role in numerical computation, although today most numerical algorithms are programmed iteratively.
Matrix Algorithms using Quadtrees Invited Talk, ATABLE92
 in Proc. ATABLE92
, 1992
"... Many scheduling and synchronization problems for largescale multiprocessing can be overcome using functional (or applicative) programming. With this observation, it is strange that so much attention within the functional programming community has focused on the "aggregate update problem" [10]: esse ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Many scheduling and synchronization problems for largescale multiprocessing can be overcome using functional (or applicative) programming. With this observation, it is strange that so much attention within the functional programming community has focused on the "aggregate update problem" [10]: essentially how to implement FORTRAN arrays. This situation is strange because inplace updating of aggregates belongs more to uniprocessing than to mathematics. Several years ago functional style drew me to treatment of ddimen sional arrays as 2 d ary trees; in particular, matrices become quaternary trees or quadtrees. This convention yields efficient recopyingcumupdate of any array; recursive, algebraic decomposition of conventional arithmetic algorithms; and uniform representations and algorithms for both dense and sparse matrices. For instance, any nonsingular subtree is a candidate as the pivot block for Gaussian elimination; the restriction actually helps identification of pivot b...
Daisy, DSI and LiMP  Issues in Architecture for Suspending Construction
"... This report briefly describes the functional programming language Daisy, it's underlying computational model, DSI, and a hypothetical architecture, LiMP, for their implementation. Daisy is a simple list processing language, derived from Pure Lisp, which inherits a callbyneed semantics through its ..."
Abstract
 Add to MetaCart
This report briefly describes the functional programming language Daisy, it's underlying computational model, DSI, and a hypothetical architecture, LiMP, for their implementation. Daisy is a simple list processing language, derived from Pure Lisp, which inherits a callbyneed semantics through its use of a suspending constructor. DSI is the hetergeneous `data space' of suspensions and manifest graphs, modeling concurrent task execution on a a parallel virtual machine. LiMP stands for List MultiProcessor; an MIMD architecture for parallel graph processing. The primitive mechanisms of LiMP are explained, and the evolution of the machine model from language oriented research is discussed. Research reported herein was supported, in part, by the National Science Foundation, under grants numbered MCS 8203978, DCR 8405241, and DCR 8521497. daisy, dsi, and limp 2 1. Introduction The article "CONS should not Evaluate its Arguments" [FrWi76a] was one of several papers to appear i...