Results 1  10
of
63
Optimizing the performance of sparse matrixvector multiplication
, 2000
"... Copyright 2000 by EunJin Im ..."
(Show Context)
Applying recursion to serial and parallel QR factorization leads to better performance
"... this paper may be copied or distributed royalty free without further permission by computerbased and other informationservice systems. Permission to republish any other portion of this paper must be obtained from the Editor. ..."
Abstract

Cited by 52 (4 self)
 Add to MetaCart
this paper may be copied or distributed royalty free without further permission by computerbased and other informationservice systems. Permission to republish any other portion of this paper must be obtained from the Editor.
The Design of a Parallel Dense Linear Algebra Software Library: Reduction to Hessenberg, Tridiagonal, and Bidiagonal Form
, 1995
"... ..."
Fast polar decomposition of an arbitrary matrix
 SIAM J. Sci. Stat. Comput
, 1990
"... Abstract. The polar decomposition of an m x n matrix A of full rank, where rn n, can be computed usingaquadraticallyconvergentalgorithmofHigham SIAMJ. Sci. Statist. Comput.,7 (1986), pp. 11601174]. The algorithm is based on a Newton iteration involving a matrix inverse. It is shown how, with the us ..."
Abstract

Cited by 33 (9 self)
 Add to MetaCart
(Show Context)
Abstract. The polar decomposition of an m x n matrix A of full rank, where rn n, can be computed usingaquadraticallyconvergentalgorithmofHigham SIAMJ. Sci. Statist. Comput.,7 (1986), pp. 11601174]. The algorithm is based on a Newton iteration involving a matrix inverse. It is shown how, with the use of a preliminary complete orthogonal decomposition, the algorithm can be extended to arbitrary A. The use ofthe algorithm to compute the positive semidefinite square root ofa Hermitian positive semidefinite matrix is also described. A hybrid algorithm that adaptively switches from the matrix inversion based iteration to a matrix multiplication based iteration due to Kovarik, and to Bj6rck and Bowie, is formulated. The decision when to switch is made using a condition estimator. This &quot;matrix multiplication rich &quot; algorithm is shown to be more efficient on machines for which matrix multiplication can be executed 1.5 times faster than matrix inversion.
New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems
, 1998
"... . We present a new recursive algorithm for the QR factorization of an m by n matrix A. The recursion leads to an automatic variable blocking that allow us to replace a level 2 part in a standard block algorithm by level 3 operations. However, there are some additional costs for performing the update ..."
Abstract

Cited by 32 (6 self)
 Add to MetaCart
. We present a new recursive algorithm for the QR factorization of an m by n matrix A. The recursion leads to an automatic variable blocking that allow us to replace a level 2 part in a standard block algorithm by level 3 operations. However, there are some additional costs for performing the updates which prohibits the efficient use of the recursion for large n. This obstacle is overcome by using a hybrid recursive algorithm that outperforms the LAPACK algorithm DGEQRF by 78% to 21% as m = n increases from 100 to 1000. A successful parallel implementation on a PowerPC 604 based IBM SMP node based on dynamic load balancing is presented. For 2, 3, 4 processors and m = n = 2000 it shows speedups of 1.96, 2.99, and 3.92 compared to our uniprocessor algorithm. 1 Introduction LAPACK algorithm DGEQRF requires more floating point operations than LAPACK algorithm DGEQR2, see [1]. Yet, DGEQRF outperforms DGEQR2 on a RS/6000 workstation by nearly a factor of 3 on large matrices. Dongarra, Kaufm...
ScaLAPACK: A Linear Algebra Library for MessagePassing Computers
 In SIAM Conference on Parallel Processing
, 1997
"... This article outlines the content and performance of some of the ScaLAPACK software. ScaLAPACK is a collection of mathematical software for linear algebra computations on distributedmemory computers. The importance of developing standards for computational and messagepassing interfaces is discusse ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
This article outlines the content and performance of some of the ScaLAPACK software. ScaLAPACK is a collection of mathematical software for linear algebra computations on distributedmemory computers. The importance of developing standards for computational and messagepassing interfaces is discussed. We present the different components and building blocks of ScaLAPACK and provide initial performance results for selected PBLAS routines and a subset of ScaLAPACK driver routines.
Fast linear algebra is stable
 In preparation
, 2006
"... In [23] we showed that a large class of fast recursive matrix multiplication algorithms is stable in a normwise sense, and that in fact if multiplication of nbyn matrices can be done by any algorithm in O(n ω+η) operations for any η> 0, then it can be done stably in O(n ω+η) operations for any ..."
Abstract

Cited by 28 (15 self)
 Add to MetaCart
(Show Context)
In [23] we showed that a large class of fast recursive matrix multiplication algorithms is stable in a normwise sense, and that in fact if multiplication of nbyn matrices can be done by any algorithm in O(n ω+η) operations for any η> 0, then it can be done stably in O(n ω+η) operations for any η> 0. Here we extend this result to show that essentially all standard linear algebra operations, including LU decomposition, QR decomposition, linear equation solving, matrix inversion, solving least squares problems, (generalized) eigenvalue problems and the singular value decomposition can also be done stably (in a normwise sense) in O(n ω+η) operations. 1
Computing RankRevealing QR Factorizations of Dense Matrices
 Argonne Preprint ANLMCSP5590196, Argonne National Laboratory
, 1996
"... this paper, and we give only a brief synopsis here. For details, the reader is referred to the code. Test matrices 1 through 5 were designed to exercise column pivoting. Matrix 6 was designed to test the behavior of the condition estimation in the presence of clusters for the smallest singular value ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
this paper, and we give only a brief synopsis here. For details, the reader is referred to the code. Test matrices 1 through 5 were designed to exercise column pivoting. Matrix 6 was designed to test the behavior of the condition estimation in the presence of clusters for the smallest singular value. For the other cases, we employed the LAPACK matrix generator xLATMS, which generates random symmetric matrices by multiplying a diagonal matrix with prescribed singular values by random orthogonal matrices from the left and right. For the break1 distribution, all singular values are 1.0 except for one. In the arithmetic and geometric distributions, they decay from 1.0 to a specified smallest singular value in an arithmetic and geometric fashion, respectively. In the "reversed" distributions, the order of the diagonal entries was reversed. For test cases 7 though 12, we used xLATMS to generate a matrix of order