Results 1  10
of
11
Fast linear algebra is stable
 In preparation
, 2006
"... In [23] we showed that a large class of fast recursive matrix multiplication algorithms is stable in a normwise sense, and that in fact if multiplication of nbyn matrices can be done by any algorithm in O(n ω+η) operations for any η> 0, then it can be done stably in O(n ω+η) operations for any η> ..."
Abstract

Cited by 25 (15 self)
 Add to MetaCart
In [23] we showed that a large class of fast recursive matrix multiplication algorithms is stable in a normwise sense, and that in fact if multiplication of nbyn matrices can be done by any algorithm in O(n ω+η) operations for any η> 0, then it can be done stably in O(n ω+η) operations for any η> 0. Here we extend this result to show that essentially all standard linear algebra operations, including LU decomposition, QR decomposition, linear equation solving, matrix inversion, solving least squares problems, (generalized) eigenvalue problems and the singular value decomposition can also be done stably (in a normwise sense) in O(n ω+η) operations. 1
Parallel Performance of a Symmetric Eigensolver based on the Invariant Subspace Decomposition Approach
, 1994
"... In this paper, we discuss work in progress on a complete eigensolver based on the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA). We describe a recently developed acceleration technique that substantially reduces the overall work required by this algorithm and revie ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
In this paper, we discuss work in progress on a complete eigensolver based on the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA). We describe a recently developed acceleration technique that substantially reduces the overall work required by this algorithm and review the algorithmic highlights of a distributedmemory implementation of this approach. These include a fast matrixmatrix multiplication algorithm, a new approach to parallel band reduction and tridiagonalization, and a harness for coordinating the divideandconquer parallelism in the problem. We present performance results for the dominant kernel, dense matrix multiplication, as well as for the overall SYISDA implementation on the Intel Touchstone Delta and the Intel Paragon. 1. Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated [26, 3, 28, 22, 25, 6]. The work presented in t...
A Study of the Invariant Subspace Decomposition Algorithm for Banded Symmetric Matrices
 in Proceedings of the Fifth SIAM Conference on Applied Linear Algebra
, 1994
"... In this paper, we give an overview of the Invariant Subspace Decomposition Algorithm for banded symmetric matrices and describe a sequential implementation of this algorithm. Our implementation uses a specialized routine for performing banded matrix multiplication together with successive band reduc ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
In this paper, we give an overview of the Invariant Subspace Decomposition Algorithm for banded symmetric matrices and describe a sequential implementation of this algorithm. Our implementation uses a specialized routine for performing banded matrix multiplication together with successive band reduction, yielding a sequential algorithm that is competitive for large problems with the LAPACK QR code in computing all of the eigenvalues and eigenvectors of a dense symmetric matrix. Performance results are given on a variety of machines. 1 Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated [8, 11, 7]. The work presented in this paper is part of the PRISM (Parallel Research on Invariant Subspace Methods) Project, which involves researchers from Argonne National Laboratory, the Supercomputing Research Center, the University of California at Berkeley, and the University of Kent...
On Tridiagonalizing and Diagonalizing Symmetric Matrices with Repeated Eigenvalues
 PREPRINT ANL/MCSP54541095, MATHEMATICS AND COMPUTER SCIENCE DIVISION, ARGONNE NATIONAL LABORATORY
, 1995
"... We describe a divideandconquer tridiagonalization approach for matrices with repeated eigenvalues. Our algorithm hinges on the fact that, under easily constructively verifiable conditions, a symmetric matrix with bandwidth b and k distinct eigenvalues must be block diagonal with diagonal blocks ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We describe a divideandconquer tridiagonalization approach for matrices with repeated eigenvalues. Our algorithm hinges on the fact that, under easily constructively verifiable conditions, a symmetric matrix with bandwidth b and k distinct eigenvalues must be block diagonal with diagonal blocks of size at most bk. A slight modification of the usual orthogonal bandreduction algorithm allows us to reveal this structure, which then leads to potential parallelism in the form of independent diagonal blocks. Compared with the usual Householder reduction algorithm, the new approach exhibits improved data locality, significantly more scope for parallelism, and the potential to reduce arithmetic complexity by close to 50% for matrices that have only two numerically distinct eigenvalues. The actual improvement depends to a large extent on the number of distinct eigenvalues and a good estimate thereof. However, at worst the algorithm behaves like a successive bandreduction approach to tridia...
Parallel Studies of the Invariant Subspace Decomposition Approach for Banded Symmetric Matrices
, 1995
"... We present an overview of the banded Invariant Subspace Decomposition Algorithm for symmetric matrices and describe a parallel implementation of this algorithm. The algorithm described here is a promising variant of the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA) ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We present an overview of the banded Invariant Subspace Decomposition Algorithm for symmetric matrices and describe a parallel implementation of this algorithm. The algorithm described here is a promising variant of the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA) that retains the property of using scalable primitives, while requiring significantly less overall computation than SYISDA. 1 Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated. The work presented in this paper is part of the PRISM (Parallel Research on Invariant Subspace Methods) Project, which involves researchers from Argonne National Laboratory, the Supercomputing Research Center, the University of California at Berkeley, and the University of Kentucky. The goal of the PRISM project is the development of algorithms and software for solving largescale eigenvalue problems ...
A Case Study of MPI: Portable and Efficient Libraries*
, 1995
"... In this paper, we discuss the performance achieved by several implementations of the recently defined Message Passing Interface (MPI) standard. In particular, performance results for different implementations of the broadcast operation are analyzed and compared on the Delta, Paragon, SP1 and CM5. 1 ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we discuss the performance achieved by several implementations of the recently defined Message Passing Interface (MPI) standard. In particular, performance results for different implementations of the broadcast operation are analyzed and compared on the Delta, Paragon, SP1 and CM5. 1 Introduction For the past several years, members of the Parallel Research on Invariant Subspace Methods (PRISM) project have been investigating scalable parallel eigensolvers for distributed memory systems [1, 3]. The ultimate objective of this research is the development of portable and efficient libraries for this fundamental numerical linear algebra kernel. In the course of our work, we, like many other library developers, have been faced with many issues relating to portable programming. Previously, a notable obstacle to library development was the lack of standardization in message passing, from both a programming and a functional point of view. This lack of standardization made it dif...
The CCLRC HPCI Centre at Daresbury Laboratory
, 1996
"... Parallel software packages which may be of use in scientific and engineering applications of the type carried out on the parallel computing facilities at EPCC and Daresbury Laboratory are surveyed. For each package, a brief description is given along with other useful information such as availabilit ..."
Abstract
 Add to MetaCart
Parallel software packages which may be of use in scientific and engineering applications of the type carried out on the parallel computing facilities at EPCC and Daresbury Laboratory are surveyed. For each package, a brief description is given along with other useful information such as availability, contact addresses and systems supported. keywords: parallel computing, software packages, scientific applications. This report is available from http://www.dl.ac.uk/TCSC/HPCI/ c fl1996, Daresbury Laboratory. We do not accept any responsibility for loss or damage arising from the use of information contained in any of our reports or in any communication about our tests or investigations. ii CONTENTS iii Contents 1 Introduction 1 1.1 Criteria for inclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Package areas : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.3 Individual entries : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :...
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
, 2010
"... Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we ..."
Abstract
 Add to MetaCart
Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In [4] lower bounds were presented on the amount of communication required for essentially all O(n 3)like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs. 1
and Integrable Systems. Minimizing Communication for Eigenproblems and the Singular Value
, 2011
"... personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires pri ..."
Abstract
 Add to MetaCart
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement This research is supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award #DIG0710227). Additional support comes from Par Lab
algorithm for nonsymmetric eigenproblems
, 1994
"... Summary. We discuss an inversefree, highly parallel, spectral divide and conquer algorithm. It can compute either an invariant subspace of a nonsymmetric matrix A, or a pair of left and right deflating subspaces of a regular matrix pencil A − λB. This algorithm is based on earlier ones of Bulgakov, ..."
Abstract
 Add to MetaCart
Summary. We discuss an inversefree, highly parallel, spectral divide and conquer algorithm. It can compute either an invariant subspace of a nonsymmetric matrix A, or a pair of left and right deflating subspaces of a regular matrix pencil A − λB. This algorithm is based on earlier ones of Bulgakov, Godunov and Malyshev, but improves on them in several ways. This algorithm only uses easily parallelizable linear algebra building blocks: matrix multiplication and QR decomposition, but not matrix inversion. Similar parallel algorithms for the nonsymmetric eigenproblem use the matrix sign function, which requires matrix inversion and is faster but can be less stable than the new algorithm. Mathematics Subject Classification (1991): 65F15 1.