Results 1  10
of
14
An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems
, 1997
"... We discuss an inversefree, highly parallel, spectral divide and conquer algorithm. It can compute either an invariant subspace of a nonsymmetric matrix A, or a pair of left and right deflating subspaces of a regular matrix pencil A − λB. This algorithm is based on earlier ones of Bulgakov, Godunov ..."
Abstract

Cited by 70 (11 self)
 Add to MetaCart
We discuss an inversefree, highly parallel, spectral divide and conquer algorithm. It can compute either an invariant subspace of a nonsymmetric matrix A, or a pair of left and right deflating subspaces of a regular matrix pencil A − λB. This algorithm is based on earlier ones of Bulgakov, Godunov and Malyshev, but improves on them in several ways. This algorithm only uses easily parallelizable linear algebra building blocks: matrix multiplication and QR decomposition, but not matrix inversion. Similar parallel algorithms for the nonsymmetric eigenproblem use the matrix sign function, which requires matrix inversion and is faster but can be less stable than the new algorithm.
Fast linear algebra is stable
 In preparation
, 2006
"... In [23] we showed that a large class of fast recursive matrix multiplication algorithms is stable in a normwise sense, and that in fact if multiplication of nbyn matrices can be done by any algorithm in O(n ω+η) operations for any η> 0, then it can be done stably in O(n ω+η) operations for any ..."
Abstract

Cited by 35 (15 self)
 Add to MetaCart
(Show Context)
In [23] we showed that a large class of fast recursive matrix multiplication algorithms is stable in a normwise sense, and that in fact if multiplication of nbyn matrices can be done by any algorithm in O(n ω+η) operations for any η> 0, then it can be done stably in O(n ω+η) operations for any η> 0. Here we extend this result to show that essentially all standard linear algebra operations, including LU decomposition, QR decomposition, linear equation solving, matrix inversion, solving least squares problems, (generalized) eigenvalue problems and the singular value decomposition can also be done stably (in a normwise sense) in O(n ω+η) operations. 1
Parallel performance of a symmetric eigensolver based on the invariant subspace decomposition approach
 In Proceedings of the 1994 Scalable High Performance Computing Conference
, 1994
"... ..."
A Study of the Invariant Subspace Decomposition Algorithm for Banded Symmetric Matrices
 in Proceedings of the Fifth SIAM Conference on Applied Linear Algebra
, 1994
"... In this paper, we give an overview of the Invariant Subspace Decomposition Algorithm for banded symmetric matrices and describe a sequential implementation of this algorithm. Our implementation uses a specialized routine for performing banded matrix multiplication together with successive band reduc ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we give an overview of the Invariant Subspace Decomposition Algorithm for banded symmetric matrices and describe a sequential implementation of this algorithm. Our implementation uses a specialized routine for performing banded matrix multiplication together with successive band reduction, yielding a sequential algorithm that is competitive for large problems with the LAPACK QR code in computing all of the eigenvalues and eigenvectors of a dense symmetric matrix. Performance results are given on a variety of machines. 1 Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated [8, 11, 7]. The work presented in this paper is part of the PRISM (Parallel Research on Invariant Subspace Methods) Project, which involves researchers from Argonne National Laboratory, the Supercomputing Research Center, the University of California at Berkeley, and the University of Kent...
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
, 2010
"... Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In [4] lower bounds were presented on the amount of communication required for essentially all O(n 3)like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs. 1
Parallel Studies of the Invariant Subspace Decomposition Approach for Banded Symmetric Matrices
, 1995
"... We present an overview of the banded Invariant Subspace Decomposition Algorithm for symmetric matrices and describe a parallel implementation of this algorithm. The algorithm described here is a promising variant of the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA) ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We present an overview of the banded Invariant Subspace Decomposition Algorithm for symmetric matrices and describe a parallel implementation of this algorithm. The algorithm described here is a promising variant of the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA) that retains the property of using scalable primitives, while requiring significantly less overall computation than SYISDA. 1 Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated. The work presented in this paper is part of the PRISM (Parallel Research on Invariant Subspace Methods) Project, which involves researchers from Argonne National Laboratory, the Supercomputing Research Center, the University of California at Berkeley, and the University of Kentucky. The goal of the PRISM project is the development of algorithms and software for solving largescale eigenvalue problems ...
On Tridiagonalizing and Diagonalizing Symmetric Matrices with Repeated Eigenvalues
 PREPRINT ANL/MCSP54541095, MATHEMATICS AND COMPUTER SCIENCE DIVISION, ARGONNE NATIONAL LABORATORY
, 1995
"... We describe a divideandconquer tridiagonalization approach for matrices with repeated eigenvalues. Our algorithm hinges on the fact that, under easily constructively verifiable conditions, a symmetric matrix with bandwidth b and k distinct eigenvalues must be block diagonal with diagonal blocks ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We describe a divideandconquer tridiagonalization approach for matrices with repeated eigenvalues. Our algorithm hinges on the fact that, under easily constructively verifiable conditions, a symmetric matrix with bandwidth b and k distinct eigenvalues must be block diagonal with diagonal blocks of size at most bk. A slight modification of the usual orthogonal bandreduction algorithm allows us to reveal this structure, which then leads to potential parallelism in the form of independent diagonal blocks. Compared with the usual Householder reduction algorithm, the new approach exhibits improved data locality, significantly more scope for parallelism, and the potential to reduce arithmetic complexity by close to 50% for matrices that have only two numerically distinct eigenvalues. The actual improvement depends to a large extent on the number of distinct eigenvalues and a good estimate thereof. However, at worst the algorithm behaves like a successive bandreduction approach to tridia...
A Case Study of MPI: Portable and Efficient Libraries*
, 1995
"... In this paper, we discuss the performance achieved by several implementations of the recently defined Message Passing Interface (MPI) standard. In particular, performance results for different implementations of the broadcast operation are analyzed and compared on the Delta, Paragon, SP1 and CM5. 1 ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we discuss the performance achieved by several implementations of the recently defined Message Passing Interface (MPI) standard. In particular, performance results for different implementations of the broadcast operation are analyzed and compared on the Delta, Paragon, SP1 and CM5. 1 Introduction For the past several years, members of the Parallel Research on Invariant Subspace Methods (PRISM) project have been investigating scalable parallel eigensolvers for distributed memory systems [1, 3]. The ultimate objective of this research is the development of portable and efficient libraries for this fundamental numerical linear algebra kernel. In the course of our work, we, like many other library developers, have been faced with many issues relating to portable programming. Previously, a notable obstacle to library development was the lack of standardization in message passing, from both a programming and a functional point of view. This lack of standardization made it dif...
and Integrable Systems. Minimizing Communication for Eigenproblems and the Singular Value
, 2011
"... personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires pri ..."
Abstract
 Add to MetaCart
(Show Context)
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement This research is supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award #DIG0710227). Additional support comes from Par Lab