Results 1 
9 of
9
A Framework for Symmetric Band Reduction
, 1999
"... this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d < b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d < b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser algorithm, the band reduction algorithm is repeatedly used until the reduced matrix is tridiagonal. If d = b 1, it is the MHLalgorithm; and if d = 1 is used for each reduction step, it results in the Rutishauser algorithm. However, d need not be chosen this way; indeed, exploiting the freedom we have in choosing d leads to a class of algorithms for banded reduction and tridiagonalization with favorable computational properties. In particular, we can derive algorithms with
Parallel Block Tridiagonalization of Real Symmetric Matrices
, 2006
"... Two parallel block tridiagonalization algorithms and implementations for dense real symmetric matrices are presented. Block tridiagonalization is a critical preprocessing step for the blocktridiagonal divideandconquer algorithm for computing eigensystems and is useful for many algorithms desirin ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Two parallel block tridiagonalization algorithms and implementations for dense real symmetric matrices are presented. Block tridiagonalization is a critical preprocessing step for the blocktridiagonal divideandconquer algorithm for computing eigensystems and is useful for many algorithms desiring the efficiencies of block structure in matrices. For an "effectively" sparse matrix, which frequently results from applications with strong locality properties, a heuristic parallel algorithm is used to transform it into a block tridiagonal matrix such that the eigenvalue errors remain bounded by some prescribed accuracy tolerance. For a dense matrix without any usable structure, orthogonal transformations are used to reduce it to block tridiagonal form using mostly level 3 BLAS operations. Numerical experiments show that blocktridiagonal structure obtained from this algorithm directly affects the computational complexity of the parallel blocktridiagonal divideandconquer eigensolver.
A DivideandConquer Method for Symmetric Banded Eigenproblems  Part I: Theoretical Results
, 1998
"... The two currently most interesting methods for solving symmetric tridiagonal eigenproblems on parallel computers are (i) divideandconquer methods, and (ii) the socalled \Holy Grail" method (Dhillon [10]). None of these methods has been generalized for nontridiagonal banded eigenproblems, i. ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The two currently most interesting methods for solving symmetric tridiagonal eigenproblems on parallel computers are (i) divideandconquer methods, and (ii) the socalled \Holy Grail" method (Dhillon [10]). None of these methods has been generalized for nontridiagonal banded eigenproblems, i. e., problems with band matrices having more than one lower and upper sidediagonal, in a stable and ecient way. The highly accurate eigenvalue calculation for tridiagonal matrices utilized in the \Holy Grail" method cannot be generalized directly to band matrices with more than three diagonals (Demmel and Gragg [9]). Thus, it is not clear at the moment whether the \Holy Grail" method can be adapted for nontridiagonal banded eigenproblems. Generalizations of divideandconquer methods for banded eigenproblems have been investigated to some extent (Arbenz [2], Arbenz and Golub [4], Arbenz et al. [3]). However, until now no methods have been published which are ecient and numerically stable as well...
TOWARD HIGH PERFORMANCE TILE DIVIDE AND CONQUER ALGORITHM FOR THE DENSE SYMMETRIC EIGENVALUE PROBLEM
, 2011
"... Classical solvers for the dense symmetric eigenvalue problem suffer from the first step involving a reduction to tridiagonal form that is dominated by the cost of accessing memory during the panel factorization. The solution is to reduce the matrix to a banded form, which then requires the eigenva ..."
Abstract
 Add to MetaCart
Classical solvers for the dense symmetric eigenvalue problem suffer from the first step involving a reduction to tridiagonal form that is dominated by the cost of accessing memory during the panel factorization. The solution is to reduce the matrix to a banded form, which then requires the eigenvalues of the banded matrix to be computed. The standard D&C algorithm can be modified for this purpose. The paper combines this insight with tile algorithms that can be scheduled via a dynamic runtime system to multicore architectures. A detailed analysis of performance and accuracy is included. Performance improvements of 14fold and 4fold speedups are reported relative to LAPACK and Intelâ€™s MKL library.
TOWARD HIGH PERFORMANCE DIVIDE AND CONQUER EIGENSOLVER FOR DENSE SYMMETRIC MATRICES
"... This paper presents a high performance eigensolver for dense symmetric matrices on multicore architectures. Based on the wellknown divide and conquer (D&C) methodology introduced by Cuppen, this algorithm computes all the eigenvalues of the symmetric matrix. The general D&C can be expresse ..."
Abstract
 Add to MetaCart
This paper presents a high performance eigensolver for dense symmetric matrices on multicore architectures. Based on the wellknown divide and conquer (D&C) methodology introduced by Cuppen, this algorithm computes all the eigenvalues of the symmetric matrix. The general D&C can be expressed in three stages: (1) Partitioning into subproblems, (2) Computing the solution of the subproblems and (3) Merging the subproblems. It is therefore wellsuited for data parallel algorithmic technics due to the number of independent computational tasks which can potentially run concurrently. In particular, tile algorithms have recently shown very promising performance results for solving linear systems of equations. The idea consists of splitting the input matrix into small square tiles and reorganizing the data within each tile to be contiguous in memory for efficient cache reuse. The authors propose to extend this idea to the D&C eigensolver algorithm. The tile DC (TD&C) eigensolver algorithm described in this paper takes a dense symmetric matrix in tile layout as input, reduces it to symmetric band form by applying orthogonal transformations and finally, applies the D&C approach on the symmetric band matrix to calculate all eigenvalues. The whole execution flow can then be represented as a directed acyclic graph where nodes are tasks and edges represent dependencies between them. A light weighted runtime system environment is used to dynamically schedule the different tasks in order to ensure the data dependencies are not violated. The tasks are scheduled in an outoforder fashion with a special emphasis on the data locality and the pursuit of the critical path.
and Integrable Systems. Minimizing Communication for Eigenproblems and the Singular Value
, 2011
"... personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires pri ..."
Abstract
 Add to MetaCart
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement This research is supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award #DIG0710227). Additional support comes from Par Lab
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
, 2010
"... Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we ..."
Abstract
 Add to MetaCart
Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In [4] lower bounds were presented on the amount of communication required for essentially all O(n 3)like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs. 1
A Parallel Implementation of Symmetric Band Reduction Using PLAPACK
, 1996
"... Successive band reduction (SBR) is a twophase approach for reducing a full symmetric matrix to tridiagonal (or narrow banded) form. In its simplest case, it consists of a fulltoband reduction followed by a bandto tridiagonal reduction. Its richness in BLAS3 operations makes it potentially more ..."
Abstract
 Add to MetaCart
Successive band reduction (SBR) is a twophase approach for reducing a full symmetric matrix to tridiagonal (or narrow banded) form. In its simplest case, it consists of a fulltoband reduction followed by a bandto tridiagonal reduction. Its richness in BLAS3 operations makes it potentially more efficient on highperformance architectures than the traditional tridiagonalization method. However, a scalable, portable, generalpurpose parallel implementation of SBR is still not available. In this article, we review some existing parallel tridiagonalization routines and describe the implementation of a fulltoband reduction routine using PLAPACK as a first step toward a parallel SBR toolbox. The PLAPACKbased routine turns out to be simple and efficient and, unlike the other existing packages, does not suffer restrictions on physical data layout or algorithmic block size. 1 Introduction Reducing a full, dense symmetric matrix to tridiagonal form is one of the key steps in computing eig...
A Framework for Symmetric Band Reduction
"... this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d ! b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser alg ..."
Abstract
 Add to MetaCart
this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d ! b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser algorithm, the band reduction algorithm is repeatedly used until the reduced matrix is tridiagonal. If d = b \Gamma 1, it is the MHLalgorithm; and if d = 1 is used for each reduction step, it results in the Rutishauser algorithm. However, d need not be chosen this way; indeed, exploiting the freedom we have in choosing d leads to a class of algorithms for banded reduction and tridiagonalization with favorable computational properties. In particular, we can derive algorithms with