Results 1  10
of
25
A Framework for Symmetric Band Reduction
, 1999
"... this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d < b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
(Show Context)
this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d < b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser algorithm, the band reduction algorithm is repeatedly used until the reduced matrix is tridiagonal. If d = b 1, it is the MHLalgorithm; and if d = 1 is used for each reduction step, it results in the Rutishauser algorithm. However, d need not be chosen this way; indeed, exploiting the freedom we have in choosing d leads to a class of algorithms for banded reduction and tridiagonalization with favorable computational properties. In particular, we can derive algorithms with
Parallel performance of a symmetric eigensolver based on the invariant subspace decomposition approach
 In Scalable High Performance Computing Conference
, 1994
"... ..."
Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated FineGrained and MemoryAware Kernels
"... This paper introduces a novel implementation in reducing a symmetric dense matrix to tridiagonal form, which is the preprocessing step toward solving symmetric eigenvalue problems. Based on tile algorithms, the reduction follows a twostage approach, where the tile matrix is first reduced to symmetr ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
(Show Context)
This paper introduces a novel implementation in reducing a symmetric dense matrix to tridiagonal form, which is the preprocessing step toward solving symmetric eigenvalue problems. Based on tile algorithms, the reduction follows a twostage approach, where the tile matrix is first reduced to symmetric band form prior to the final condensed structure. The challenging tradeoff between algorithmic performance and task granularity has been tackled through a grouping technique, which consists of aggregating finegrained and memoryaware computational tasks during both stages, while sustaining the applications overall high performance. A dynamic runtime environment system then schedules the different tasks in an outoforder fashion. The performance for the tridiagonal reduction reported in this paper is unprecedented. Our implementation results in up to 50fold and 12fold improvement (130 Gflop/s) compared to the equivalent routines from LAPACK V3.2 and Intel MKL V10.3, respectively, on an eight socket hexacore AMD Opteron multicore sharedmemory system with a matrix size of 24000 × 24000. 1.
The SBR Toolbox  Software for Successive Band Reduction
, 1996
"... this paper. Their singleprecision twins are identical except for a leading "S" instead of "D" in the routine's name and REAL instead of DOUBLE PRECISION scalars and arrays in the parameter list. ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
this paper. Their singleprecision twins are identical except for a leading "S" instead of "D" in the routine's name and REAL instead of DOUBLE PRECISION scalars and arrays in the parameter list.
Efficient Eigenvalue and Singular Value Computations on Shared Memory Machines
, 1998
"... We describe two techniques for speeding up eigenvalue and singular value computations on shared memory parallel computers. Depending on the information that is required, different steps in the overall process can be made more efficient. If only the eigenvalues or singluar values are sought then the ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
We describe two techniques for speeding up eigenvalue and singular value computations on shared memory parallel computers. Depending on the information that is required, different steps in the overall process can be made more efficient. If only the eigenvalues or singluar values are sought then the reduction to condensed form may be done in two or more steps to make best use of optimized level3 BLAS. If eigenvectors and/or singular vectors are required, too, then their accumulation can be sped up by another blocking technique. The efficiency of the blocked algorithms depends heavily on the values of certain control parameters. We also present a very simple performance model that allows selecting these parameters automatically. Keywords: Linear algebra; Eigenvalues and singular values; Reduction to condensed form; Hessenberg QR iteration; Blocked algorithms. 1 Introduction The problem of determining eigenvalues and associated eigenvectors (or singular values and vectors) of a matrix ...
Direct Solvers for Symmetric Eigenvalue Problems
 IN MODERN METHODS AND ALGORITHMS OF QUANTUM CHEMISTRY, J. GROTENDORST (EDITOR), PROCEEDINGS, NIC SERIES VOLUME
, 2000
"... ..."
A Study of the Invariant Subspace Decomposition Algorithm for Banded Symmetric Matrices
 in Proceedings of the Fifth SIAM Conference on Applied Linear Algebra
, 1994
"... In this paper, we give an overview of the Invariant Subspace Decomposition Algorithm for banded symmetric matrices and describe a sequential implementation of this algorithm. Our implementation uses a specialized routine for performing banded matrix multiplication together with successive band reduc ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we give an overview of the Invariant Subspace Decomposition Algorithm for banded symmetric matrices and describe a sequential implementation of this algorithm. Our implementation uses a specialized routine for performing banded matrix multiplication together with successive band reduction, yielding a sequential algorithm that is competitive for large problems with the LAPACK QR code in computing all of the eigenvalues and eigenvectors of a dense symmetric matrix. Performance results are given on a variety of machines. 1 Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated [8, 11, 7]. The work presented in this paper is part of the PRISM (Parallel Research on Invariant Subspace Methods) Project, which involves researchers from Argonne National Laboratory, the Supercomputing Research Center, the University of California at Berkeley, and the University of Kent...
HighPerformance Computing for Exact Numerical Approaches to Quantum ManyBody Problems on the Earth Simulator
 SC2006
, 2006
"... In order to study intriguing features of quantum manybody problems, we develop two matrix diagonalization codes,
one of which solves only the ground state including a few
excitation states, and another of which does all quantum
states. The target model in both codes is the Hubbard
model with confin ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
In order to study intriguing features of quantum manybody problems, we develop two matrix diagonalization codes,
one of which solves only the ground state including a few
excitation states, and another of which does all quantum
states. The target model in both codes is the Hubbard
model with confinement potential which describes an atomic
Fermi gas loaded on an optical lattice and partly HighTc
cuprate superconductors. For the former code, we make a
parallel tuning to attain the best performance and to expand
the matrix size limitation on the Earth Simulator. Conse
quently, we obtain 18.692TFlops (57 % of the peak) as the
best performance when calculating the ground state of 100
billiondimensional matrix. From these largescale calcula
tions, we find that the confinement effect leads to atomic
scale inhomogeneous superfluidity which is a new challeng
ing subject for physicists. For the latter code, we develop or install the best three routines on three calculation stages and succeed in solving the matrix whose dimension is 375,000
with 18.396TFlops (locally 24.613TFlops and 75% of the
peak). The numerical calculations reveal a novel quantum
feature, i.e., a change from Schrödinger's cat to classical one can be controlled by tuning the interaction. This is a marked contrast to the general concept that the change occurs with increasing the system size.
Parallel Block Tridiagonalization of Real Symmetric Matrices
, 2006
"... Two parallel block tridiagonalization algorithms and implementations for dense real symmetric matrices are presented. Block tridiagonalization is a critical preprocessing step for the blocktridiagonal divideandconquer algorithm for computing eigensystems and is useful for many algorithms desirin ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Two parallel block tridiagonalization algorithms and implementations for dense real symmetric matrices are presented. Block tridiagonalization is a critical preprocessing step for the blocktridiagonal divideandconquer algorithm for computing eigensystems and is useful for many algorithms desiring the efficiencies of block structure in matrices. For an "effectively" sparse matrix, which frequently results from applications with strong locality properties, a heuristic parallel algorithm is used to transform it into a block tridiagonal matrix such that the eigenvalue errors remain bounded by some prescribed accuracy tolerance. For a dense matrix without any usable structure, orthogonal transformations are used to reduce it to block tridiagonal form using mostly level 3 BLAS operations. Numerical experiments show that blocktridiagonal structure obtained from this algorithm directly affects the computational complexity of the parallel blocktridiagonal divideandconquer eigensolver.