Results 1  10
of
18
A Framework for Symmetric Band Reduction
, 1999
"... this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d < b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser alg ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d < b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser algorithm, the band reduction algorithm is repeatedly used until the reduced matrix is tridiagonal. If d = b 1, it is the MHLalgorithm; and if d = 1 is used for each reduction step, it results in the Rutishauser algorithm. However, d need not be chosen this way; indeed, exploiting the freedom we have in choosing d leads to a class of algorithms for banded reduction and tridiagonalization with favorable computational properties. In particular, we can derive algorithms with
The PRISM Project: Infrastructure and Algorithms for Parallel Eigensolvers
, 1994
"... The goal of the PRISM project is the development of infrastructure and algorithms for the parallel solution of eigenvalue problems. We are currently investigating a complete eigensolver based on the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA). After briefly revie ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
The goal of the PRISM project is the development of infrastructure and algorithms for the parallel solution of eigenvalue problems. We are currently investigating a complete eigensolver based on the Invariant Subspace Decomposition Algorithm for dense symmetric matrices (SYISDA). After briefly reviewing SYISDA, we discuss the algorithmic highlights of a distributedmemory implementation of this approach. These include a fast matrixmatrix multiplication algorithm, a new approach to parallel band reduction and tridiagonalization, and a harness for coordinating the divideandconquer parallelism in the problem. We also present performance results of these kernels as well as the overall SYISDA implementation on the Intel Touchstone Delta prototype. 1. Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated [29, 24, 3, 27, 21]. The work presented in this paper is part of the PRI...
The SBR Toolbox  Software for Successive Band Reduction
, 1996
"... this paper. Their singleprecision twins are identical except for a leading "S" instead of "D" in the routine's name and REAL instead of DOUBLE PRECISION scalars and arrays in the parameter list. ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
this paper. Their singleprecision twins are identical except for a leading "S" instead of "D" in the routine's name and REAL instead of DOUBLE PRECISION scalars and arrays in the parameter list.
Efficient Eigenvalue and Singular Value Computations on Shared Memory Machines
, 1998
"... We describe two techniques for speeding up eigenvalue and singular value computations on shared memory parallel computers. Depending on the information that is required, different steps in the overall process can be made more efficient. If only the eigenvalues or singluar values are sought then the ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We describe two techniques for speeding up eigenvalue and singular value computations on shared memory parallel computers. Depending on the information that is required, different steps in the overall process can be made more efficient. If only the eigenvalues or singluar values are sought then the reduction to condensed form may be done in two or more steps to make best use of optimized level3 BLAS. If eigenvectors and/or singular vectors are required, too, then their accumulation can be sped up by another blocking technique. The efficiency of the blocked algorithms depends heavily on the values of certain control parameters. We also present a very simple performance model that allows selecting these parameters automatically. Keywords: Linear algebra; Eigenvalues and singular values; Reduction to condensed form; Hessenberg QR iteration; Blocked algorithms. 1 Introduction The problem of determining eigenvalues and associated eigenvectors (or singular values and vectors) of a matrix ...
Direct Solvers for Symmetric Eigenvalue Problems
 IN MODERN METHODS AND ALGORITHMS OF QUANTUM CHEMISTRY, J. GROTENDORST (EDITOR), PROCEEDINGS, NIC SERIES VOLUME
, 2000
"... ..."
A Study of the Invariant Subspace Decomposition Algorithm for Banded Symmetric Matrices
 in Proceedings of the Fifth SIAM Conference on Applied Linear Algebra
, 1994
"... In this paper, we give an overview of the Invariant Subspace Decomposition Algorithm for banded symmetric matrices and describe a sequential implementation of this algorithm. Our implementation uses a specialized routine for performing banded matrix multiplication together with successive band reduc ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
In this paper, we give an overview of the Invariant Subspace Decomposition Algorithm for banded symmetric matrices and describe a sequential implementation of this algorithm. Our implementation uses a specialized routine for performing banded matrix multiplication together with successive band reduction, yielding a sequential algorithm that is competitive for large problems with the LAPACK QR code in computing all of the eigenvalues and eigenvectors of a dense symmetric matrix. Performance results are given on a variety of machines. 1 Introduction Computation of eigenvalues and eigenvectors is an essential kernel in many applications, and several promising parallel algorithms have been investigated [8, 11, 7]. The work presented in this paper is part of the PRISM (Parallel Research on Invariant Subspace Methods) Project, which involves researchers from Argonne National Laboratory, the Supercomputing Research Center, the University of California at Berkeley, and the University of Kent...
DIVIDE & CONQUER ON HYBRID GPUACCELERATED MULTICORE SYSTEMS
"... Abstract. With the raw compute power of GPUs being more widely available in commodity multicore systems, there is an imminent need to harness their power for important numerical libraries such as LAPACK. In this paper, we consider the solution of dense symmetric and Hermitian eigenproblems by LAPACK ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract. With the raw compute power of GPUs being more widely available in commodity multicore systems, there is an imminent need to harness their power for important numerical libraries such as LAPACK. In this paper, we consider the solution of dense symmetric and Hermitian eigenproblems by LAPACK’s Divide & Conquer algorithm on such modern heterogeneous systems. We focus on how to make the best use of the individual strengths of the massively parallel manycore GPUs and multicore CPUs. The resulting algorithm overcomes performance bottlenecks that current implementations, optimized for a homogeneous multicore face. On a dual socket quadcore Intel Xeon 2.33 GHz with an NVIDIA GTX 280 GPU, we typically obtain up to about 10fold improvement in performance for the complete dense problem. The techniques described here thus represent an example on how to develop numerical software to efficiently use heterogeneous architectures. As heterogeneity becomes common in the architecture design, the significance and need of this work is expected to grow.
DIVIDE & CONQUER ON HYBRID GPUACCELERATED MULTICORE SYSTEMS
, 2012
"... With the raw compute power of GPUs being more widely available in commodity multicore systems, there is an imminent need to harness their power for important numerical libraries such as LAPACK. In this paper, we consider the solution of dense symmetric and Hermitian eigenproblems by LAPACK’s Divid ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
With the raw compute power of GPUs being more widely available in commodity multicore systems, there is an imminent need to harness their power for important numerical libraries such as LAPACK. In this paper, we consider the solution of dense symmetric and Hermitian eigenproblems by LAPACK’s Divide & Conquer algorithm on such modern heterogeneous systems. We focus on how to make the best use of the individual strengths of the massively parallel manycore GPUs and multicore CPUs. The resulting algorithm overcomes performance bottlenecks that current implementations, optimized for a homogeneous multicore face. On a dual socket quadcore Intel Xeon 2.33 GHz with an NVIDIA GTX 280 GPU, we typically obtain up to about 10fold improvement in performance for the complete dense problem. The techniques described here thus represent an example on how to develop numerical software to efficiently use heterogeneous architectures. As heterogeneity becomes common in the architecture design, the significance and need of this work is expected to grow.
A Framework for Symmetric Band Reduction
"... this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d ! b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser alg ..."
Abstract
 Add to MetaCart
this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d ! b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser algorithm, the band reduction algorithm is repeatedly used until the reduced matrix is tridiagonal. If d = b \Gamma 1, it is the MHLalgorithm; and if d = 1 is used for each reduction step, it results in the Rutishauser algorithm. However, d need not be chosen this way; indeed, exploiting the freedom we have in choosing d leads to a class of algorithms for banded reduction and tridiagonalization with favorable computational properties. In particular, we can derive algorithms with