Results 11  20
of
20
Parallel Block Tridiagonalization of Real Symmetric Matrices
, 2006
"... Two parallel block tridiagonalization algorithms and implementations for dense real symmetric matrices are presented. Block tridiagonalization is a critical preprocessing step for the blocktridiagonal divideandconquer algorithm for computing eigensystems and is useful for many algorithms desirin ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Two parallel block tridiagonalization algorithms and implementations for dense real symmetric matrices are presented. Block tridiagonalization is a critical preprocessing step for the blocktridiagonal divideandconquer algorithm for computing eigensystems and is useful for many algorithms desiring the efficiencies of block structure in matrices. For an "effectively" sparse matrix, which frequently results from applications with strong locality properties, a heuristic parallel algorithm is used to transform it into a block tridiagonal matrix such that the eigenvalue errors remain bounded by some prescribed accuracy tolerance. For a dense matrix without any usable structure, orthogonal transformations are used to reduce it to block tridiagonal form using mostly level 3 BLAS operations. Numerical experiments show that blocktridiagonal structure obtained from this algorithm directly affects the computational complexity of the parallel blocktridiagonal divideandconquer eigensolver.
Efficiency Enhancement by Blocking
, 1998
"... When developing high performance algorithms blocking is a standard procedure to increase the locality of reference. Conflicting factors which influence the choice of blocking parameters are described in this paper. These factors include cache size, load balancing, memory overhead, algorithmic issues ..."
Abstract
 Add to MetaCart
When developing high performance algorithms blocking is a standard procedure to increase the locality of reference. Conflicting factors which influence the choice of blocking parameters are described in this paper. These factors include cache size, load balancing, memory overhead, algorithmic issues, and others. Optimal block sizes can be determined with respect to each of these factors. The resulting block sizes are independent of each other and can be implemented in several levels of blocking within a program. A tridiagonalization algorithm serves as an example to illustrate several blocking techniques. The work described in this paper was supported by the Special Research Program SFB F011 "AURORA" of the Austrian Science Fund. 1 INTRODUCTION 2 1 Introduction In modern computer systems the gap between raw computing power and memory throughput increases tremendously. This leads to the effect that for many algorithms the processors are not running at full speed since they have to w...
A Framework for Symmetric Band Reduction
"... this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d ! b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser alg ..."
Abstract
 Add to MetaCart
this paper, we generalize the ideas behind the RSalgorithms and the MHLalgorithm. We develop a band reduction algorithm that eliminates d subdiagonals of a symmetric banded matrix with semibandwidth b (d ! b), in a fashion akin to the MHL tridiagonalization algorithm. Then, like the Rutishauser algorithm, the band reduction algorithm is repeatedly used until the reduced matrix is tridiagonal. If d = b \Gamma 1, it is the MHLalgorithm; and if d = 1 is used for each reduction step, it results in the Rutishauser algorithm. However, d need not be chosen this way; indeed, exploiting the freedom we have in choosing d leads to a class of algorithms for banded reduction and tridiagonalization with favorable computational properties. In particular, we can derive algorithms with
OutofCore Solution of Large Symmetric Eigenproblems
, 1999
"... This paper describes a prototype implementation of a solver for dense symmetric eigenproblems, which are too large to t into the main memory. 1. Introduction ..."
Abstract
 Add to MetaCart
This paper describes a prototype implementation of a solver for dense symmetric eigenproblems, which are too large to t into the main memory. 1. Introduction
HighPerformance Computing for Exact Numerical Approaches to Quantum ManyBody Problems on the Earth Simulator
 SC2006
, 2006
"... In order to study intriguing features of quantum manybody problems, we develop two matrix diagonalization codes,
one of which solves only the ground state including a few
excitation states, and another of which does all quantum
states. The target model in both codes is the Hubbard
model with confin ..."
Abstract
 Add to MetaCart
In order to study intriguing features of quantum manybody problems, we develop two matrix diagonalization codes,
one of which solves only the ground state including a few
excitation states, and another of which does all quantum
states. The target model in both codes is the Hubbard
model with confinement potential which describes an atomic
Fermi gas loaded on an optical lattice and partly HighTc
cuprate superconductors. For the former code, we make a
parallel tuning to attain the best performance and to expand
the matrix size limitation on the Earth Simulator. Conse
quently, we obtain 18.692TFlops (57 % of the peak) as the
best performance when calculating the ground state of 100
billiondimensional matrix. From these largescale calcula
tions, we find that the confinement effect leads to atomic
scale inhomogeneous superfluidity which is a new challeng
ing subject for physicists. For the latter code, we develop or install the best three routines on three calculation stages and succeed in solving the matrix whose dimension is 375,000
with 18.396TFlops (locally 24.613TFlops and 75% of the
peak). The numerical calculations reveal a novel quantum
feature, i.e., a change from Schrödinger's cat to classical one can be controlled by tuning the interaction. This is a marked contrast to the general concept that the change occurs with increasing the system size.
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
, 2010
"... Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we ..."
Abstract
 Add to MetaCart
Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In [4] lower bounds were presented on the amount of communication required for essentially all O(n 3)like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs. 1
www.elsevier.com/locate/jpdc Parallel block tridiagonalization of real symmetric matrices �
, 2007
"... Two parallel block tridiagonalization algorithms and implementations for dense real symmetric matrices are presented. Block tridiagonalization is a critical preprocessing step for the block tridiagonal divideandconquer algorithm for computing eigensystems and is useful for many algorithms desirin ..."
Abstract
 Add to MetaCart
Two parallel block tridiagonalization algorithms and implementations for dense real symmetric matrices are presented. Block tridiagonalization is a critical preprocessing step for the block tridiagonal divideandconquer algorithm for computing eigensystems and is useful for many algorithms desiring the efficiencies of block structure in matrices. For an “effectively ” sparse matrix, which frequently results from applications with strong locality properties, a heuristic parallel algorithm is used to transform it into a block tridiagonal matrix such that the eigenvalue errors remain bounded by some prescribed accuracy tolerance. For a dense matrix without any usable structure, orthogonal transformations are used to reduce it to block tridiagonal form using mostly level 3 BLAS operations. Numerical experiments show that block tridiagonal structure obtained from this algorithm directly affects the computational complexity of the parallel block tridiagonal divideandconquer eigensolver. Reduction to block tridiagonal form provides significantly lower execution times, as well as memory traffic and communication cost, over the traditional reduction to tridiagonal form for eigensystem computations.
and Integrable Systems. Minimizing Communication for Eigenproblems and the Singular Value
, 2011
"... personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires pri ..."
Abstract
 Add to MetaCart
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement This research is supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award #DIG0710227). Additional support comes from Par Lab
Families of Algorithms for Reducing a Matrix to Condensed Form
"... In a recent paper it was shown how memory traffic can be diminished by reformulating the classic algorithm for reducing a matrix to bidiagonal form, a preprocess when computing the singular values of a dense matrix. The key is a reordering of the computation so that the most memoryintensive operati ..."
Abstract
 Add to MetaCart
In a recent paper it was shown how memory traffic can be diminished by reformulating the classic algorithm for reducing a matrix to bidiagonal form, a preprocess when computing the singular values of a dense matrix. The key is a reordering of the computation so that the most memoryintensive operations can be “fused”. In this paper, we show that other operations that reduce matrices to condensed form (reduction to upper Hessenberg form and reduction to tridiagonal form) can be similarly reorganized, yielding different sets of operations that can be fused. By developing the algorithms with a common framework and notation, we facilitate the comparing and contrasting of the different algorithms and opportunities for optimization on sequential architectures. We discuss the algorithms, develop a simple model to estimate the speedup potential from fusing, and showcase performance improvements consistent with the what the model predicts.