Results 1  10
of
53
Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I
, 1993
"... The dense nonsymmetric eigenproblem is one of the hardest linear algebra problems to solve effectively on massively parallel machines. Rather than trying to design a "black box" eigenroutine in the spirit of EISPACK or LAPACK, we propose building a toolbox for this problem. The tools are meant to ..."
Abstract

Cited by 63 (14 self)
 Add to MetaCart
The dense nonsymmetric eigenproblem is one of the hardest linear algebra problems to solve effectively on massively parallel machines. Rather than trying to design a "black box" eigenroutine in the spirit of EISPACK or LAPACK, we propose building a toolbox for this problem. The tools are meant to be used in different combinations on different problems and architectures. In this paper, we will describe these tools which include basic block matrix computations, the matrix sign function, 2dimensional bisection, and spectral divide and conquer using the matrix sign function to find selected eigenvalues. We also outline how we deal with illconditioning and potential instability. Numerical examples are included. A future paper will discuss error analysis in detail and extensions to the generalized eigenproblem.
Inverse free parallel spectral divide and conquer algorithms for nonsymmetric eigenproblems
 Numer. Math
, 1994
"... We discuss two inverse free, highly parallel, spectral divide and conquer algorithms: one for computing an invariant subspace of a nonsymmetric matrix and another one for computing left and right de ating subspaces of a regular matrix pencil A, B. These two closely related algorithms are based on ea ..."
Abstract

Cited by 61 (12 self)
 Add to MetaCart
We discuss two inverse free, highly parallel, spectral divide and conquer algorithms: one for computing an invariant subspace of a nonsymmetric matrix and another one for computing left and right de ating subspaces of a regular matrix pencil A, B. These two closely related algorithms are based on earlier ones of Bulgakov, Godunov and Malyshev, but improve on them in several ways. These algorithms only use easily parallelizable linear algebra building blocks: matrix multiplication and QR decomposition. The existing parallel algorithms for the nonsymmetric eigenproblem use the matrix sign function, which is faster but can be less stable than the new algorithm. Appears also as
Algorithms for Intersecting Parametric and Algebraic Curves I: Simple Intersections
 ACM Transactions on Graphics
, 1995
"... : The problem of computing the intersection of parametric and algebraic curves arises in many applications of computer graphics and geometric and solid modeling. Previous algorithms are based on techniques from elimination theory or subdivision and iteration. The former is however, restricted to low ..."
Abstract

Cited by 59 (18 self)
 Add to MetaCart
: The problem of computing the intersection of parametric and algebraic curves arises in many applications of computer graphics and geometric and solid modeling. Previous algorithms are based on techniques from elimination theory or subdivision and iteration. The former is however, restricted to low degree curves. This is mainly due to issues of efficiency and numerical stability. In this paper we use elimination theory and express the resultant of the equations of intersection as a matrix determinant. The matrix itself rather than its symbolic determinant, a polynomial, is used as the representation. The problem of intersection is reduced to computing the eigenvalues and eigenvectors of a numeric matrix. The main advantage of this approach lies in its efficiency and robustness. Moreover, the numerical accuracy of these operations is well understood. For almost all cases we are able to compute accurate answers in 64 bit IEEE floating point arithmetic. Keywords: Intersection, curves, a...
A ThreeDimensional Approach to Parallel Matrix Multiplication
 IBM Journal of Research and Development
, 1995
"... A threedimensional (3D) matrix multiplication algorithm for massively parallel processing systems is presented. The P processors are configured as a "virtual" processing cube with dimensions p 1 , p 2 , and p 3 proportional to the matrices' dimensionsM , N , and K. Each processor performs a sin ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
A threedimensional (3D) matrix multiplication algorithm for massively parallel processing systems is presented. The P processors are configured as a "virtual" processing cube with dimensions p 1 , p 2 , and p 3 proportional to the matrices' dimensionsM , N , and K. Each processor performs a single local matrix multiplication of size M=p 1 \Theta N=p 2 \Theta K=p 3 . Before the local computation can be carried out, each subcube must receive a single submatrix of A and B. After the single matrix multiplication has completed, K=p 3 submatrices of this product must be sent to their respective destination processors and then summed together with the resulting matrix C. The 3D parallel matrix multiplication approach has a factor P 1=6 less communication than the 2D parallel algorithms. This algorithm has been implemented on IBM POWERparallel TM SP2 TM systems (up to 216 nodes) and has yielded close to the peak performance of the machine. The algorithm has been combined with Winog...
Large Dense Numerical Linear Algebra in 1993: The Parallel Computing Influence
 International Journal Supercomputer Applications
, 1994
"... This paper surveys the current state of applications of large dense numerical linear algebra, and the influence of parallel computing. Furthermore, we attempt to crystalize many important ideas that we feel have been sometimes been misunderstood in the rush to write fast programs. 1 Introduction Th ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
This paper surveys the current state of applications of large dense numerical linear algebra, and the influence of parallel computing. Furthermore, we attempt to crystalize many important ideas that we feel have been sometimes been misunderstood in the rush to write fast programs. 1 Introduction This paper represents my continuing efforts to track the status of large dense linear algebra problems. The goal is to shatter the barriers that separate the various interested communities while commenting on the influence of parallel computing. A secondary goal is to crystalize the most important ideas that have all too often been obscured by the details of machines and algorithms. Parallel supercomputing is in the spotlight. In the race towards the proliferation of papers on person X's experiences with machine Y (and why his algorithm runs faster than person Z's), sometimes we have lost sight of the applications for which these algorithms are meant to be useful. This paper concentrates on la...
The spectral decomposition of nonsymmetric matrices on distributed memory parallel computers
 SIAM J. Sci. Comput
, 1997
"... Abstract. The implementation and performance of a class of divideandconquer algorithms for computing the spectral decomposition of nonsymmetric matrices on distributed memory parallel computers are studied in this paper. After presenting a general framework, we focus on a spectral divideandconqu ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
Abstract. The implementation and performance of a class of divideandconquer algorithms for computing the spectral decomposition of nonsymmetric matrices on distributed memory parallel computers are studied in this paper. After presenting a general framework, we focus on a spectral divideandconquer (SDC) algorithm with Newton iteration. Although the algorithm requires several times as many floating point operations as the best serial QR algorithm, it can be simply constructed from a small set of highly parallelizable matrix building blocks within Level 3 basic linear algebra subroutines (BLAS). Efficient implementations of these building blocks are available on a wide range of machines. In some illconditioned cases, the algorithm may lose numerical stability, but this can easily be detected and compensated for. The algorithm reached 31 % efficiency with respect to the underlying PUMMA matrix multiplication and 82 % efficiency with respect to the underlying ScaLAPACK matrix inversion on a 256 processor Intel Touchstone Delta system, and 41 % efficiency with respect to the matrix multiplication in CMSSL on a 32 node Thinking Machines CM5 with vector units. Our performance model predicts the performance reasonably accurately. To take advantage of the geometric nature of SDC algorithms, we have designed a graphical user interface to let the user choose the spectral decomposition according to specified regions in the complex plane.
The Multicomputer Toolbox Approach to Concurrent BLAS
 Proc. Scalable High Performance Computing Conf. (SHPCC
, 1993
"... Concurrent Basic Linear Algebra Subprograms (CBLAS) are a sensible approach to extending the successful Basic Linear Algebra Subprograms (BLAS) to multicomputers. We describe many of the issues involved in generalpurpose CBLAS. Algorithms for dense matrixvector and matrixmatrix multiplication on ..."
Abstract

Cited by 28 (8 self)
 Add to MetaCart
Concurrent Basic Linear Algebra Subprograms (CBLAS) are a sensible approach to extending the successful Basic Linear Algebra Subprograms (BLAS) to multicomputers. We describe many of the issues involved in generalpurpose CBLAS. Algorithms for dense matrixvector and matrixmatrix multiplication on general P \Theta Q logical process grids are presented, and experiments run demonstrating their performance characteristics. This work was supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy. Work performed under the auspices of the U. S. Department of Energy by the Lawrence Livermore National Laboratory under contract No. W7405ENG48. Submitted to the Concurrency: Practice & Experience. y Address correspondence to: Mississippi State University, Engineering Research Center, PO Box 6176, Mississippi State, MS 39762. 6013258435. tony@cs.msstate.edu. Falgout, Skjellum, Smith & Still  The Multicomputer Toolbo...
A Serial Implementation of Cuppen's Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem
, 1994
"... This report discusses a serial implementation of Cuppen's divide and conquer algorithm for computing all eigenvalues and eigenvectors of a real symmetric matrix T = Q Q T. This method is compared with the LAPACK implementations of QR, bisection/inverse iteration, and rootfree QR/inverse iteration t ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
This report discusses a serial implementation of Cuppen's divide and conquer algorithm for computing all eigenvalues and eigenvectors of a real symmetric matrix T = Q Q T. This method is compared with the LAPACK implementations of QR, bisection/inverse iteration, and rootfree QR/inverse iteration to nd all of the eigenvalues and eigenvectors. On a DEC Alpha using optimized Basic Linear Algebra Subroutines (BLAS), divide and conquer was uniformly the fastest algorithm by a large margin for large tridiagonal eigenproblems. When Fortran BLAS were used, bisection/inverse iteration was somewhat faster (up to a factor of 2) for very large matrices (n 500) without clustered eigenvalues. When eigenvalues were clustered, divide and conquer was up to 80 times faster. The speedups over QR were so large in the tridiagonal case that the overall problem, including reduction to tridiagonal form, sped up by a factor of 2.5 over QR for n 500. Nearly universally, the matrix of eigenvectors generated by divide and con
The performance of finding eigenvalues and eigenvectors of dense symmetric matrices on distributed memory computers
 In Proceedings of the Seventh SIAM Conference on Parallel Proceesing for Scientific Computing. SIAM
, 1994
"... We discuss timing and performance modeling of a routine to nd all the eigenvalues and eigenvectors of a dense symmetric matrix on distributed memory computers. The routine, PDSYEVX, is part of the ScaLAPACK library. It is based on bisection and inverse iteration, but is not designed to guarantee ort ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
We discuss timing and performance modeling of a routine to nd all the eigenvalues and eigenvectors of a dense symmetric matrix on distributed memory computers. The routine, PDSYEVX, is part of the ScaLAPACK library. It is based on bisection and inverse iteration, but is not designed to guarantee orthogonality of eigenvectors in the presence of clustered eigenvalues. We use our validated performance model to conclude that PDSYEVX is very e cient for large enough problem sizes, nearly independently of input and output data layouts. However, e ciency will be low ifinterprocessor communication is too slow, such asonconventional workstation networks, or if per processor memory is too small, such as on the Intel Gamma. Modeling also helps us choose the appropriate algorithm to deal with clusters. 1
A Parallel Version of the Unsymmetric Lanczos Algorithm and its Application to QMR
, 1996
"... A new version of the unsymmetric Lanczos algorithm without lookahead is described combining elements of numerical stability and parallel algorithm design. Firstly, stability is obtained by a coupled twoterm procedure that generates Lanczos vectors scaled to unit length. Secondly, the algorithm is ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
A new version of the unsymmetric Lanczos algorithm without lookahead is described combining elements of numerical stability and parallel algorithm design. Firstly, stability is obtained by a coupled twoterm procedure that generates Lanczos vectors scaled to unit length. Secondly, the algorithm is derived by making all inner products of a single iteration step independent such that global synchronization on parallel distributed memory computers is reduced. Among the algorithms using the Lanczos process as a major component, the quasiminimal residual (QMR) method for the solution of systems of linear equations is illustrated by an elegant derivation. The resulting QMR algorithm maintains the favorable properties of the Lanczos algorithm while not increasing computational costs as compared with its corresponding original version.