## Parallel Tridiagonalization through Two-Step Band Reduction (1994)

Venue: | In Proceedings of the Scalable High-Performance Computing Conference |

Citations: | 22 - 12 self |

### BibTeX

@INPROCEEDINGS{Bischof94paralleltridiagonalization,

author = {Christian Bischof and Bruno Lang and Xiaobai Sun},

title = {Parallel Tridiagonalization through Two-Step Band Reduction},

booktitle = {In Proceedings of the Scalable High-Performance Computing Conference},

year = {1994},

pages = {23--27},

publisher = {IEEE Computer Society Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present a two-step variant of the "successive band reduction" paradigm for the tridiagonalization of symmetric matrices. Here we reduce a full matrix first to narrow-banded form and then to tridiagonal form. The first step allows easy exploitation of block orthogonal transformations. In the second step, we employ a new blocked version of a banded matrix tridiagonalization algorithm by Lang. In particular, we are able to express the update of the orthogonal transformation matrix in terms of block transformations. This expression leads to an algorithm that is almost entirely based on BLAS-3 kernels and has greatly improved data movement and communication characteristics. We also present some performance results on the Intel Touchstone DELTA and the IBM SP1. 1 Introduction Reduction to tridiagonal form is a major step in eigenvalue computations for symmetric matrices. If the matrix is full, the conventional Householder tridiagonalization approach [9, p. 276] or a block variant thereof...

### Citations

421 |
LAPACK Users' Guide
- Anderson
- 1999
(Show Context)
Citation Context ...ransformation matrix Q is stored in the same 2-D block torus wrap mapping as before. We also mention that, unlike the original code used in [13], we employ the same packed storage scheme as in LAPACK =-=[1]-=-, which allows us to formulate the packed storage block algorithm more succinctly, and naturally employs BLAS-2 kernels. We also mention that in addition to increasing the computational efficiency on ... |

107 |
The WY representation for products of householder matrices
- Bischof, Loan
- 1987
(Show Context)
Citation Context ...uccessive band reduction" framework suggested by Bischof and Sun [4]. We first reduce the dense matrix to bandwidthsnb using block orthogonal transformations employing the so-called WY representa=-=tion [5]-=-; this is described in Section 2. The remaining narrow-banded matrix is then reduced to tridiagonal form using a new variant of an algorithm originally suggested in [13]. In particular, we have devise... |

78 | Block reduction of matrices to condensed forms for eigenvalue computation
- Dongarra, Hammarling, et al.
- 1989
(Show Context)
Citation Context ...diagonal form is a major step in eigenvalue computations for symmetric matrices. If the matrix is full, the conventional Householder tridiagonalization approach [9, p. 276] or a block variant thereof =-=[8]-=- is the method of choice. These two apThis work was supported by the Applied and Computational Mathematics Program, Advanced Research Projects Agency, under contract DM28E04120, and by the Office of S... |

60 |
A storage efficient WY representation for products of householder transformations
- Schreiber, Loan
- 1989
(Show Context)
Citation Context ...ansformations express the productsH 1 \Delta \Delta \Delta H k of k Householder transformations H i = I \Gamma u i u T i ; u i 2 R n as a rank-k update of the form I \Gamma WY T [5] or I \Gamma WSW T =-=[14]-=-, where W;Y 2 R n\Thetak and S 2 R k\Thetak . As a result, the application of these transformations now involves BLAS-3 operations, such as matrix-matrix multiplication, which perform efficiently on c... |

43 |
Chameleon parallel programming tools users manual
- Gropp, Smith
- 1993
(Show Context)
Citation Context ...h are expressed with the WY representation [5]. To develop a portable code, and to allow a maintainable implementation, we chose to base our implementation on the Chameleon parallel programming tools =-=[11]-=-. The performance of this code is promising. Its performance on the reduction of a full random matrix to bandwidth 10, including the accumulation of orthogonal transformations, on the Intel Touchstone... |

30 |
de Geijn. Reduction to condensed form for the eigenvalue problem on distributed memory architectures
- Dongarra, van
- 1992
(Show Context)
Citation Context ...28E04120, and by the Office of Scientific Computing, U.S. Department of Energy, under Contract W-31-109-Eng-38. proaches also underlie the parallel implementations described, for example, in [12] and =-=[6]. The appr-=-oach described in this paper, on the other hand, is a two-step instantiation of the "successive band reduction" framework suggested by Bischof and Sun [4]. We first reduce the dense matrix t... |

18 |
de geijn. Lapack for distributed memory architecture progress report
- Anderson, Benzoni, et al.
- 1991
(Show Context)
Citation Context ...ons in a block orthogonal transformation, and only then access the remainder of A. This approach reduces data accesses by a factor of nb and has been shown to perform efficiently on parallel machines =-=[3, 2]-=- and in out-of-core factorization approaches [10]. We have implemented this algorithm using a twodimensional block torus wrapping. The block size nb of the block torus wrapping is also the block size ... |

10 |
H.D.: Solution of large, dense symmetric generalized eigenvalue problems using secondary storage
- Grimes, Simon
- 1988
(Show Context)
Citation Context ...then access the remainder of A. This approach reduces data accesses by a factor of nb and has been shown to perform efficiently on parallel machines [3, 2] and in out-of-core factorization approaches =-=[10]-=-. We have implemented this algorithm using a twodimensional block torus wrapping. The block size nb of the block torus wrapping is also the block size used for orthogonal transformations, which are ex... |

6 |
The torus-wrap mapping for dense matrix calculations on massively parallel computers
- Hendrikson, Womble
- 1992
(Show Context)
Citation Context ...ntract DM28E04120, and by the Office of Scientific Computing, U.S. Department of Energy, under Contract W-31-109-Eng-38. proaches also underlie the parallel implementations described, for example, in =-=[12] and [6]. -=-The approach described in this paper, on the other hand, is a two-step instantiation of the "successive band reduction" framework suggested by Bischof and Sun [4]. We first reduce the dense ... |

5 |
A framework for band reduction and tridiagonalization of symmetric matrices
- Bischof, Sun
- 1992
(Show Context)
Citation Context ...described, for example, in [12] and [6]. The approach described in this paper, on the other hand, is a two-step instantiation of the "successive band reduction" framework suggested by Bischo=-=f and Sun [4]-=-. We first reduce the dense matrix to bandwidthsnb using block orthogonal transformations employing the so-called WY representation [5]; this is described in Section 2. The remaining narrow-banded mat... |

2 |
A parallel algorithm for reducing symmetric matrices banded matrices to tridiagonal form
- Lang
- 1993
(Show Context)
Citation Context ...the so-called WY representation [5]; this is described in Section 2. The remaining narrow-banded matrix is then reduced to tridiagonal form using a new variant of an algorithm originally suggested in =-=[13]-=-. In particular, we have devised a way of blocking the orthogonal transformations. The new algorithm is described in Section 3. The reason for considering this two-step approach is that in our experie... |

1 |
A pipelined block QR decomposition algorithm
- Bischof
- 1989
(Show Context)
Citation Context ...ons in a block orthogonal transformation, and only then access the remainder of A. This approach reduces data accesses by a factor of nb and has been shown to perform efficiently on parallel machines =-=[3, 2]-=- and in out-of-core factorization approaches [10]. We have implemented this algorithm using a twodimensional block torus wrapping. The block size nb of the block torus wrapping is also the block size ... |