## Minimizing Communication for Eigenproblems and the Singular Value Decomposition (2010)

### BibTeX

@MISC{Ballard10minimizingcommunication,

author = {Grey Ballard and James Demmel and Ioana Dumitriu},

title = {Minimizing Communication for Eigenproblems and the Singular Value Decomposition},

year = {2010}

}

### OpenURL

### Abstract

Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In [4] lower bounds were presented on the amount of communication required for essentially all O(n 3)-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs. 1

### Citations

8779 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1990
(Show Context)
Citation Context ...ncy cost, in both the two-level sequential memory model and parallel model (asymptotically, up to polylog factors). Because there exist cache-oblivious sequential algorithms for matrix multiplication =-=[24]-=- and QR decomposition [23], we conjecture that it is possible to devise cache-oblivious versions of the randomized spectral divide-and-conquer algorithms, thereby minimizing communication costs betwee... |

173 |
I/O complexity: The red-blue pebble game
- Hong, Kung
- 1981
(Show Context)
Citation Context ...processor has room in memory for 1/P -th of the input matrix, the number of words moved between one processor and the others is Ω(n 2 / √ P ). These lower bounds were originally proven for sequential =-=[30]-=- and parallel [32] matrix multiplication, and extended to many other linear algebra algorithms in [4], including the first phase of conventional eigenvalue/SVD algorithms: reduction to Hessenberg, tri... |

153 | ScaLAPACK: A portable linear algebra library for distributed memory computers - design issues and performance - Choi, DEmmel, et al. |

115 | Spectra and pseudospectra: The behavior of nonnormal matrices and operators
- Trefethen, Embree
- 2005
(Show Context)
Citation Context ...o avoid not just “hitting” the eigenvalues, but also “hitting” some appropriate -pseudospectrum of the matrix, defined below (the right choice for will be discussed later). Definition 3.1. (Following =-=[46]-=-.) The -pseudospectrum of a matrix A is defined as Λ (A) ={z ∈ C | z is an eigenvalue of A + E for some E with ||E||2 ≤ } 9We measure progress in two ways: one is when a split occurs, i.e., we separa... |

96 |
Efficient algorithms for computing a strong rank-revealing QR factorization
- Gu, Eisenstat
- 1996
(Show Context)
Citation Context ...tor of a low-degree polynomial in n away from it); (4) In addition, if ||R −1 11 R12||2 is small (at most a low-degree polynomial in n), then the rank-revealing factorization is called strong (as per =-=[28]-=-). Rank revealing decompositions are used in rank determination [44], least square computations [13], condition estimation [5], etc., as well as in divide-and-conquer algorithms for eigenproblems. For... |

80 | Block reduction of matrices to condensed forms for eigenvalue computations - Dongarra, Hammarling, et al. - 1989 |

74 |
Linear model reduction and solution of the algebraic Riccati equation by use of the sign function
- Roberts
- 1980
(Show Context)
Citation Context ...er algorithms: the PRISM project algorithm [2, 6], the reduction of the symmetric eigenproblem to matrix multiplication by Yau and Lu [35], the matrix sign function algorithm originally formulated in =-=[31, 40]-=-, and the inverse-free approach originally formulated in [26, 12, 36, 37, 38], which led to the version [3, 14] from which we start here. Our algorithms will minimize communication, both bandwidth cos... |

60 | Inverse free parallel spectral divide and conquer algorithms for nonsymmetric eigenproblems
- Bai, Gu
- 1994
(Show Context)
Citation Context ...ication by Yau and Lu [35], the matrix sign function algorithm originally formulated in [31, 40], and the inverse-free approach originally formulated in [26, 12, 36, 37, 38], which led to the version =-=[3, 14]-=- from which we start here. Our algorithms will minimize communication, both bandwidth cost and latency cost, in both the two-level sequential memory model and parallel model (asymptotically, up to pol... |

59 | Communication-optimal parallel and sequential QR and LU factorizations. LAPACK Working Note 204
- Demmel, Grigori, et al.
- 2008
(Show Context)
Citation Context ...algorithms in [14] and were modified to take advantage of the optimal communication-complexity O(n3 ) algorithms for matrix multiplication and dense QR decomposition (the latter of which was given in =-=[16]-=-). We also make here several contributions to the numerical analysis of the methods themselves. First, they depend on a rank-revealing generalized QR decomposition of the product of two matrices A−1B,... |

58 |
Eigenvalues and Pseudo-Eigenvalues of Toeplitz Matrices
- Reichel, Trefethen
- 1992
(Show Context)
Citation Context ...; this is an alternative to the output that classical (or currently implemented) algorithms yield, which samples the pseudospectrum. Since in general pseudospectra can be arbitrarily complicated (see =-=[39]-=-), the amount of information gained may be limited; for example, the genus of the pseudospectrum will not be specified (i.e., the output will ignore the presence of “holes”, as in the third picture on... |

55 |
The generalized Schur decomposition of an arbitrary pencil A−λB
- Demmel, K̊agström
- 1991
(Show Context)
Citation Context ... we like.) The mathematically correct approach in this case is to compute the Kronecker Canonical Form (or rather the variant that only uses orthogonal transformations to maintain numerical stability =-=[17, 18]-=-), a problem that is beyond the scope of this paper. The challenge of singular pencils is shared by any algorithm for the generalized eigenproblem. The conventional QZ iteration may converge, but tiny... |

51 | Communication lower bounds for distributedmemory matrix multiplication
- Irony, Toledo, et al.
(Show Context)
Citation Context ... in memory for 1/P -th of the input matrix, the number of words moved between one processor and the others is Ω(n 2 / √ P ). These lower bounds were originally proven for sequential [30] and parallel =-=[32]-=- matrix multiplication, and extended to many other linear algebra algorithms in [4], including the first phase of conventional eigenvalue/SVD algorithms: reduction to Hessenberg, tridiagonal and bidia... |

41 | Robust incremental condition estimation - BISCHOF, TANG - 1991 |

38 | Orthogonal eigenvectors and relative gaps
- Dhillon, Parlett
(Show Context)
Citation Context ...N(e−iλB)v0 for λ = ikπ N completely addresses the case of tightly clustered eigenvalues. Clustering is a well-known challenge for attaining orthogonality with reasonable complexity, as highlighted in =-=[19, 20]-=-. By contrast, in the symmetric case, for highly clustered eigenvalues, our strategy would produce small bounding intervals and output them together with the number of eigenvalues to be found in them,... |

30 | On parallelizable eigensolvers
- Auslander, Tsao
- 1992
(Show Context)
Citation Context ...rix it computes the Schur form. For a symmetric problem or SVD, it computes the full decomposition. There is an extensive literature on such divide-and-conquer algorithms: the PRISM project algorithm =-=[2, 6]-=-, the reduction of the symmetric eigenproblem to matrix multiplication by Yau and Lu [35], the matrix sign function algorithm originally formulated in [31, 40], and the inverse-free approach originall... |

30 |
Parallel algorithm for solving some spectral problems of linear algebra
- Malyshev
- 1993
(Show Context)
Citation Context ...of the symmetric eigenproblem to matrix multiplication by Yau and Lu [35], the matrix sign function algorithm originally formulated in [31, 40], and the inverse-free approach originally formulated in =-=[26, 12, 36, 37, 38]-=-, which led to the version [3, 14] from which we start here. Our algorithms will minimize communication, both bandwidth cost and latency cost, in both the two-level sequential memory model and paralle... |

27 | The Dimension of Matrices (Matrix Pencils) with Given Jordan (Kronecker) Canonical Forms
- Demmel, Edelman
- 1995
(Show Context)
Citation Context ...eated failures occur, the algorithm should simply stop and alert the user of the likelihood of near-singularity. Fortunately, the set of singular pencils form an algebraic variety of high codimension =-=[15]-=-, i.e. a set of measure zero, so it is unlikely that a randomly chosen pencil will be close to singular. Henceforth we assume that we are not near a singular pencil. Even in the case of a regular penc... |

26 |
Some applications of the rank revealing QR factorization
- Chan, Hansen
- 1992
(Show Context)
Citation Context ... most a low-degree polynomial in n), then the rank-revealing factorization is called strong (as per [28]). Rank revealing decompositions are used in rank determination [44], least square computations =-=[13]-=-, condition estimation [5], etc., as well as in divide-and-conquer algorithms for eigenproblems. For a good survey paper, we recommend [28]. In the paper [14], we have proposed a randomized rank revea... |

26 | Fast linear algebra is stable
- Demmel, Dumitriu, et al.
(Show Context)
Citation Context ...ication by Yau and Lu [35], the matrix sign function algorithm originally formulated in [31, 40], and the inverse-free approach originally formulated in [26, 12, 36, 37, 38], which led to the version =-=[3, 14]-=- from which we start here. Our algorithms will minimize communication, both bandwidth cost and latency cost, in both the two-level sequential memory model and parallel model (asymptotically, up to pol... |

24 |
On Jacobi rotation patterns
- Rutishauser
- 1963
(Show Context)
Citation Context ...r the symmetric eigenvalue problem) or bidiagonal form (for the SVD). These algorithms for the symmetric case were discussed at length in [7, 8], which in turn refer back to algorithms originating in =-=[41, 42]-=-. Our SVD algorithms are natural variations of these. These algorithms can minimize the number of words moved in an asymptotic sense on a sequential machine with two levels of memory hierarchy (say ma... |

23 |
Problem of the dichotomy of the spectrum of a matrix
- Godunov
- 1986
(Show Context)
Citation Context ...of the symmetric eigenproblem to matrix multiplication by Yau and Lu [35], the matrix sign function algorithm originally formulated in [31, 40], and the inverse-free approach originally formulated in =-=[26, 12, 36, 37, 38]-=-, which led to the version [3, 14] from which we start here. Our algorithms will minimize communication, both bandwidth cost and latency cost, in both the two-level sequential memory model and paralle... |

22 | Parallel tridiagonalization through twostep band reduction - Bischof, Lang, et al. - 1984 |

22 |
de Geijn and Jerrell Watts. SUMMA: Scalable universal matrix multiplication algorithm
- van
- 1997
(Show Context)
Citation Context ...nalysis. After analysis of one step of the divide-and-conquer algorithms, we will describe how the subproblems can be solved independently. 4.3.1 Subroutines The SUMMA matrix multiplication algorithm =-=[25]-=- for multiplying square matrices of size n × n, with blocksize b, has a total cost of CMM(n, P )=α · O n n2 log P + β · O √ log P + γ · O b P n3 P From [16] (equation (12) on page 62), the total cost ... |

21 |
Rank degeneracy
- Stewart
- 1984
(Show Context)
Citation Context ...if ||R −1 11 R12||2 is small (at most a low-degree polynomial in n), then the rank-revealing factorization is called strong (as per [28]). Rank revealing decompositions are used in rank determination =-=[44]-=-, least square computations [13], condition estimation [5], etc., as well as in divide-and-conquer algorithms for eigenproblems. For a good survey paper, we recommend [28]. In the paper [14], we have ... |

19 |
A parallel algorithm for reducing symmetric banded matrices to tridiagonal form
- Lang
(Show Context)
Citation Context ...scuss two different schemes (i.e. choices of {ci} and {di}) for reducing the banded matrix to tridiagonal form. 1 Consider choosing d1 = b−1 and c1 = 1, so that s = 1, as in the parallel algorithm of =-=[33]-=-. Then the communication cost of the banded-to-tridiagonal reduction is O(bn2 √ ). If we choose b = α1 M for some constant α1 < 1/2, then four b × b blocks can fit into fast memory at a time, and by r... |

17 | Minimizing communication in linear algebra
- BALLARD, DEMMEL, et al.
- 2011
(Show Context)
Citation Context ... the respective decomposition. In addition, TRSM stands for “triangular solve with multiple right-hand-sides”, which costs O(m3 ) flops, and for which communication-avoiding implementations are known =-=[4]-=-. Algorithm 7 Function [ ˆ Q, ˆ R]=S2QR([R; X]), where R is n × n upper triangular and X is m × n 1: Form B = R 0 X 0 2: Compute [ ˆ Q, T ]=Schur(B) 3: Q11 = ˆ Q(1 : m, 1:m); T11 = T (1 : m, 1:m) 4: ˆ... |

17 | The design and implementation of the MRRR algorithm
- Dhillon, Parlett, et al.
(Show Context)
Citation Context ...N(e−iλB)v0 for λ = ikπ N completely addresses the case of tightly clustered eigenvalues. Clustering is a well-known challenge for attaining orthogonality with reasonable complexity, as highlighted in =-=[19, 20]-=-. By contrast, in the symmetric case, for highly clustered eigenvalues, our strategy would produce small bounding intervals and output them together with the number of eigenvalues to be found in them,... |

15 |
The sign matrix and the separation of matrix eigenvalues
- Howland
- 1983
(Show Context)
Citation Context ...er algorithms: the PRISM project algorithm [2, 6], the reduction of the symmetric eigenproblem to matrix multiplication by Yau and Lu [35], the matrix sign function algorithm originally formulated in =-=[31, 40]-=-, and the inverse-free approach originally formulated in [26, 12, 36, 37, 38], which led to the version [3, 14] from which we start here. Our algorithms will minimize communication, both bandwidth cos... |

12 | The PRISM project: Infrastructure and algorithms for parallel eigensolvers
- Bischof, Huss-Lederman, et al.
- 1993
(Show Context)
Citation Context ...rix it computes the Schur form. For a symmetric problem or SVD, it computes the full decomposition. There is an extensive literature on such divide-and-conquer algorithms: the PRISM project algorithm =-=[2, 6]-=-, the reduction of the symmetric eigenproblem to matrix multiplication by Yau and Lu [35], the matrix sign function algorithm originally formulated in [31, 40], and the inverse-free approach originall... |

12 |
Computing invariant subspaces of a regular linear pencil of matrices
- Malyshev
- 1989
(Show Context)
Citation Context ...of the symmetric eigenproblem to matrix multiplication by Yau and Lu [35], the matrix sign function algorithm originally formulated in [31, 40], and the inverse-free approach originally formulated in =-=[26, 12, 36, 37, 38]-=-, which led to the version [3, 14] from which we start here. Our algorithms will minimize communication, both bandwidth cost and latency cost, in both the two-level sequential memory model and paralle... |

11 | QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism
- Frens, Wise
(Show Context)
Citation Context ...level sequential memory model and parallel model (asymptotically, up to polylog factors). Because there exist cache-oblivious sequential algorithms for matrix multiplication [24] and QR decomposition =-=[23]-=-, we conjecture that it is possible to devise cache-oblivious versions of the randomized spectral divide-and-conquer algorithms, thereby minimizing communication costs between multiple levels of the m... |

10 |
Guaranteed accuracy in spectral problems of linear algebra
- Malyshev
- 1992
(Show Context)
Citation Context |

10 | On graded QR decompositions of products of matrices, Elect
- Stewart
- 1995
(Show Context)
Citation Context ...gonal/unitary matrix to the front of the product, while computing factor matrices from which (if desired) the upper triangular R matrix can be obtained. A similar idea was explored by G.W. Stewart in =-=[45]-=- to perform graded QR; although it was suggested that such techniques can be also applied to algorithms like URV, no randomization was used. The algorithm is presented in pseudocode below. Lemma 2.2. ... |

9 | The SBR Toolbox: Software for Successive Band Reduction
- Bischof, Lang, et al.
- 2000
(Show Context)
Citation Context ...f these algorithms in the parallel case and minimizing latency in either case are open problems. These algorithms are a special case of the class of successive band reduction algorithms introduced in =-=[7, 8]-=-. The rest of this paper is organized as follows. Section 2 discusses randomized rank-revealing decompositions. Section 3 uses these decompositions to implement randomized spectral divide-and-conquer ... |

6 | The shifted Hessenberg system solve computation
- HENRY
- 1994
(Show Context)
Citation Context ...esented in Algorithm 8. The problem of computing the eigenvectors of a triangular matrix, as well as the formulation of a blocked algorithm, appears as a special case of the computations presented in =-=[29]-=-; however, [29] does not try to pick a blocksize in order to minimize communication, nor does it analyze the communication complexity of the algorithm. For simplicity we assume the blocksize b divides... |

5 | Scheduling two-sided transformations using tile algorithms on multicore architectures - Ltaief, Kurzak, et al. |

4 |
Eigenvalues of the Laplacian through boundary integral equations
- Lu, Yau
- 1991
(Show Context)
Citation Context ...osition. There is an extensive literature on such divide-and-conquer algorithms: the PRISM project algorithm [2, 6], the reduction of the symmetric eigenproblem to matrix multiplication by Yau and Lu =-=[35]-=-, the matrix sign function algorithm originally formulated in [31, 40], and the inverse-free approach originally formulated in [26, 12, 36, 37, 38], which led to the version [3, 14] from which we star... |

2 |
A Framework for Successive Band Reduction
- Bischof, Lang, et al.
- 2000
(Show Context)
Citation Context ...f these algorithms in the parallel case and minimizing latency in either case are open problems. These algorithms are a special case of the class of successive band reduction algorithms introduced in =-=[7, 8]-=-. The rest of this paper is organized as follows. Section 2 discusses randomized rank-revealing decompositions. Section 3 uses these decompositions to implement randomized spectral divide-and-conquer ... |

2 |
Communication-avoiding rank-revealing QR decomposition. in preparation
- Branescu, Demmel, et al.
- 2010
(Show Context)
Citation Context ...one could potentially use a deterministic rank-revealing QR algorithm instead of our randomized one. There has been recent progress in developing such a QR algorithm that also minimizes communication =-=[11]-=-. The other approach is to use the sign function algorithm introduced by [40]; this works for both the symmetric and the non-symmetric cases, but requires taking inverses, which leads to a potential l... |

2 |
Smallest eigenvalue distributions for β-Jacobi ensembles
- Dumitriu
- 2010
(Show Context)
Citation Context ...submatrix of a Haar-distributed orthogonal n×n matrix. All the results of [14] can be extended verbatim to Haar-distributed unitary matrices; however, the analysis employed in [14] is not optimal. In =-=[22]-=-, the asymptotically exact scaling and distribution of fr,n were computed for all regimes of growth (r, n) (naturally, 0 <r<n). Essentially, the bounds state that r(n − r)fr,n converges in law to a gi... |

2 |
Tridiagonalization of a symetric band matrix
- Schwarz
- 1968
(Show Context)
Citation Context ...r the symmetric eigenvalue problem) or bidiagonal form (for the SVD). These algorithms for the symmetric case were discussed at length in [7, 8], which in turn refer back to algorithms originating in =-=[41, 42]-=-. Our SVD algorithms are natural variations of these. These algorithms can minimize the number of words moved in an asymptotic sense on a sequential machine with two levels of memory hierarchy (say ma... |

2 |
Gershgorin theory for the generalized eigenproblem Ax = λBx
- Stewart
- 1975
(Show Context)
Citation Context ...es that are arbitrarily large, or even equal to infinity, if B is singular. So we cannot expect to find a single finite bounding circle holding all the eigenvalues, for example using Gershgorin disks =-=[43]-=-, to which we could then apply the techniques of the last section, because many or all of the eigenvalues could lie outside any finite circle. Indeed, a Gershgorin disk in [43] has infinite radius as ... |

2 | On MRRR-type algorithms for the tridiagonal symmetric eigenproblem and the bidiagonal SVD - Willems - 2009 |

1 | performing the following three computations overwrites the matrix A with (I −τuu T )A(I −τuu T ) T : y ← Au v ← y − 1 2 (yT u)u A ← A − uv T − vu T The first computation costs 2hn flops, and the second line costs 4h + 1. Since each of the matrices uv T an - the |