## The geometry of algorithms with orthogonality constraints (1998)

### Cached

### Download Links

Venue: | SIAM J. MATRIX ANAL. APPL |

Citations: | 422 - 1 self |

### BibTeX

@ARTICLE{Edelman98thegeometry,

author = {Alan Edelman and Tomás A. Arias and Steven T. Smith},

title = {The geometry of algorithms with orthogonality constraints},

journal = {SIAM J. MATRIX ANAL. APPL},

year = {1998},

volume = {20},

number = {2},

pages = {303--353}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we develop new Newton and conjugate gradient algorithms on the Grassmann and Stiefel manifolds. These manifolds represent the constraints that arise in such areas as the symmetric eigenvalue problem, nonlinear eigenvalue problems, electronic structures computations, and signal processing. In addition to the new algorithms, we show how the geometrical framework gives penetrating new insights allowing us to create, understand, and compare algorithms. The theory proposed here provides a taxonomy for numerical linear algebra algorithms that provide a top level mathematical view of previously unrelated algorithms. It is our hope that developers of new algorithms and perturbation theories will benefit from the theory, methods, and examples in this paper.

### Citations

1518 |
Practical Optimization
- Murray, Wright
- 1981
(Show Context)
Citation Context ...from the optimization point and the role that geodesics play “behind-the-scenes”) in the optimization references that we have consulted. Numerical Lagrange multiplier issues are discussed in [40] and =-=[41]-=-, for example. In this paper, we give the new interpretation that the Hessian of the Lagrangian is the correct matrix for computing second derivatives along geodesics at every point, not only as an ap... |

1214 |
Differential Geometry, Lie Groups, and Symmetric Spaces
- Helgason
- 1978
(Show Context)
Citation Context ...used in the algorithms presented in the following section. Because we focus on computations, our approach differs from the more general (and powerful) coordinate-free methods used by modern geometers =-=[18, 47, 54, 62, 79, 87]-=-. Boothby [8] provides an undergraduate level introduction to the coordinate-free approach. For readers with a background in differential geometry, we wish to point out how we use extrinsic coordinate... |

1173 |
The Algebraic Eigenvalue Problem
- Wilkinson
- 1965
(Show Context)
Citation Context ...tion verifies that M(t) satisfies Eq. (2.44). By separately considering Y T Y (t) and (I − Y Y T )Y (t), we may derive Eq. (2.43). The solution of the differential equation Eq. (2.44) may be obtained =-=[25, 88]-=- by solving the quadratic eigenvalue problem (λ 2 I − Aλ + C)x = 0. Such problems are typically solved in one of three ways: (1) by solving the generalized eigenvalue problem ( ) ( ) ( ) ( ) C 0 x A −... |

808 | The Symmetric Eigenvalue Problem - Parlett - 1998 |

796 |
A Comprehensive Introduction to Differential Geometry. Publish or
- Spivak
- 1979
(Show Context)
Citation Context ...used in the algorithms presented in the following section. Because we focus on computations, our approach differs from the more general (and powerful) coordinate-free methods used by modern geometers =-=[18, 47, 54, 62, 79, 87]-=-. Boothby [8] provides an undergraduate level introduction to the coordinate-free approach. For readers with a background in differential geometry, we wish to point out how we use extrinsic coordinate... |

772 |
Methods of conjugate gradients for solving linear systems
- Hestenes, Stiefel
- 1952
(Show Context)
Citation Context ...hich is named for Eduard Stiefel, who considered its topology in the 1930s [82]. This is the same Stiefel who in collaboration with Magnus Hestenes in 1952 originated the conjugate gradient algorithm =-=[49]-=-. Both Stiefel’s manifold and his conjugate gradient algorithm play an important role in this paper. The geometry of the Stiefel manifold in the context of ∗ Department of Mathematics Room 2-380, Mass... |

537 |
Multiple emitter location and signal parameter estimation
- Schmidt
- 1986
(Show Context)
Citation Context ...ing. The problem of computing the principal invariant subspace of a symmetric or Hermitian matrix arises frequently in signal processing applications, such as adaptive filtering and direction finding =-=[64, 72, 6, 73, 68]-=-. Frequently, there is some time-varying aspect to the signal processing problem, and a family of time-varying principal invariant subspaces must be tracked. The variations may be due to either the ad... |

414 |
ESPRIT - Estimation of Signal Parameters via Rotational Invariance Techniques
- Roy, Kailath
- 1989
(Show Context)
Citation Context ...ing. The problem of computing the principal invariant subspace of a symmetric or Hermitian matrix arises frequently in signal processing applications, such as adaptive filtering and direction finding =-=[64, 72, 6, 73, 68]-=-. Frequently, there is some time-varying aspect to the signal processing problem, and a family of time-varying principal invariant subspaces must be tracked. The variations may be due to either the ad... |

405 |
An introduction to differentiable manifolds and Riemannian geometry
- Boothby
- 1975
(Show Context)
Citation Context ...n the following section. Because we focus on computations, our approach differs from the more general (and powerful) coordinate-free methods used by modern geometers [18, 47, 54, 62, 79, 87]. Boothby =-=[8]-=- provides an undergraduate level introduction to the coordinate-free approach. For readers with a background in differential geometry, we wish to point out how we use extrinsic coordinates in a somewh... |

320 |
Semi-Riemannian geometry with applications to relativity, Pure and Applied Mathematics, 103. New York-London etc
- O’Neill
- 1983
(Show Context)
Citation Context ...used in the algorithms presented in the following section. Because we focus on computations, our approach differs from the more general (and powerful) coordinate-free methods used by modern geometers =-=[18, 47, 54, 62, 79, 87]-=-. Boothby [8] provides an undergraduate level introduction to the coordinate-free approach. For readers with a background in differential geometry, we wish to point out how we use extrinsic coordinate... |

295 |
Foundations of differentiable manifolds and Lie groups
- Warner
- 1983
(Show Context)
Citation Context |

265 | A multilinear singular value decomposition
- Lathauwer, Moor, et al.
(Show Context)
Citation Context ...nding to the non-zero eigenvalues. There is also a version involving the two matrices ⎛ 0 0 ∆ ⎝ B 0 0 0 ∆T ⎞ ⎛ 0 0 B ⎠ and ⎝∆ 0 T ⎞ 0 0 ⎠. 0 ∆ 0 This SVD may be expressed in terms of the quotient SVD =-=[45, 27]-=-. Given the SVD, we may follow geodesics by computing ( ) C Y (t) = (Y V U ) V S T . All the Y along this curve have the property that Y T BY = I. For the problem of minimizing 1 2 tr Y T AY , line mi... |

256 |
Self consistent equations including exchange and correlation e® ects. Phys
- Kohn, Sham
- 1965
(Show Context)
Citation Context ...hods, a direct approach is simply infeasible. The fundamental theorems which make the ab initio approach tractable come from the density functional theory of Hohenberg and Kohn [50] and Kohn and Sham =-=[55]-=-. Density functional theory states that the ground states energy of a quantum mechanical system of interacting electrons and ions is equal to the solution of the problem of minimizing an energy functi... |

210 | der Vorst. A Jacobi-Davidson iteration method for linear eigenvalue problems - Sleijpen, van - 1996 |

139 |
The Iterative Calculation of a Few of the Lowest Eigenvalues and Corresponding Eigenvectors of Large RealSymmetric Matrices
- Davidson
- 1975
(Show Context)
Citation Context ...erse iteration process (almost always) allowing the underlying mechanism to drive the algorithm. One trivial example where these issues arise is the generalization and derivation of Davidson’s method =-=[74, 26, 22]-=-. In this context there is some question as to the interpretation of D − λI as a preconditioner. One interpretation is that it preconditions the eigenproblem by creating better eigenvalue spacings. We... |

136 |
Inhomogeneous electron gas
- Hohenberg, Kohn
- 1964
(Show Context)
Citation Context ...dense versus sparse methods, a direct approach is simply infeasible. The fundamental theorems which make the ab initio approach tractable come from the density functional theory of Hohenberg and Kohn =-=[50]-=- and Kohn and Sham [55]. Density functional theory states that the ground states energy of a quantum mechanical system of interacting electrons and ions is equal to the solution of the problem of mini... |

130 | Eigenvalues of Matrices - Chatelin - 1993 |

121 | Sequential quadratic programming - Boggs, Tolle - 1995 |

120 |
Space-time adaptive processing for airborne radar
- Ward
- 1994
(Show Context)
Citation Context ...lly arises from updating the sample covariance matrix estimate; Eq. (4.12), the more general case, arises from a time-varying interference scenario, e.g., interference for airborne surveillance radar =-=[85, 77]-=-. Solving this eigenvalue problem via the eigenvalue or singular value decompositions requires a large computational effort. Furthermore, only the span of the first few principal eigenvectors may be r... |

118 |
Unified approach for molecular dynamics and density-functional
- CAR, PARRINELLO
- 1985
(Show Context)
Citation Context ...re and dynamics of surfaces [51, 61], the nature of point defects in crystals [60], and the diffusion and interaction of impurities in bulk materials [84]. Less than ten years ago, Car and Parrinello =-=[13]-=- in a watershed paper proposed minimization through simulated annealing. Teter and Gillan [42, 83] later introduced conjugate gradient based schemes and demonstrated an order of magnitude increase in ... |

116 |
Error and perturbation bounds for subspaces associated with certain eigenvalue problems
- Stewart
- 1973
(Show Context)
Citation Context ...ethod for Invariant Subspace Computations. Methods for refining estimates for invariant subspace computations have been proposed by Chatelin [15, 16], Dongarra, Moler, and Wilkinson [29], and Stewart =-=[80]-=-. Demmel [28, §3] proposes a unified approach by showing that they are all solutions to a Riccati equation. These algorithms, when applied to symmetric matrices, are all variations on our geometrical ... |

99 | An updating algorithm for subspace tracking - Stewart - 1992 |

81 |
Tracking a few extreme singular values and vectors in signal processing
- Comon, Golub
- 1990
(Show Context)
Citation Context ...putations. Approaches to this problem may be classified as standard iterative methods [44], methods exploiting rank 1 updates [64, 53, 73, 94, 58, 81, 14, 57], i.e., Eq. (4.11), Lanczos based methods =-=[20, 91, 90]-=-, gradient based methods [64, 92, 10], conjugate gradient based methods [38, 19, 71, 93, 75, 36, 78], which are surveyed by Edelman and Smith [31], Rayleigh-Ritz based methods [37, 20], and methods th... |

79 |
Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients
- Payne, Teter, et al.
- 1992
(Show Context)
Citation Context ...is a function which we leave unspecified in this discussion. In full generality the X are complex, but the real case applies for physical systems of large extent that we envisage for this application =-=[66]-=-, and we, accordingly, take X to be real in this discussion. Recent advances in computers have enabled such calculations on systems with several hundreds of atoms [4, 11]. Further improvements in memo... |

68 |
The Davidson method
- Crouzeix, Philippe, et al.
- 1994
(Show Context)
Citation Context ...erse iteration process (almost always) allowing the underlying mechanism to drive the algorithm. One trivial example where these issues arise is the generalization and derivation of Davidson’s method =-=[74, 26, 22]-=-. In this context there is some question as to the interpretation of D − λI as a preconditioner. One interpretation is that it preconditions the eigenproblem by creating better eigenvalue spacings. We... |

66 |
Numerical Linear Algebra and Application
- Datta
- 1995
(Show Context)
Citation Context ...tion verifies that M(t) satisfies Eq. (2.44). By separately considering Y T Y (t) and (I − Y Y T )Y (t), we may derive Eq. (2.43). The solution of the differential equation Eq. (2.44) may be obtained =-=[25, 88]-=- by solving the quadratic eigenvalue problem (λ 2 I − Aλ + C)x = 0. Such problems are typically solved in one of three ways: (1) by solving the generalized eigenvalue problem ( ) ( ) ( ) ( ) C 0 x A −... |

66 | Inhomogeneous electron gas, Phys - Hohenberg, Kohn - 1964 |

53 |
Algorithms for the regularization of ill-conditioned least squares problems
- Eldén
- 1977
(Show Context)
Citation Context ...iag(p, p − 1, . . . , 1), then the optimum solution to maximizing F over the Stiefel manifold yields the eigenvectors corresponding to the p largest eigenvalues. For the orthogonal Procrustes problem =-=[32]-=-, F(Y ) = 1 2‖AY − B‖2F (A m-by-n, B m-by-p, both arbitrary), FY = ATAY − ATB and FY Y (∆) = ATA∆ − Y ∆TATAY . Note that Y TFY Y (∆) = skewsymmetric.Orthogonality Constraints 21 3.3. Conjugate Gradie... |

51 |
Some history of the conjugate gradient and Lanczos algorithms: 1949-1976
- Golub, O'Leary
- 1989
(Show Context)
Citation Context ... required, whereas decomposition techniques compute all eigenvectors and eigenvalues, resulting in superfluous computations. Approaches to this problem may be classified as standard iterative methods =-=[44]-=-, methods exploiting rank 1 updates [64, 53, 73, 94, 58, 81, 14, 57], i.e., Eq. (4.11), Lanczos based methods [20, 91, 90], gradient based methods [64, 92, 10], conjugate gradient based methods [38, 1... |

48 | Optimization techniques on Riemannian manifolds - Smith - 1994 |

47 |
Riemannian Geometry|A Modern Introduction
- Chavel
- 1993
(Show Context)
Citation Context |

45 |
A trace, minimization algorithm for the generalized eigenvalue problem
- Sameh, Wisniewski
- 1982
(Show Context)
Citation Context ...ric) on the sphere (p = 1) is easy and has been proposed in many sources. The correct model algorithm for p > 1 presented in this paper is new. We were at first bewildered by the number of variations =-=[2, 9, 33, 34, 3, 39, 35, 36, 69, 70, 38, 67, 19, 46, 93]-=- most of which propose “new” algorithms for conjugate gradient for the eigenvalue problem. Most of these algorithms are for computing extreme eigenvalues and corresponding eigenvectors. It is importan... |

44 |
Algorithms for Large Symmetric
- Cullum, Willoughby
- 1985
(Show Context)
Citation Context ...solved more completely by solving the 2p-by-2p eigenvalue problem. This does not follow the geodesic directly, but captures the main idea of the block Lanczos algorithm which in some sense is optimal =-=[23, 24]-=-. If one is really considering the pure linear symmetric eigenvalue problem then pure conjugate gradient style procedures must be inferior to Lanczos. Every step of all proposed non-preconditioned con... |

42 |
Geometric Optimization Methods for Adaptive Filtering
- Smith
- 1993
(Show Context)
Citation Context ...ecommendations are those of the author and are not necessarily endorsed by the United States Air Force. 12 Edelman, Arias, and Smith optimization problems and subspace tracking was explored by Smith =-=[75]-=-. In this paper we use numerical linear algebra techniques to simplify the ideas and algorithms presented there so that the differential geometric ideas seem natural and illuminating to the numerical ... |

37 |
Optimality of high resolution array processing using the eigensystem approach
- Bienvenu, Kopp
- 1983
(Show Context)
Citation Context ...ing. The problem of computing the principal invariant subspace of a symmetric or Hermitian matrix arises frequently in signal processing applications, such as adaptive filtering and direction finding =-=[64, 72, 6, 73, 68]-=-. Frequently, there is some time-varying aspect to the signal processing problem, and a family of time-varying principal invariant subspaces must be tracked. The variations may be due to either the ad... |

37 |
Differential geometry of Grassmann manifolds
- Wong
- 1967
(Show Context)
Citation Context ...red in the same way. This amounts to post-multiplying everything by V , or for that matter, any p-by-p orthogonal matrix. The path length between Y0 and Y (t) (distance between subspaces) is given by =-=[89]-=- ⎞ ⎠ (2.67) d ( ( ) ∑p Y (t), Y0 = t‖H‖F = t i=1 σ 2 i ) 1/2 , where σi are the diagonal elements of Σ. (Actually, this is only true for t small enough to avoid the issue of conjugate points, e.g., lo... |

36 | Three methods for refining estimates of invariant subspaces - Demmel - 1987 |

35 | Large dense numerical linear algebra in 1993: The parallel computing influence
- Edelman
- 1993
(Show Context)
Citation Context ...natural and illuminating to the numerical linear algebra and optimization communities. The first author’s original motivation for studying this problem came from a response to a linear algebra survey =-=[30]-=-, which claimed to be using conjugate gradient to solve large dense eigenvalue problems. The second and third authors were motivated by two distinct engineering and physics applications. The salient q... |

33 |
Richtungsfelder und Fernparallelismusin Mannigfaltigkeiten
- Stiefel
- 1936
(Show Context)
Citation Context ...ion problem is defined on the set of n-by-p orthonormal matrices. This constraint surface is known as the Stiefel manifold, which is named for Eduard Stiefel, who considered its topology in the 1930s =-=[82]-=-. This is the same Stiefel who in collaboration with Magnus Hestenes in 1952 originated the conjugate gradient algorithm [49]. Both Stiefel’s manifold and his conjugate gradient algorithm play an impo... |

31 | Improving the accuracy of computed Eigenvalues and Eigenvectors
- Dongarra, Moler, et al.
- 1983
(Show Context)
Citation Context ...35 4.8. Newton’s Method for Invariant Subspace Computations. Methods for refining estimates for invariant subspace computations have been proposed by Chatelin [15, 16], Dongarra, Moler, and Wilkinson =-=[29]-=-, and Stewart [80]. Demmel [28, §3] proposes a unified approach by showing that they are all solutions to a Riccati equation. These algorithms, when applied to symmetric matrices, are all variations o... |

30 |
A survey of conjugate gradient algorithms for solution of extreme eigen-problem of a symmetric matrix
- Yang, Sarkar, et al.
- 1989
(Show Context)
Citation Context ...ric) on the sphere (p = 1) is easy and has been proposed in many sources. The correct model algorithm for p > 1 presented in this paper is new. We were at first bewildered by the number of variations =-=[2, 9, 33, 34, 3, 39, 35, 36, 69, 70, 38, 67, 19, 46, 93]-=- most of which propose “new” algorithms for conjugate gradient for the eigenvalue problem. Most of these algorithms are for computing extreme eigenvalues and corresponding eigenvectors. It is importan... |

28 |
R.: Second derivatives for optimizing eigenvalues of symmetric matrices
- Overton, Womersley
- 1995
(Show Context)
Citation Context ...ng the largest eigenvalue of A(x), an n-by-n real symmetric matrixvalued function of x ∈ R m when it is known that at the minimum, exactly p of the largest eigenvalues coalesce. Overton and Womersley =-=[63]-=- formulated SQPs for this problem using Lagrange multipliers and sophisticated perturbation theory. The constraint in their SQP was that the p largest eigenvalues were identical. We will here consider... |

26 |
A singular value decomposition updating algorithm for subspace tracking
- Moonen, Dooren, et al.
- 1992
(Show Context)
Citation Context ...ues compute all eigenvectors and eigenvalues, resulting in superfluous computations. Approaches to this problem may be classified as standard iterative methods [44], methods exploiting rank 1 updates =-=[64, 53, 73, 94, 58, 81, 14, 57]-=-, i.e., Eq. (4.11), Lanczos based methods [20, 91, 90], gradient based methods [64, 92, 10], conjugate gradient based methods [38, 19, 71, 93, 75, 36, 78], which are surveyed by Edelman and Smith [31]... |

24 |
Adaptive eigendecomposition of data covariance matrices based on first-order perturbations
- Champagne
- 1994
(Show Context)
Citation Context ...ues compute all eigenvectors and eigenvalues, resulting in superfluous computations. Approaches to this problem may be classified as standard iterative methods [44], methods exploiting rank 1 updates =-=[64, 53, 73, 94, 58, 81, 14, 57]-=-, i.e., Eq. (4.11), Lanczos based methods [20, 91, 90], gradient based methods [64, 92, 10], conjugate gradient based methods [38, 19, 71, 93, 75, 36, 78], which are surveyed by Edelman and Smith [31]... |

20 | Fast subspace decomposition - Xu, Kailath - 1994 |

19 |
Simultaneous Newton’s iteration for the eigenproblem. In Defect correction methods (Oberwolfach
- Chatelin
- 1983
(Show Context)
Citation Context ...ubspace track.Orthogonality Constraints 35 4.8. Newton’s Method for Invariant Subspace Computations. Methods for refining estimates for invariant subspace computations have been proposed by Chatelin =-=[15, 16]-=-, Dongarra, Moler, and Wilkinson [29], and Stewart [80]. Demmel [28, §3] proposes a unified approach by showing that they are all solutions to a Riccati equation. These algorithms, when applied to sym... |

18 | Recursive updating of the eigenvalue decomposition of a covariance matrix - Yu - 1991 |

17 |
New iterative methods for solution of the eigenproblem
- Bradbury, Fletcher
- 1966
(Show Context)
Citation Context ...herefore, a flat space CG method modified by projecting search directions to the constraint’s tangent space will converge superlinearly. This is basically the method proposed by Bradbury and Fletcher =-=[9]-=- and others for the single eigenvector case. For the Grassmann (invariant subspace) case, we have performed line searches of the function φ(t) = trQ(t) T AQ(t), where Q(t)R(t) := Y + t∆ is the compact... |

13 | On conjugate gradient-like methods for eigen-like problems
- Edelman, Smith
- 1996
(Show Context)
Citation Context ...for the eigenvalue problem. Most of these algorithms are for computing extreme eigenvalues and corresponding eigenvectors. It is important to note that none of these methods are equivalent to Lanczos =-=[31]-=-. It seems that the correct approach to the conjugate gradient algorithm for invariant subspaces (p > 1) has been more elusive. We are only aware of three papers [2, 70, 36] that directly consider con... |

12 |
Calculation of vacancy formation energy in aluminum
- Gillan
- 1989
(Show Context)
Citation Context ...ffusion and interaction of impurities in bulk materials [84]. Less than ten years ago, Car and Parrinello [13] in a watershed paper proposed minimization through simulated annealing. Teter and Gillan =-=[42, 83]-=- later introduced conjugate gradient based schemes and demonstrated an order of magnitude increase in the convergence rate. These initial approaches, however, ignored entirely the effects of curvature... |