## Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization (2009)

Venue: | Advances in Neural Information Processing Systems 22 |

Citations: | 43 - 3 self |

### BibTeX

@INPROCEEDINGS{Wright09robustprincipal,

author = {John Wright},

title = {Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization},

booktitle = {Advances in Neural Information Processing Systems 22},

year = {2009}

}

### OpenURL

### Abstract

The supplementary material to the NIPS version of this paper [4] contains a critical error, which was discovered several days before the conference. Unfortunately, it was too late to withdraw the paper from the proceedings. Fortunately, since that time, a correct analysis of the proposed convex programming relaxation has been developed by Emmanuel Candes of Stanford University. That analysis is reported in a joint paper, Robust Principal Component Analysis? by Emmanuel Candes, Xiaodong Li, Yi Ma and John Wright,

### Citations

3661 |
Convex optimization
- Boyd, Vandenberghe
(Show Context)
Citation Context ... ∈ (0, 1), ‖S Ωc γ [A + B]‖F ≤ ‖S Ωc αγ [A]‖F + ‖S Ωc (1−α)γ[B]‖F . (88) Proof. Notice that for scalars x, |Sγ[x]| is a convex nonnegative function. X ∈ Rm×m , ‖SΩc γ [X]‖F is again convex (see, e.g, =-=[3]-=- Example 3.14), and so ∥ ‖S Ωc γ [A + B]‖F = ≤ ∥S Ωc γ ∥ α ∥S Ωc γ [ α A ]∥ B ∥∥F α + (1 − α) 1−α ] ∥ ∥ ∥F + (1 − α) [ A α ∥S Ωc γ = ‖S Ωc αγ [A]‖F + ‖S Ωc (1−α)γ [B]‖F . [ ]∥ B ∥∥F 1−α Hence, for mat... |

2447 |
Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography
- Fischler, Bolles
- 1981
(Show Context)
Citation Context ...ifying PCA have been explored in the literature. These approaches include influence function techniques [6, 7], multivariate trimming [8], alternating minimization [9], and random sampling techniques =-=[10]-=-. Unfortunately, none of these existing approaches yields a polynomial-time algorithm with strong performance guarantees. 1 In this paper, we consider an idealization of the robust PCA problem, in whi... |

2008 |
Principal Component Analysis
- Jolliffe
- 2002
(Show Context)
Citation Context ... low-dimensional subspace. In other words, if we stack all the observations as column vectors of a matrix M ∈ R m×n , the matrix should be (approximately) low rank. Principal component analysis (PCA) =-=[1, 5]-=- seeks the best (in an ℓ 2 -sense) such low-rank representation of the given data matrix. It enjoys a number of optimality properties when the data are only mildly corrupted by small noise, and can be... |

1823 |
Robust Statistics
- Huber
- 1981
(Show Context)
Citation Context ... irrelevant to the structure we are trying to identify. A number of natural approaches to robustifying PCA have been explored in the literature. These approaches include influence function techniques =-=[6, 7]-=-, multivariate trimming [8], alternating minimization [9], and random sampling techniques [10]. Unfortunately, none of these existing approaches yields a polynomial-time algorithm with strong performa... |

1682 |
A Global Geometric Framework for Nonlinear Dimensionality Reduction
- Tenenbaum, Silva, et al.
- 2000
(Show Context)
Citation Context ...imited number of observations requires some assumption that the data have low intrinsic complexity, e.g., that they are low-rank [1], sparse in some basis [2], or lie on some low-dimensional manifold =-=[3, 4]-=-. Perhaps the simplest useful assumption is that the observations all lie near some low-dimensional subspace. In other words, if we stack all the observations as column vectors of a matrix M ∈ R m×n ,... |

1651 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 2001
(Show Context)
Citation Context ... full play here: meaningful inference with limited number of observations requires some assumption that the data have low intrinsic complexity, e.g., that they are low-rank [15], sparse in some basis =-=[10]-=-, or lie on some low-dimensional manifold [27, 2]. Perhaps the simplest useful assumption is that the observations all lie near some low-dimensional subspace. In other words, if we stack all the obser... |

729 | Laplacian eigenmaps for dimensionality reduction and data representation
- Belkin, Niyogi
- 2003
(Show Context)
Citation Context ...imited number of observations requires some assumption that the data have low intrinsic complexity, e.g., that they are low-rank [1], sparse in some basis [2], or lie on some low-dimensional manifold =-=[3, 4]-=-. Perhaps the simplest useful assumption is that the observations all lie near some low-dimensional subspace. In other words, if we stack all the observations as column vectors of a matrix M ∈ R m×n ,... |

652 | Decoding by linear programming
- Candes, Tao
- 2005
(Show Context)
Citation Context ...ations in the presence of arbitrary, but sparse errors. These results imply that for generic systems of equations, it is possible to correct a constant fraction of arbitrary errors in polynomial time =-=[11]-=-. This is achieved by employing the ℓ 1 -norm as a convex surrogate for the highly-nonconvex ℓ 0 -norm. A parallel (and still emerging) line of work concerns the problem of computing low-rank matrix s... |

432 | From few to many: illumination cone models for face recognition under variable lighting and pose
- Georghiades, Belhumeur, et al.
- 2001
(Show Context)
Citation Context ...bust PCA algorithm offers a principled way of removing such spatially localized artifacts. Figure 3 illustrates the results of our algorithm on images from subsets 1-3 of the Extended Yale B database =-=[31]-=-. The proposed algorithm algorithm removes the specularities in the eyes and the shadows around the nose region. This technique is potentially useful for pre-processing training images in face recogni... |

363 | A fast iterative shrinkage-thresholding algorithm for linear inverse problems
- Beck, Teboulle
- 1997
(Show Context)
Citation Context ... without scalable algorithms for solving the associated convex program. To this end, we discuss how a near-solution to this convex program can be obtained relatively efficiently via proximal gradient =-=[18, 19]-=- and iterative thresholding techniques, similar to those proposed for matrix completion in [20, 21]. For large matrices, these algorithms are significantly faster and more scalable than general-purpos... |

344 | CoSaMP: Iterative signal recovery from incomplete and inaccurate samples
- Needell, Tropp
- 2008
(Show Context)
Citation Context ...is operator. Parallel implementation on GPUs or distributed implementation across multiple machines are other interesting avenues for future improvements. Finally, in the spirit of the recent work of =-=[25]-=- on sparse reconstruction and [23] on rank minimization, it may be possible to find efficient greedy algorithms for robust PCA whose error correction capability approaches that of the semidefinite pro... |

343 | For most large underdetermined systems of linear equations, the minimal l1 norm solution is also the sparsest solution
- Donoho
(Show Context)
Citation Context ...er, recent advances in our understanding of the nuclear norm heuristic for low-rank solutions to matrix equations [12, 13] and the ℓ 1 heuristic for sparse solutions to underdetermined linear systems =-=[11, 24]-=-, suggest that there might be circumstances under which solving the tractable problem (2) perfectly recovers the low-rank matrix A0. The main result of this paper will be to show that this is indeed t... |

336 | Lambertian reflectance and linear subspaces
- Basri, Jacobs
- 2003
(Show Context)
Citation Context ...hadows and specularities from face images. Face recognition is another domain in computer vision where low-dimensional linear models have received a great deal of attention, mostly due to the work of =-=[30]-=-. The key observation is that under certain idealized circumstances, images of the same face under varying illumination lie near an approximately ninedimensional linear subspace known as the harmonic ... |

314 | Exact matrix completion via convex optimization
- Candès, Recht
(Show Context)
Citation Context ...in [1] remove the necessity to assume any random model on the matrix to be recovered. All that needs to be assumed is that the singular vectors are incoherent with the standard basis, in the sense of =-=[2, 3]-=-. The correct result states that rank-r matrices with incoherence parameter µ can be recovered from ρm 2 errors, as long as r < Cm/µ log 2 m. Please see [1] for more discussion of this result. [1] als... |

251 | Smooth minimization of nonsmooth functions
- Nesterov
(Show Context)
Citation Context ... without scalable algorithms for solving the associated convex program. To this end, we discuss how a near-solution to this convex program can be obtained relatively efficiently via proximal gradient =-=[18, 19]-=- and iterative thresholding techniques, similar to those proposed for matrix completion in [20, 21]. For large matrices, these algorithms are significantly faster and more scalable than general-purpos... |

226 |
The approximation of one matrix by another of lower rank
- Eckart, Young
- 1936
(Show Context)
Citation Context ...rse of dimensionality is in full play here: meaningful inference with limited number of observations requires some assumption that the data have low intrinsic complexity, e.g., that they are low-rank =-=[1]-=-, sparse in some basis [2], or lie on some low-dimensional manifold [3, 4]. Perhaps the simplest useful assumption is that the observations all lie near some low-dimensional subspace. In other words, ... |

195 | A Singular Value Thresholding Algorithm for Matrix Completion
- Cai, Candès, et al.
(Show Context)
Citation Context ... a near-solution to this convex program can be obtained relatively efficiently via proximal gradient [18, 19] and iterative thresholding techniques, similar to those proposed for matrix completion in =-=[20, 21]-=-. For large matrices, these algorithms are significantly faster and more scalable than general-purpose convex program solvers. Our analysis also implies an extension of existing results for the low-ra... |

168 |
Asymptotic Theory of Finite Dimensional Normed Spaces
- Milman, Schechtman
(Show Context)
Citation Context ... defined as αX,d,µ(t) . = sup ˘ 1 − µ(At) | A ⊂ X, µ(A) ≥ 1 ¯ , (47) 2 where At = {x | d(x, A) < t} is a t-neighborhood of A. The concentration functions for W m r and SO(r) are well known: Fact 4.4 (=-=[24]-=- Theorems 6.5.1 and 6.7.1). For r < m the manifold W m r , with distance d(X, Y ) . = ‖X − Y ‖F , the Haar measure µ has concentration function r π αW,d,µ(t) ≤ 8 exp „ − mt2 « . (48) 8 Similarly, on S... |

130 | The power of convex relaxation: Near-optimal matrix completion
- Candès, Tao
(Show Context)
Citation Context ...in [1] remove the necessity to assume any random model on the matrix to be recovered. All that needs to be assumed is that the singular vectors are incoherent with the standard basis, in the sense of =-=[2, 3]-=-. The correct result states that rank-r matrices with incoherence parameter µ can be recovered from ρm 2 errors, as long as r < Cm/µ log 2 m. Please see [1] for more discussion of this result. [1] als... |

93 | A framework for robust subspace learning
- Torre, Black
- 2003
(Show Context)
Citation Context ... irrelevant to the structure we are trying to identify. A number of natural approaches to robustifying PCA have been explored in the literature. These approaches include influence function techniques =-=[6, 7]-=-, multivariate trimming [8], alternating minimization [9], and random sampling techniques [10]. Unfortunately, none of these existing approaches yields a polynomial-time algorithm with strong performa... |

89 | high-dimensional data analysis: The curses and blessings of dimensionality
- Donoho, “Aide-memoire
- 2000
(Show Context)
Citation Context ...lying that in high-dimensional observation spaces, sparse and low-rank structures can be efficiently and exactly separated. This behavior is an example of the so-called the blessing of dimensionality =-=[17]-=-. However, this result would remain a theoretical curiosity without scalable algorithms for solving the associated convex program. To this end, we discuss how a near-solution to this convex program ca... |

75 | Matrix completion with noise
- Candès, Plan
(Show Context)
Citation Context ...low-rank matrix solutions to underdetermined linear equations [12, 13]. One of the most striking results concerns the exact completion of low-rank matrices from only a small fraction of their entries =-=[13, 14, 15, 16]-=-. 2 There, a similar convex relaxation is employed, replacing the highly non-convex matrix rank with the nuclear norm (or sum of singular values). The robust PCA problem outlined above combines aspect... |

74 |
ªRobust Estimates, Residuals and Outlier Detection with Multi-Response Data,º
- Gnanadesikan, Kettenring
- 1972
(Show Context)
Citation Context ...e are trying to identify. A number of natural approaches to robustifying PCA have been explored in the literature. These approaches include influence function techniques [6, 7], multivariate trimming =-=[8]-=-, alternating minimization [9], and random sampling techniques [10]. Unfortunately, none of these existing approaches yields a polynomial-time algorithm with strong performance guarantees. 1 In this p... |

69 | Matrix completion from a few entries
- Keshavan, Montanari, et al.
(Show Context)
Citation Context ...low-rank matrix solutions to underdetermined linear equations [12, 13]. One of the most striking results concerns the exact completion of low-rank matrices from only a small fraction of their entries =-=[13, 14, 15, 16]-=-. 2 There, a similar convex relaxation is employed, replacing the highly non-convex matrix rank with the nuclear norm (or sum of singular values). The robust PCA problem outlined above combines aspect... |

69 | On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations
- Amaldi, Kann
- 1998
(Show Context)
Citation Context ...nding them to non-square matrices. 3 In a sense, this problem subsumes both the low rank matrix completion problem and the ℓ 0 -minimization problem, both of which are NP-hard and hard to approximate =-=[23]-=-. 4 Notice that this is not an “equivalence” result for (1) and (2) – rather than asserting that the solutions of these two problems are equal with high probability, we directly prove that the convex ... |

67 | An accelerated proximal gradient algorithm for nuclear norm regularized least squares Problems
- Toh
(Show Context)
Citation Context ... a near-solution to this convex program can be obtained relatively efficiently via proximal gradient [18, 19] and iterative thresholding techniques, similar to those proposed for matrix completion in =-=[20, 21]-=-. For large matrices, these algorithms are significantly faster and more scalable than general-purpose convex program solvers. Our analysis also implies an extension of existing results for the low-ra... |

60 | Statistical modeling of complex backgrounds for foreground object detection
- Li, Huang, et al.
- 2004
(Show Context)
Citation Context ...f a matrix D, this matrix can be expressed as the sum of a low-rank background matrix and a sparse error matrix representing the activity in the scene. We illustrate this idea using two examples from =-=[29]-=- (see Figures 2). In Figure 2(a)-(c), the video sequence consists of 200 frames of a scene in an airport. There is no significant change in illumination in the video, but a lot of activity in the fore... |

45 | Fixed-point continuation for ℓ1-minimization: Methodology and convergence
- Hale, Yin, et al.
(Show Context)
Citation Context ... A0 = arg min ‖A‖∗ subj A(i, j) = A0(i, j) ∀ (i, j) ∈ Υ, (7) Finally, in Section 6 we extend existing iterative thresholding techniques for solving equalityconstrained ℓ 1 -norm minimization problems =-=[31]-=- and nuclear norm minimization problems [4] to give an algorithm that produces a near-solution to (2) more efficiently and scalably than off-theshelf interior point methods. 2.1 Relationship to existi... |

41 |
Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization
- Recht, Fazel, et al.
(Show Context)
Citation Context ... as a convex surrogate for the highly-nonconvex ℓ 0 -norm. A parallel (and still emerging) line of work concerns the problem of computing low-rank matrix solutions to underdetermined linear equations =-=[12, 13]-=-. One of the most striking results concerns the exact completion of low-rank matrices from only a small fraction of their entries [13, 14, 15, 16]. 2 There, a similar convex relaxation is employed, re... |

39 | T.: Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming
- Ke, Kanade
(Show Context)
Citation Context ...mber of natural approaches to robustifying PCA have been explored in the literature. These approaches include influence function techniques [6, 7], multivariate trimming [8], alternating minimization =-=[9]-=-, and random sampling techniques [10]. Unfortunately, none of these existing approaches yields a polynomial-time algorithm with strong performance guarantees. 1 In this paper, we consider an idealizat... |

34 |
Counting Faces of Randomly Projected Polytopes When the Projection Radically Lowers Dimension,” preprint
- Donoho, Tanner
- 2007
(Show Context)
Citation Context ...e used such a geometric interpretation to give a sharp characterization of the phase transition between correct recovery of sparse solutions, building on a long line of work in discrete geometry (see =-=[14]-=- and references therein). For our problem, either by leveraging results in geometry or by more careful probabilistic analysis, it may be possible to better understand the boundary of C, and hence bett... |

33 | Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix
- Lin, Ganesh, et al.
- 2009
(Show Context)
Citation Context ...ve first-order method based on the proximal gradient approach of [18], 6 which we briefly introduce here. For further discussion of this approach, as well as alternatives based on duality, please see =-=[27]-=-. This algorithm solves a slightly relaxed version of (2), in which the equality constraint is replaced with a penalty term: min µ‖A‖∗ + λµ‖E‖1 + 1 2‖D − A − E‖2F . (9) Here, µ is a small constant; as... |

30 |
Statistics on Special Manifolds
- Chikuse
- 2003
(Show Context)
Citation Context ...ogonal matrix, then the joint distribution Ũ, U, and Û is invariant under the map Ũ ↦→ Ũ [ ] I 0 . (67) 0 R This follows from the right orthogonal invariance of the Haar measure on W m m−1 (see e.g., =-=[11]-=- Section 1.4.3). Since this map preserves U and V , it also preserves Θ. Hence, the term of interest, ∥ ∑ [ ]∥ ∞ k (πΩπΘπΩ) ÛVˆ ∗ ∥∥2,2 is equal in distribution to ∥ ∑ [ ]∥ ∞ k (πΩπΘπΩ) ÛRVˆ ∗ ∥∥2,2 .... |

17 | Dense error correction via ℓ1 minimization
- Wright, Ma
(Show Context)
Citation Context ...s improved result is a probabilistic analysis of an iterative refinement technique for producing a dual vector that certifies optimality of the pair (A0, E0). This approach extends techniques used in =-=[11, 26]-=-, with additional care required to handle an operator norm constraint arising from the presence of the nuclear norm in (2). For further details we refer the interested reader to [22]. Finally, while T... |

16 |
Sparse and low-rank matrix decompositions
- Chandrasekaran, Sanghavi, et al.
- 2009
(Show Context)
Citation Context ...th probability at least 1 − exp (−Cm), A0 = arg min ‖A‖∗ subj A(i, j) = A0(i, j) ∀ (i, j) ∈ Υ, (6) and the minimizer is uniquely defined. Relationship to existing work. Contemporaneous results due to =-=[25]-=- show that for A0 distributed according to the random orthogonal model, and E0 with Bernoulli support, correct recovery occurs with high probability provided ‖E0‖0 ≤ C m 1.5 log(m) −1 max(r, log m) −1... |

11 | Spectral norm of products of random and deterministic matrices
- Vershynin
(Show Context)
Citation Context ... 3.18, there is such a net with size at most exp(4m). Moving from ‖A‖2,2 = supx,y∈Sm−1 x∗Ay to our product of nets loses at most a constant factor in the estimate: ‖A‖2,2 ≤ 4 supx,y∈N x∗Ay (see e.g., =-=[29]-=- Proposition 2.6). We will argue that for our A of interest, f(A) = x∗Ay concentrates, and union bound over all exp(8m) pairs in N × N. 4.2.2 Representation and uniqueness of W0. Lemma 4.6 (Representa... |

9 |
Sparse signal recovery using Markov random fields
- Cehver, Duarte, et al.
(Show Context)
Citation Context ...mple, general tool to effectively separate low-dimensional and sparse structures occurring in real visual data. Appropriately harnessing additional structure (e.g., the spatial coherence of the error =-=[28]-=-) may yield even more effective algorithms. 6‖Â−A0‖F if the recovered Â satisfies < 0.01. Figure 1 (left) plots the fraction of correct recoveries. ‖A0‖F White denotes perfect recovery in all experim... |

2 |
Robust principal component analysis,” http://arxiv.org/abs/0912.3599
- Candes, Li, et al.
- 2009
(Show Context)
Citation Context ...trengthened. The paper asserted that the convex programming heuristic successfully recovers certain random matrices of rank r < Cm/ log m from errors affecting ρm 2 of the m 2 entries. New results in =-=[1]-=- remove the necessity to assume any random model on the matrix to be recovered. All that needs to be assumed is that the singular vectors are incoherent with the standard basis, in the sense of [2, 3]... |

2 |
Dense error correction via ℓ 1 -minimization. submitted to
- Wright, Ma
- 2008
(Show Context)
Citation Context ...sult is a probabilistic analysis of an iterative refinement technique for producing a dual vector that certifies optimality of the pair (A0, E0). This proof technique is related to techniques used in =-=[7, 30]-=-, with additional care required to handle an operator norm constraint arising from the presence of the nuclear norm in (2). Finally, while Theorem 2.5 is merely a byproduct of our analysis and not the... |

1 |
Efficient and guaranteed rank minimizatrion by atomic decomposition
- Lee, Bresler
- 2009
(Show Context)
Citation Context ...on on GPUs or distributed implementation across multiple machines are other interesting avenues for future improvements. Finally, in the spirit of the recent work of [25] on sparse reconstruction and =-=[23]-=- on rank minimization, it may be possible to find efficient greedy algorithms for robust PCA whose error correction capability approaches that of the semidefinite program (2). Acknowledgement The auth... |