Results 1 - 10
of
16
The noisy power method: A meta algorithm with applications
- In NIPS
, 2014
"... We provide a new robust convergence analysis of the well-known power method for computing the dominant singular vectors of a matrix that we call the noisy power method. Our result characterizes the convergence behavior of the algorithm when a significant amount noise is introduced after each matrix- ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We provide a new robust convergence analysis of the well-known power method for computing the dominant singular vectors of a matrix that we call the noisy power method. Our result characterizes the convergence behavior of the algorithm when a significant amount noise is introduced after each matrix-vector multiplication. The noisy power method can be seen as a meta-algorithm that has recently found a number of important applications in a broad range of machine learning problems including alternating minimization for matrix completion, streaming principal component analysis (PCA), and privacy-preserving spectral analysis. Our general analysis subsumes several existing ad-hoc convergence bounds and resolves a number of open problems in multiple applications: Streaming PCA. A recent work of Mitliagkas et al. (NIPS 2013) gives a space-efficient algorithm for PCA in a streaming model where samples are drawn from a gaussian spiked covariance model. We give a simpler and more general analysis that applies to arbitrary distributions confirming experimental evidence of Mitliagkas et al. Moreover, even in the spiked covariance model our result gives quantitative improvements in a natural parameter regime. It is also notably simpler and follows easily from our general convergence analysis of the noisy power method together with a matrix Chernoff bound. Private PCA. We provide the first nearly-linear time algorithm for the problem of differentially private principal component analysis that achieves nearly tight worst-case error bounds. Complementing our worst-case bounds, we show that the error dependence of our algorithm on the matrix dimension can be replaced by an essentially tight dependence on the coherence of the matrix. This result resolves the main problem left open by Hardt and Roth (STOC 2013). The coherence is always bounded by the matrix dimension but often substantially smaller thus leading to strong average-case improvements over the optimal worst-case bound. 1
Matrix completion and low-rank svd via fast alternating least squares,” arXiv preprint arXiv:1410.2596
, 2014
"... Abstract The matrix-completion problem has attracted a lot of attention, largely as a result of the celebrated Netflix competition. Two popular approaches for solving the problem are nuclear-norm-regularized matrix approximation ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
Abstract The matrix-completion problem has attracted a lot of attention, largely as a result of the celebrated Netflix competition. Two popular approaches for solving the problem are nuclear-norm-regularized matrix approximation
Global convergence of stochastic gradient descent for some non-convex matrix problems. arXiv preprint arXiv:1411.1134,
, 2014
"... Abstract Stochastic gradient descent (SGD) on a low-rank factorization ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Abstract Stochastic gradient descent (SGD) on a low-rank factorization
On the power of adaptivity in matrix completion and approximation
, 2014
"... We consider the related tasks of matrix completion and matrix approximation from missing data and propose adaptive sampling procedures for both problems. We show that adaptive sampling allows one to eliminate standard incoherence assumptions on the matrix row space that are necessary for passive sam ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
We consider the related tasks of matrix completion and matrix approximation from missing data and propose adaptive sampling procedures for both problems. We show that adaptive sampling allows one to eliminate standard incoherence assumptions on the matrix row space that are necessary for passive sam-pling procedures. For exact recovery of a low-rank matrix, our algorithm judiciously selects a few columns to observe in full and, with few additional measurements, projects the remaining columns onto their span. This algorithm exactly recovers an n × n rank r matrix using O(nrµ0 log2(r)) observations, where µ0 is a coherence parameter on the column space of the matrix. In addition to completely eliminating any row space assumptions that have pervaded the literature, this algorithm enjoys a better sample complexity than any existing matrix completion algorithm. To certify that this improvement is due to adaptive sampling, we establish that row space coherence is necessary for passive sampling algorithms to achieve non-trivial sample complexity bounds. For constructing a low-rank approximation to a high-rank input matrix, we propose a simple algorithm that thresholds the singular values of a zero-filled version of the input matrix. The algorithm computes an approximation that is nearly as good as the best rank-r approximation using O(nrµ log2(n)) samples, where µ is a slightly different coherence parameter on the matrix columns. Again we eliminate assump-tions on the row space. 1
Low-rank Solutions of Linear Matrix Equations via Procrustes Flow
, 2015
"... In this paper we study the problem of recovering an low-rank positive semidefinite matrix from linear measurements. Our algorithm, which we call Procrustes Flow, starts from an ini-tial estimate obtained by a thresholding scheme followed by gradient descent on a non-convex objective. We show that as ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we study the problem of recovering an low-rank positive semidefinite matrix from linear measurements. Our algorithm, which we call Procrustes Flow, starts from an ini-tial estimate obtained by a thresholding scheme followed by gradient descent on a non-convex objective. We show that as long as the measurements obey a standard restricted isometry property, our algorithm converges to the unknown matrix at a geometric rate. In the case of Gaussian measurements, such convergence occurs for a n×n matrix of rank r when the number of measurements exceeds a constant times nr. 1
Matrix completion from fewer entries: Spectral detectability and rank estimation.
- Advances in Neural Information Processing Systems 28,
, 2015
"... Abstract The completion of low rank matrices from few entries is a task with many practical applications. We consider here two aspects of this problem: detectability, i.e. the ability to estimate the rank r reliably from the fewest possible random entries, and performance in achieving small reconst ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract The completion of low rank matrices from few entries is a task with many practical applications. We consider here two aspects of this problem: detectability, i.e. the ability to estimate the rank r reliably from the fewest possible random entries, and performance in achieving small reconstruction error. We propose a spectral algorithm for these two tasks called MaCBetH (for Matrix Completion with the Bethe Hessian). The rank is estimated as the number of negative eigenvalues of the Bethe Hessian matrix, and the corresponding eigenvectors are used as initial condition for the minimization of the discrepancy between the estimated matrix and the revealed entries. We analyze the performance in a random matrix setting using results from the statistical mechanics of the Hopfield neural network, and show in particular that MaCBetH efficiently detects the rank r of a large n × m matrix from C(r)r √ nm entries, where C(r) is a constant close to 1. We also evaluate the corresponding root-mean-square error empirically and show that MaCBetH compares favorably to other existing approaches. Matrix completion is the task of inferring the missing entries of a matrix given a subset of known entries. Typically, this is possible because the matrix to be completed has (at least approximately) low rank r. This problem has witnessed a burst of activity, see e.g. The first question we address is detectability: how many random entries do we need to reveal in order to be able to estimate the rank r reliably. This is motivated by the more generic problem of detecting structure (in our case, low rank) hidden in partially observed data. It is reasonable to expect the existence of a region where exact completion is hard or even impossible yet the rank estimation is tractable. A second question we address is what is the minimum achievable root-mean-square error (RMSE) in estimating the unknown elements of the matrix. In practice, even if exact reconstruction is not possible, having a procedure that provides a very small RMSE might be quite sufficient. In this paper we propose an algorithm called MaCBetH that gives the best known empirical performance for the two tasks above when the rank r is small. The rank in our algorithm is estimated as the number of negative eigenvalues of an associated Bethe Hessian matrix
Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent
"... Abstract Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have been restricted to the offline setting where they pr ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have been restricted to the offline setting where they provide an estimate of the unknown matrix using all observations simultaneously. However, in many applications, the online version, where we observe one entry at a time and dynamically update our estimate, is more appealing. While existing algorithms are efficient for the offline setting, they could be highly inefficient for the online setting. In this paper, we propose the first provable, efficient online algorithm for matrix completion. Our algorithm starts from an initial estimate of the matrix and then performs non-convex stochastic gradient descent (SGD). After every observation, it performs a fast update involving only one row of two tall matrices, giving near linear total runtime. Our algorithm can be naturally used in the offline setting as well, where it gives competitive sample complexity and runtime to state of the art algorithms. Our proofs introduce a general framework to show that SGD updates tend to stay away from saddle surfaces and could be of broader interests to other non-convex problems.
Dynamic matrix recovery from incomplete observations under an exact low-rank constraint
"... Abstract Low-rank matrix factorizations arise in a wide variety of applications -including recommendation systems, topic models, and source separation, to name just a few. In these and many other applications, it has been widely noted that by incorporating temporal information and allowing for the ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Low-rank matrix factorizations arise in a wide variety of applications -including recommendation systems, topic models, and source separation, to name just a few. In these and many other applications, it has been widely noted that by incorporating temporal information and allowing for the possibility of time-varying models, significant improvements are possible in practice. However, despite the reported superior empirical performance of these dynamic models over their static counterparts, there is limited theoretical justification for introducing these more complex models. In this paper we aim to address this gap by studying the problem of recovering a dynamically evolving low-rank matrix from incomplete observations. First, we propose the locally weighted matrix smoothing (LOWEMS) framework as one possible approach to dynamic matrix recovery. We then establish error bounds for LOWEMS in both the matrix sensing and matrix completion observation models. Our results quantify the potential benefits of exploiting dynamic constraints both in terms of recovery accuracy and sample complexity. To illustrate these benefits we provide both synthetic and real-world experimental results.
Recovery guarantee of weighted low-rank approximation via alternating minimization
"... Abstract Many applications require recovering a ground truth low-rank matrix from noisy observations of the entries, which in practice is typically formulated as a weighted low-rank approximation problem and solved by non-convex optimization heuristics such as alternating minimization. In this pape ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Many applications require recovering a ground truth low-rank matrix from noisy observations of the entries, which in practice is typically formulated as a weighted low-rank approximation problem and solved by non-convex optimization heuristics such as alternating minimization. In this paper, we provide provable recovery guarantee of weighted low-rank via a simple alternating minimization algorithm. In particular, for a natural class of matrices and weights and without any assumption on the noise, we bound the spectral norm of the difference between the recovered matrix and the ground truth, by the spectral norm of the weighted noise plus an additive error term that decreases exponentially with the number of rounds of alternating minimization, from either initialization by SVD or, more importantly, random initialization. These provide the first theoretical results for weighted low-rank approximation via alternating minimization with nonbinary deterministic weights, significantly generalizing those for matrix completion, the special case with binary weights, since our assumptions are similar or weaker than those made in existing works. Furthermore, this is achieved by a very simple algorithm that improves the vanilla alternating minimization with a simple clipping step.
Noisy Tensor Completion via the Sum-of-Squares Hierarchy
, 2016
"... Abstract In the noisy tensor completion problem we observe m entries (whose location is chosen uniformly at random) from an unknown n 1 × n 2 × n 3 tensor T . We assume that T is entry-wise close to being rank r. Our goal is to fill in its missing entries using as few observations as possible. Let ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract In the noisy tensor completion problem we observe m entries (whose location is chosen uniformly at random) from an unknown n 1 × n 2 × n 3 tensor T . We assume that T is entry-wise close to being rank r. Our goal is to fill in its missing entries using as few observations as possible. Let n = max(n 1 , n 2 , n 3 ). We show that if m = n 3/2 r then there is a polynomial time algorithm based on the sixth level of the sum-of-squares hierarchy for completing it. Our estimate agrees with almost all of T 's entries almost exactly and works even when our observations are corrupted by noise. This is also the first algorithm for tensor completion that works in the overcomplete case when r > n, and in fact it works all the way up to r = n 3/2− . Our proofs are short and simple and are based on establishing a new connection between noisy tensor completion (through the language of Rademacher complexity) and the task of refuting random constant satisfaction problems. This connection seems to have gone unnoticed even in the context of matrix completion. Furthermore, we use this connection to show matching lower bounds. Our main technical result is in characterizing the Rademacher complexity of the sequence of norms that arise in the sum-of-squares relaxations to the tensor nuclear norm. These results point to an interesting new direction: Can we explore computational vs. sample complexity tradeoffs through the sum-of-squares hierarchy?