Results 1  10
of
16
The noisy power method: A meta algorithm with applications
 In NIPS
, 2014
"... We provide a new robust convergence analysis of the wellknown power method for computing the dominant singular vectors of a matrix that we call the noisy power method. Our result characterizes the convergence behavior of the algorithm when a significant amount noise is introduced after each matrix ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We provide a new robust convergence analysis of the wellknown power method for computing the dominant singular vectors of a matrix that we call the noisy power method. Our result characterizes the convergence behavior of the algorithm when a significant amount noise is introduced after each matrixvector multiplication. The noisy power method can be seen as a metaalgorithm that has recently found a number of important applications in a broad range of machine learning problems including alternating minimization for matrix completion, streaming principal component analysis (PCA), and privacypreserving spectral analysis. Our general analysis subsumes several existing adhoc convergence bounds and resolves a number of open problems in multiple applications: Streaming PCA. A recent work of Mitliagkas et al. (NIPS 2013) gives a spaceefficient algorithm for PCA in a streaming model where samples are drawn from a gaussian spiked covariance model. We give a simpler and more general analysis that applies to arbitrary distributions confirming experimental evidence of Mitliagkas et al. Moreover, even in the spiked covariance model our result gives quantitative improvements in a natural parameter regime. It is also notably simpler and follows easily from our general convergence analysis of the noisy power method together with a matrix Chernoff bound. Private PCA. We provide the first nearlylinear time algorithm for the problem of differentially private principal component analysis that achieves nearly tight worstcase error bounds. Complementing our worstcase bounds, we show that the error dependence of our algorithm on the matrix dimension can be replaced by an essentially tight dependence on the coherence of the matrix. This result resolves the main problem left open by Hardt and Roth (STOC 2013). The coherence is always bounded by the matrix dimension but often substantially smaller thus leading to strong averagecase improvements over the optimal worstcase bound. 1
Matrix completion and lowrank svd via fast alternating least squares,” arXiv preprint arXiv:1410.2596
, 2014
"... Abstract The matrixcompletion problem has attracted a lot of attention, largely as a result of the celebrated Netflix competition. Two popular approaches for solving the problem are nuclearnormregularized matrix approximation ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract The matrixcompletion problem has attracted a lot of attention, largely as a result of the celebrated Netflix competition. Two popular approaches for solving the problem are nuclearnormregularized matrix approximation
Global convergence of stochastic gradient descent for some nonconvex matrix problems. arXiv preprint arXiv:1411.1134,
, 2014
"... Abstract Stochastic gradient descent (SGD) on a lowrank factorization ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract Stochastic gradient descent (SGD) on a lowrank factorization
On the power of adaptivity in matrix completion and approximation
, 2014
"... We consider the related tasks of matrix completion and matrix approximation from missing data and propose adaptive sampling procedures for both problems. We show that adaptive sampling allows one to eliminate standard incoherence assumptions on the matrix row space that are necessary for passive sam ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
We consider the related tasks of matrix completion and matrix approximation from missing data and propose adaptive sampling procedures for both problems. We show that adaptive sampling allows one to eliminate standard incoherence assumptions on the matrix row space that are necessary for passive sampling procedures. For exact recovery of a lowrank matrix, our algorithm judiciously selects a few columns to observe in full and, with few additional measurements, projects the remaining columns onto their span. This algorithm exactly recovers an n × n rank r matrix using O(nrµ0 log2(r)) observations, where µ0 is a coherence parameter on the column space of the matrix. In addition to completely eliminating any row space assumptions that have pervaded the literature, this algorithm enjoys a better sample complexity than any existing matrix completion algorithm. To certify that this improvement is due to adaptive sampling, we establish that row space coherence is necessary for passive sampling algorithms to achieve nontrivial sample complexity bounds. For constructing a lowrank approximation to a highrank input matrix, we propose a simple algorithm that thresholds the singular values of a zerofilled version of the input matrix. The algorithm computes an approximation that is nearly as good as the best rankr approximation using O(nrµ log2(n)) samples, where µ is a slightly different coherence parameter on the matrix columns. Again we eliminate assumptions on the row space. 1
Lowrank Solutions of Linear Matrix Equations via Procrustes Flow
, 2015
"... In this paper we study the problem of recovering an lowrank positive semidefinite matrix from linear measurements. Our algorithm, which we call Procrustes Flow, starts from an initial estimate obtained by a thresholding scheme followed by gradient descent on a nonconvex objective. We show that as ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In this paper we study the problem of recovering an lowrank positive semidefinite matrix from linear measurements. Our algorithm, which we call Procrustes Flow, starts from an initial estimate obtained by a thresholding scheme followed by gradient descent on a nonconvex objective. We show that as long as the measurements obey a standard restricted isometry property, our algorithm converges to the unknown matrix at a geometric rate. In the case of Gaussian measurements, such convergence occurs for a n×n matrix of rank r when the number of measurements exceeds a constant times nr. 1
Matrix completion from fewer entries: Spectral detectability and rank estimation.
 Advances in Neural Information Processing Systems 28,
, 2015
"... Abstract The completion of low rank matrices from few entries is a task with many practical applications. We consider here two aspects of this problem: detectability, i.e. the ability to estimate the rank r reliably from the fewest possible random entries, and performance in achieving small reconst ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract The completion of low rank matrices from few entries is a task with many practical applications. We consider here two aspects of this problem: detectability, i.e. the ability to estimate the rank r reliably from the fewest possible random entries, and performance in achieving small reconstruction error. We propose a spectral algorithm for these two tasks called MaCBetH (for Matrix Completion with the Bethe Hessian). The rank is estimated as the number of negative eigenvalues of the Bethe Hessian matrix, and the corresponding eigenvectors are used as initial condition for the minimization of the discrepancy between the estimated matrix and the revealed entries. We analyze the performance in a random matrix setting using results from the statistical mechanics of the Hopfield neural network, and show in particular that MaCBetH efficiently detects the rank r of a large n × m matrix from C(r)r √ nm entries, where C(r) is a constant close to 1. We also evaluate the corresponding rootmeansquare error empirically and show that MaCBetH compares favorably to other existing approaches. Matrix completion is the task of inferring the missing entries of a matrix given a subset of known entries. Typically, this is possible because the matrix to be completed has (at least approximately) low rank r. This problem has witnessed a burst of activity, see e.g. The first question we address is detectability: how many random entries do we need to reveal in order to be able to estimate the rank r reliably. This is motivated by the more generic problem of detecting structure (in our case, low rank) hidden in partially observed data. It is reasonable to expect the existence of a region where exact completion is hard or even impossible yet the rank estimation is tractable. A second question we address is what is the minimum achievable rootmeansquare error (RMSE) in estimating the unknown elements of the matrix. In practice, even if exact reconstruction is not possible, having a procedure that provides a very small RMSE might be quite sufficient. In this paper we propose an algorithm called MaCBetH that gives the best known empirical performance for the two tasks above when the rank r is small. The rank in our algorithm is estimated as the number of negative eigenvalues of an associated Bethe Hessian matrix
Provable Efficient Online Matrix Completion via Nonconvex Stochastic Gradient Descent
"... Abstract Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have been restricted to the offline setting where they pr ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have been restricted to the offline setting where they provide an estimate of the unknown matrix using all observations simultaneously. However, in many applications, the online version, where we observe one entry at a time and dynamically update our estimate, is more appealing. While existing algorithms are efficient for the offline setting, they could be highly inefficient for the online setting. In this paper, we propose the first provable, efficient online algorithm for matrix completion. Our algorithm starts from an initial estimate of the matrix and then performs nonconvex stochastic gradient descent (SGD). After every observation, it performs a fast update involving only one row of two tall matrices, giving near linear total runtime. Our algorithm can be naturally used in the offline setting as well, where it gives competitive sample complexity and runtime to state of the art algorithms. Our proofs introduce a general framework to show that SGD updates tend to stay away from saddle surfaces and could be of broader interests to other nonconvex problems.
Dynamic matrix recovery from incomplete observations under an exact lowrank constraint
"... Abstract Lowrank matrix factorizations arise in a wide variety of applications including recommendation systems, topic models, and source separation, to name just a few. In these and many other applications, it has been widely noted that by incorporating temporal information and allowing for the ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Lowrank matrix factorizations arise in a wide variety of applications including recommendation systems, topic models, and source separation, to name just a few. In these and many other applications, it has been widely noted that by incorporating temporal information and allowing for the possibility of timevarying models, significant improvements are possible in practice. However, despite the reported superior empirical performance of these dynamic models over their static counterparts, there is limited theoretical justification for introducing these more complex models. In this paper we aim to address this gap by studying the problem of recovering a dynamically evolving lowrank matrix from incomplete observations. First, we propose the locally weighted matrix smoothing (LOWEMS) framework as one possible approach to dynamic matrix recovery. We then establish error bounds for LOWEMS in both the matrix sensing and matrix completion observation models. Our results quantify the potential benefits of exploiting dynamic constraints both in terms of recovery accuracy and sample complexity. To illustrate these benefits we provide both synthetic and realworld experimental results.
Recovery guarantee of weighted lowrank approximation via alternating minimization
"... Abstract Many applications require recovering a ground truth lowrank matrix from noisy observations of the entries, which in practice is typically formulated as a weighted lowrank approximation problem and solved by nonconvex optimization heuristics such as alternating minimization. In this pape ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Many applications require recovering a ground truth lowrank matrix from noisy observations of the entries, which in practice is typically formulated as a weighted lowrank approximation problem and solved by nonconvex optimization heuristics such as alternating minimization. In this paper, we provide provable recovery guarantee of weighted lowrank via a simple alternating minimization algorithm. In particular, for a natural class of matrices and weights and without any assumption on the noise, we bound the spectral norm of the difference between the recovered matrix and the ground truth, by the spectral norm of the weighted noise plus an additive error term that decreases exponentially with the number of rounds of alternating minimization, from either initialization by SVD or, more importantly, random initialization. These provide the first theoretical results for weighted lowrank approximation via alternating minimization with nonbinary deterministic weights, significantly generalizing those for matrix completion, the special case with binary weights, since our assumptions are similar or weaker than those made in existing works. Furthermore, this is achieved by a very simple algorithm that improves the vanilla alternating minimization with a simple clipping step.
Noisy Tensor Completion via the SumofSquares Hierarchy
, 2016
"... Abstract In the noisy tensor completion problem we observe m entries (whose location is chosen uniformly at random) from an unknown n 1 × n 2 × n 3 tensor T . We assume that T is entrywise close to being rank r. Our goal is to fill in its missing entries using as few observations as possible. Let ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract In the noisy tensor completion problem we observe m entries (whose location is chosen uniformly at random) from an unknown n 1 × n 2 × n 3 tensor T . We assume that T is entrywise close to being rank r. Our goal is to fill in its missing entries using as few observations as possible. Let n = max(n 1 , n 2 , n 3 ). We show that if m = n 3/2 r then there is a polynomial time algorithm based on the sixth level of the sumofsquares hierarchy for completing it. Our estimate agrees with almost all of T 's entries almost exactly and works even when our observations are corrupted by noise. This is also the first algorithm for tensor completion that works in the overcomplete case when r > n, and in fact it works all the way up to r = n 3/2− . Our proofs are short and simple and are based on establishing a new connection between noisy tensor completion (through the language of Rademacher complexity) and the task of refuting random constant satisfaction problems. This connection seems to have gone unnoticed even in the context of matrix completion. Furthermore, we use this connection to show matching lower bounds. Our main technical result is in characterizing the Rademacher complexity of the sequence of norms that arise in the sumofsquares relaxations to the tensor nuclear norm. These results point to an interesting new direction: Can we explore computational vs. sample complexity tradeoffs through the sumofsquares hierarchy?