Results 1  10
of
12
The noisy power method: A meta algorithm with applications
 In NIPS
, 2014
"... We provide a new robust convergence analysis of the wellknown power method for computing the dominant singular vectors of a matrix that we call the noisy power method. Our result characterizes the convergence behavior of the algorithm when a significant amount noise is introduced after each matrix ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
We provide a new robust convergence analysis of the wellknown power method for computing the dominant singular vectors of a matrix that we call the noisy power method. Our result characterizes the convergence behavior of the algorithm when a significant amount noise is introduced after each matrixvector multiplication. The noisy power method can be seen as a metaalgorithm that has recently found a number of important applications in a broad range of machine learning problems including alternating minimization for matrix completion, streaming principal component analysis (PCA), and privacypreserving spectral analysis. Our general analysis subsumes several existing adhoc convergence bounds and resolves a number of open problems in multiple applications: Streaming PCA. A recent work of Mitliagkas et al. (NIPS 2013) gives a spaceefficient algorithm for PCA in a streaming model where samples are drawn from a gaussian spiked covariance model. We give a simpler and more general analysis that applies to arbitrary distributions confirming experimental evidence of Mitliagkas et al. Moreover, even in the spiked covariance model our result gives quantitative improvements in a natural parameter regime. It is also notably simpler and follows easily from our general convergence analysis of the noisy power method together with a matrix Chernoff bound. Private PCA. We provide the first nearlylinear time algorithm for the problem of differentially private principal component analysis that achieves nearly tight worstcase error bounds. Complementing our worstcase bounds, we show that the error dependence of our algorithm on the matrix dimension can be replaced by an essentially tight dependence on the coherence of the matrix. This result resolves the main problem left open by Hardt and Roth (STOC 2013). The coherence is always bounded by the matrix dimension but often substantially smaller thus leading to strong averagecase improvements over the optimal worstcase bound. 1
Global convergence of stochastic gradient descent for some nonconvex matrix problems. arXiv preprint arXiv:1411.1134,
, 2014
"... Abstract Stochastic gradient descent (SGD) on a lowrank factorization ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract Stochastic gradient descent (SGD) on a lowrank factorization
A stochastic PCA algorithm with an exponential convergence rate
 CoRR, abs/1409.2848, 2014. URL http://arxiv.org/abs/1409.2848
"... We describe and analyze a simple algorithm for principal component analysis, VRPCA, which uses computationally cheap stochastic iterations, yet converges exponentially fast to the optimal solution. In contrast, existing algorithms suffer either from slow convergence, or computationally intensive it ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
We describe and analyze a simple algorithm for principal component analysis, VRPCA, which uses computationally cheap stochastic iterations, yet converges exponentially fast to the optimal solution. In contrast, existing algorithms suffer either from slow convergence, or computationally intensive iterations whose runtime scales with the data size. The algorithm builds on a recent variancereduced stochastic gradient technique, which was previously analyzed for strongly convex optimization, whereas here we apply it to the nonconvex PCA problem, using a very different analysis. 1
Online Learning of Eigenvectors
"... Computing the leading eigenvector of a symmetric real matrix is a fundamental primitive of numerical linear algebra with numerous applications. We consider a natural online extension of the leading eigenvector problem: a sequence of matrices is presented and the goal is to predict for each matri ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Computing the leading eigenvector of a symmetric real matrix is a fundamental primitive of numerical linear algebra with numerous applications. We consider a natural online extension of the leading eigenvector problem: a sequence of matrices is presented and the goal is to predict for each matrix a unit vector, with the overall goal of competing with the leading eigenvector of the cumulative matrix. Existing regretminimization algorithms for this problem either require to compute an eigen decompostion every iteration, or suffer from a large dependency of the regret bound on the dimension. In both cases the algorithms are not practical for large scale applications. In this paper we present new algorithms that avoid both issues. On one hand they do not require any expensive matrix decompositions and on the other, they guarantee regret rates with a mild dependence on the dimension at most. In contrast to previous algorithms, our algorithms also admit implementations that enable to leverage sparsity in the data to further reduce computation. We extend our results to also handle nonsymmetric matrices. 1.
Online principal components analysis
 In Proceedings of the TwentySixth Annual ACMSIAM Symposium on Discrete Algorithms, SODA 2015
"... We consider the online version of the well known Principal Component Analysis (PCA) problem. In standard PCA, the input to the problem is a set of ddimensional vectors X = [x1,...,xn] and a target dimension k < d; the output is a set of kdimensional vectors Y = [y1,...,yn] that minimize the ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We consider the online version of the well known Principal Component Analysis (PCA) problem. In standard PCA, the input to the problem is a set of ddimensional vectors X = [x1,...,xn] and a target dimension k < d; the output is a set of kdimensional vectors Y = [y1,...,yn] that minimize the reconstruction error: minΦ i ‖xi − Φyi‖22. Here, Φ ∈ Rd×k is restricted to being isometric. The global minimum of this quantity, OPTk, is obtainable by offline PCA. In online PCA (OPCA) the setting is identical except for two differences: i) the vectors xt are presented to the algorithm one by one and for every presented xt the algorithm must output a vector yt before receiving xt+1; ii) the output vectors yt are ` dimensional with ` ≥ k to compensate for the handicap of operating online. To the best of our knowledge, this paper is the first to consider this setting of OPCA. Our algorithm produces yt ∈ R ` with ` = O(k · poly(1/ε)) such that ALG ≤ OPTk +ε‖X‖2F. 1
Scale up nonlinear component analysis with doubly stochastic gradients.
 In NIPS,
, 2015
"... Abstract Nonlinear component analysis such as kernel Principle Component Analysis (KPCA) and kernel Canonical Correlation Analysis (KCCA) are widely used in machine learning, statistics and data analysis, but they cannot scale up to big datasets. Recent attempts have employed random feature approxi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract Nonlinear component analysis such as kernel Principle Component Analysis (KPCA) and kernel Canonical Correlation Analysis (KCCA) are widely used in machine learning, statistics and data analysis, but they cannot scale up to big datasets. Recent attempts have employed random feature approximations to convert the problem to the primal form for linear computational complexity. However, to obtain high quality solutions, the number of random features should be the same order of magnitude as the number of data points, making such approach not directly applicable to the regime with millions of data points. We propose a simple, computationally efficient, and memory friendly algorithm based on the "doubly stochastic gradients" to scale up a range of kernel nonlinear component analysis, such as kernel PCA, CCA and SVD. Despite the nonconvex nature of these problems, our method enjoys theoretical guarantees that it converges at the rateÕ(1/t) to the global optimum, even for the top k eigen subspace. Unlike many alternatives, our algorithm does not require explicit orthogonalization, which is infeasible on big datasets. We demonstrate the effectiveness and scalability of our algorithm on large scale synthetic and real world datasets.
Online PCA with Spectral Bounds
"... This paper revisits the online PCA problem. Given a stream of n vectors xt ∈ Rd (columns of X) the algorithm must output yt ∈ R ` (columns of Y) before receiving xt+1. The goal of online PCA is to simultaneously minimize the target dimension ` and the error ‖X − (XY +)Y ‖2. We describe two simple an ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This paper revisits the online PCA problem. Given a stream of n vectors xt ∈ Rd (columns of X) the algorithm must output yt ∈ R ` (columns of Y) before receiving xt+1. The goal of online PCA is to simultaneously minimize the target dimension ` and the error ‖X − (XY +)Y ‖2. We describe two simple and deterministic algorithms. The first, receives a parameter ∆ and guaranties that ‖X − (XY +)Y ‖2 is not significantly larger than ∆. It requires a target dimension of ` = O(k/ε) for any k, ε such that ∆ ≥ εσ21 +σ2k+1. The second receives k and ε and guaranties that ‖X − (XY +)Y ‖2 ≤ εσ21 + σ2k+1. It requires a target dimension of O(k logn/ε2). Different models and algorithms for Online PCA were considered in the past. This is the first that achieves a bound on the spectral norm of the residual matrix.
Efficient Second Order Online Learning by Sketching
"... Abstract We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for illconditioned data. SON is an enhanced version of the Online Newton Step, which, via sketching techniques enjoys a running time linear in the dimens ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for illconditioned data. SON is an enhanced version of the Online Newton Step, which, via sketching techniques enjoys a running time linear in the dimension and sketch size. We further develop sparse forms of the sketching methods (such as Oja's rule), making the computation linear in the sparsity of features. Together, the algorithm eliminates all computational obstacles in previous second order online learning approaches.
Streaming PCA: Matching Matrix Bernstein and NearOptimal Finite Sample Guarantees for Oja's Algorithm
"... Abstract This work provides improved guarantees for streaming principle component analysis (PCA). Given A 1 , . . . , A n ∈ R d×d sampled independently from distributions satisfying E [A i ] = Σ for Σ 0, this work provides an O(d)space lineartime singlepass streaming algorithm for estimating the ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract This work provides improved guarantees for streaming principle component analysis (PCA). Given A 1 , . . . , A n ∈ R d×d sampled independently from distributions satisfying E [A i ] = Σ for Σ 0, this work provides an O(d)space lineartime singlepass streaming algorithm for estimating the top eigenvector of Σ. The algorithm nearly matches (and in certain cases improves upon) the accuracy obtained by the standard batch method that computes top eigenvector of the empirical covariance 1 n i∈[n] A i as analyzed by the matrix Bernstein inequality. Moreover, to achieve constant accuracy, our algorithm improves upon the best previous known sample complexities of streaming algorithms by either a multiplicative factor of O(d) or 1/gap where gap is the relative distance between the top two eigenvalues of Σ. These results are achieved through a novel analysis of the classic Oja's algorithm, one of the oldest and most popular algorithms for streaming PCA. In particular, this work shows that simply picking a random initial point w 0 and applying the update rule w i+1 = w i + η i A i w i suffices to accurately estimate the top eigenvector, with a suitable choice of η i . We believe our result sheds light on how to efficiently perform streaming PCA both in theory and in practice, and we hope that our analysis may serve as the basis for analyzing many variants and extensions of streaming PCA. * Microsoft Research India.
Convergence of Stochastic Gradient Descent for PCA
"... Abstract We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i.i.d. data points in R d . A simple and computationally cheap algorithm for this is stochastic gradi ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i.i.d. data points in R d . A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the nonconvex nature of the problem, analyzing its performance has been a challenge. In particular, existing guarantees rely on a nontrivial eigengap assumption on the covariance matrix, which is intuitively unnecessary. In this paper, we provide (to the best of our knowledge) the first eigengapfree convergence guarantees for SGD in the context of PCA. This also partially resolves an open problem posed in