#### DMCA

## Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent

### Citations

868 | Exact matrix completion via convex optimization.
- Candes, Recht
- 2009
(Show Context)
Citation Context ... matrix we wish to recover would be user-item rating matrix where each row corresponds to a user and each column corresponds to an item. Each entry of the matrix is the rating given by a user to an item. Low rank assumption on the matrix is inspired by the intuition that rating of an item by a user depends on only a few hidden factors, which are much fewer than the number of users or items. The goal is to estimate the ratings of all items by users given only partial ratings of items by users, which would then be helpful in recommending new items to users. The seminal works of Candès and Recht [4] first identified regularity conditions under which low rank matrix completion can be solved in polynomial time using convex relaxation – low rank matrix completion could be ill-posed and NP-hard in general without such regularity assumptions [9]. Since then, a number of works have studied various algorithms under different settings for matrix completion: weighted and noisy matrix completion, fast convex solvers, fast iterative non-convex solvers, parallel and distributed algorithms and so on. Most of this work however deals only with the offline setting where all the observed entries are reve... |

772 |
Amazon.com Recommendations: Item-to-Item Collaborative Filtering,”
- Linden, Smith, et al.
- 2003
(Show Context)
Citation Context ...n – low rank matrix completion could be ill-posed and NP-hard in general without such regularity assumptions [9]. Since then, a number of works have studied various algorithms under different settings for matrix completion: weighted and noisy matrix completion, fast convex solvers, fast iterative non-convex solvers, parallel and distributed algorithms and so on. Most of this work however deals only with the offline setting where all the observed entries are revealed at once and the recovery procedure does computation using all these observations simultaneously. However in several applications [5, 18], we encounter the online setting where observations are 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. only revealed sequentially and at each step the recovery algorithm is required to maintain an estimate of the low rank matrix based on the observations so far. Consider for instance recommendation engines, where the low rank matrix we are interested in is the user-item rating matrix. While we make an observation only when a user rates an item, at any point of time, we should have an estimate of the user-item rating matrix based on all prior observatio... |

326 | Online learning for matrix factorization and sparse coding
- Mairal, Bach, et al.
(Show Context)
Citation Context ...iant of online matrix completion studied in the literature is where observations are made on a column by column basis e.g., [16, 26]. These models can give improved offline performance in terms of space and could potentially work under relaxed regularity conditions. However, they do not tackle the version where only entries (as opposed to columns) are observed. Non-convex optimization: Over the last few years, there has also been a significant amount of work in designing other efficient algorithms for solving non-convex problems. Examples include eigenvector computation [6, 11], sparse coding [20, 1] etc. For general non-convex optimization, an interesting line of recent work is that of [7], which proves gradient descent with noise can also escape saddle point, but they only provide polynomial rate without explicit dependence. Later [17, 21] show that without noise, the space of points from where gradient descent converges to a saddle point is a measure zero set. However, they do not provide a rate of convergence. Another related piece of work to ours is [10], proves global convergence along with rates of convergence, for the special case of computing matrix squareroot. 1.3 Outline The re... |

251 | User-friendly tail bounds for sums of random matrices,” 2780 - Tropp - 2012 |

157 | A Simpler Approach to Matrix Completion
- Recht
- 2009
(Show Context)
Citation Context ...ces. In order to do so, we consider distances from saddle surfaces, show that they behave like sub-martingales under SGD updates and use martingale convergence techniques to conclude that the iterates stay away from saddle surfaces. While [24] shows that SGD updates stay away from saddle surfaces, the stepsizes they can handle are 2 Table 1: Comparison of sample complexity and runtime of our algorithm with existing algorithms in order to obtain Frobenius norm error . O(·) hides log d factors. See Section 1.2 for more discussion. Algorithm Sample complexity Total runtime Online? Nuclear Norm [22] O(µdk) O(d3/ √ ) No Alternating minimization [14] O(µdkκ 8 log 1 ) O(µdk 2κ8 log 1 ) No Alternating minimization [8] O ( µdk2κ2 ( k + log 1 )) O ( µdk3κ2 ( k + log 1 )) No Projected gradient descent[12] O(µdk 5) O(µdk7 log 1 ) No SGD [24] O(µ2dk7κ6) poly(µ, d, k, κ) log 1 Yes Our result O ( µdkκ4 ( k + log 1 )) O ( µdk4κ4 log 1 ) Yes quite small (scaling as 1/poly(d1, d2)), leading to suboptimal computational complexity. Our framework makes it possible to establish the same statement for much larger step sizes, giving us near-optimal runtime. We believe these techniques ma... |

72 | Parallel stochastic gradient algorithms for large-scale matrix completion. Optimization Online
- Recht, Ré
- 2011
(Show Context)
Citation Context ... and Vt respectively by U (i) t+1 = U (i) t − 2ηd1d2 ( UtV > t −M ) ij V (j) t , and, V (j) t+1 = V (j) t − 2ηd1d2 ( UtV > t −M ) ij U (i) t , (2) where η is an appropriately chosen stepsize, and U(i) denote the ith row of matrix U. Note that each update modifies only one row of the factor matrices U and V, and the computation only involves one row of U,V and the new observed entry (M)ij and hence are extremely fast. These fast updates make SGD extremely appealing in practice. Moreover, SGD, in the context of matrix completion, is also useful for parallelization and distributed implementation [23]. 1.1 Our Contributions In this work we present the first provable efficient algorithm for online matrix completion by showing that SGD (2) with a good initialization converges to a true factorization of M at a geometric rate. Our main contributions are as follows. • We provide the first provable, efficient, online algorithm for matrix completion. Starting with a good initialization, after each observation, the algorithm makes quick updates each taking time O(k3) and requires O(µdkκ4(k + log ‖M‖F ) log d) observations to reach accuracy, where µ is the incoherence parameter, d = max(d1, d2),... |

68 | Fast online svd revisions for lightweight recommender systems.
- Brand
- 2003
(Show Context)
Citation Context ...ral passes of the entire data for every additional observation. This is simply not feasible in most settings. Another natural approach is to group observations into batches and do an update only once for each batch. This however induces a lag between observations and estimates which is undesirable. To the best of our knowledge, there is no known provable, efficient, online algorithm for matrix completion. On the other hand, in order to deal with the online matrix completion scenario in practical applications, several heuristics (with no convergence guarantees) have been proposed in literature [2, 19]. Most of these approaches are based on starting with an estimate of the matrix and doing fast updates of this estimate whenever a new observation is presented. One of the update procedures used in this context is that of stochastic gradient descent (SGD) applied to the following non-convex optimization problem min U,V ‖M−UV>‖2F s.t. U ∈ Rd1×k,V ∈ Rd2×k, (1) where M is the unknown matrix of size d1 × d2, k is the rank of M and UV> is a low rank factorization of M we wish to obtain. The algorithm starts with some U0 and V0, and given a new observation (M)ij , SGD updates the ith-row and the jth... |

47 | The youtube video recommendation system.
- Davidson, Liebald, et al.
- 2010
(Show Context)
Citation Context ...n – low rank matrix completion could be ill-posed and NP-hard in general without such regularity assumptions [9]. Since then, a number of works have studied various algorithms under different settings for matrix completion: weighted and noisy matrix completion, fast convex solvers, fast iterative non-convex solvers, parallel and distributed algorithms and so on. Most of this work however deals only with the offline setting where all the observed entries are revealed at once and the recovery procedure does computation using all these observations simultaneously. However in several applications [5, 18], we encounter the online setting where observations are 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. only revealed sequentially and at each step the recovery algorithm is required to maintain an estimate of the low rank matrix based on the observations so far. Consider for instance recommendation engines, where the low rank matrix we are interested in is the user-item rating matrix. While we make an observation only when a user rates an item, at any point of time, we should have an estimate of the user-item rating matrix based on all prior observatio... |

43 | Robust video denoising using Low rank matrix completion,”
- Ji, Liu, et al.
- 2010
(Show Context)
Citation Context ...l matrices, giving near linear total runtime. Our algorithm can be naturally used in the offline setting as well, where it gives competitive sample complexity and runtime to state of the art algorithms. Our proofs introduce a general framework to show that SGD updates tend to stay away from saddle surfaces and could be of broader interests to other non-convex problems. 1 Introduction Low rank matrix completion refers to the problem of recovering a low rank matrix by observing the values of only a tiny fraction of its entries. This problem arises in several applications such as video denoising [13], phase retrieval [3] and most famously in movie recommendation engines [15]. In the context of recommendation engines for instance, the matrix we wish to recover would be user-item rating matrix where each row corresponds to a user and each column corresponds to an item. Each entry of the matrix is the rating given by a user to an item. Low rank assumption on the matrix is inspired by the intuition that rating of an item by a user depends on only a few hidden factors, which are much fewer than the number of users or items. The goal is to estimate the ratings of all items by users given only p... |

16 | Understanding alternating minimization for matrix completion.
- Hardt
- 2014
(Show Context)
Citation Context ...tes and use martingale convergence techniques to conclude that the iterates stay away from saddle surfaces. While [24] shows that SGD updates stay away from saddle surfaces, the stepsizes they can handle are 2 Table 1: Comparison of sample complexity and runtime of our algorithm with existing algorithms in order to obtain Frobenius norm error . O(·) hides log d factors. See Section 1.2 for more discussion. Algorithm Sample complexity Total runtime Online? Nuclear Norm [22] O(µdk) O(d3/ √ ) No Alternating minimization [14] O(µdkκ 8 log 1 ) O(µdk 2κ8 log 1 ) No Alternating minimization [8] O ( µdk2κ2 ( k + log 1 )) O ( µdk3κ2 ( k + log 1 )) No Projected gradient descent[12] O(µdk 5) O(µdk7 log 1 ) No SGD [24] O(µ2dk7κ6) poly(µ, d, k, κ) log 1 Yes Our result O ( µdkκ4 ( k + log 1 )) O ( µdk4κ4 log 1 ) Yes quite small (scaling as 1/poly(d1, d2)), leading to suboptimal computational complexity. Our framework makes it possible to establish the same statement for much larger step sizes, giving us near-optimal runtime. We believe these techniques may be applicable in other non-convex settings as well. 1.2 Related Work In this section we will mention some more related wor... |

14 | Low-rank matrix and tensor completion via adaptive sampling.
- Krishnamurthy, Singh
- 2013
(Show Context)
Citation Context ...tors. Our sample complexity is better than that of [14] and is incomparable to those of [8, 12]. To the best of our knowledge, the only provable online algorithm for this problem is that of Sun and Luo [24]. However the stepsizes they suggest are quite small, leading to suboptimal computational complexity by factors of poly(d1, d2). The runtime of our algorithm is linear in d, which makes poly(d) improvements over it. Other models for online matrix completion: Another variant of online matrix completion studied in the literature is where observations are made on a column by column basis e.g., [16, 26]. These models can give improved offline performance in terms of space and could potentially work under relaxed regularity conditions. However, they do not tackle the version where only entries (as opposed to columns) are observed. Non-convex optimization: Over the last few years, there has also been a significant amount of work in designing other efficient algorithms for solving non-convex problems. Examples include eigenvector computation [6, 11], sparse coding [20, 1] etc. For general non-convex optimization, an interesting line of recent work is that of [7], which proves gradient descent w... |

11 |
Efficient algorithms for collaborative filtering
- Keshavan
- 2012
(Show Context)
Citation Context ...ddle surfaces, show that they behave like sub-martingales under SGD updates and use martingale convergence techniques to conclude that the iterates stay away from saddle surfaces. While [24] shows that SGD updates stay away from saddle surfaces, the stepsizes they can handle are 2 Table 1: Comparison of sample complexity and runtime of our algorithm with existing algorithms in order to obtain Frobenius norm error . O(·) hides log d factors. See Section 1.2 for more discussion. Algorithm Sample complexity Total runtime Online? Nuclear Norm [22] O(µdk) O(d3/ √ ) No Alternating minimization [14] O(µdkκ 8 log 1 ) O(µdk 2κ8 log 1 ) No Alternating minimization [8] O ( µdk2κ2 ( k + log 1 )) O ( µdk3κ2 ( k + log 1 )) No Projected gradient descent[12] O(µdk 5) O(µdk7 log 1 ) No SGD [24] O(µ2dk7κ6) poly(µ, d, k, κ) log 1 Yes Our result O ( µdkκ4 ( k + log 1 )) O ( µdk4κ4 log 1 ) Yes quite small (scaling as 1/poly(d1, d2)), leading to suboptimal computational complexity. Our framework makes it possible to establish the same statement for much larger step sizes, giving us near-optimal runtime. We believe these techniques may be applicable in other non-convex settings as well.... |

10 | Guaranteed matrix completion via nonconvex factorization.
- Sun, Luo
- 2015
(Show Context)
Citation Context ...ntime linear in d, and is competitive to even the best existing offline results for matrix completion. (either improve over or is incomparable, i.e., better in some parameters and worse in others, to these results). See Table 1 for the comparison. • To obtain our results, we introduce a general framework to show SGD updates tend to stay away from saddle surfaces. In order to do so, we consider distances from saddle surfaces, show that they behave like sub-martingales under SGD updates and use martingale convergence techniques to conclude that the iterates stay away from saddle surfaces. While [24] shows that SGD updates stay away from saddle surfaces, the stepsizes they can handle are 2 Table 1: Comparison of sample complexity and runtime of our algorithm with existing algorithms in order to obtain Frobenius norm error . O(·) hides log d factors. See Section 1.2 for more discussion. Algorithm Sample complexity Total runtime Online? Nuclear Norm [22] O(µdk) O(d3/ √ ) No Alternating minimization [14] O(µdkκ 8 log 1 ) O(µdk 2κ8 log 1 ) No Alternating minimization [8] O ( µdk2κ2 ( k + log 1 )) O ( µdk3κ2 ( k + log 1 )) No Projected gradient descent[12] O(µdk 5) O(µdk7 log 1... |

8 |
Escaping from saddle points—online stochastic gradient for tensor decomposition.
- Ge, Huang, et al.
- 2015
(Show Context)
Citation Context ...olumn by column basis e.g., [16, 26]. These models can give improved offline performance in terms of space and could potentially work under relaxed regularity conditions. However, they do not tackle the version where only entries (as opposed to columns) are observed. Non-convex optimization: Over the last few years, there has also been a significant amount of work in designing other efficient algorithms for solving non-convex problems. Examples include eigenvector computation [6, 11], sparse coding [20, 1] etc. For general non-convex optimization, an interesting line of recent work is that of [7], which proves gradient descent with noise can also escape saddle point, but they only provide polynomial rate without explicit dependence. Later [17, 21] show that without noise, the space of points from where gradient descent converges to a saddle point is a measure zero set. However, they do not provide a rate of convergence. Another related piece of work to ours is [10], proves global convergence along with rates of convergence, for the special case of computing matrix squareroot. 1.3 Outline The rest of the paper is organized as follows. In Section 2 we formally describe the problem and a... |

8 |
The bellkor solution to the netflix grand prize. Netflix prize documentation,
- Koren
- 2009
(Show Context)
Citation Context ... used in the offline setting as well, where it gives competitive sample complexity and runtime to state of the art algorithms. Our proofs introduce a general framework to show that SGD updates tend to stay away from saddle surfaces and could be of broader interests to other non-convex problems. 1 Introduction Low rank matrix completion refers to the problem of recovering a low rank matrix by observing the values of only a tiny fraction of its entries. This problem arises in several applications such as video denoising [13], phase retrieval [3] and most famously in movie recommendation engines [15]. In the context of recommendation engines for instance, the matrix we wish to recover would be user-item rating matrix where each row corresponds to a user and each column corresponds to an item. Each entry of the matrix is the rating given by a user to an item. Low rank assumption on the matrix is inspired by the intuition that rating of an item by a user depends on only a few hidden factors, which are much fewer than the number of users or items. The goal is to estimate the ratings of all items by users given only partial ratings of items by users, which would then be helpful in recommendin... |

5 |
Simple, efficient, and neural algorithms for sparse coding, arXiv preprint arXiv:1503.00778
- Arora, Ge, et al.
- 2015
(Show Context)
Citation Context ...iant of online matrix completion studied in the literature is where observations are made on a column by column basis e.g., [16, 26]. These models can give improved offline performance in terms of space and could potentially work under relaxed regularity conditions. However, they do not tackle the version where only entries (as opposed to columns) are observed. Non-convex optimization: Over the last few years, there has also been a significant amount of work in designing other efficient algorithms for solving non-convex problems. Examples include eigenvector computation [6, 11], sparse coding [20, 1] etc. For general non-convex optimization, an interesting line of recent work is that of [7], which proves gradient descent with noise can also escape saddle point, but they only provide polynomial rate without explicit dependence. Later [17, 21] show that without noise, the space of points from where gradient descent converges to a saddle point is a measure zero set. However, they do not provide a rate of convergence. Another related piece of work to ours is [10], proves global convergence along with rates of convergence, for the special case of computing matrix squareroot. 1.3 Outline The re... |

5 | Global convergence of stochastic gradient descent for some nonconvex matrix problems.
- Sa, Olukotun, et al.
- 2015
(Show Context)
Citation Context ...completion: Another variant of online matrix completion studied in the literature is where observations are made on a column by column basis e.g., [16, 26]. These models can give improved offline performance in terms of space and could potentially work under relaxed regularity conditions. However, they do not tackle the version where only entries (as opposed to columns) are observed. Non-convex optimization: Over the last few years, there has also been a significant amount of work in designing other efficient algorithms for solving non-convex problems. Examples include eigenvector computation [6, 11], sparse coding [20, 1] etc. For general non-convex optimization, an interesting line of recent work is that of [7], which proves gradient descent with noise can also escape saddle point, but they only provide polynomial rate without explicit dependence. Later [17, 21] show that without noise, the space of points from where gradient descent converges to a saddle point is a measure zero set. However, they do not provide a rate of convergence. Another related piece of work to ours is [10], proves global convergence along with rates of convergence, for the special case of computing matrix squarer... |

5 | Fast exact matrix completion with finite samples, arXiv preprint arXiv:1411.1087
- Jain, PraneethNetrapalli
- 2014
(Show Context)
Citation Context ...m saddle surfaces. While [24] shows that SGD updates stay away from saddle surfaces, the stepsizes they can handle are 2 Table 1: Comparison of sample complexity and runtime of our algorithm with existing algorithms in order to obtain Frobenius norm error . O(·) hides log d factors. See Section 1.2 for more discussion. Algorithm Sample complexity Total runtime Online? Nuclear Norm [22] O(µdk) O(d3/ √ ) No Alternating minimization [14] O(µdkκ 8 log 1 ) O(µdk 2κ8 log 1 ) No Alternating minimization [8] O ( µdk2κ2 ( k + log 1 )) O ( µdk3κ2 ( k + log 1 )) No Projected gradient descent[12] O(µdk 5) O(µdk7 log 1 ) No SGD [24] O(µ2dk7κ6) poly(µ, d, k, κ) log 1 Yes Our result O ( µdkκ4 ( k + log 1 )) O ( µdk4κ4 log 1 ) Yes quite small (scaling as 1/poly(d1, d2)), leading to suboptimal computational complexity. Our framework makes it possible to establish the same statement for much larger step sizes, giving us near-optimal runtime. We believe these techniques may be applicable in other non-convex settings as well. 1.2 Related Work In this section we will mention some more related work. Offline matrix completion: There has been a lot of work on designing offline algorithms... |

4 |
Incremental collaborative filtering recommender based on regularized matrix factorization. Knowledge-Based Systems,
- Luo, Xia, et al.
- 2012
(Show Context)
Citation Context ...ral passes of the entire data for every additional observation. This is simply not feasible in most settings. Another natural approach is to group observations into batches and do an update only once for each batch. This however induces a lag between observations and estimates which is undesirable. To the best of our knowledge, there is no known provable, efficient, online algorithm for matrix completion. On the other hand, in order to deal with the online matrix completion scenario in practical applications, several heuristics (with no convergence guarantees) have been proposed in literature [2, 19]. Most of these approaches are based on starting with an estimate of the matrix and doing fast updates of this estimate whenever a new observation is presented. One of the update procedures used in this context is that of stochastic gradient descent (SGD) applied to the following non-convex optimization problem min U,V ‖M−UV>‖2F s.t. U ∈ Rd1×k,V ∈ Rd2×k, (1) where M is the unknown matrix of size d1 × d2, k is the rank of M and UV> is a low rank factorization of M we wish to obtain. The algorithm starts with some U0 and V0, and given a new observation (M)ij , SGD updates the ith-row and the jth... |

3 |
Candes, Yonina C Eldar, Thomas Strohmer, and Vladislav Voroninski. Phase retrieval via matrix completion.
- Emmanuel
- 2015
(Show Context)
Citation Context ...r linear total runtime. Our algorithm can be naturally used in the offline setting as well, where it gives competitive sample complexity and runtime to state of the art algorithms. Our proofs introduce a general framework to show that SGD updates tend to stay away from saddle surfaces and could be of broader interests to other non-convex problems. 1 Introduction Low rank matrix completion refers to the problem of recovering a low rank matrix by observing the values of only a tiny fraction of its entries. This problem arises in several applications such as video denoising [13], phase retrieval [3] and most famously in movie recommendation engines [15]. In the context of recommendation engines for instance, the matrix we wish to recover would be user-item rating matrix where each row corresponds to a user and each column corresponds to an item. Each entry of the matrix is the rating given by a user to an item. Low rank assumption on the matrix is inspired by the intuition that rating of an item by a user depends on only a few hidden factors, which are much fewer than the number of users or items. The goal is to estimate the ratings of all items by users given only partial ratings of ite... |

3 |
Matching matrix bernstein with little memory: Near-optimal finite sample guarantees for oja’s algorithm. arXiv preprint arXiv:1602.06929,
- Jain, Jin, et al.
- 2016
(Show Context)
Citation Context ...completion: Another variant of online matrix completion studied in the literature is where observations are made on a column by column basis e.g., [16, 26]. These models can give improved offline performance in terms of space and could potentially work under relaxed regularity conditions. However, they do not tackle the version where only entries (as opposed to columns) are observed. Non-convex optimization: Over the last few years, there has also been a significant amount of work in designing other efficient algorithms for solving non-convex problems. Examples include eigenvector computation [6, 11], sparse coding [20, 1] etc. For general non-convex optimization, an interesting line of recent work is that of [7], which proves gradient descent with noise can also escape saddle point, but they only provide polynomial rate without explicit dependence. Later [17, 21] show that without noise, the space of points from where gradient descent converges to a saddle point is a measure zero set. However, they do not provide a rate of convergence. Another related piece of work to ours is [10], proves global convergence along with rates of convergence, for the special case of computing matrix squarer... |

2 | Computational limits for matrix completion. In
- Hardt, Meka, et al.
- 2014
(Show Context)
Citation Context ...red by the intuition that rating of an item by a user depends on only a few hidden factors, which are much fewer than the number of users or items. The goal is to estimate the ratings of all items by users given only partial ratings of items by users, which would then be helpful in recommending new items to users. The seminal works of Candès and Recht [4] first identified regularity conditions under which low rank matrix completion can be solved in polynomial time using convex relaxation – low rank matrix completion could be ill-posed and NP-hard in general without such regularity assumptions [9]. Since then, a number of works have studied various algorithms under different settings for matrix completion: weighted and noisy matrix completion, fast convex solvers, fast iterative non-convex solvers, parallel and distributed algorithms and so on. Most of this work however deals only with the offline setting where all the observed entries are revealed at once and the recovery procedure does computation using all these observations simultaneously. However in several applications [5, 18], we encounter the online setting where observations are 30th Conference on Neural Information Processing... |

2 |
Computing matrix squareroot via non convex local search, arXiv preprint arXiv:1507.05854
- Jain, Jin, et al.
- 2015
(Show Context)
Citation Context ...ning other efficient algorithms for solving non-convex problems. Examples include eigenvector computation [6, 11], sparse coding [20, 1] etc. For general non-convex optimization, an interesting line of recent work is that of [7], which proves gradient descent with noise can also escape saddle point, but they only provide polynomial rate without explicit dependence. Later [17, 21] show that without noise, the space of points from where gradient descent converges to a saddle point is a measure zero set. However, they do not provide a rate of convergence. Another related piece of work to ours is [10], proves global convergence along with rates of convergence, for the special case of computing matrix squareroot. 1.3 Outline The rest of the paper is organized as follows. In Section 2 we formally describe the problem and all relevant parameters. In Section 3, we present our algorithms, results and some of the key intuition 3 behind our results. In Section 4 we give proof outline for our main results. We conclude in Section 5. All formal proofs are deferred to the Appendix. 2 Preliminaries In this section, we introduce our notation, formally define the matrix completion problem and regularity... |

1 |
Gradient descent converges to minimizers.
- Lee, Simchowitz, et al.
- 2016
(Show Context)
Citation Context ...egularity conditions. However, they do not tackle the version where only entries (as opposed to columns) are observed. Non-convex optimization: Over the last few years, there has also been a significant amount of work in designing other efficient algorithms for solving non-convex problems. Examples include eigenvector computation [6, 11], sparse coding [20, 1] etc. For general non-convex optimization, an interesting line of recent work is that of [7], which proves gradient descent with noise can also escape saddle point, but they only provide polynomial rate without explicit dependence. Later [17, 21] show that without noise, the space of points from where gradient descent converges to a saddle point is a measure zero set. However, they do not provide a rate of convergence. Another related piece of work to ours is [10], proves global convergence along with rates of convergence, for the special case of computing matrix squareroot. 1.3 Outline The rest of the paper is organized as follows. In Section 2 we formally describe the problem and all relevant parameters. In Section 3, we present our algorithms, results and some of the key intuition 3 behind our results. In Section 4 we give proof ou... |

1 |
Streaming, memory limited matrix completion with noise. arXiv preprint arXiv:1504.03156,
- Yun, Lelarge, et al.
- 2015
(Show Context)
Citation Context ...tors. Our sample complexity is better than that of [14] and is incomparable to those of [8, 12]. To the best of our knowledge, the only provable online algorithm for this problem is that of Sun and Luo [24]. However the stepsizes they suggest are quite small, leading to suboptimal computational complexity by factors of poly(d1, d2). The runtime of our algorithm is linear in d, which makes poly(d) improvements over it. Other models for online matrix completion: Another variant of online matrix completion studied in the literature is where observations are made on a column by column basis e.g., [16, 26]. These models can give improved offline performance in terms of space and could potentially work under relaxed regularity conditions. However, they do not tackle the version where only entries (as opposed to columns) are observed. Non-convex optimization: Over the last few years, there has also been a significant amount of work in designing other efficient algorithms for solving non-convex problems. Examples include eigenvector computation [6, 11], sparse coding [20, 1] etc. For general non-convex optimization, an interesting line of recent work is that of [7], which proves gradient descent w... |