Results 1 
5 of
5
Convergence results for projected linesearch methods on varieties of lowrank matrices via Łojasiewicz inequality
, 2014
"... Abstract. The aim of this paper is to derive convergence results for projected linesearch methods on the realalgebraic variety M≤k of real m × n matrices of rank at most k. Such methods extend successfully used Riemannian optimization methods on the smooth manifold Mk of rankk matrices to its clo ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Abstract. The aim of this paper is to derive convergence results for projected linesearch methods on the realalgebraic variety M≤k of real m × n matrices of rank at most k. Such methods extend successfully used Riemannian optimization methods on the smooth manifold Mk of rankk matrices to its closure by taking steps along gradientrelated directions in the tangent cone, and afterwards projecting back to M≤k. Considering such a method circumvents the difficulties which arise from the nonclosedness and the unbounded curvature of Mk. The pointwise convergence is obtained for realanalytic functions on the basis of a Lojasiewicz inequality for the projection of the antigradient to the tangent cone. If the derived limit point lies on the smooth part of M≤k, i.e. in Mk, this boils down to more or less known results, but with the benefit that asymptotic convergence rate estimates (for specific stepsizes) can be obtained without an apriori curvature bound, simply from the fact that the limit lies on a smooth manifold. At the same time, one can give a convincing justification for assuming critical points to lie in Mk: if X is a critical point of f on M≤k, then either X has rank k, or ∇f(X) = 0. Key words. Convergence analysis, linesearch methods, lowrank matrices, Riemannian optimization, steepest descent, Lojasiewicz gradient inequality, tangent cones
Coordinate descent converges faster with the GaussSouthwell rule than random selection.
 In the Proceeding of the 30th International Conference on Machine Learning (ICML),
, 2015
"... Abstract There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a randomcoordinate selection rule achieves the same convergence rate as the GaussSouthwell s ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a randomcoordinate selection rule achieves the same convergence rate as the GaussSouthwell selection rule. This result suggests that we should never use the GaussSouthwell rule, as it is typically much more expensive than random selection. However, the empirical behaviours of these algorithms contradict this theoretical result: in applications where the computational costs of the selection rules are comparable, the GaussSouthwell selection rule tends to perform substantially better than random coordinate selection. We give a simple analysis of the GaussSouthwell rule showing thatexcept in extreme casesit's convergence rate is faster than choosing random coordinates. Further, in this work we (i) show that exact coordinate optimization improves the convergence rate for certain sparse problems, (ii) propose a GaussSouthwellLipschitz rule that gives an even faster convergence rate given knowledge of the Lipschitz constants of the partial derivatives, (iii) analyze the effect of approximate GaussSouthwell rules, and (iv) analyze proximalgradient variants of the GaussSouthwell rule. Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copyright 2015 by the author(s). Coordinate Descent Methods There has been substantial recent interest in applying coordinate descent methods to solve largescale optimization problems, starting with the seminal work of Nesterov After discussing contexts in which it makes sense to use coordinate descent and the GS rule, we answer this theoretical question by giving a tighter analysis of the GS rule (under strongconvexity and standard smoothness assumptions) that yields the same rate as the randomized method for a restricted class of functions, but is otherwise faster (and in some cases substantially faster). We further show that, compared to the usual constant stepsize update of the coordinate, the GS method with exact coordinate optimization has a provably faster rate for problems satisfying a certain sparsity constraint (Section 5). We believe that this is the first result showing a theoretical benefit of exact coordinate optimization; all previous analyses show that these
Tensor product methods and entanglement optimization for ab initio quantum chemistry
, 2014
"... ..."
A NEW CONVERGENCE PROOF FOR THE HIGHERORDER POWER METHOD AND GENERALIZATIONS
"... ABSTRACT. A proof for the pointwise convergence of the factors in the higherorder power method for tensors towards a critical point is given. It is obtained by applying established results from the theory of Łojasiewicz inequalities to the equivalent, unconstrained alternating least squares algor ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT. A proof for the pointwise convergence of the factors in the higherorder power method for tensors towards a critical point is given. It is obtained by applying established results from the theory of Łojasiewicz inequalities to the equivalent, unconstrained alternating least squares algorithm for best rankone tensor approximation. 1.