Results 1  10
of
16
On the complexity analysis of randomized blockcoordinate descent methods
 Mathematical Programming DOI
, 2014
"... Abstract In this paper we analyze the randomized blockcoordinate descent (RBCD) methods proposed in ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
(Show Context)
Abstract In this paper we analyze the randomized blockcoordinate descent (RBCD) methods proposed in
Iteration complexity of feasible descent methods for convex optimization.
 The Journal of Machine Learning Research,
, 2014
"... Abstract In many machine learning problems such as the dual form of SVM, the objective function to be minimized is convex but not strongly convex. This fact causes difficulties in obtaining the complexity of some commonly used optimization algorithms. In this paper, we proved the global linear conv ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
Abstract In many machine learning problems such as the dual form of SVM, the objective function to be minimized is convex but not strongly convex. This fact causes difficulties in obtaining the complexity of some commonly used optimization algorithms. In this paper, we proved the global linear convergence on a wide range of algorithms when they are applied to some nonstrongly convex problems. In particular, we are the first to prove O(log(1/ )) time complexity of cyclic coordinate descent methods on dual problems of support vector classification and regression.
On optimal probabilities in stochastic coordinate descent methods. arXiv:1310.3438
, 2013
"... Abstract We propose and analyze a new parallel coordinate descent method'NSyncin which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen nonuniformly. We derive convergence rates under a strong convexity assumption, and comment on ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
Abstract We propose and analyze a new parallel coordinate descent method'NSyncin which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen nonuniformly. We derive convergence rates under a strong convexity assumption, and comment on how to assign probabilities to the sets to optimize the bound. The complexity and practical performance of the method can outperform its uniform variant by an order of magnitude. Surprisingly, the strategy of updating a single randomly selected coordinate per iterationwith optimal probabilitiesmay require less iterations, both in theory and practice, than the strategy of updating all coordinates at every iteration.
An Inexact Successive Quadratic Approximation Method for Convex L1 Regularized Optimization,” arXiv preprint arXiv:1309.3529
, 2013
"... Abstract We study a Newtonlike method for the minimization of an objective function φ that is the sum of a smooth convex function and an 1 regularization term. This method, which is sometimes referred to in the literature as a proximal Newton method, computes a step by minimizing a piecewise quadr ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract We study a Newtonlike method for the minimization of an objective function φ that is the sum of a smooth convex function and an 1 regularization term. This method, which is sometimes referred to in the literature as a proximal Newton method, computes a step by minimizing a piecewise quadratic model q k of the objective function φ. In order to make this approach efficient in practice, it is imperative to perform this inner minimization inexactly. In this paper, we give inexactness conditions that guarantee global convergence and that can be used to control the local rate of convergence of the iteration. Our inexactness conditions are based on a semismooth function that represents a (continuous) measure of the optimality conditions of the problem, and that embodies the softthresholding iteration. We give careful consideration to the algorithm employed for the inner minimization, and report numerical results on two test sets originating in machine learning.
Smooth minimization of nonsmooth functions with parallel coordinate descent methods
, 2013
"... ..."
Distributed Block Coordinate Descent for Minimizing Partially Separable Functions
"... In this work we propose a distributed randomized block coordinate descent method for minimizing a convex function with a huge number of variables/coordinates. We analyze its complexity under the assumption that the smooth part of the objective function is partially block separable, and show that th ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
In this work we propose a distributed randomized block coordinate descent method for minimizing a convex function with a huge number of variables/coordinates. We analyze its complexity under the assumption that the smooth part of the objective function is partially block separable, and show that the degree of separability directly influences the complexity. This extends the results in [22] to a distributed environment. We first show that partially block separable functions admit an expected separable overapproximation (ESO) with respect to a distributed sampling, compute the ESO parameters, and then specialize complexity results from recent literature that hold under the generic ESO assumption. We describe several approaches to distribution and synchronization of the computation across a cluster of multicore computer and provide promising computational results.
Hybrid Random/Deterministic Parallel Algorithms for Convex and Nonconvex Big Data Optimization
"... We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of this work is a novel parallel, hybrid random/deterministic decomposition scheme wherein, at each iteration, a subset of (block) variables is updated at the same time by minimizing a convex surrogate of the original nonconvex function. To tackle hugescale problems, the (block) variables to be updated are chosen according to a mixed random and deterministic procedure, which captures the advantages of both pure deterministic and random updatebased schemes. Almost sure convergence of the proposed scheme is established. Numerical results show that on hugescale problems the proposed hybrid random/deterministic algorithm compares favorably to random and deterministic schemes on both convex and nonconvex problems.
Separable Approximations and Decomposition Methods for the Augmented Lagrangian
, 2013
"... In this paper we study decomposition methods based on separable approximations for minimizing the augmented Lagrangian. In particular, we study and compare the Diagonal Quadratic Approximation Method (DQAM) of Mulvey and Ruszczyński [13] and the Parallel Coordinate Descent Method (PCDM) of Richtárik ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In this paper we study decomposition methods based on separable approximations for minimizing the augmented Lagrangian. In particular, we study and compare the Diagonal Quadratic Approximation Method (DQAM) of Mulvey and Ruszczyński [13] and the Parallel Coordinate Descent Method (PCDM) of Richtárik and Takáč [23]. We show that the two methods are equivalent for feasibility problems up to the selection of a single stepsize parameter. Furthermore, we prove an improved complexity bound for PCDM under strong convexity, and show that this bound is at least 8(L ′ / ¯ L)(ω − 1) 2 times better than the best known bound for DQAM, where ω is the degree of partial separability and L ′ and ¯ L are the maximum and average of the block Lipschitz constants of the gradient of the quadratic penalty appearing in the augmented Lagrangian.