Results 1  10
of
113
Parallel Coordinate Descent Methods for Big Data Optimization
, 2012
"... In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex function. The theoretical speedup, as compared to the serial m ..."
Abstract

Cited by 74 (4 self)
 Add to MetaCart
(Show Context)
In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex function. The theoretical speedup, as compared to the serial method, and referring to the number of iterations needed to approximately solve the problem with high probability, is a simple expression depending on the number of parallel processors and a natural and easily computable measure of separability of the smooth component of the objective function. In the worst case, when no degree of separability is present, there may be no speedup; in the best case, when the problem is separable, the speedup is equal to the number of processors. Our analysis also works in the mode when the number of blocks being updated at each iteration is random, which allows for modeling situations with busy or unreliable processors. We show that our algorithm is able to solve a LASSO problem involving a matrix with 20 billion nonzeros in 2 hours on a large memory node with 24 cores.
Stochastic blockcoordinate frankwolfe optimization for structural svms. arXiv preprint:1207.4747
, 2012
"... We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, w ..."
Abstract

Cited by 52 (4 self)
 Add to MetaCart
(Show Context)
We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, when applied to the dual structural support vector machine (SVM) objective, this yields an online algorithm that has the same low iteration complexity as primal stochastic subgradient methods. However, unlike stochastic subgradient methods, the blockcoordinate FrankWolfe algorithm allows us to compute the optimal stepsize and yields a computable duality gap guarantee. Our experiments indicate that this simple algorithm outperforms competing structural SVM solvers. 1.
On the complexity analysis of randomized blockcoordinate descent methods
, 2013
"... In this paper we analyze the randomized blockcoordinate descent (RBCD) methods proposed in [11, 15] for minimizing the sum of a smooth convex function and a blockseparable convex function, and derive improved bounds on their convergence rates. In particular, we extend Nesterov’s technique develope ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
(Show Context)
In this paper we analyze the randomized blockcoordinate descent (RBCD) methods proposed in [11, 15] for minimizing the sum of a smooth convex function and a blockseparable convex function, and derive improved bounds on their convergence rates. In particular, we extend Nesterov’s technique developed in [11] for analyzing the RBCD method for minimizing a smooth convex function over a blockseparable closed convex set to the aforementioned more general problem and obtain a sharper expectedvalue type of convergence rate than the one implied in [15]. As a result, we also obtain a better highprobability type of iteration complexity. In addition, for unconstrained smooth convex minimization, we develop a new technique called randomized estimate sequence to analyze the accelerated RBCD method proposed by Nesterov [11] and establish a sharper expectedvalue type of convergence rate than the one given in [11]. Key words: Randomized blockcoordinate descent, accelerated coordinate descent, iteration complexity, convergence rate, composite minimization. 1
Recent Advances of Largescale Linear Classification
"... Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classifiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
(Show Context)
Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classifiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much faster. Recently, many research works have developed efficient optimization methods to construct linear classifiers and applied them to some largescale applications. In this paper, we give a comprehensive survey on the recent development of this active research area.
Minibatch primal and dual methods for SVMs
 In 30th International Conference on Machine Learning
, 2013
"... We address the issue of using minibatches in stochastic optimization of SVMs. We show that the same quantity, the spectral norm of the data, controls the parallelization speedup obtained for both primal stochastic subgradient descent (SGD) and stochastic dual coordinate ascent (SCDA) methods and us ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
We address the issue of using minibatches in stochastic optimization of SVMs. We show that the same quantity, the spectral norm of the data, controls the parallelization speedup obtained for both primal stochastic subgradient descent (SGD) and stochastic dual coordinate ascent (SCDA) methods and use it to derive novel variants of minibatched SDCA. Our guarantees for both methods are expressed in terms of the original nonsmooth primal problem based on the hingeloss. 1.
Efficient Serial and Parallel Coordinate Descent Methods for HugeScale Truss Topology Design
"... Abstract In this work we propose solving hugescale instances of the truss topology design problem with coordinate descent methods. We develop four efficient codes: serial and parallel implementations of randomized and greedy rules for the selection of the variable (potential bar) to be updated in t ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
(Show Context)
Abstract In this work we propose solving hugescale instances of the truss topology design problem with coordinate descent methods. We develop four efficient codes: serial and parallel implementations of randomized and greedy rules for the selection of the variable (potential bar) to be updated in the next iteration. Both serial methods enjoy an O(n/k) iteration complexity guarantee, where n is the number of potential bars and k the iteration counter. Our parallel implementations, written in CUDA and running on a graphical processing unit (GPU), are capable of speedups of up to two orders of magnitude when compared to their serial counterparts. Numerical experiments were performed on instances with up to 30 million potential bars. 1
Optimization with firstorder surrogate functions
 In Proceedings of the International Conference on Machine Learning (ICML
, 2013
"... In this paper, we study optimization methods consisting of iteratively minimizing surrogates of an objective function. By proposing several algorithmic variants and simple convergence analyses, we make two main contributions. First, we provide a unified viewpoint for several firstorder optimization ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
In this paper, we study optimization methods consisting of iteratively minimizing surrogates of an objective function. By proposing several algorithmic variants and simple convergence analyses, we make two main contributions. First, we provide a unified viewpoint for several firstorder optimization techniques such as accelerated proximal gradient, block coordinate descent, or FrankWolfe algorithms. Second, we introduce a new incremental scheme that experimentally matches or outperforms stateoftheart solvers for largescale optimization problems typically arising in machine learning. 1.
Accelerated minibatch stochastic dual coordinate ascent. arxiv
"... Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the minibatch setting that is often used in practice. Our main contribution is to introduce an accelerated mini ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the minibatch setting that is often used in practice. Our main contribution is to introduce an accelerated minibatch version of SDCA and prove a fast convergence rate for this method. We discuss an implementation of our method over a parallel computing system, and compare the results to both the vanilla stochastic dual coordinate ascent and to the accelerated deterministic gradient descent method of Nesterov [2007]. 1
Accelerated, parallel and proximal coordinate descent. arXiv preprint arXiv:1312.5799
, 2013
"... We propose a new stochastic coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special cas ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
We propose a new stochastic coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate 2ω̄L̄R2/(k+1)2, where k is the iteration counter, ω ̄ is an average degree of separability of the loss function, L ̄ is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and R is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform fulldimensional vector operations, which is the major bottleneck of accelerated coordinate descent. The fact that the method depends on the average degree of separability, and not on the maximum degree of separability, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel stochastic coordinate descent algorithms based on the concept of ESO. 1
Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization
"... We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an innerouter iteration procedure. We analyze the runtime of the framework and obtain rates that improve stateoftheart results for various key machine learning optimizat ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
(Show Context)
We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an innerouter iteration procedure. We analyze the runtime of the framework and obtain rates that improve stateoftheart results for various key machine learning optimization problems including SVM, logistic regression, ridge regression, Lasso, and multiclass SVM. Experiments validate our theoretical findings. 1.