Results 1  10
of
24
Accelerated minibatch stochastic dual coordinate ascent
"... Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the minibatch setting that is often used in practice. Our main contribution is to introduce an accelerated mini ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the minibatch setting that is often used in practice. Our main contribution is to introduce an accelerated minibatch version of SDCA and prove a fast convergence rate for this method. We discuss an implementation of our method over a parallel computing system, and compare the results to both the vanilla stochastic dual coordinate ascent and to the accelerated deterministic gradient descent method of Nesterov [2007].
On optimal probabilities in stochastic coordinate descent methods. arXiv:1310.3438
, 2013
"... Abstract We propose and analyze a new parallel coordinate descent method'NSyncin which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen nonuniformly. We derive convergence rates under a strong convexity assumption, and comment on ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
Abstract We propose and analyze a new parallel coordinate descent method'NSyncin which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen nonuniformly. We derive convergence rates under a strong convexity assumption, and comment on how to assign probabilities to the sets to optimize the bound. The complexity and practical performance of the method can outperform its uniform variant by an order of magnitude. Surprisingly, the strategy of updating a single randomly selected coordinate per iterationwith optimal probabilitiesmay require less iterations, both in theory and practice, than the strategy of updating all coordinates at every iteration.
An Accelerated Proximal Coordinate Gradient Method
, 2014
"... We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems. In particular, our method achieves faster linear convergence rates for minimizing strongly convex functions than existing randomized proximal coordina ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems. In particular, our method achieves faster linear convergence rates for minimizing strongly convex functions than existing randomized proximal coordinate gradient methods. We show how to apply the APCG method to solve the dual of the regularized empirical risk minimization (ERM) problem, and devise efficient implementations that avoid fulldimensional vector operations. For illconditioned ERM problems, our method obtains improved convergence rates than the stateoftheart stochastic dual coordinate ascent (SDCA) method.
Communication Efficient Distributed Optimization using an Approximate Newtontype Method
, 2014
"... We present a novel Newtontype method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems. For quadratic objectives, the method enjoys a linear rate of convergence which provably improves with the data size, requiring an essentially const ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We present a novel Newtontype method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems. For quadratic objectives, the method enjoys a linear rate of convergence which provably improves with the data size, requiring an essentially constant number of iterations under reasonable assumptions. We provide theoretical and empirical evidence of the advantages of our method compared to other approaches, such as oneshot parameter averaging and ADMM.
FAST DISTRIBUTED COORDINATE DESCENT FOR NONSTRONGLY CONVEX LOSSES ∗
"... We propose an efficient distributed randomized coordinate descent method for minimizing regularized nonstrongly convex loss functions. The method attains the optimal O(1/k 2) convergence rate, where k is the iteration counter. The core of the work is the theoretical study of stepsize parameters. We ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
We propose an efficient distributed randomized coordinate descent method for minimizing regularized nonstrongly convex loss functions. The method attains the optimal O(1/k 2) convergence rate, where k is the iteration counter. The core of the work is the theoretical study of stepsize parameters. We have implemented the method on Archer—the largest supercomputer in the UK—and show that the method is capable of solving a (synthetic) LASSO optimization problem with 50 billion variables. Index Terms — Coordinate descent, distributed algorithms, acceleration. 1.
Randomized dual coordinate ascent with arbitrary sampling
, 2014
"... We study the problem of minimizing the average of a large number of smooth convex functions penalized with a strongly convex regularizer. We propose and analyze a novel primaldual method (Quartz) which at every iteration samples and updates a random subset of the dual variables, chosen according to ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We study the problem of minimizing the average of a large number of smooth convex functions penalized with a strongly convex regularizer. We propose and analyze a novel primaldual method (Quartz) which at every iteration samples and updates a random subset of the dual variables, chosen according to an arbitrary distribution. In contrast to typical analysis, we directly bound the decrease of the primaldual error (in expectation), without the need to first analyze the dual error. Depending on the choice of the sampling, we obtain efficient serial, parallel and distributed variants of the method. In the serial case, our bounds match the best known bounds for SDCA (both with uniform and importance sampling). With standard minibatching, our bounds predict initial dataindependent speedup as well as additional datadriven speedup which depends on spectral and sparsity properties of the data. We calculate theoretical speedup factors and find that they are excellent predictors of actual speedup in practice. Moreover, we illustrate that it is possible to design an efficient minibatch importance sampling. The distributed variant of Quartz is the first distributed SDCAlike method with an analysis for nonseparable data.
Distributed Block Coordinate Descent for Minimizing Partially Separable Functions
"... In this work we propose a distributed randomized block coordinate descent method for minimizing a convex function with a huge number of variables/coordinates. We analyze its complexity under the assumption that the smooth part of the objective function is partially block separable, and show that th ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
In this work we propose a distributed randomized block coordinate descent method for minimizing a convex function with a huge number of variables/coordinates. We analyze its complexity under the assumption that the smooth part of the objective function is partially block separable, and show that the degree of separability directly influences the complexity. This extends the results in [22] to a distributed environment. We first show that partially block separable functions admit an expected separable overapproximation (ESO) with respect to a distributed sampling, compute the ESO parameters, and then specialize complexity results from recent literature that hold under the generic ESO assumption. We describe several approaches to distribution and synchronization of the computation across a cluster of multicore computer and provide promising computational results.
Coordinate descent with arbitrary sampling I: Algorithms and complexity
, 2014
"... The design and complexity analysis of randomized coordinate descent methods, and in particular of variants which update a random subset (sampling) of coordinates in each iteration, depends on the notion of expected separable overapproximation (ESO). This refers to an inequality involving the objec ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
The design and complexity analysis of randomized coordinate descent methods, and in particular of variants which update a random subset (sampling) of coordinates in each iteration, depends on the notion of expected separable overapproximation (ESO). This refers to an inequality involving the objective function and the sampling, capturing in a compact way certain smoothness properties of the function in a random subspace spanned by the sampled coordinates. ESO inequalities were previously established for special classes of samplings only, almost invariably for uniform samplings. In this paper we develop a systematic technique for deriving these inequalities for a large class of functions and for arbitrary samplings. We demonstrate that one can recover existing ESO results using our general approach, which is based on the study of eigenvalues associated with samplings and the data describing the function. 1
ON CONVERGENCE OF THE MAXIMUM BLOCK IMPROVEMENT METHOD∗
"... Abstract. The MBI (maximum block improvement) method is a greedy approach to solving optimization problems where the decision variables can be grouped into a finite number of blocks. Assuming that optimizing over one block of variables while fixing all others is relatively easy, the MBI method updat ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Abstract. The MBI (maximum block improvement) method is a greedy approach to solving optimization problems where the decision variables can be grouped into a finite number of blocks. Assuming that optimizing over one block of variables while fixing all others is relatively easy, the MBI method updates the block of variables corresponding to the maximally improving block at each iteration, which is arguably a most natural and simple process to tackle blockstructured problems with great potentials for engineering applications. In this paper we establish global and local linear convergence results for this method. The global convergence is established under the Lojasiewicz inequality assumption, while the local analysis invokes secondorder assumptions. We study in particular the tensor optimization model with spherical constraints. Conditions for linear convergence of the famous power method for computing the maximum eigenvalue of a matrix follow in this framework as a special case. The condition is interpreted in various other forms for the rankone tensor optimization model under spherical constraints. Numerical experiments are shown to support the convergence property of the MBI method.