Results 1  10
of
31
Online Alternating Direction Method
 In ICML
, 2012
"... Online optimization has emerged as powerful tool in large scale optimization. In this paper, we introduce efficient online algorithms based on the alternating directions method (ADM). We introduce a new proof technique for ADM in the batch setting, which yields the O(1/T) convergence rate of ADM and ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
(Show Context)
Online optimization has emerged as powerful tool in large scale optimization. In this paper, we introduce efficient online algorithms based on the alternating directions method (ADM). We introduce a new proof technique for ADM in the batch setting, which yields the O(1/T) convergence rate of ADM and forms the basis of regret analysis in the online setting. We consider two scenarios in the online setting, based on whether the solution needs to lie in the feasible set or not. In both settings, we establish regret bounds for both the objective function as well as constraint violation for general and strongly convex functions. Preliminary results are presented to illustrate the performance of the proposed algorithms. 1.
Trading computation for communication: Distributed stochastic dual coordinate ascent
 in NIPS
, 2013
"... We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized lo ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We make a progress along the line by presenting a distributed stochastic dual coordinate ascent algorithm in a star network, with an analysis of the tradeoff between computation and communication. We verify our analysis by experiments on real data sets. Moreover, we compare the proposed algorithm with distributed stochastic gradient descent methods and distributed alternating direction methods of multipliers for optimizing SVMs in the same distributed framework, and observe competitive performances. 1
Dual averaging and proximal gradient descent for online alternating direction multiplier method
 In Proceedings of the 30th International Conference on Machine Learning
, 2013
"... We develop new stochastic optimization methods that are applicable to a wide range of structured regularizations. Basically our methods are combinations of basic stochastic optimization techniques and Alternating Direction Multiplier Method (ADMM). ADMM is a general framework for optimizing a comp ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We develop new stochastic optimization methods that are applicable to a wide range of structured regularizations. Basically our methods are combinations of basic stochastic optimization techniques and Alternating Direction Multiplier Method (ADMM). ADMM is a general framework for optimizing a composite function, and has a wide range of applications. We propose two types of online variants of ADMM, which correspond to online proximal gradient descent and regularized dual averaging respectively. The proposed algorithms are computationally efficient and easy to implement. Our methods yield O(1/ T) convergence of the expected risk. Moreover, the online proximal gradient descent type method yields O(log(T)/T) convergence for a strongly convex loss. Numerical experiments show effectiveness of our methods in learning tasks with structured sparsity such as overlapped group lasso. 1.
Stochastic primaldual coordinate method for regularized empirical risk minimization.
, 2014
"... Abstract We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convexconcave saddle point problem. We propose a stochastic primaldual coordinate (SPDC) method, which alt ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Abstract We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convexconcave saddle point problem. We propose a stochastic primaldual coordinate (SPDC) method, which alternates between maximizing over a randomly chosen dual variable and minimizing over the primal variable. An extrapolation step on the primal variable is performed to obtain accelerated convergence rate. We also develop a minibatch version of the SPDC method which facilitates parallel computing, and an extension with weighted sampling probabilities on the dual variables, which has a better complexity than uniform sampling on unnormalized data. Both theoretically and empirically, we show that the SPDC method has comparable or better performance than several stateoftheart optimization methods.
Stochastic Dual Coordinate Ascent with Alternating Direction Multiplier Method. ArXiv eprints,
, 2013
"... Abstract We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems. Our method is based on Alternating Direction Method of Multipliers (ADMM) to deal with complex regularization functions such as structured regularizations. Our ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems. Our method is based on Alternating Direction Method of Multipliers (ADMM) to deal with complex regularization functions such as structured regularizations. Our method can naturally afford minibatch update and it gives speed up of convergence. We show that, under mild assumptions, our method converges exponentially. The numerical experiments show that our method actually performs efficiently.
Asynchronous Distributed ADMM for Consensus Optimization
"... Distributed optimization algorithms are highly attractive for solving big data problems. In particular, many machine learning problems can be formulated as the global consensus optimization problem, which can then be solved in a distributed manner by the alternating direction method of multiplier ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Distributed optimization algorithms are highly attractive for solving big data problems. In particular, many machine learning problems can be formulated as the global consensus optimization problem, which can then be solved in a distributed manner by the alternating direction method of multipliers (ADMM) algorithm. However, this suffers from the straggler problem as its updates have to be synchronized. In this paper, we propose an asynchronous ADMM algorithm by using two conditions to control the asynchrony: partial barrier and bounded delay. The proposed algorithm has a simple structure and good convergence guarantees (its convergence rate can be reduced to that of its synchronous counterpart). Experiments on different distributed ADMM applications show that asynchrony reduces the time on network waiting, and achieves faster convergence than its synchronous counterpart in terms of the wall clock time. 1.
approximation and faster algorithm using the proximal average
 In Advances in Neural Information Processing Systems
"... It is a common practice to approximate “complicated ” functions with more friendly ones. In largescale machine learning applications, nonsmooth losses/regularizers that entail great computational challenges are usually approximated by smooth functions. We reexamine this powerful methodology and p ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
It is a common practice to approximate “complicated ” functions with more friendly ones. In largescale machine learning applications, nonsmooth losses/regularizers that entail great computational challenges are usually approximated by smooth functions. We reexamine this powerful methodology and point out a nonsmooth approximation which simply pretends the linearity of the proximal map. The new approximation is justified using a recent convex analysis tool— proximal average, and yields a novel proximal gradient algorithm that is strictly better than the one based on smoothing, without incurring any extra overhead. Numerical experiments conducted on two important applications, overlapping group lasso and graphguided fused lasso, corroborate the theoretical claims. 1
A stochastic coordinate descent primaldual algorithm and applications to largescale composite optimization,
, 2014
"... AbstractBased on the idea of randomized coordinate descent of αaveraged operators, a randomized primaldual optimization algorithm is introduced, where a random subset of coordinates is updated at each iteration. The algorithm builds upon a variant of a recent (deterministic) algorithm proposed b ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
AbstractBased on the idea of randomized coordinate descent of αaveraged operators, a randomized primaldual optimization algorithm is introduced, where a random subset of coordinates is updated at each iteration. The algorithm builds upon a variant of a recent (deterministic) algorithm proposed by Vũ and Condat that includes the well known ADMM as a particular case. The obtained algorithm is used to solve asynchronously a distributed optimization problem. A network of agents, each having a separate cost function containing a differentiable term, seek to find a consensus on the minimum of the aggregate objective. The method yields an algorithm where at each iteration, a random subset of agents wake up, update their local estimates, exchange some data with their neighbors, and go idle. Numerical results demonstrate the attractive performance of the method. The general approach can be naturally adapted to other situations where coordinate descent convex optimization algorithms are used with a random choice of the coordinates.
Fast Stochastic Alternating Direction Method of Multipliers
"... We propose a new stochastic alternating direction method of multipliers (ADMM) algorithm, which incrementally approximates the full gradient in the linearized ADMM formulation. Besides having a low periteration complexity as existing stochastic ADMM algorithms, it improves the convergence rate ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
We propose a new stochastic alternating direction method of multipliers (ADMM) algorithm, which incrementally approximates the full gradient in the linearized ADMM formulation. Besides having a low periteration complexity as existing stochastic ADMM algorithms, it improves the convergence rate on convex problems fromO(1/√T) toO(1/T), where T is the number of iterations. This matches the convergence rate of the batch ADMM algorithm, but without the need to visit all the samples in each iteration. Experiments on the graphguided fused lasso demonstrate that the new algorithm is significantly faster than stateoftheart stochastic and batch ADMM algorithms.