Results 1  10
of
37
Stochastic Alternating Direction Method of Multipliers
"... The Alternating Direction Method of Multipliers (ADMM) has received lots of attention recently due to the tremendous demand from largescale and datadistributed machine learning applications. In this paper, we present a stochastic setting for optimization problems with nonsmooth composite objectiv ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
(Show Context)
The Alternating Direction Method of Multipliers (ADMM) has received lots of attention recently due to the tremendous demand from largescale and datadistributed machine learning applications. In this paper, we present a stochastic setting for optimization problems with nonsmooth composite objective functions. To solve this problem, we propose a stochastic ADMM algorithm. Our algorithm applies to a more general class of convex and nonsmooth objective functions, beyond the smooth and separable least squares loss used in lasso. We also demonstrate the rates of convergence for our algorithm under various structural assumptions of the stochastic function: O(1 / √ t) for convex functions and O(log t/t) for strongly convex functions. Compared to previous literature, we establish the convergence rate of ADMM for convex problems in terms of both the objective value and the feasibility violation. A novel application named GraphGuided SVM is proposed to demonstrate the usefulness of our algorithm.
Local Linear Convergence of the Alternating Direction Method of Multipliers on Quadratic or Linear Programs
"... We introduce a novel matrix recurrence yielding a new spectral analysis of the local transient convergence behavior of the Alternating Direction Method of Multipliers (ADMM), for the particular case of a quadratic program or a linear program. We identify a particular combination of vector iterates w ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
We introduce a novel matrix recurrence yielding a new spectral analysis of the local transient convergence behavior of the Alternating Direction Method of Multipliers (ADMM), for the particular case of a quadratic program or a linear program. We identify a particular combination of vector iterates whose convergence can be analyzed via a spectral analysis. The theory predicts that ADMM should go through up to four convergence regimes, such as constant step convergence or linear convergence, ending with the latter when close enough to the optimal solution if the optimal solution is unique and satisfies strict complementarity.
Decomposition methods for large scale LP decoding
 In 49th Annual Allerton Conference on Communication, Control, and Computing
, 2011
"... Abstract When binary linear errorcorrecting codes are used over symmetric channels, a relaxed version of the maximum likelihood decoding problem can be stated as a linear program (LP). This LP decoder can be used to decode at biterrorrates comparable to stateoftheart belief propagation (BP) d ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Abstract When binary linear errorcorrecting codes are used over symmetric channels, a relaxed version of the maximum likelihood decoding problem can be stated as a linear program (LP). This LP decoder can be used to decode at biterrorrates comparable to stateoftheart belief propagation (BP) decoders, but with significantly stronger theoretical guarantees. However, LP decoding when implemented with standard LP solvers does not easily scale to the block lengths of modern error correcting codes. In this paper we draw on decomposition methods from optimization theory, specifically the Alternating Directions Method of Multipliers (ADMM), to develop efficient distributed algorithms for LP decoding. The key enabling technical result is a nearly linear time algorithm for twonorm projection onto the parity polytope. This allows us to use LP decoding, with all its theoretical guarantees, to decode largescale error correcting codes efficiently. We present numerical results for two LDPC codes. The first is the rate0.5 [2640, 1320] "Margulis" code, the second a rate0.77 [1057.244] code. The "waterfall" region of LP decoding is seen to initiate at a slightly higher signaltonoise ratio than for sumproduct BP, however an errorfloor is not observed for either code, which is not the case for BP. Our implementation of LP decoding using ADMM executes as quickly as our baseline sumproduct BP decoder, is fully parallelizable, and can be seen to implement a type of messagepassing with a particularly simple schedule.
Dual averaging and proximal gradient descent for online alternating direction multiplier method
 In Proceedings of the 30th International Conference on Machine Learning
, 2013
"... We develop new stochastic optimization methods that are applicable to a wide range of structured regularizations. Basically our methods are combinations of basic stochastic optimization techniques and Alternating Direction Multiplier Method (ADMM). ADMM is a general framework for optimizing a comp ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
We develop new stochastic optimization methods that are applicable to a wide range of structured regularizations. Basically our methods are combinations of basic stochastic optimization techniques and Alternating Direction Multiplier Method (ADMM). ADMM is a general framework for optimizing a composite function, and has a wide range of applications. We propose two types of online variants of ADMM, which correspond to online proximal gradient descent and regularized dual averaging respectively. The proposed algorithms are computationally efficient and easy to implement. Our methods yield O(1/ T) convergence of the expected risk. Moreover, the online proximal gradient descent type method yields O(log(T)/T) convergence for a strongly convex loss. Numerical experiments show effectiveness of our methods in learning tasks with structured sparsity such as overlapped group lasso. 1.
Stochastic primaldual coordinate method for regularized empirical risk minimization.
, 2014
"... Abstract We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convexconcave saddle point problem. We propose a stochastic primaldual coordinate (SPDC) method, which alt ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Abstract We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convexconcave saddle point problem. We propose a stochastic primaldual coordinate (SPDC) method, which alternates between maximizing over a randomly chosen dual variable and minimizing over the primal variable. An extrapolation step on the primal variable is performed to obtain accelerated convergence rate. We also develop a minibatch version of the SPDC method which facilitates parallel computing, and an extension with weighted sampling probabilities on the dual variables, which has a better complexity than uniform sampling on unnormalized data. Both theoretically and empirically, we show that the SPDC method has comparable or better performance than several stateoftheart optimization methods.
Bregman Alternating Direction Method of Multipliers
"... The mirror descent algorithm (MDA) generalizes gradient descent by using a Bregman divergence to replace squared Euclidean distance. In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman div ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The mirror descent algorithm (MDA) generalizes gradient descent by using a Bregman divergence to replace squared Euclidean distance. In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman divergences to exploit the structure of problems. BADMM provides a unified framework for ADMM and its variants, including generalized ADMM, inexact ADMM and Bethe ADMM. We establish the global convergence and the O(1/T) iteration complexity for BADMM. In some cases, BADMM can be faster than ADMM by a factor of O(n / log(n)). In solving the linear program of mass transportation problem, BADMM leads to massive parallelism and can easily run on GPU. BADMM is several times faster than highly optimized commercial software Gurobi. 1
BetheADMM for Tree Decomposition based Parallel MAP Inference
"... We consider the problem of maximum a posteriori (MAP) inference in discrete graphical models. We present a parallel MAP inference algorithm called BetheADMM based on two ideas: treedecomposition of the graph and the alternating direction method of multipliers (ADMM). However, unlike the standard A ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
We consider the problem of maximum a posteriori (MAP) inference in discrete graphical models. We present a parallel MAP inference algorithm called BetheADMM based on two ideas: treedecomposition of the graph and the alternating direction method of multipliers (ADMM). However, unlike the standard ADMM, we use an inexact ADMM augmented with a Bethedivergence based proximal function, which makes each subproblem in ADMM easy to solve in parallel using the sumproduct algorithm. We rigorously prove global convergence of BetheADMM. The proposed algorithm is extensively evaluated on both synthetic and real datasets to illustrate its effectiveness. Further, the parallel BetheADMM is shown to scale almost linearly with increasing number of cores. 1
Stochastic Dual Coordinate Ascent with Alternating Direction Multiplier Method. ArXiv eprints,
, 2013
"... Abstract We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems. Our method is based on Alternating Direction Method of Multipliers (ADMM) to deal with complex regularization functions such as structured regularizations. Our ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems. Our method is based on Alternating Direction Method of Multipliers (ADMM) to deal with complex regularization functions such as structured regularizations. Our method can naturally afford minibatch update and it gives speed up of convergence. We show that, under mild assumptions, our method converges exponentially. The numerical experiments show that our method actually performs efficiently.
Alternating directions dual decomposition
, 2012
"... We propose AD³, a new algorithm for approximate maximum a posteriori (MAP) inference on factor graphs based on the alternating directions method of multipliers. Like dual decomposition algorithms, AD3 uses worker nodes to iteratively solve local subproblems and a controller node to combine these loc ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
We propose AD³, a new algorithm for approximate maximum a posteriori (MAP) inference on factor graphs based on the alternating directions method of multipliers. Like dual decomposition algorithms, AD3 uses worker nodes to iteratively solve local subproblems and a controller node to combine these local solutions into a global update. The key characteristic of AD3 is that each local subproblem has a quadratic regularizer, leading to a faster consensus than subgradientbased dual decomposition, both theoretically and in practice. We provide closedform solutions for these AD3 subproblems for binary pairwise factors and factors imposing firstorder logic constraints. For arbitrary factors (large or combinatorial), we introduce an active set method which requires only an oracle for computing a local MAP configuration, making AD3 applicable to a wide range of problems. Experiments on synthetic and realworld problems show that AD³ compares favorably with the stateoftheart.
Fast Stochastic Alternating Direction Method of Multipliers
"... We propose a new stochastic alternating direction method of multipliers (ADMM) algorithm, which incrementally approximates the full gradient in the linearized ADMM formulation. Besides having a low periteration complexity as existing stochastic ADMM algorithms, it improves the convergence rate ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
We propose a new stochastic alternating direction method of multipliers (ADMM) algorithm, which incrementally approximates the full gradient in the linearized ADMM formulation. Besides having a low periteration complexity as existing stochastic ADMM algorithms, it improves the convergence rate on convex problems fromO(1/√T) toO(1/T), where T is the number of iterations. This matches the convergence rate of the batch ADMM algorithm, but without the need to visit all the samples in each iteration. Experiments on the graphguided fused lasso demonstrate that the new algorithm is significantly faster than stateoftheart stochastic and batch ADMM algorithms.