Results 1  10
of
36
Stochastic blockcoordinate frankwolfe optimization for structural svms. arXiv preprint:1207.4747
, 2012
"... We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, w ..."
Abstract

Cited by 58 (6 self)
 Add to MetaCart
(Show Context)
We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, when applied to the dual structural support vector machine (SVM) objective, this yields an online algorithm that has the same low iteration complexity as primal stochastic subgradient methods. However, unlike stochastic subgradient methods, the blockcoordinate FrankWolfe algorithm allows us to compute the optimal stepsize and yields a computable duality gap guarantee. Our experiments indicate that this simple algorithm outperforms competing structural SVM solvers. 1.
An Asynchronous Parallel Stochastic Coordinate Descent Algorithm
 JOURNAL OF MACHINE LEARNING RESEARCH
"... We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong convexity property and a sublinear rate (1/K) on general c ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong convexity property and a sublinear rate (1/K) on general convex functions. Nearlinear speedup on a multicore system can be expected if the number of processors is O(n1/2) in unconstrained optimization and O(n1/4) in the separableconstrained case, where n is the number of variables. We describe results from implementation on 40core processors.
Stochastic primaldual coordinate method for regularized empirical risk minimization.
, 2014
"... Abstract We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convexconcave saddle point problem. We propose a stochastic primaldual coordinate (SPDC) method, which alt ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Abstract We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convexconcave saddle point problem. We propose a stochastic primaldual coordinate (SPDC) method, which alternates between maximizing over a randomly chosen dual variable and minimizing over the primal variable. An extrapolation step on the primal variable is performed to obtain accelerated convergence rate. We also develop a minibatch version of the SPDC method which facilitates parallel computing, and an extension with weighted sampling probabilities on the dual variables, which has a better complexity than uniform sampling on unnormalized data. Both theoretically and empirically, we show that the SPDC method has comparable or better performance than several stateoftheart optimization methods.
Deeplysupervised nets
 In AISTATS
, 2015
"... Our proposed deeplysupervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent. We make an attempt to boost the classification performance by studying a new formulation in deep networks. Three aspects in convo ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Our proposed deeplysupervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent. We make an attempt to boost the classification performance by studying a new formulation in deep networks. Three aspects in convolutional neural networks (CNN) style architectures are being looked at: (1) transparency of the intermediate layers to the overall classification; (2) discriminativeness and robustness of learned features, especially in the early layers; (3) effectiveness in training due to the presence of the exploding and vanishing gradients. We introduce “companion objective ” to the individual hidden layers, in addition to the overall objective at the output layer (a different strategy to layerwise pretraining). We extend techniques from stochastic gradient methods to analyze our algorithm. The advantage of our method is evident and our experimental result on benchmark datasets shows significant performance gain over existing methods (e.g. all stateoftheart results on MNIST, CIFAR10, CIFAR100, and SVHN). 1
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
, 2014
"... ..."
(Show Context)
Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 2014
"... ..."
(Show Context)
Mixed optimization for smooth functions
 In Neural Information Processing Systems (NIPS
, 2013
"... Abstract It is well known that the optimal convergence rate for stochastic optimization of smooth functions is O(1/ √ T ), which is same as stochastic optimization of Lipschitz continuous convex functions. This is in contrast to optimizing smooth functions using full gradients, which yields a conve ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Abstract It is well known that the optimal convergence rate for stochastic optimization of smooth functions is O(1/ √ T ), which is same as stochastic optimization of Lipschitz continuous convex functions. This is in contrast to optimizing smooth functions using full gradients, which yields a convergence rate of O(1/T 2 ). In this work, we consider a new setup for optimizing smooth functions, termed as Mixed Optimization, which allows to access both a stochastic oracle and a full gradient oracle. Our goal is to significantly improve the convergence rate of stochastic optimization of smooth functions by having an additional small number of accesses to the full gradient oracle. We show that, with an O(ln T ) calls to the full gradient oracle and an O(T ) calls to the stochastic oracle, the proposed mixed optimization algorithm is able to achieve an optimization error of O(1/T ).
Asynchronous stochastic coordinate descent: Parallelism and convergence properties
"... ar ..."
(Show Context)
Distributed stochastic optimization and learning
 IN PROC. ALLERTON CONF. ON COMMUNICATION, CONTROL, AND COMPUTING, (MONTECELLO, IL
, 2014
"... We consider the problem of distributed stochastic optimization, where each of several machines has access to samples from the same source distribution, and the goal is to jointly optimize the expected objective w.r.t. the source distribution, minimizing: (1) overall runtime; (2) communication costs ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of distributed stochastic optimization, where each of several machines has access to samples from the same source distribution, and the goal is to jointly optimize the expected objective w.r.t. the source distribution, minimizing: (1) overall runtime; (2) communication costs; (3) number of samples used. We study this problem systematically, highlighting fundamental limitations, and differences versus distributed consensus problems where each machine has a different, independent, objective. We show how the best known guarantees are obtained by an accelerated minibatched SGD approach, and contrast the runtime and sample costs of the approach with those of other distributed optimization algorithms.