Results 1  10
of
39
Primaldual subgradient methods for convex problems
, 2005
"... (after revision) In this paper we present a new approach for constructing subgradient schemes for different types of nonsmooth problems with convex structure. Our methods are primaldual since they are always able to generate a feasible approximation to the optimum of an appropriately formulated dual ..."
Abstract

Cited by 77 (1 self)
 Add to MetaCart
(Show Context)
(after revision) In this paper we present a new approach for constructing subgradient schemes for different types of nonsmooth problems with convex structure. Our methods are primaldual since they are always able to generate a feasible approximation to the optimum of an appropriately formulated dual problem. Besides other advantages, this useful feature provides the methods with a reliable stopping criterion. The proposed schemes differ from the classical approaches (divergent series methods, mirror descent methods) by presence of two control sequences. The first sequence is responsible for aggregating the support functions in the dual space, and the second one establishes a dynamically updated scale between the primal and dual spaces. This additional flexibility allows to guarantee a boundedness of the sequence of primal test points even in the case of unbounded feasible set. We present the variants of subgradient schemes for nonsmooth convex minimization, minimax problems, saddle point problems, variational inequalities, and stochastic optimization. In all situations our methods are proved to be optimal from the view point of worstcase blackbox lower complexity bounds.
A convergent incremental gradient method with constant step size
 SIAM J. OPTIM
, 2004
"... An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
(Show Context)
An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits regions in which the gradient is small infinitely often. Under certain unimodality assumptions, global convergence is established. In the quadratic case, a global linear rate of convergence is shown. The method is applied to distributed optimization problems arising in wireless sensor networks, and numerical experiments compare the new method with the standard incremental gradient method.
Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods
, 2007
"... We study primal solutions obtained as a byproduct of subgradient methods when solving the Lagrangian dual of a primal convex constrained optimization problem (possibly nonsmooth). The existing literature on the use of subgradient methods for generating primal optimal solutions is limited to the met ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
(Show Context)
We study primal solutions obtained as a byproduct of subgradient methods when solving the Lagrangian dual of a primal convex constrained optimization problem (possibly nonsmooth). The existing literature on the use of subgradient methods for generating primal optimal solutions is limited to the methods producing such solutions only asymptotically (i.e., in the limit as the number of subgradient iterations increases to infinity). Furthermore, no convergence rate results are known for these algorithms. In this paper, we propose and analyze dual subgradient methods using averaging to generate approximate primal optimal solutions. These algorithms use a constant stepsize as opposed to a diminishing stepsize which is dominantly used in the existing primal recovery schemes. We provide estimates on the convergence rate of the primal sequences. In particular, we provide bounds on the amount of feasibility violation of the generated approximate primal solutions. We also provide upper and lower bounds on the primal function values at the approximate solutions. The feasibility violation and primal value estimates are given per iteration, thus providing practical stopping criteria. Our analysis relies on the Slater condition and the inherited boundedness properties of the dual problem under this condition.
Noneuclidean restricted memory level method for largescale convex optimization
 MATH. PROGRAM., SER. A 102: 407–456 (2005)
, 2005
"... ..."
(Show Context)
N.: Recursive Aggregation of Estimators by Mirror Descent Algorithm with averaging. Problems of Information Transmission
"... We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ 1constraint. It is defined by a stochastic version of the mirror descent algor ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ 1constraint. It is defined by a stochastic version of the mirror descent algorithm (i.e., of the method which performs gradient descent in the dual space) with an additional averaging. The main result of the paper is an upper bound for the expected accuracy 1 of the proposed estimator. This bound is of the order √ (log M)/t with an explicit and small constant factor, where M is the dimension of the problem and t stands for the sample size. Similar bound is proved for a more general setting that covers, in particular, the regression model with squared loss. 1
Condition Number Complexity of an Elementary Algorithm for Resolving a Conic Linear System
, 1997
"... We develop an algorithm for resolving a conic linear system (FP d ), which is a system of the form (FP d ): b Ax 2 C Y x 2 CX ; where CX and C Y are closed convex cones, and the data for the system is d = (A; b). ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
We develop an algorithm for resolving a conic linear system (FP d ), which is a system of the form (FP d ): b Ax 2 C Y x 2 CX ; where CX and C Y are closed convex cones, and the data for the system is d = (A; b).
On the algorithmics and applications of a mixednorm based kernel learning formulation
 In Advances in Neural Information Processing Systems
, 2009
"... Motivated from real world problems, like object categorization, we study a particular mixednorm regularization for Multiple Kernel Learning (MKL). It is assumed that the given set of kernels are grouped into distinct components where each component is crucial for the learning task at hand. The form ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
Motivated from real world problems, like object categorization, we study a particular mixednorm regularization for Multiple Kernel Learning (MKL). It is assumed that the given set of kernels are grouped into distinct components where each component is crucial for the learning task at hand. The formulation hence employs l ∞ regularization for promoting combinations at the component level and l1 regularization for promoting sparsity among kernels in each component. While previous attempts have formulated this as a nonconvex problem, the formulation given here is an instance of nonsmooth convex optimization problem which admits an efficient MirrorDescent (MD) based procedure. The MD procedure optimizes over product of simplexes, which is not a wellstudied case in literature. Results on realworld datasets show that the new MKL formulation is wellsuited for object categorization tasks and that the MD based algorithm outperforms stateoftheart MKL solvers like simpleMKL in terms of computational effort. 1
Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms
, 2010
"... In this paper we present a generic algorithmic framework, namely, the accelerated stochastic approximation (ACSA) algorithm, for solving strongly convex stochastic composite optimization (SCO) problems. While the classical stochastic approximation (SA) algorithms are asymptotically optimal for solv ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
In this paper we present a generic algorithmic framework, namely, the accelerated stochastic approximation (ACSA) algorithm, for solving strongly convex stochastic composite optimization (SCO) problems. While the classical stochastic approximation (SA) algorithms are asymptotically optimal for solving differentiable and strongly convex problems, the ACSA algorithm, when employed with proper stepsize policies, can achieve optimal or nearly optimal rates of convergence for solving different classes of SCO problems during a given number of iterations. Moreover, we investigate these ACSA algorithms in more detail, such as, establishing the largedeviation results associated with the convergence rates and introducing efficient validation procedure to check the accuracy of the generated solutions.
Multiple Kernel Learning and the SMO Algorithm
"... Our objective is to trainpnorm Multiple Kernel Learning (MKL) and, more generally, linear MKL regularised by the Bregman divergence, using the Sequential Minimal Optimization (SMO) algorithm. The SMO algorithm is simple, easy to implement and adapt, and efficiently scales to large problems. As a re ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Our objective is to trainpnorm Multiple Kernel Learning (MKL) and, more generally, linear MKL regularised by the Bregman divergence, using the Sequential Minimal Optimization (SMO) algorithm. The SMO algorithm is simple, easy to implement and adapt, and efficiently scales to large problems. As a result, it has gained widespread acceptance and SVMs are routinely trained using SMO in diverse real world applications. Training using SMO has been a long standing goal in MKL for the very same reasons. Unfortunately, the standard MKL dual is not differentiable, and therefore can not be optimised using SMO style coordinate ascent. In this paper, we demonstrate that linear MKL regularised with the pnorm squared, or with certain Bregman divergences, can indeed be trained using SMO. The resulting algorithm retains both simplicity and efficiency and is significantly faster than stateoftheart specialisedpnorm MKL solvers. We show that we can train on a hundred thousand kernels in approximately seven minutes and on fifty thousand points in less than half an hour on a single core. 1