Results 1  10
of
23
Incremental Subgradient Methods For Nondifferentiable Optimization
, 2001
"... We consider a class of subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions. This type of minimization arises in a dual context from Lagrangian relaxation of the coupling constraints of large scale separable problems. The idea is to p ..."
Abstract

Cited by 64 (12 self)
 Add to MetaCart
We consider a class of subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions. This type of minimization arises in a dual context from Lagrangian relaxation of the coupling constraints of large scale separable problems. The idea is to perform the subgradient iteration incrementally, by sequentially taking steps along the subgradients of the component functions, with intermediate adjustment of the variables after processing each component function. This incremental approach has been very successful in solving large di#erentiable least squares problems, such as those arising in the training of neural networks, and it has resulted in a much better practical rate of convergence than the steepest descent method. In this paper, we establish the convergence properties of a number of variants of incremental subgradient methods, including some that are stochastic. Based on the analysis and computational experiments, the methods appear very promising and e#ective for important classes of large problems. A particularly interesting discovery is that by randomizing the order of selection of component functions for iteration, the convergence rate is substantially improved. 1 Research supported by NSF under Grant ACI9873339.
Dual averaging methods for regularized stochastic learning and online optimization
 In Advances in Neural Information Processing Systems 23
, 2009
"... We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1norm for promoting sparsity. We develop extensions of Nes ..."
Abstract

Cited by 60 (3 self)
 Add to MetaCart
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1norm for promoting sparsity. We develop extensions of Nesterov’s dual averaging method, that can exploit the regularization structure in an online setting. At each iteration of these methods, the learning variables are adjusted by solving a simple minimization problem that involves the running average of all past subgradients of the loss function and the whole regularization term, not just its subgradient. In the case of ℓ1regularization, our method is particularly effective in obtaining sparse solutions. We show that these methods achieve the optimal convergence rates or regret bounds that are standard in the literature on stochastic and online convex optimization. For stochastic learning problems in which the loss functions have Lipschitz continuous gradients, we also present an accelerated version of the dual averaging method.
Mathematical Programming for Data Mining: Formulations and Challenges
 INFORMS Journal on Computing
, 1998
"... This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research ch ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research challenges, and outline opportunities for contributions by the optimization research communities. Towards these goals, we include formulations of the basic categories of data mining methods as optimization problems. We also provide examples of successful mathematical programming approaches to some data mining problems. keywords: data analysis, data mining, mathematical programming methods, challenges for massive data sets, classification, clustering, prediction, optimization. To appear: INFORMS: Journal of Compting, special issue on Data Mining, A. Basu and B. Golden (guest editors). Also appears as Mathematical Programming Technical Report 9801, Computer Sciences Department, University of Wi...
The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography
 SIAM J. Optim
, 2001
"... Abstract. We describe an optimization problem arising in reconstructing 3D medical images from Positron Emission Tomography (PET). A mathematical model of the problem, based on the Maximum Likelihood principle is posed as a problem of minimizing a convex function of several millions variables over t ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
Abstract. We describe an optimization problem arising in reconstructing 3D medical images from Positron Emission Tomography (PET). A mathematical model of the problem, based on the Maximum Likelihood principle is posed as a problem of minimizing a convex function of several millions variables over the standard simplex. To solve a problem of these characteristics, we develop and implement a new algorithm, Ordered Subsets Mirror Descent, and demonstrate, theoretically and computationally, that it is well suited for solving the PET reconstruction problem. Key words: positron emission tomography, maximum likelihood, image reconstruction, convex optimization, mirror descent. 1
Hogwild: A LockFree Approach to Parallelizing Stochastic Gradient Descent
 In NIPS
, 2011
"... Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateoftheart performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performancedestroying memory locking and synchronization. This work a ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateoftheart performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performancedestroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called Hogwild! which allows processors access to shared memory with the possibility of overwriting each other’s work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then Hogwild! achieves a nearly optimal rate of convergence. We demonstrate experimentally that Hogwild! outperforms alternative schemes that use locking by an order of magnitude.
A convergent incremental gradient method with constant step size
 SIAM J. OPTIM
, 2004
"... An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits regions in which the gradient is small infinitely often. Under certain unimodality assumptions, global convergence is established. In the quadratic case, a global linear rate of convergence is shown. The method is applied to distributed optimization problems arising in wireless sensor networks, and numerical experiments compare the new method with the standard incremental gradient method.
Incremental Gradient Algorithms with Stepsizes Bounded Away From Zero
 Computational Opt. and Appl
, 1998
"... Abstract. We consider the class of incremental gradient methods for minimizing a sum of continuously differentiable functions. An important novel feature of our analysis is that the stepsizes are kept bounded away from zero. We derive the first convergence results of any kind for this computationall ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
Abstract. We consider the class of incremental gradient methods for minimizing a sum of continuously differentiable functions. An important novel feature of our analysis is that the stepsizes are kept bounded away from zero. We derive the first convergence results of any kind for this computationally important case. In particular, we show that a certain εapproximate solution can be obtained and establish the linear dependence of ε on the stepsize limit. Incremental gradient methods are particularly wellsuited for large neural network training problems where obtaining an approximate solution is typically sufficient and is often preferable to computing an exact solution. Thus, in the context of neural networks, the approach presented here is related to the principle of tolerant training. Our results justify numerous stepsize rules that were derived on the basis of extensive numerical experimentation but for which no theoretical analysis was previously available. In addition, convergence to (exact) stationary points is established when the gradient satisfies a certain growth property.
Error Stability Properties of Generalized GradientType Algorithms
 Journal of Optimization Theory and Applications
, 1998
"... Abstract. We present a unified framework for convergence analysis of generalized subgradienttype algorithms in the presence of perturbations. A principal novel feature of our analysis is that perturbations need not tend to zero in the limit. It is established that the iterates of the algorithms are ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
Abstract. We present a unified framework for convergence analysis of generalized subgradienttype algorithms in the presence of perturbations. A principal novel feature of our analysis is that perturbations need not tend to zero in the limit. It is established that the iterates of the algorithms are attracted, in a certain sense, to an estationary set of the problem, where e depends on the magnitude of perturbations. Characterization of the attraction sets is given in the general (nonsmooth and nonconvex) case. The results are further strengthened for convex, weakly sharp, and strongly convex problems. Our analysis extends and unifies previously known results on convergence and stability properties of gradient and subgradient methods, including their incremental, parallel, and heavy ball modifications.
Approximation Accuracy, Gradient Methods, and Error Bound for Structured Convex Optimization
, 2009
"... Convex optimization problems arising in applications, possibly as approximations of intractable problems, are often structured and large scale. When the data are noisy, it is of interest to bound the solution error relative to the (unknown) solution of the original noiseless problem. Related to this ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Convex optimization problems arising in applications, possibly as approximations of intractable problems, are often structured and large scale. When the data are noisy, it is of interest to bound the solution error relative to the (unknown) solution of the original noiseless problem. Related to this is an error bound for the linear convergence analysis of firstorder gradient methods for solving these problems. Example applications include compressed sensing, variable selection in regression, TVregularized image denoising, and sensor network localization.