Results 1 - 10
of
15
Mathematical Programming for Data Mining: Formulations and Challenges
- INFORMS Journal on Computing
, 1998
"... This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research ch ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research challenges, and outline opportunities for contributions by the optimization research communities. Towards these goals, we include formulations of the basic categories of data mining methods as optimization problems. We also provide examples of successful mathematical programming approaches to some data mining problems. keywords: data analysis, data mining, mathematical programming methods, challenges for massive data sets, classification, clustering, prediction, optimization. To appear: INFORMS: Journal of Compting, special issue on Data Mining, A. Basu and B. Golden (guest editors). Also appears as Mathematical Programming Technical Report 98-01, Computer Sciences Department, University of Wi...
Incremental Subgradient Methods For Nondifferentiable Optimization
, 2001
"... We consider a class of subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions. This type of minimization arises in a dual context from Lagrangian relaxation of the coupling constraints of large scale separable problems. The idea is to p ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
We consider a class of subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions. This type of minimization arises in a dual context from Lagrangian relaxation of the coupling constraints of large scale separable problems. The idea is to perform the subgradient iteration incrementally, by sequentially taking steps along the subgradients of the component functions, with intermediate adjustment of the variables after processing each component function. This incremental approach has been very successful in solving large di#erentiable least squares problems, such as those arising in the training of neural networks, and it has resulted in a much better practical rate of convergence than the steepest descent method. In this paper, we establish the convergence properties of a number of variants of incremental subgradient methods, including some that are stochastic. Based on the analysis and computational experiments, the methods appear very promising and e#ective for important classes of large problems. A particularly interesting discovery is that by randomizing the order of selection of component functions for iteration, the convergence rate is substantially improved. 1 Research supported by NSF under Grant ACI-9873339.
Dual averaging methods for regularized stochastic learning and online optimization
- In Advances in Neural Information Processing Systems 23
, 2009
"... We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1-norm for promoting sparsity. We develop extensions of Nes ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1-norm for promoting sparsity. We develop extensions of Nesterov’s dual averaging method, that can exploit the regularization structure in an online setting. At each iteration of these methods, the learning variables are adjusted by solving a simple minimization problem that involves the running average of all past subgradients of the loss function and the whole regularization term, not just its subgradient. In the case of ℓ1-regularization, our method is particularly effective in obtaining sparse solutions. We show that these methods achieve the optimal convergence rates or regret bounds that are standard in the literature on stochastic and online convex optimization. For stochastic learning problems in which the loss functions have Lipschitz continuous gradients, we also present an accelerated version of the dual averaging method.
The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography
- SIAM J. Optim
, 2001
"... Abstract. We describe an optimization problem arising in reconstructing 3D medical images from Positron Emission Tomography (PET). A mathematical model of the problem, based on the Maximum Likelihood principle is posed as a problem of minimizing a convex function of several millions variables over t ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Abstract. We describe an optimization problem arising in reconstructing 3D medical images from Positron Emission Tomography (PET). A mathematical model of the problem, based on the Maximum Likelihood principle is posed as a problem of minimizing a convex function of several millions variables over the standard simplex. To solve a problem of these characteristics, we develop and implement a new algorithm, Ordered Subsets Mirror Descent, and demonstrate, theoretically and computationally, that it is well suited for solving the PET reconstruction problem. Key words: positron emission tomography, maximum likelihood, image reconstruction, convex optimization, mirror descent. 1
Error Stability Properties of Generalized Gradient-Type Algorithms
- Journal of Optimization Theory and Applications
, 1998
"... Abstract. We present a unified framework for convergence analysis of generalized subgradient-type algorithms in the presence of perturbations. A principal novel feature of our analysis is that perturbations need not tend to zero in the limit. It is established that the iterates of the algorithms are ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Abstract. We present a unified framework for convergence analysis of generalized subgradient-type algorithms in the presence of perturbations. A principal novel feature of our analysis is that perturbations need not tend to zero in the limit. It is established that the iterates of the algorithms are attracted, in a certain sense, to an e-stationary set of the problem, where e depends on the magnitude of perturbations. Characterization of the attraction sets is given in the general (nonsmooth and nonconvex) case. The results are further strengthened for convex, weakly sharp, and strongly convex problems. Our analysis extends and unifies previously known results on convergence and stability properties of gradient and subgradient methods, including their incremental, parallel, and heavy ball modifications.
Incremental Gradient Algorithms with Stepsizes Bounded Away From Zero
- Computational Opt. and Appl
, 1998
"... Abstract. We consider the class of incremental gradient methods for minimizing a sum of continuously differentiable functions. An important novel feature of our analysis is that the stepsizes are kept bounded away from zero. We derive the first convergence results of any kind for this computationall ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Abstract. We consider the class of incremental gradient methods for minimizing a sum of continuously differentiable functions. An important novel feature of our analysis is that the stepsizes are kept bounded away from zero. We derive the first convergence results of any kind for this computationally important case. In particular, we show that a certain ε-approximate solution can be obtained and establish the linear dependence of ε on the stepsize limit. Incremental gradient methods are particularly well-suited for large neural network training problems where obtaining an approximate solution is typically sufficient and is often preferable to computing an exact solution. Thus, in the context of neural networks, the approach presented here is related to the principle of tolerant training. Our results justify numerous stepsize rules that were derived on the basis of extensive numerical experimentation but for which no theoretical analysis was previously available. In addition, convergence to (exact) stationary points is established when the gradient satisfies a certain growth property.
A convergent incremental gradient method with constant step size
- SIAM J. OPTIM
, 2004
"... An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits regions in which the gradient is small infinitely often. Under certain unimodality assumptions, global convergence is established. In the quadratic case, a global linear rate of convergence is shown. The method is applied to distributed optimization problems arising in wireless sensor networks, and numerical experiments compare the new method with the standard incremental gradient method.
Convergence analysis of perturbed feasible descent methods
- Journal of Optimization Theory and Applications
, 1996
"... Abstract. We develop a general approach to convergence analysis of feasible descent methods in the presence of perturbations. The important novel feature of our analysis is that perturbations need not tend to zero in the limit. In that case, standard convergence analysis techniques are not applicabl ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Abstract. We develop a general approach to convergence analysis of feasible descent methods in the presence of perturbations. The important novel feature of our analysis is that perturbations need not tend to zero in the limit. In that case, standard convergence analysis techniques are not applicable. Therefore, a new approach is needed. We show that, in the presence of perturbations, a certain e-approximate solution can be obtained, where e depends linearly on the level of perturbations. Applications to the gradient projection, proximal minimization, extragradient and incremental gradient algorithms are described. Key Words. Feasible descent methods, perturbation analysis, approximate solutions. 1.
Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
- In NIPS
, 2011
"... Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work a ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called Hogwild! which allows processors access to shared memory with the possibility of overwriting each other’s work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then Hogwild! achieves a nearly optimal rate of convergence. We demonstrate experimentally that Hogwild! outperforms alternative schemes that use locking by an order of magnitude.

