Results 1  10
of
57
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
, 2010
"... Stochastic subgradient methods are widely used, well analyzed, and constitute effective tools for optimization and online learning. Stochastic gradient methods ’ popularity and appeal are largely due to their simplicity, as they largely follow predetermined procedural schemes. However, most common s ..."
Abstract

Cited by 287 (3 self)
 Add to MetaCart
(Show Context)
Stochastic subgradient methods are widely used, well analyzed, and constitute effective tools for optimization and online learning. Stochastic gradient methods ’ popularity and appeal are largely due to their simplicity, as they largely follow predetermined procedural schemes. However, most common subgradient approaches are oblivious to the characteristics of the data being observed. We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradientbased learning. The adaptation, in essence, allows us to find needles in haystacks in the form of very predictive but rarely seenfeatures. Ourparadigmstemsfromrecentadvancesinstochasticoptimizationandonlinelearning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. In a companion paper, we validate experimentally our theoretical analysis and show that the adaptive subgradient approach outperforms stateoftheart, but nonadaptive, subgradient algorithms. 1
Convex Nondifferentiable Optimization: A Survey Focussed On The Analytic Center Cutting Plane Method.
, 1999
"... We present a survey of nondifferentiable optimization problems and methods with special focus on the analytic center cutting plane method. We propose a selfcontained convergence analysis, that uses the formalism of the theory of selfconcordant functions, but for the main results, we give direct pr ..."
Abstract

Cited by 76 (2 self)
 Add to MetaCart
We present a survey of nondifferentiable optimization problems and methods with special focus on the analytic center cutting plane method. We propose a selfcontained convergence analysis, that uses the formalism of the theory of selfconcordant functions, but for the main results, we give direct proofs based on the properties of the logarithmic function. We also provide an in depth analysis of two extensions that are very relevant to practical problems: the case of multiple cuts and the case of deep cuts. We further examine extensions to problems including feasible sets partially described by an explicit barrier function, and to the case of nonlinear cuts. Finally, we review several implementation issues and discuss some applications.
Boosting as Entropy Projection
, 1999
"... We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be s ..."
Abstract

Cited by 74 (10 self)
 Add to MetaCart
(Show Context)
We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be seen as an approximate solution to the following problem: Find a new distribution that is closest to the old distribution subject to the constraint that the new distribution is orthogonal to the vector of mistakes of the current weak hypothesis. The distance (or divergence) between distributions is measured by the relative entropy. Alternatively, we could say that AdaBoost approximately projects the distribution vector onto a hyperplane dened by the mistake vector. We show that this new view of AdaBoost as an entropy projection is dual to the usual view of AdaBoost as minimizing the normalization factors of the updated distributions.
A stochastic programming approach for supply chain network design under uncertainty
, 2003
"... ..."
Decomposition Algorithms for Stochastic Programming on a Computational Grid
 Computational Optimization and Applications
, 2001
"... . We describe algorithms for twostage stochastic linear programming with recourse and their implementation on a grid computing platform. In particular, we examine serial and asynchronous versions of the Lshaped method and a trustregion method. The parallel platform of choice is the dynamic, heter ..."
Abstract

Cited by 61 (7 self)
 Add to MetaCart
(Show Context)
. We describe algorithms for twostage stochastic linear programming with recourse and their implementation on a grid computing platform. In particular, we examine serial and asynchronous versions of the Lshaped method and a trustregion method. The parallel platform of choice is the dynamic, heterogeneous, opportunistic platform provided by the Condor system. The algorithms are of masterworker type (with the workers being used to solve secondstage problems), and the MW runtime support library (which supports masterworker computations) is key to the implementation. Computational results are presented on large sample average approximations of problems from the literature. 1.
Surrogate Gradient Algorithm for Lagrangian Relaxation
 Journal of Optimization Theory and Applications
, 1999
"... The subgradient method is frequently used to optimize dual functions in Lagrangian relaxation for separable integer programming problems. In the method, all subproblems must be optimally solved to obtain a subgradient direction. In this paper, the "surrogate subgradient method" is develope ..."
Abstract

Cited by 51 (21 self)
 Add to MetaCart
(Show Context)
The subgradient method is frequently used to optimize dual functions in Lagrangian relaxation for separable integer programming problems. In the method, all subproblems must be optimally solved to obtain a subgradient direction. In this paper, the "surrogate subgradient method" is developed, where a proper direction can be obtained without optimally solving all the subproblems. In fact, only approximate optimization of one subproblem is needed to get a proper "surrogate subgradient direction," and the directions are smooth for problems of large size. The convergence of the algorithm is proved. Compared with methods that take effort to find better directions, this method can obtain good directions with much less effort, and provides a new approach that is especially powerful for problems of very large size.
BundleBased Relaxation Methods For Multicommodity Capacitated Fixed Charge Network Design
, 1999
"... To efficiently derive bounds for largescale instances of the capacitated fixedcharge network design problem, Lagrangian relaxations appear promising. This paper presents the results of comprehensive experiments aimed at calibrating and comparing bundle and subgradient methods applied to the optimi ..."
Abstract

Cited by 50 (26 self)
 Add to MetaCart
To efficiently derive bounds for largescale instances of the capacitated fixedcharge network design problem, Lagrangian relaxations appear promising. This paper presents the results of comprehensive experiments aimed at calibrating and comparing bundle and subgradient methods applied to the optimization of Lagrangian duals arising from two Lagrangian relaxations. This study substantiates the fact that bundle methods appear superior to subgradient approaches because they converge faster and are more robust relative to different relaxations, problem characteristics, and selection of the initial parameter values. It also demonstrates that effective lower bounds may be computed efficiently for largescale instances of the capacitated fixedcharge network design problem. Indeed, in a fraction of the time required by a standard simplex approach to solve the linear programming relaxation, the methods we present attain very high quality solutions.
Large Margin Training for Hidden Markov Models with Partially Observed States TrinhMinhTri Do
"... Large margin learning of Continuous Density HMMs with a partially labeled dataset has been extensively studied in the speech and handwriting recognition fields. Yet due to the nonconvexity of the optimization problem, previous works usually rely on severe approximations so that it is still an open ..."
Abstract

Cited by 40 (4 self)
 Add to MetaCart
Large margin learning of Continuous Density HMMs with a partially labeled dataset has been extensively studied in the speech and handwriting recognition fields. Yet due to the nonconvexity of the optimization problem, previous works usually rely on severe approximations so that it is still an open problem. We propose a new learning algorithm that relies on nonconvex optimization and bundle methods and allows tackling the original optimization problem as is. It is proved to converge to a solution with accuracy ɛ with a rate O(1/ɛ). We provide experimental results gained on speech and handwriting recognition that demonstrate the potential of the method. 1.
Optimal Power Generation under Uncertainty via Stochastic Programming
 in: Stochastic Programming Methods and Technical Applications (K. Marti and P. Kall Eds.), Lecture Notes in Economics and Mathematical Systems
, 1997
"... : A power generation system comprising thermal and pumpedstorage hydro plants is considered. Two kinds of models for the costoptimal generation of electric power under uncertain load are introduced: (i) a dynamic model for the shortterm operation and (ii) a power production planning model. In both ..."
Abstract

Cited by 29 (10 self)
 Add to MetaCart
(Show Context)
: A power generation system comprising thermal and pumpedstorage hydro plants is considered. Two kinds of models for the costoptimal generation of electric power under uncertain load are introduced: (i) a dynamic model for the shortterm operation and (ii) a power production planning model. In both cases, the presence of stochastic data in the optimization model leads to multistage and twostage stochastic programs, respectively. Both stochastic programming problems involve a large number of mixedinteger (stochastic) decisions, but their constraints are loosely coupled across operating power units. This is used to design Lagrangian relaxation methods for both models, which lead to a decomposition into stochastic single unit subproblems. For the dynamic model a Lagrangian decomposition based algorithm is described in more detail. Special emphasis is put on a discussion of the duality gap, the efficient solution of the multistage single unit subproblems and on solving the dual problem...