Results 1 - 10
of
23
Boosting as Entropy Projection
, 1999
"... We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be seen ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be seen as an approximate solution to the following problem: Find a new distribution that is closest to the old distribution subject to the constraint that the new distribution is orthogonal to the vector of mistakes of the current weak hypothesis. The distance (or divergence) between distributions is measured by the relative entropy. Alternatively, we could say that AdaBoost approximately projects the distribution vector onto a hyperplane dened by the mistake vector. We show that this new view of AdaBoost as an entropy projection is dual to the usual view of AdaBoost as minimizing the normalization factors of the updated distributions.
Decomposition Algorithms for Stochastic Programming on a Computational Grid
- Computational Optimization and Applications
, 2001
"... . We describe algorithms for two-stage stochastic linear programming with recourse and their implementation on a grid computing platform. In particular, we examine serial and asynchronous versions of the L-shaped method and a trust-region method. The parallel platform of choice is the dynamic, heter ..."
Abstract
-
Cited by 44 (9 self)
- Add to MetaCart
. We describe algorithms for two-stage stochastic linear programming with recourse and their implementation on a grid computing platform. In particular, we examine serial and asynchronous versions of the L-shaped method and a trust-region method. The parallel platform of choice is the dynamic, heterogeneous, opportunistic platform provided by the Condor system. The algorithms are of master-worker type (with the workers being used to solve second-stage problems), and the MW runtime support library (which supports masterworker computations) is key to the implementation. Computational results are presented on large sample average approximations of problems from the literature. 1.
Convex Nondifferentiable Optimization: A Survey Focussed On The Analytic Center Cutting Plane Method.
, 1999
"... We present a survey of nondifferentiable optimization problems and methods with special focus on the analytic center cutting plane method. We propose a self-contained convergence analysis, that uses the formalism of the theory of self-concordant functions, but for the main results, we give direct pr ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
We present a survey of nondifferentiable optimization problems and methods with special focus on the analytic center cutting plane method. We propose a self-contained convergence analysis, that uses the formalism of the theory of self-concordant functions, but for the main results, we give direct proofs based on the properties of the logarithmic function. We also provide an in depth analysis of two extensions that are very relevant to practical problems: the case of multiple cuts and the case of deep cuts. We further examine extensions to problems including feasible sets partially described by an explicit barrier function, and to the case of nonlinear cuts. Finally, we review several implementation issues and discuss some applications.
Surrogate Gradient Algorithm for Lagrangian Relaxation
- Journal of Optimization Theory and Applications
, 1999
"... The subgradient method is frequently used to optimize dual functions in Lagrangian relaxation for separable integer programming problems. In the method, all subproblems must be optimally solved to obtain a subgradient direction. In this paper, the "surrogate subgradient method" is developed, where a ..."
Abstract
-
Cited by 31 (18 self)
- Add to MetaCart
The subgradient method is frequently used to optimize dual functions in Lagrangian relaxation for separable integer programming problems. In the method, all subproblems must be optimally solved to obtain a subgradient direction. In this paper, the "surrogate subgradient method" is developed, where a proper direction can be obtained without optimally solving all the subproblems. In fact, only approximate optimization of one subproblem is needed to get a proper "surrogate subgradient direction," and the directions are smooth for problems of large size. The convergence of the algorithm is proved. Compared with methods that take effort to find better directions, this method can obtain good directions with much less effort, and provides a new approach that is especially powerful for problems of very large size.
Optimal Power Generation under Uncertainty via Stochastic Programming
- in: Stochastic Programming Methods and Technical Applications (K. Marti and P. Kall Eds.), Lecture Notes in Economics and Mathematical Systems
, 1997
"... : A power generation system comprising thermal and pumpedstorage hydro plants is considered. Two kinds of models for the cost-optimal generation of electric power under uncertain load are introduced: (i) a dynamic model for the short-term operation and (ii) a power production planning model. In both ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
: A power generation system comprising thermal and pumpedstorage hydro plants is considered. Two kinds of models for the cost-optimal generation of electric power under uncertain load are introduced: (i) a dynamic model for the short-term operation and (ii) a power production planning model. In both cases, the presence of stochastic data in the optimization model leads to multi-stage and two-stage stochastic programs, respectively. Both stochastic programming problems involve a large number of mixedinteger (stochastic) decisions, but their constraints are loosely coupled across operating power units. This is used to design Lagrangian relaxation methods for both models, which lead to a decomposition into stochastic single unit subproblems. For the dynamic model a Lagrangian decomposition based algorithm is described in more detail. Special emphasis is put on a discussion of the duality gap, the efficient solution of the multi-stage single unit subproblems and on solving the dual problem...
Large Margin Training for Hidden Markov Models with Partially Observed States Trinh-Minh-Tri Do
"... Large margin learning of Continuous Density HMMs with a partially labeled dataset has been extensively studied in the speech and handwriting recognition fields. Yet due to the non-convexity of the optimization problem, previous works usually rely on severe approximations so that it is still an open ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Large margin learning of Continuous Density HMMs with a partially labeled dataset has been extensively studied in the speech and handwriting recognition fields. Yet due to the non-convexity of the optimization problem, previous works usually rely on severe approximations so that it is still an open problem. We propose a new learning algorithm that relies on non-convex optimization and bundle methods and allows tackling the original optimization problem as is. It is proved to converge to a solution with accuracy ɛ with a rate O(1/ɛ). We provide experimental results gained on speech and handwriting recognition that demonstrate the potential of the method. 1.
Nonsmooth techniques for stabilizing linear systems
- 2007 American Control Conference (2007 ACC), Times Square
"... We discuss closed-loop stabilization of linear time-invariant dynamical systems, a problems which frequently arises in controller synthesis, either as a stand-alone task, or to initialize algorithms for H∞ synthesis or related problems. Classical stabilization methods based on Lyapunov or Riccati eq ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
We discuss closed-loop stabilization of linear time-invariant dynamical systems, a problems which frequently arises in controller synthesis, either as a stand-alone task, or to initialize algorithms for H∞ synthesis or related problems. Classical stabilization methods based on Lyapunov or Riccati equations appear to be inefficient for large systems. Recently, non-smooth optimization methods like gradient sampling [19] have been successfully used to minimize the spectral abscissa of the closed-loop state matrix (the largest real part of its eigenvalues) to solve the stabilization problem. These methods have to address the non-smooth and even non-Lipschitz character of the spectral abscissa function. In this work, we develop an alternative non-smooth technique for solving similar problems, with the option to incorporate second-order elements to speed-up convergence to local minima. Using several case studies, the proposed technique is compared to more conventional approaches including direct search methods and techniques where the spectral abscissa minimization problem is recast as a traditional smooth non-linear mathematical programming problem. 1 Introduction and
Distributed Dual Averaging in Networks
"... The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sh ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1

