Results 1  10
of
10
Infinitehorizon policygradient estimation
 Journal of Artificial Intelligence Research
, 2001
"... Gradientbased approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in valuefunction methods. In this paper we introduce � � , a si ..."
Abstract

Cited by 153 (5 self)
 Add to MetaCart
Gradientbased approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in valuefunction methods. In this paper we introduce � � , a simulationbased algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes ( � s) controlled by parameterized stochastic policies. A similar algorithm was proposed by Kimura, Yamamura, and Kobayashi (1995). The algorithm’s chief advantages are that it requires storage of only twice the number of policy parameters, uses one free parameter � � (which has a natural interpretation in terms of biasvariance tradeoff), and requires no knowledge of the underlying state. We prove convergence of � � , and show how the correct choice of the parameter is related to the mixing time of the controlled �. We briefly describe extensions of � � to controlled Markov chains, continuous state, observation and control spaces, multipleagents, higherorder derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter, Bartlett, & Weaver, 2001) we show how the gradient estimates generated by � � can be used in both a traditional stochastic gradient algorithm and a conjugategradient procedure to find local optima of the average reward. 1.
Robust Solutions To Uncertain Semidefinite Programs
 SIAM J. OPTIMIZATION
, 1998
"... In this paper we consider semidefinite programs (SDPs) whose data depend on some unknown but bounded perturbation parameters. We seek "robust" solutions to such programs, that is, solutions which minimize the (worstcase) objective while satisfying the constraints for every possible value of paramet ..."
Abstract

Cited by 82 (8 self)
 Add to MetaCart
In this paper we consider semidefinite programs (SDPs) whose data depend on some unknown but bounded perturbation parameters. We seek "robust" solutions to such programs, that is, solutions which minimize the (worstcase) objective while satisfying the constraints for every possible value of parameters within the given bounds. Assuming the data matrices are rational functions of the perturbation parameters, we show how to formulate sufficient conditions for a robust solution to exist as SDPs. When the perturbation is "full," our conditions are necessary and sufficient. In this case, we provide sufficient conditions which guarantee that the robust solution is unique and continuous (Hölderstable) with respect to the unperturbed problem's data. The approach can thus be used to regularize illconditioned SDPs. We illustrate our results with examples taken from linear programming, maximum norm minimization, polynomial interpolation, and integer programming.
Robust Solutions To Uncertain Semidefinite Programs
, 1998
"... In this paper we consider semidenite programs (SDPs) whose data depends on some unknownbutbounded perturbation parameters. We seek "robust" solutions to such programs, that is, solutions which minimize the (worstcase) objective while satisfying the constraints for every possible values of paramet ..."
Abstract

Cited by 57 (2 self)
 Add to MetaCart
In this paper we consider semidenite programs (SDPs) whose data depends on some unknownbutbounded perturbation parameters. We seek "robust" solutions to such programs, that is, solutions which minimize the (worstcase) objective while satisfying the constraints for every possible values of parameters within the given bounds. Assuming the data matrices are rational functions of the perturbation parameters, we show how to formulate sufficient conditions for a robust solution to exist, as SDPs. When the perturbation is "full", our conditions are necessary and sufficient. In this case, we provide sufficient conditions which guarantee that the robust solution is unique, and continuous (Hölderstable) with respect to the unperturbed problems' data. The approach can thus be used to regularize illconditioned SDPs. We illustrate our results with examples taken from linear programming, maximum norm minimization, polynomial interpolation and integer programming.
Fast greeks by simulation in forward LIBOR models
 Journal of Computational Finance
, 1999
"... This paper develops methods for fast estimation of option price sensitivities in Monte Carlo simulation of term structure models. The models considered are based on discretely compounded forward rates with proportional volatilities. The ef®cient estimation of option deltas, gammas, and vegas are inv ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
This paper develops methods for fast estimation of option price sensitivities in Monte Carlo simulation of term structure models. The models considered are based on discretely compounded forward rates with proportional volatilities. The ef®cient estimation of option deltas, gammas, and vegas are investigated in this setting. Various general methods are available in the Monte Carlo literature for computing such estimates; these methods are tailored to the term structure models and approximations speci®c to this setting are developed in order either to accelerate the methods or to expand their applicability. The authors provide some theoretical support for the application of the basic methods and evaluate the approximations through numerical experiments. The results indicate that the proposed algorithms can substantially improve on standard ®nite difference estimates of sensitivities.
Robust Adaptive Importance Sampling for Normal Random Vectors, preprint hal00334697, Technical report, n o 389, ENPC/CERMICS
, 2008
"... Adaptive Monte Carlo methods are very efficient techniques designed to tune simulation estimators online. In this work, we present an alternative to stochastic approximation to tune the optimal change of measure in the context of importance sampling for normal random vectors. Unlike stochastic appr ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Adaptive Monte Carlo methods are very efficient techniques designed to tune simulation estimators online. In this work, we present an alternative to stochastic approximation to tune the optimal change of measure in the context of importance sampling for normal random vectors. Unlike stochastic approximation, which requires very fine tuning in practice, we propose to use sample average approximation and deterministic optimization techniques to devise a robust and fully automatic variance reduction methodology. The same samples are used in the sample optimization of the importance sampling parameter and in the Monte Carlo computation of the expectation of interest with the optimal measure computed in the previous step. We prove that this highly non independent Monte Carlo estimator is convergent and satisfies a central limit theorem with the optimal limiting variance. Numerical experiments confirm the performance of this estimator: in comparison with the crude Monte Carlo method, the computation time needed to achieve a given precision is divided by a factor going from 2 to 10.
Control of Manufacturing Systems With ReEntrant Lines
, 1996
"... : In this paper a mathematical framework of the control theory of manufacturing systems is proposed. All possible plant structures are classified. A mathematical dynamical model which describes the dynamics of a plant is developed. With the help of this model it is shown how the real time control po ..."
Abstract
 Add to MetaCart
: In this paper a mathematical framework of the control theory of manufacturing systems is proposed. All possible plant structures are classified. A mathematical dynamical model which describes the dynamics of a plant is developed. With the help of this model it is shown how the real time control policy which guarantees the stable work of the plant with the quasi maximal production rate can be designed. Keywords: graph theory, linear programming, integer programming, manufacturing systems, control theory, combinatorial optimization. 1. Introduction In this paper a mathematical framework of the control theory of manufacturing systems is proposed. A manufacturing system (or a plant) operates through the occurrence of discrete events and therefore can be considered as a discrete event system. There are many good references (see, e.g., [4] and bibliography there) dealing with discrete event system analysis. It is known that, for most discrete event systems, analytical methods are not ava...
Asymptotic Theory for the Empirical Haezendonck Risk Measure
"... Haezendonck risk measures is a recently introduced class of risk measures which includes, as its minimal member, the Tail ValueatRisk (TVaR) TVaR arguably the most popular risk measure in global insurance regulation. In applications often one has to estimate the risk measure given a random samp ..."
Abstract
 Add to MetaCart
Haezendonck risk measures is a recently introduced class of risk measures which includes, as its minimal member, the Tail ValueatRisk (TVaR) TVaR arguably the most popular risk measure in global insurance regulation. In applications often one has to estimate the risk measure given a random sample from an unknown distribution. The distribution could either be truly unknown or could be the distribution of a complex function of economic and idiosyncratic variables with the complexity of the function rendering indeterminable its distribution. Hence statistical procedures for the estimation of Haezendonck risk measures is a key requirement for its use in practice. A natural estimator of the Haezendonck risk measure is the Haezendonck risk measure of the empirical distribution, but its statistical properties have not yet been explored in detail. The main goal of this article is to both establish the strong consistency of this estimator and to derive weak convergence limits for this estimator. We also conduct a simulation study to lend insight into the sample sizes required for these asymptotic limits to take hold.
SIMULATION OPTIMIZATION USING METAMODELS
"... Many iterative optimization methods are designed to be used in conjunction with deterministic objective functions. These optimization methods can be difficult to apply to an objective generated by a discreteevent simulation, due to the stochastic nature of the response(s) and the potentially extens ..."
Abstract
 Add to MetaCart
Many iterative optimization methods are designed to be used in conjunction with deterministic objective functions. These optimization methods can be difficult to apply to an objective generated by a discreteevent simulation, due to the stochastic nature of the response(s) and the potentially extensive run times. A metamodel aids simulation optimization by providing a deterministic objective with run times that are generally much shorter than the original discreteevent simulation. Polynomial metamodels generally provide only local approximations, and so a series of metamodels must be fit as the optimization progresses. Other classes of metamodels can provide global fit; fitting can be done either by constructing the global model once at the start of the optimization, or by using the optimization results to identify additional discreteevent runs to refine the global model. This tutorial surveys both local and global metamodelbased optimization methods. 1
FORMULATION AND SOLUTION STRATEGIES FOR NONPARAMETRIC NONLINEAR STOCHASTIC PROGRAMS, WITH AN APPLICATION IN FINANCE
, 2007
"... nonparametric nonlinear stochastic programs, with an application in finance. ..."
Abstract
 Add to MetaCart
nonparametric nonlinear stochastic programs, with an application in finance.