MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Policy Gradient in Continuous Time (2006) [9 citations — 0 self]

by Remi Munos ,  Michael Littman
Journal of Machine Learning Research
Add To MetaCart

Abstract:

Policy search is a method for approximately solving an optimal control problem by performing a parametric optimization search in a given class of parameterized policies. In order to process a local optimization technique, such as a gradient method, we wish to evaluate the sensitivity of the performance measure with respect to the policy parameters, the so-called policy gradient. This paper is concerned with the estimation of the policy gradient for continuous-time, deterministic state dynamics, in a reinforcement learning framework, that is, when the decision maker does not have a model of the state dynamics.

Citations

1933 Reinforcement Learning: An introduction – Sutton, Barto - 1998
267 Numerical Solution of Stochastic Differential Equations. 3rd revised printing – Kloeden, Platen - 1999
207 Stochastic Approximation Algorithms and Applications – Kushner, Yin - 1997
188 Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning – Williams - 1992
176 Y.: Policy gradient methods for reinforcement learning with function approximation – Sutton, McAllester, et al. - 2000
162 Planning Algorithms – LAVALLE - 2006
79 The concentration of measure phenomenon – Ledoux - 2001
45 A new look at independence – Talagrand - 1996
27 Introduction to Optimization, Optimization – Polyak - 1987
24 Infinite-horizon gradient-based policy search – Baxter, Bartlett
24 Sensitivity analysis via likelihood ratios – Reiman, Weiss - 1986
6 Sensitivity analysis using Itô Malliavin calculus and martingales. Application to stochastic optimal control – Gobet, Munos - 2002
5 Tsitsiklis. Approximate gradient methods in policy-space optimization of Markov reward processes – Marbach, N - 2003
2 Perturbation methods in optimal control. Wiley/Gauthier-Villars Series in Modern Applied Mathematics – Bensoussan - 1988
1 Optimal control of a double inverted pendulum on a cart – Bogdanov - 2004