Abstract:
Policy search is a method for approximately solving an optimal control problem by performing a parametric optimization search in a given class of parameterized policies. In order to process a local optimization technique, such as a gradient method, we wish to evaluate the sensitivity of the performance measure with respect to the policy parameters, the so-called policy gradient. This paper is concerned with the estimation of the policy gradient for continuous-time, deterministic state dynamics, in a reinforcement learning framework, that is, when the decision maker does not have a model of the state dynamics.
Citations
|
1933
|
Reinforcement Learning: An introduction
– Sutton, Barto
- 1998
|
|
267
|
Numerical Solution of Stochastic Differential Equations. 3rd revised printing
– Kloeden, Platen
- 1999
|
|
207
|
Stochastic Approximation Algorithms and Applications
– Kushner, Yin
- 1997
|
|
188
|
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
– Williams
- 1992
|
|
176
|
Y.: Policy gradient methods for reinforcement learning with function approximation
– Sutton, McAllester, et al.
- 2000
|
|
162
|
Planning Algorithms
– LAVALLE
- 2006
|
|
79
|
The concentration of measure phenomenon
– Ledoux
- 2001
|
|
45
|
A new look at independence
– Talagrand
- 1996
|
|
27
|
Introduction to Optimization, Optimization
– Polyak
- 1987
|
|
24
|
Infinite-horizon gradient-based policy search
– Baxter, Bartlett
|
|
24
|
Sensitivity analysis via likelihood ratios
– Reiman, Weiss
- 1986
|
|
6
|
Sensitivity analysis using Itô Malliavin calculus and martingales. Application to stochastic optimal control
– Gobet, Munos
- 2002
|
|
5
|
Tsitsiklis. Approximate gradient methods in policy-space optimization of Markov reward processes
– Marbach, N
- 2003
|
|
2
|
Perturbation methods in optimal control. Wiley/Gauthier-Villars Series in Modern Applied Mathematics
– Bensoussan
- 1988
|
|
1
|
Optimal control of a double inverted pendulum on a cart
– Bogdanov
- 2004
|