Policy Gradient Methods for Reinforcement Learning with Function Approximation (2000)

by R S Sutton, D Mcallester, S Singh, Y Mansour
Venue:In Advances in Neural Information Processing Systems 12 (NIPS'00