Y.: Policy gradient methods for reinforcement learning with function approximation (2000)

by R Sutton, D McAllester, S Singh, Mansour
Venue:Advances in Neural Information Processing Systems 12