Policy gradient methods for reinforcement learning with function approximation (2000)

by R S Sutton, D McAllester, S Singh, Y Mansour
Venue:Advances in Neural Information Processing Systems 12