Policy gradient methods for reinforcement learning with function approximation (1999)

by R S Sutton, D A McAllester, S P Singh, Y Mansour
Venue:Neural Information Processing Systems 1057–1063