Policy gradient methods for reinforcement learning with function approximation (2000)

by R S Sutton, D McAllester, S Singh, Y Mansour
Venue:In NIPS1999