Convergence results for single-step on-policyreinforcement-learning algorithms,” (2000)

by S Singh, T Jaakkola, M L Littman, C Szepesvari
Venue:Mach. Learn.,