Infinite-horizon policy-gradient estimation (2001)

by J Baxter, P L Bartlett
Venue:Journal of Artifical Intelligence Research