Infinite-horizon policy-gradient estimation (2001)

by Jonathan Baxter, Peter L Bartlett
Venue:Journal of Artificial Intelligence Research