Experiments with Infinite-Horizon, Policy-Gradient Estimation (2001)

by J Baxter, P L Bartlett, L Weaver
Venue:Journal of Artificial Intelligence Research