Off-Policy Temporal-Difference Learning with Funtion Approximation. (2001)

by Doina Precup, Richard S Sutton, Sanjoy Dasgupta
Venue:In Proceedings of the 18th International Conference on Machine Learning,