Regularized off-policy TD-learning. (2012)

by B Liu, S Mahadevan, J Liu
Venue:In Advances in Neural Information Processing Systems,