R-MAX - a general polynomial time algorithm for nearoptimal reinforcement learning (2002)

by R I Brafman, M Tennenholtz
Venue:Journal of Machine Learning Research