R-max: A general polynomial time algorithm for near-optimal reinforcement learning (2001)

by R Brafman, M Tennenholtz
Venue:In Proceedings of IJCAI’01