R-max – a general polynomial time algorithm for near-optimal reinforcement learning. (2002)

by R Braffman, M Tennenholtz
Venue:Journal of Machine Learning Research,