Online Markov decision processes under bandit feedback. (2010)

by G Neu, A Gyorgy, C Szepesvari, A Antos
Venue:In Advances in Neural Information Processing Systems 23: 2010.,