Learning control of finite markov chains with unknown transition probabilities (1982)

by M Sato, K Abe, H Takeda
Venue:IEEE Trans. on Automatic Control