Learning algorithms for Markov decision processes with average cost”, (2001)

by J Abounadi, D P Bertsekas, V S Borkar
Venue:SIAM J. Control Optim.