Adaptive step-size policy gradients with average reward metric (2010)

by T Matsubara, T Morimura, J Morimoto
Venue:Journal of Machine Learning Research - Proceedings Track, 13:285–298