An online policy gradient algorithm for Markov decision processes with continuous states and actions (2014)

by Y Ma, T Zhao, K Hatano, M Sugiyama
Venue:InMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014