Internal-state policy-gradient algorithms for partially observable Markov decision processes (2002)

by Douglas Aberdeen , Jonanthan Baxter
Citations:5 - 1 self