Computing a bias-optimal policy in a discrete-time Markov decision problem (1970)

by E V Denardo
Venue:Oper. Res