Learning to predict by the methods of temporal differences (1988)

by Richard S Sutton
Venue:Machine Learning