Learning with delayed rewards (1989)

by C J Watkins