Learning from delayed rewards (1989)

by Christopher J C H Watkins