Learning from Delayed Rewards (1989)

by Christopher J C H Watkins