Learning from Delayed Rewards (1989)

by C J C H Watkins