On average versus discounted reward temporaldifference learning. (2002)

by J N Tsitsiklis, B Van Roy
Venue:Machine Learning,