On average versus discounted reward temporal-difference learning (2002)

by John N. Tsitsiklis, Benjamin Van Roy, Satinder Singh
Venue:Machine Learning