DMCA

On average versus discounted reward temporal-difference learning (2002)

by John N. Tsitsiklis , Benjamin Van Roy , Satinder Singh
Venue:Machine Learning
Citations:13 - 2 self