An analysis of temporal-difference learning with function approximation. (1997)

by John N Tsitsiklis, Benjamin Van Roy
Venue:IEEE Transactions on Automatic Control,