Learning to act using real-time dynamic programming (1993)

by Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh