On-line policy improvement using monte-carlo search (1996)

by Gerald Tesauro, Gregory R Galperin
Venue:In Proceedings of Advances in Neural Information Processing Systems. Robert