Prioritized sweeping: Reinforcement learning with less data and less time (1993)

by Andrew W. Moore , Christopher G. Atkeson
Venue:Machine Learning
Citations:275 - 5 self

Documents Related by Co-Citation

2829 Reinforcement Learning I: Introduction – Richard S. Sutton, Andrew G. Barto - 1998
1137 Learning from delayed rewards – C J C H Watkins - 1989
1965 Dynamic Programming – R Bellman - 1957
427 Dyna, an Integrated Architecture for Learning, Planning, and Reacting – Richard S. Sutton - 1991
1060 Learning to predict by the methods of temporal differences – Richard S. Sutton - 1988
178 Reinforcement learning for robots using neural networks – L-J Lin - 1992
85 Efficient Learning and Planning Within the Dyna Framework – Jing Peng, Ronald J. Williams - 1993
222 Motivated Reinforcement Learning – Peter Dayan - 2001
472 Learning to act using real-time dynamic programming – Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh - 1993
94 Hierarchical Learning in Stochastic Domains: Preliminary Results – Leslie Pack Kaelbling - 1993
434 Dynamic programming and Markov processes – R A Howard - 1960
187 Convergence of Stochastic Iterative Dynamic Programming Algorithms – Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh - 1994
173 Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach – Lonnie Chrisman - 1992
55 TD Models: Modeling the World at a Mixture of Time Scales – Richard S. Sutton - 1995
280 Learning in embedded systems – L P Kaelbling - 1993
226 On-Line Q-Learning Using Connectionist Systems – G. A. Rummery, M. Niranjan - 1994
219 Temporal Credit Assignment in Reinforcement Learning – R S Sutton - 1984
1134 Reinforcement learning: a survey – Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore - 1996
331 Dynamic programming: deterministic and stochastic models – D P Bertsekas - 1987