Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions (1993)

by Ronald Williams , Leemon C. Baird
Citations:83 - 1 self

Documents Related by Co-Citation

1226 Learning to predict by the methods of temporal differences – Richard S. Sutton - 1988
1309 Learning from Delayed Rewards – C Watkins - 1989
473 Integrated architectures for learning, planning, and reacting based on approximating dynamic programming – Richard S. Sutton - 1990
2593 On the theory of dynamic programming – Richard E Bellman - 1952
527 Learning to act using real-time dynamic programming – Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh - 1993
237 Residual Algorithms: Reinforcement Learning with Function Approximation – Leemon Baird - 1995
513 Dynamic Programming and Markov Processes – R A Howard - 1960
316 Prioritized sweeping: Reinforcement learning with less data and less time – Andrew W. Moore, Christopher G. Atkeson - 1993
207 Convergence of Stochastic Iterative Dynamic Programming Algorithms – Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh - 1994
208 Stable Function Approximation in Dynamic Programming – Geoffrey J. Gordon - 1995
373 Dynamic Programming: Deterministic and Stochastic Model – D P Bertsekas - 1987
274 Acting Optimally in Partially Observable Stochastic Domains – Anthony R. Cassandra, Leslie Pack Kaelbling, Michael L. Littman - 1994
95 Markov Decision Processes—Discrete Stochastic Dynamic Programming – M L Puterman - 1994
155 The Complexity of Stochastic Games – Anne Condon - 1992
193 Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach – Lonnie Chrisman - 1992
131 On the complexity of solving Markov decision problems – Michael L. Littman, Thomas L. Dean, Leslie Pack Kaelbling - 1995
616 Tsitsiklis. Parallel and Distributed Computation: Numerical Methods – D P Bertsekas, J N - 1989
151 Asynchronous Stochastic Approximation and Q-Learning – John N. Tsitsiklis, Richard Sutton - 1994
373 Temporal Difference Learning and TD-Gammon – G TESAURO - 1995