Simple statistical gradient-following algorithms for connectionist reinforcement learning (1992)

by Ronald J. Williams
Venue:Machine Learning
Citations:262 - 0 self

Documents Related by Co-Citation

116 Gradient Descent for General Reinforcement Learning – Leemon Baird, Andrew Moore - 1998
262 Policy Gradient Methods for Reinforcement Learning with Function Approximation – Richard S. Sutton, David Mcallester, Satinder Singh, Yishay Mansour - 1999
2829 Reinforcement Learning I: Introduction – Richard S. Sutton, Andrew G. Barto - 1998
190 Td-gammon, a self-teaching backgammon program, achieves master-level play – G Tesauro - 1994
67 Learning finite-state controllers for partially observable environments – Nicolas Meuleau, Leonid Peshkin, Kee-eung Kim, Leslie Pack Kaelbling - 1999
140 Actor-Critic Algorithms – Vijay R. Konda, John N. Tsitsiklis - 2001
65 Simulation-Based Optimization of Markov Reward Processes – Peter Marbach, John N. Tsitsiklis - 1998
545 Some Studies in Machine Learning using the Game of Checkers – A Samuel - 2000
1060 Learning to predict by the methods of temporal differences – Richard S. Sutton - 1988
13 Reinforcement learning by stochastic hill climbing on discounted reward – Hajime Kimura, Masayuki Yamamura, Shigenobu Kobayashi - 1995
25 Perturbation realization, potentials, and sensitivity analysis of Markov processes – X R Cao, H F Chen - 1997
7 Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization – X-R Cao, Y-W Wan - 1998
13 Learning to play chess using temporal-differences – J Baxter, A Tridgell, L, Weaver
21 Reinforcement learning in pomdps with function approximation – Hajime Kimura, Kazuteru Miyazaki, Shigenobu Kobayashi - 1997
111 Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems – Satinder Singh, Dimitri Bertsekas
97 Reinforcement Learning with Soft State Aggregation – Satinder P. Singh, Tommi Jaakkola, Michael I. Jordan - 1995
102 A Reinforcement Learning Approach to Job-shop Scheduling – Wei Zhang, Thomas G. Dietterich - 1995
424 Neuronlike adaptive elements that can solve difficult learning control problems – Andrew G Barto, Richard S Sutton, Charles W Anderson - 1983
427 Dyna, an Integrated Architecture for Learning, Planning, and Reacting – Richard S. Sutton - 1991