Simple statistical gradient-following algorithms for connectionist reinforcement learning (1992)

by Ronald J. Williams
Venue:Machine Learning
Citations:320 - 0 self

Documents Related by Co-Citation

319 Policy Gradient Methods for Reinforcement Learning with Function Approximation – Richard S. Sutton, David Mcallester, Satinder Singh, Yishay Mansour - 1999
127 Gradient Descent for General Reinforcement Learning – Leemon Baird, Andrew Moore - 1998
3760 Reinforcement Learning I: Introduction – Richard S. Sutton, Andrew G. Barto - 1998
224 TD-gammon, a self-teaching backgammon program, achieves master-level play – G J Tesauro - 1994
174 Actor-Critic Algorithms – Vijay R. Konda, John N. Tsitsiklis - 2001
76 Simulation-Based Optimization of Markov Reward Processes – Peter Marbach, John N. Tsitsiklis - 1998
612 Some studies in machine learning using the game of Checkers – Arthur L. Samuel - 1959
1226 Learning to predict by the methods of temporal differences – Richard S. Sutton - 1988
14 Reinforcement learning by stochastic hillclimbing on discounted reward – H Kimura, M Yamamura, S Kobayashi - 1995
151 Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems – Tommi Jaakkola, Satinder P. Singh, Michael I. Jordan - 1995
39 Perturbation realization, potentials, and sensitivity analysis of Markov processes – Xi-ren Cao, Han-fu Chen - 1997
77 Learning finite-state controllers for partially observable environments – Nicolas Meuleau, Leonid Peshkin, Kee-eung Kim, Leslie Pack Kaelbling - 1999
7 Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization – X-R Cao, Y-W Wan - 1998
23 Reinforcement learning in POMDPs with function approximation, in – H Kimura, K Miyazaki, S Kobayashi - 1997
15 Learning to play chess using temporal-differences – J Baxter, A Tridgell, L Weaver - 2000
124 Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems – Satinder Singh, Dimitri Bertsekas
111 Reinforcement Learning with Soft State Aggregation – Satinder P. Singh, Tommi Jaakkola, Michael I. Jordan - 1995
112 A Reinforcement Learning Approach to Job-shop Scheduling – Wei Zhang, Thomas G. Dietterich - 1995
471 Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems – A G Barto, R S Sutton, C W Anderson - 1983