|
116
|
Gradient Descent for General Reinforcement Learning
– Leemon Baird, Andrew Moore
- 1998
|
|
262
|
Policy Gradient Methods for Reinforcement Learning with Function Approximation
– Richard S. Sutton, David Mcallester, Satinder Singh, Yishay Mansour
- 1999
|
|
2829
|
Reinforcement Learning I: Introduction
– Richard S. Sutton, Andrew G. Barto
- 1998
|
|
190
|
Td-gammon, a self-teaching backgammon program, achieves master-level play
– G Tesauro
- 1994
|
|
67
|
Learning finite-state controllers for partially observable environments
– Nicolas Meuleau, Leonid Peshkin, Kee-eung Kim, Leslie Pack Kaelbling
- 1999
|
|
140
|
Actor-Critic Algorithms
– Vijay R. Konda, John N. Tsitsiklis
- 2001
|
|
65
|
Simulation-Based Optimization of Markov Reward Processes
– Peter Marbach, John N. Tsitsiklis
- 1998
|
|
545
|
Some Studies in Machine Learning using the Game of Checkers
– A Samuel
- 2000
|
|
1060
|
Learning to predict by the methods of temporal differences
– Richard S. Sutton
- 1988
|
|
13
|
Reinforcement learning by stochastic hill climbing on discounted reward
– Hajime Kimura, Masayuki Yamamura, Shigenobu Kobayashi
- 1995
|
|
25
|
Perturbation realization, potentials, and sensitivity analysis of Markov processes
– X R Cao, H F Chen
- 1997
|
|
7
|
Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization
– X-R Cao, Y-W Wan
- 1998
|
|
13
|
Learning to play chess using temporal-differences
– J Baxter, A Tridgell, L, Weaver
|
|
21
|
Reinforcement learning in pomdps with function approximation
– Hajime Kimura, Kazuteru Miyazaki, Shigenobu Kobayashi
- 1997
|
|
111
|
Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems
– Satinder Singh, Dimitri Bertsekas
|
|
97
|
Reinforcement Learning with Soft State Aggregation
– Satinder P. Singh, Tommi Jaakkola, Michael I. Jordan
- 1995
|
|
102
|
A Reinforcement Learning Approach to Job-shop Scheduling
– Wei Zhang, Thomas G. Dietterich
- 1995
|
|
424
|
Neuronlike adaptive elements that can solve difficult learning control problems
– Andrew G Barto, Richard S Sutton, Charles W Anderson
- 1983
|
|
427
|
Dyna, an Integrated Architecture for Learning, Planning, and Reacting
– Richard S. Sutton
- 1991
|