## On-Line Q-Learning Using Connectionist Systems (1994)

Citations: | 292 - 1 self |

### BibTeX

@TECHREPORT{Rummery94on-lineq-learning,

author = {G. A. Rummery and M. Niranjan},

title = {On-Line Q-Learning Using Connectionist Systems},

institution = {},

year = {1994}

}

### Abstract

Reinforcement learning algorithms are a powerful machine learning technique. However, much of the work on these algorithms has been developed with regard to discrete finite-state Markovian problems, which is too restrictive for many real-world environments. Therefore, it is desirable to extend these methods to high dimensional continuous state-spaces, which requires the use of function approximation to generalise the information learnt by the system. In this report, the use of back-propagation neural networks (Rumelhart, Hinton and Williams 1986) is considered in this context. We consider a number of different algorithms based around Q-Learning (Watkins 1989) combined with the Temporal Difference algorithm (Sutton 1988), including a new algorithm (Modified Connectionist Q-Learning), and Q() (Peng and Williams 1994). In addition, we present algorithms for applying these updates on-line during trials, unlike backward replay used by Lin (1993) that requires waiting until the end of each t...

