Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environments (1990)
| Citations: | 2 - 0 self |
BibTeX
@TECHREPORT{Schmidhuber90makingthe,
author = {Jürgen Schmidhuber},
title = {Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environments },
institution = {},
year = {1990}
}
Years of Citing Articles
OpenURL
Abstract
First a brief introduction to reinforcement learning and to supervised learning with recurrent networks in non-stationary environments is given. The introduction also covers the basic principle of `gradient descent through frozen model networks' as employed by Werbos, Jordan, Munro, Robinson and Fallside, and Nguyen and Widrow. This principle allows supervised learning techniques to be employed for reinforcement learning. Then a general algorithm for a reinforcement learning neural network with internal and external feedback in a non-stationary reactive environment is described. Internal feedback is given by connections that allow cyclic activation flow through the network. External feedback is given by output actions that may change the state of the environment thus influencing subsequent input activations. The network's main goal is to receive as much reinforcement (or as little `pain') as possible. In theory, arbitrary time lags between actions and ulterior consequences ar...







