## Unified Formulation for Training Recurrent Networks with Derivative Adaptive Critics (0)

Venue: | in PROC. International Conference on Neural Networks -ICNN’97 in |

Citations: | 5 - 3 self |

### BibTeX

@INPROCEEDINGS{Feldkamp_unifiedformulation,

author = {L. A. Feldkamp and G.V. Puskorius and D. V. Prokhorov},

title = {Unified Formulation for Training Recurrent Networks with Derivative Adaptive Critics},

booktitle = {in PROC. International Conference on Neural Networks -ICNN’97 in},

year = {},

publisher = {IEEE Press}

}

### OpenURL

### Abstract

We present a procedure for obtaining derivatives used in training a recurrent network that combines in a unified framework the techniques of backpropagation through time and derivative adaptive critics. The resulting formulation is consistent with previous descriptions, but has the advantage of allowing the mentioned techniques to be used together in a proportion that is appropriate to a given problem. 1 Introduction Substantial interest has been generated regarding the use of various methods which can be regarded as forms of approximate dynamic programming (ADP). Commonly used terms for such methods include adaptive critics, reinforcement learning, heuristic dynamic programming, neurodynamic programming, and others. By most measures, systems that involve discrete states have dominated research in this area. Recently, however, attention is being directed to systems with continuous state variables. Several examples of the application of ADP to such systems have appeared, but most if no...

### Citations

438 | A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
- Williams, Zipser
- 1980
(Show Context)
Citation Context ...rivative adaptive critics. In using this form, we must give up the property that, in the appropriate limit, the derivatives become identical with those obtained by real time recurrent learning (RTRL) =-=[6]-=-. We recognize that this choice is arbitrary and merely note that similar derivations exist for the other forms. In this form of BPTT, we obtain derivatives of the squared error of each node k of the ... |

294 |
Backpropagation Through Time: What It Does and How to Do it
- Werbos
- 1990
(Show Context)
Citation Context ...critic form. A similar procedure can be applied to the standard form. 3 Formulation of the Method In order to compute weight updates with dynamic gradient methods, we must estimate the total (ordered =-=[2, 3]-=-) derivatives of network outputs with respect to the weights of the network. A well established procedure is to use truncated backpropagation through time, denoted as BPTT(h). If the truncation depth ... |

124 | Gradient-based learning algorithms for recurrent networks and their computational complexity - Backpropagation: Theory, architectures and applications, Edit by Yves Chauvin and
- Williams, Zipser
- 1995
(Show Context)
Citation Context ...ary static backpropagation. We build our treatment of derivative adaptive critics on BPTT(h) and use the resulting derivatives in the same way. Several slightly different ways of executing BPTT exist =-=[4]-=-. We here present a form closely related to but slightly different from that we have presented previously [5]. We choose to use this particular form here because it leads to a more natural corresponde... |

88 |
Approximate dynamic programming for real-time control and neural modeling
- Werbos
- 1992
(Show Context)
Citation Context ... Foremost among these issues has been the question of how to handle all the states of the system. In most discussions, especially of derivative critic methods such as dual heuristic programming, (DHP)=-=[1]-=-, attention is given primarily to the outputs of the system, generally neglecting the question of how to deal with system states that are not directly measured. In much of our own work we have been co... |

18 |
Dynamic neural network methods applied to on-vehicle idle speed control
- Puskorius, Fcldkamp, et al.
- 1996
(Show Context)
Citation Context ...lting derivatives in the same way. Several slightly different ways of executing BPTT exist [4]. We here present a form closely related to but slightly different from that we have presented previously =-=[5]-=-. We choose to use this particular form here because it leads to a more natural correspondence between the derivatives it produces and those produced by derivative adaptive critics. In using this form... |

7 |
Neural networks, system identification, and control in the chemical process industries, On
- Werbos
- 1992
(Show Context)
Citation Context ...critic form. A similar procedure can be applied to the standard form. 3 Formulation of the Method In order to compute weight updates with dynamic gradient methods, we must estimate the total (ordered =-=[2, 3]-=-) derivatives of network outputs with respect to the weights of the network. A well established procedure is to use truncated backpropagation through time, denoted as BPTT(h). If the truncation depth ... |

3 | Primitive Adaptive Critics
- Prokhorov, Feldkamp
- 1997
(Show Context)
Citation Context ... node does not need a critic. We have observed evidence of this advantage of the new formulation in several simple examples, as well as in a reasonably complex nonlinear control problem (Example 3 of =-=[7]-=-). In the latter case, convergence was at least 50 percent faster than when the usual DHP formulation was used. However, because of the complexity of the dynamics of training recurrent networks, it is... |