Results 1 - 10
of
53
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
, 1989
"... The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks. These algorithms have: (1) the advantage that they do not require a precis ..."
Abstract
-
Cited by 368 (4 self)
- Add to MetaCart
The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks. These algorithms have: (1) the advantage that they do not require a precisely defined training interval, operating while the network runs; and (2) the disadvantage that they require nonlocal communication in the network being trained and are computationally expensive. These algorithms are shown to allow networks having recurrent connections to learn complex tasks requiring the retention of information over time periods having either fixed or indefinite length. 1 Introduction A major problem in connectionist theory is to develop learning algorithms that can tap the full computational power of neural networks. Much progress has been made with feedforward networks, and attention has recently turned to developing algorithms for networks with recurrent connections, wh...
Gradient calculation for dynamic recurrent neural networks: a survey
- IEEE Transactions on Neural Networks
, 1995
"... Abstract | We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non- xedpoint algorithms, namely backp ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
Abstract | We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non- xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity
, 1995
"... Introduction 1.1 Learning in Recurrent Networks Connectionist networks having feedback connections are interesting for a number of reasons. Biological neural networks are highly recurrently connected, and many authors have studied recurrent network models of various types of perceptual and memory pr ..."
Abstract
-
Cited by 100 (4 self)
- Add to MetaCart
Introduction 1.1 Learning in Recurrent Networks Connectionist networks having feedback connections are interesting for a number of reasons. Biological neural networks are highly recurrently connected, and many authors have studied recurrent network models of various types of perceptual and memory processes. The general property making such networks interesting and potentially useful is that they manifest highly nonlinear dynamical behavior. One such type of dynamical behavior that has received much attention is that of settling to a fixed stable state, but probably of greater importance both biologically and from an engineering viewpoint are time-varying behaviors. Here we consider algorithms for training recurrent networks to perform temporal supervised learning tasks, in which the specification of desired behavior is in the form of specific examples of input and desired output trajectories. One example of such a task is sequence classification, where
Biologically Plausible Error-driven Learning using Local Activation Differences: The Generalized Recirculation Algorithm
- NEURAL COMPUTATION
, 1996
"... The error backpropagation learning algorithm (BP) is generally considered biologically implausible because it does not use locally available, activation-based variables. A version of BP that can be computed locally using bi-directional activation recirculation (Hinton & McClelland, 1988) instead of ..."
Abstract
-
Cited by 70 (10 self)
- Add to MetaCart
The error backpropagation learning algorithm (BP) is generally considered biologically implausible because it does not use locally available, activation-based variables. A version of BP that can be computed locally using bi-directional activation recirculation (Hinton & McClelland, 1988) instead of backpropagated error derivatives is more biologically plausible. This paper presents a generalized version of the recirculation algorithm (GeneRec), which overcomes several limitations of the earlier algorithm by using a generic recurrent network with sigmoidal units that can learn arbitrary input/output mappings. However, the contrastiveHebbian learning algorithm (CHL, a.k.a. DBM or mean field learning) also uses local variables to perform error-driven learning in a sigmoidal recurrent network. CHL was derived in a stochastic framework (the Boltzmann machine), but has been extended to the deterministic case in various ways, all of which rely on problematic approximationsand assumptions, le...
Fast Exact Multiplication by the Hessian
- Neural Computation
, 1994
"... Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly ca ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)|_{r=0}, note that R{grad_w} = Hv and R{w} = v, and then apply R{} to the equations used to compute grad_w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to backpropagation networks, recurrent backpropagation, and stochastic Boltzmann Machines. Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating the need for direct methods.
Learning to Segment Images Using Dynamic Feature Binding
- Neural Computation
, 1991
"... Despite the fact that complex visual scenes contain multiple, overlapping objects, people perform object recognition with ease and accuracy. One operation that facilitates recognition is an early segmentation process in which features of objects are grouped and labeled according to which object t ..."
Abstract
-
Cited by 36 (9 self)
- Add to MetaCart
Despite the fact that complex visual scenes contain multiple, overlapping objects, people perform object recognition with ease and accuracy. One operation that facilitates recognition is an early segmentation process in which features of objects are grouped and labeled according to which object they belong. Current computational systems that perform this operation are based on predefined grouping heuristics. We describe a system called MAGIC that learns how to group features based on a set of presegmented examples. In many cases, MAGIC discovers grouping heuristics similar to those previously proposed, but it also has the capability of finding nonintuitive structural regularities in images. Grouping is performed by a relaxation network that attempts to dynamically bind related features. Features transmit a complex-valued signal (amplitude and phase) to one another; binding can thus be represented by phase locking related features. MAGIC's training procedure is a generalizatio...
On the Analysis of Pattern Sequences by Self-Organizing Maps
, 1994
"... This thesis is organized in three parts. In the first part, the Self-Organizing Map algorithm is introduced. The discussion focuses on the analysis of the Self-Organizing Map algorithm. It is shown that the nonlinear nature of the algorithm makes it difficult to analyze the algorithm except in some ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
This thesis is organized in three parts. In the first part, the Self-Organizing Map algorithm is introduced. The discussion focuses on the analysis of the Self-Organizing Map algorithm. It is shown that the nonlinear nature of the algorithm makes it difficult to analyze the algorithm except in some trivial cases. In the second part the Self-Organizing Map algorithm is applied to several patterns sequence analysis tasks. The first application is a voice quality analysis system. It is shown that the Self-Organizing Map algorithm can be applied to voice analysis by providing the visualization of certain deviations. The key point in the applicability of Self-Organizing Map algorithm is the topological nature of the mapping; similar voice samples are mapped to nearby locations in the map. The second application is a speech recognition system. Through several experiments it is demonstrated that by collecting some time dependent features and using them in conjunction with the basic Self-Organ...
Generalization in Interactive Networks: The Benefits of Inhibitory Competition and Hebbian Learning
- Neural Computation
, 2001
"... Computational models in cognitive neuroscience should ideally use biological properties and powerful computational principles to produce behavior consistent with psychological findings. Error-driven backpropagation is computationally powerful, and has proven useful for modeling a range of psycholo ..."
Abstract
-
Cited by 28 (5 self)
- Add to MetaCart
Computational models in cognitive neuroscience should ideally use biological properties and powerful computational principles to produce behavior consistent with psychological findings. Error-driven backpropagation is computationally powerful, and has proven useful for modeling a range of psychological data, but is not biologically plausible. Several approaches to implementing backpropagation in a biologically plausible fashion converge on the idea of using bidirectional activation propagation in interactive networks to convey error signals. This paper demonstrates two main points about these error-driven interactive networks: (a) they generalize poorly due to attractor dynamics that interfere with the network's ability to systematically produce novel combinatorial representations in response to novel inputs; and (b) this generalization problem can be remedied by adding two widely used mechanistic principles, inhibitory competition and Hebbian learning, that can be independent...
New Results on Recurrent Network Training: Unifying the Algorithms and Accelerating Convergence
- IEEE Trans. Neural Networks
, 2000
"... How to efficiently train recurrent networks remains a challenging and active research topic. Most of the proposed training approaches are based on computational ways to efficiently obtain the gradient of the error function, and can be generally grouped into five major groups. In this study we presen ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
How to efficiently train recurrent networks remains a challenging and active research topic. Most of the proposed training approaches are based on computational ways to efficiently obtain the gradient of the error function, and can be generally grouped into five major groups. In this study we present a derivation that unifies these approaches. We demonstrate that the approaches are only five different ways of solving a particular matrix equation. The second goal of this paper is develop a new algorithm based on the insights gained from the novel formulation. The new algorithm, which is based on approximating the error gradient, has lower computational complexity in computing the weight update than the competing techniques for most typical problems. In addition, it reaches the error minimum in a much smaller number of iterations. A desirable characteristic of recurrent network training algorithms is to be able to update the weights in an on-line fashion. We have also developed an on-line version of the proposed algorithm, that is based on updating the error gradient approximation in a recursive manner. Index Terms---Backpropagation through time, constrained optimization, gradient approximation, optimal control, real time recurrent learning, recurrent networks. I.

