Results 1  10
of
12
Learning LongTerm Dependencies with Gradient Descent is Difficult
 TO APPEAR IN THE SPECIAL ISSUE ON RECURRENT NETWORKS OF THE IEEE TRANSACTIONS ON NEURAL NETWORKS
"... Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in th ..."
Abstract

Cited by 255 (24 self)
 Add to MetaCart
Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We showwhy gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a tradeoff between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered.
An Application of Recurrent Nets to Phone Probability Estimation
 IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract

Cited by 193 (8 self)
 Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
Gradient calculation for dynamic recurrent neural networks: a survey
 IEEE Transactions on Neural Networks
, 1995
"... Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backp ..."
Abstract

Cited by 136 (3 self)
 Add to MetaCart
Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
GradientBased Learning Algorithms for Recurrent Networks and Their Computational Complexity
, 1995
"... Introduction 1.1 Learning in Recurrent Networks Connectionist networks having feedback connections are interesting for a number of reasons. Biological neural networks are highly recurrently connected, and many authors have studied recurrent network models of various types of perceptual and memory pr ..."
Abstract

Cited by 117 (4 self)
 Add to MetaCart
Introduction 1.1 Learning in Recurrent Networks Connectionist networks having feedback connections are interesting for a number of reasons. Biological neural networks are highly recurrently connected, and many authors have studied recurrent network models of various types of perceptual and memory processes. The general property making such networks interesting and potentially useful is that they manifest highly nonlinear dynamical behavior. One such type of dynamical behavior that has received much attention is that of settling to a fixed stable state, but probably of greater importance both biologically and from an engineering viewpoint are timevarying behaviors. Here we consider algorithms for training recurrent networks to perform temporal supervised learning tasks, in which the specification of desired behavior is in the form of specific examples of input and desired output trajectories. One example of such a task is sequence classification, where
Input/output hmms for sequence processing
 IEEE Transactions on Neural Networks
, 1996
"... We consider problems of sequence processing and propose a solution based on a discrete state model in order to represent past context. Weintroduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation ..."
Abstract

Cited by 98 (12 self)
 Add to MetaCart
We consider problems of sequence processing and propose a solution based on a discrete state model in order to represent past context. Weintroduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation we call Input/Output Hidden Markov Model (IOHMM). It can be trained by the EM or GEM algorithms, considering state trajectories as missing data, which decouples temporal credit assignment and actual parameter estimation. The model presents similarities to hidden Markov models (HMMs), but allows us to map input sequences to output sequences, using the same processing style as recurrent neural networks. IOHMMs are trained using a more discriminant learning paradigm than HMMs, while potentially taking advantage of the EM algorithm. We demonstrate that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem. Experimental results are presented for the seven Tomita grammars, showing that these adaptive models can attain excellent generalization.
Dynamic Recurrent Neural Networks
, 1990
"... We survey learning algorithms for recurrent neural networks with hidden units and attempt to put the various techniques into a common framework. We discuss fixpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and nonfixpoint algorithms, namely backpro ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
We survey learning algorithms for recurrent neural networks with hidden units and attempt to put the various techniques into a common framework. We discuss fixpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and nonfixpoint algorithms, namely backpropagation through time, Elman's history cutoff nets, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, is also discussed. In many cases, the unified presentation leads to generalizations of various sorts. Some simulations are presented, and at the end, issues of computational complexity are addressed. This research was sponsored in part by The Defense Advanced Research Projects Agency, Information Science and Technology Office, under the title "Research on Parallel Computing", ARPA Order No. 7330, issued by DARPA/CMO under Contract MDA97290C0035 and in part by the National Science Foundation under grant number EET8716324 and i...
An EM Approach to Learning Sequential Behavior
, 1994
"... We consider problems of sequence processing and we propose a solution based on a discrete state model. We introduce a recurrent architecture having a modular structure that allocates subnetworks to discrete states. Different subnetworks are model the dynamics (state transition) and the output of the ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
We consider problems of sequence processing and we propose a solution based on a discrete state model. We introduce a recurrent architecture having a modular structure that allocates subnetworks to discrete states. Different subnetworks are model the dynamics (state transition) and the output of the model, conditional on the previous state and an external input. The model has a statistical interpretation and can be trained by the EM or GEM algorithms, considering state trajectories as missing data. This allows to decouple temporal credit assignment and actual parameters estimation. The model presents similarities to hidden Markov models, but allows to map input sequences to output sequences, using the same processing style of recurrent networks. For this reason we call it Input/Output HMM (IOHMM). Another remarkable difference is that IOHMMs are trained using a supervised learning paradigm (while potentially taking advantage of the EM algorithm), whereas standard HMMs are trained by an...
Digital systems for neural networks
 Digital Signal Processing Technology, volume CR57 of Critical Reviews Series, pages 31445. SPIE Optical Engineering
, 1995
"... Neural networks are nonlinear static or dynamical systems that learn to solve problems from examples. Those learning algorithms that require a lot of computing power could benefit from fast dedicated hardware. This paper presents an overview of digital systems to implement neural networks. We consi ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Neural networks are nonlinear static or dynamical systems that learn to solve problems from examples. Those learning algorithms that require a lot of computing power could benefit from fast dedicated hardware. This paper presents an overview of digital systems to implement neural networks. We consider three options for implementing neural networks in digital systems: serial computers, parallel systems with standard digital components, and parallel systems with specialpurpose digital devices. We describe many examples under each option, with an emphasis on commercially available systems. We discuss the trend toward more general architectures, we mention a few hybrid and analog systems that can complement digital systems, and we try to answer questions that came to our minds as prospective users of these systems. We conclude that support software and in general, system integration, is beginning to reach the level of versatility that many researchers will require. The next step appears ...
Some Observations on the Use of the Extended Kalman Filter as a Recurrent Network Learning Algorithm
, 1992
"... The extended Kalman filter (EKF) can be used as an online algorithm to determine the weights in a recurrent network given target outputs as it runs. This involves forming an augmented network state vector consisting of all unit activities and weights. This report notes some relationships between th ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
The extended Kalman filter (EKF) can be used as an online algorithm to determine the weights in a recurrent network given target outputs as it runs. This involves forming an augmented network state vector consisting of all unit activities and weights. This report notes some relationships between the EKF as applied to recurrent net learning and some simpler techniques that are more widely used. In particular, it is shown that making certain simplifications to the EKF gives rise to an algorithm essentially identical to the realtime recurrent learning (RTRL) algorithm. That is, the resulting algorithm both maintains the RTRL data structure and prescribes identical weight changes. In addition, because the EKF also involves adjusting unit activity in the network, it provides a principled generalization of the useful "teacher forcing" technique. Very preliminary experiments on simple finitestate Boolean tasks indicate that the EKF works well for these, generally giving substantial speedup...
Temporally Continuous vs. Clocked Networks
 In Neural Networks in Robotics
, 1992
"... We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, and continue with some “tricks of the trade ” of continuous time and recurrent neural networks. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, and continue with some “tricks of the trade ” of continuous time and recurrent neural networks.