Results 1  10
of
39
Input/output hmms for sequence processing
 IEEE Transactions on Neural Networks
, 1996
"... We consider problems of sequence processing and propose a solution based on a discrete state model in order to represent past context. Weintroduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation ..."
Abstract

Cited by 98 (12 self)
 Add to MetaCart
We consider problems of sequence processing and propose a solution based on a discrete state model in order to represent past context. Weintroduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation we call Input/Output Hidden Markov Model (IOHMM). It can be trained by the EM or GEM algorithms, considering state trajectories as missing data, which decouples temporal credit assignment and actual parameter estimation. The model presents similarities to hidden Markov models (HMMs), but allows us to map input sequences to output sequences, using the same processing style as recurrent neural networks. IOHMMs are trained using a more discriminant learning paradigm than HMMs, while potentially taking advantage of the EM algorithm. We demonstrate that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem. Experimental results are presented for the seven Tomita grammars, showing that these adaptive models can attain excellent generalization.
Learning to Forget: Continual Prediction with LSTM
 NEURAL COMPUTATION
, 1999
"... Long ShortTerm Memory (LSTM, Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences w ..."
Abstract

Cited by 51 (25 self)
 Add to MetaCart
Long ShortTerm Memory (LSTM, Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indenitely and eventually cause the network to break down. Our remedy is a novel, adaptive \forget gate" that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve continual versions of these problems. LSTM with forget gates, however, easily solves them in an elegant way.
Natural language grammatical inference with recurrent neural networks
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1998
"... This paper examines the inductive inference of a complex grammar with neural networks  specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the P ..."
Abstract

Cited by 45 (1 self)
 Add to MetaCart
This paper examines the inductive inference of a complex grammar with neural networks  specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or GovernmentandBinding theory. Neural networks are trained, without the division into learned vs. innate components assumed by Chomsky, in an attempt to produce the same judgments as native speakers on sharply grammatical/ungrammatical data. How a recurrent neural network could possess linguistic capability and the properties of various common recurrent neural network architectures are discussed. The problem exhibits training behavior which is often not present with smaller grammars and training was initially difficult. However, after implementing several techniques aimed at improving the convergence of the gradient descent backpropagationthroughtime training algorithm, significant learning was possible. It was found that certain architectures are better able to learn an appropriate grammar. The operation of the networks and their training is analyzed. Finally, the extraction of rules in the form of deterministic finite state automata is investigated.
New Results on Recurrent Network Training: Unifying the Algorithms and Accelerating Convergence
 IEEE TRANS. NEURAL NETWORKS
, 2000
"... How to efficiently train recurrent networks remains a challenging and active research topic. Most of the proposed training approaches are based on computational ways to efficiently obtain the gradient of the error function, and can be generally grouped into five major groups. In this study we presen ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
How to efficiently train recurrent networks remains a challenging and active research topic. Most of the proposed training approaches are based on computational ways to efficiently obtain the gradient of the error function, and can be generally grouped into five major groups. In this study we present a derivation that unifies these approaches. We demonstrate that the approaches are only five different ways of solving a particular matrix equation. The second goal of this paper is develop a new algorithm based on the insights gained from the novel formulation. The new algorithm, which is based on approximating the error gradient, has lower computational complexity in computing the weight update than the competing techniques for most typical problems. In addition, it reaches the error minimum in a much smaller number of iterations. A desirable characteristic of recurrent network training algorithms is to be able to update the weights in an online fashion. We have also developed an online version of the proposed algorithm, that is based on updating the error gradient approximation in a recursive manner.
Dynamical Recurrent Neural Networks  Towards Environmental Time Series Prediction
, 1995
"... Dynamical Recurrent Neural Networks (DRNN) (Aussem 1994) are a class of fully recurrent networks obtained by modeling synapses as autoregressive filters. By virtue of their internal dynamic, these networks approximate the underlying law governing the time series by a system of nonlinear difference e ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
Dynamical Recurrent Neural Networks (DRNN) (Aussem 1994) are a class of fully recurrent networks obtained by modeling synapses as autoregressive filters. By virtue of their internal dynamic, these networks approximate the underlying law governing the time series by a system of nonlinear difference equations of internal variables. They therefore provide historysensitive forecasts without having to be explicitly fed with external memory. The model is trained by a local and recursive error propagation algorithm called temporalrecurrentbackpropagation. The efficiency of the procedure benefits from the exponential decay of the gradient terms backpropagated through the adjoint network. We assess the predictive ability of the DRNN model with meteorological and astronomical time series recorded around the candidate observation sites for the future VLT telescope. The hope is that reliable environmental forecasts provided with the model will allow the modern telescopes to be preset, a few hou...
Recurrent SOM with Local Linear Models in Time Series Prediction
 In 6th European Symposium on Artificial Neural Networks
, 1998
"... Recurrent SelfOrganizing Map (RSOM) is studied in three different time series prediction cases. RSOM is used to cluster the series into local data sets, for which corresponding local linear models are estimated. RSOM includes recurrent difference vector in each unit which allows storing context fro ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
Recurrent SelfOrganizing Map (RSOM) is studied in three different time series prediction cases. RSOM is used to cluster the series into local data sets, for which corresponding local linear models are estimated. RSOM includes recurrent difference vector in each unit which allows storing context from the past input vectors. Multilayer perceptron (MLP) network and autoregressive (AR) model are used to compare the prediction results. In studied cases RSOM shows promising results.
Learning a Class of Large Finite State Machines with a Recurrent Neural Network
, 1995
"... One of the issues in any learning model is how it scales with problem size. The problem of learning finite state machine (FSMs) from examples with recurrent neural networks has been extensively explored. However, these results are somewhat disappointing in the sense that the machines that can be le ..."
Abstract

Cited by 20 (11 self)
 Add to MetaCart
One of the issues in any learning model is how it scales with problem size. The problem of learning finite state machine (FSMs) from examples with recurrent neural networks has been extensively explored. However, these results are somewhat disappointing in the sense that the machines that can be learned are too small to be competitive with existing grammatical inference algorithms. We show that a type of recurrent neural network (Narendra & Parthasarathy, 1990, IEEE Trans. Neural Networks, 1, 427) which has feedback but no hidden state neurons can learn a special type of FSM called a finite memory machine (FMM) under certain constraints. These machines have a large number of states (simulations are for 256 and 512 state FMMs) but have minimal order, relatively small depth and little logic when the FMM is implemented as a sequential machine,
Context Learning with the SelfOrganizing Map
, 1997
"... In this paper a Recurrent SelfOrganizing Map (RSOM) algorithm is proposed for temporal sequence processing. The RSOM algorithm is close in nature to the Kohonen's SelfOrganizing Map, except that in the RSOM context of the temporal sequence is involved both in the best matching unit finding and in ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
In this paper a Recurrent SelfOrganizing Map (RSOM) algorithm is proposed for temporal sequence processing. The RSOM algorithm is close in nature to the Kohonen's SelfOrganizing Map, except that in the RSOM context of the temporal sequence is involved both in the best matching unit finding and in the adaptation of the weight vectors of the map via an introduced recursive difference equation associated for each unit of the map. The experimental results in the paper demonstrate that the RSOM is able to learn and distinguish temporal sequences, and that the RSOM algorithm can be utilized, for instance, in electroencephalogram (EEG) based epileptic activity detection.
Time Series Prediction Using Recurrent SOM with Local Linear Models
 INTERNATIONAL JOURNAL OF KNOWLEDGEBASED INTELLIGENT ENGINEERING SYSTEMS
, 1997
"... A newly proposed Recurrent SelfOrganizing Map (RSOM) is studied in time series prediction. In this approach RSOM is used to cluster the data to local data sets and local linear models corresponding each of the map units are then estimated based on the local data sets. A traditional way of clusterin ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
A newly proposed Recurrent SelfOrganizing Map (RSOM) is studied in time series prediction. In this approach RSOM is used to cluster the data to local data sets and local linear models corresponding each of the map units are then estimated based on the local data sets. A traditional way of clustering the data is to use a windowing technique to split it to input vectors of certain length. In this procedure, the temporal context between the consecutive vectors is lost. In RSOM the map units keep track of the past input vectors with a recurrent dioeerence vector in each unit. The recurrent structure allows the map to store information concerning the change in the magnitude and direction of the input vector. RSOM can thus be used to cluster the temporal context in the time series. This allows a dioeerent local model to be selected based on the context and the current input vector of the model. The studied cases show promising results.
OnLine Learning Algorithms for Locally Recurrent Neural Networks
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1999
"... This paper focuses on online learning procedures for locally recurrent neural networks with emphasis on multilayer perceptron (MLP) with infinite impulse response (IIR) synapses and its variations which include generalized output and activation feedback multilayer networks (MLN's). We propose a new ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
This paper focuses on online learning procedures for locally recurrent neural networks with emphasis on multilayer perceptron (MLP) with infinite impulse response (IIR) synapses and its variations which include generalized output and activation feedback multilayer networks (MLN's). We propose a new gradientbased procedure called recursive backpropagation (RBP) whose online version, causal recursive backpropagation (CRBP), presents some advantages with respect to the other online training methods. The new CRBP algorithm includes as particular cases backpropagation (BP), temporal backpropagation (TBP), backpropagation for sequences (BPS), BackTsoi algorithm among others, thereby providing a unifying view on gradient calculation techniques for recurrent networks with local feedback. The only learning method that has been proposed for locally recurrent networks with no architectural restriction is the one by Back and Tsoi. The proposed algorithm has better stability and higher speed ...