Results 1 - 10
of
72
Long Short-term Memory
, 1995
"... "Recurrent backprop" for learning to store information over extended time intervals takes too long. The main reason is insufficient, decaying error back flow. We briefly review Hochreiter's 1991 analysis of this problem. Then we overcome it by introducing a novel, efficient method called "Long Sho ..."
Abstract
-
Cited by 178 (51 self)
- Add to MetaCart
"Recurrent backprop" for learning to store information over extended time intervals takes too long. The main reason is insufficient, decaying error back flow. We briefly review Hochreiter's 1991 analysis of this problem. Then we overcome it by introducing a novel, efficient method called "Long Short Term Memory" (LSTM). LSTM can learn to bridge minimal time lags in excess of 1000 time steps by enforcing constant error flow through internal states of special units. Multiplicative gate units learn to open and close access to constant error flow. LSTM's update
Learning Stochastic Regular Grammars by Means of a State Merging Method
, 1994
"... We propose a new Mgorithm which allows for the identification of any stochastic deterministic regular language as well as the determination of the probabilities of the strings in the language. The algorithm builds the prefix tree acceptor from the sample set and merges systematically equivaJent stat ..."
Abstract
-
Cited by 114 (12 self)
- Add to MetaCart
We propose a new Mgorithm which allows for the identification of any stochastic deterministic regular language as well as the determination of the probabilities of the strings in the language. The algorithm builds the prefix tree acceptor from the sample set and merges systematically equivaJent states. Experimentally, it proves very fast a.ad the time needed grows only linearly with the size of the sample set.
Input/output hmms for sequence processing
- IEEE Transactions on Neural Networks
, 1996
"... We consider problems of sequence processing and propose a solution based on a discrete state model in order to represent past context. Weintroduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation ..."
Abstract
-
Cited by 82 (12 self)
- Add to MetaCart
We consider problems of sequence processing and propose a solution based on a discrete state model in order to represent past context. Weintroduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation we call Input/Output Hidden Markov Model (IOHMM). It can be trained by the EM or GEM algorithms, considering state trajectories as missing data, which decouples temporal credit assignment and actual parameter estimation. The model presents similarities to hidden Markov models (HMMs), but allows us to map input se-quences to output sequences, using the same processing style as recurrent neural networks. IOHMMs are trained using a more discriminant learning paradigm than HMMs, while potentially taking advantage of the EM algorithm. We demonstrate that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem. Experimental results are presented for the seven Tomita grammars, showing that these adaptive models can attain excellent generalization.
Constructing Deterministic Finite-State Automata in Recurrent Neural Networks
- Journal of the ACM
, 1996
"... Recurrent neural networks that are trained to behave like deterministic finite-state automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use o ..."
Abstract
-
Cited by 66 (15 self)
- Add to MetaCart
Recurrent neural networks that are trained to behave like deterministic finite-state automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use of a sigmoidal discriminant function together with the recurrent structure contribute to this instability. We prove that a simple algorithm can construct second-order recurrent neural networks with a sparse interconnection topology and sigmoidal discriminant function such that the internal DFA state representations are stable, i.e. the constructed network correctly classifies strings of arbitrary length. The algorithm is based on encoding strengths of weights directly into the neural network. We derive a relationship between the weight strength and the number of DFA states for robust string classification. For a DFA with n states and m input alphabet symbols, the constructive algorithm genera...
Continual Learning In Reinforcement Environments
, 1994
"... Continual learning is the constant development of complex behaviors with no final end in mind. It is the process of learning ever more complicated skills by building on those skills already developed. In order for learning at one stage of development to serve as the foundation for later learning, a ..."
Abstract
-
Cited by 63 (7 self)
- Add to MetaCart
Continual learning is the constant development of complex behaviors with no final end in mind. It is the process of learning ever more complicated skills by building on those skills already developed. In order for learning at one stage of development to serve as the foundation for later learning, a continual-learning agent should learn hierarchically. CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development is proposed, described, tested, and evaluated in this dissertation. CHILD accumulates useful behaviors in reinforcement environments by using the Temporal Transition Hierarchies learning algorithm, also derived in the dissertation. This constructive algorithm generates a hierarchical, higher-order neural network that can be used for predicting context-dependent temporal sequences and can learn sequential-task benchmarks more than two orders of magnitude faster than competing neural-network systems. Consequently, CHILD can quickly solve complicated non...
Extraction of Rules from Discrete-time Recurrent Neural Networks
, 1996
"... The extraction of symbolic knowledge from trained neural networks and the direct encoding of (partial) knowledge into networks prior to training are important issues. They allow the exchange of information between symbolic and connectionist knowledge representations. The focas of this paper is on t ..."
Abstract
-
Cited by 57 (15 self)
- Add to MetaCart
The extraction of symbolic knowledge from trained neural networks and the direct encoding of (partial) knowledge into networks prior to training are important issues. They allow the exchange of information between symbolic and connectionist knowledge representations. The focas of this paper is on the quality of the rules that are extracted from recurrent neural networks. Discrete-time recurrent neural networks can be trained to correctly classify strings of a regular language. Rules defining the learned grammar can be extracted from networks in the form of deterministic finite-state automata (DFAs) by applying clustering algorithms in the output space of recurrent state neurons. Our algorithm can extract different finite-state automata that are consistent with a training set from the same network. We compare the generalization performances of these different models and the trained network and we introduce a heuristic that permits us to choose among the consistent DFAs the model which best approximates the learned regular grammar.
Noisy Time Series Prediction using a Recurrent Neural Network and Grammatical Inference
- Machine Learning
, 2001
"... Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method proposed uses conversion into a symbolic representation with a selforganizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for th...
Natural language grammatical inference with recurrent neural networks
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1998
"... This paper examines the inductive inference of a complex grammar with neural networks -- specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the P ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
This paper examines the inductive inference of a complex grammar with neural networks -- specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government-and-Binding theory. Neural networks are trained, without the division into learned vs. innate components assumed by Chomsky, in an attempt to produce the same judgments as native speakers on sharply grammatical/ungrammatical data. How a recurrent neural network could possess linguistic capability and the properties of various common recurrent neural network architectures are discussed. The problem exhibits training behavior which is often not present with smaller grammars and training was initially difficult. However, after implementing several techniques aimed at improving the convergence of the gradient descent backpropagation-through-time training algorithm, significant learning was possible. It was found that certain architectures are better able to learn an appropriate grammar. The operation of the networks and their training is analyzed. Finally, the extraction of rules in the form of deterministic finite state automata is investigated.
LEARNING DETERMINISTIC REGULAR GRAMMARS FROM STOCHASTIC SAMPLES IN POLYNOMIAL TIME
, 1999
"... In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly wi ..."
Abstract
-
Cited by 38 (12 self)
- Add to MetaCart
In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly with the size of the sample set and a measure of the complexity of the task is provided. Experimentally, our implementation proves very fast for application purposes.
Extracting and Learning an Unknown Grammar with Recurrent Neural Networks
, 1992
"... Simple second-order recurrent networks are shown to readily learn small known regular grammars when trained with positive and negative strings examples. We show that similar methods are appropriate for learning unknown grammars from examples of their strings. The training algorithm is an incrementa ..."
Abstract
-
Cited by 37 (12 self)
- Add to MetaCart
Simple second-order recurrent networks are shown to readily learn small known regular grammars when trained with positive and negative strings examples. We show that similar methods are appropriate for learning unknown grammars from examples of their strings. The training algorithm is an incremental real-time, recurrent learning (RTRL) method that computes the complete gradient and updates the weights at the end of each string. After or during training, a dynamic clustering algorithm extracts the production rules that the neural network has learned. The methods are illustrated by extracting rules from unknown deterministic regular grammars. For many cases the extracted grammar outperforms the neural net from which it was extracted in correctly classifying unseen strings. 1 INTRODUCTION For many reasons, there has been a long interest in "language" models of neural networks; see [Elman 1991] for an excellent discussion. The orientation of this work is somewhat different. The focus her...

