Results 1  10
of
85
Constructing Deterministic FiniteState Automata in Recurrent Neural Networks
 Journal of the ACM
, 1996
"... Recurrent neural networks that are trained to behave like deterministic finitestate automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use o ..."
Abstract

Cited by 70 (16 self)
 Add to MetaCart
Recurrent neural networks that are trained to behave like deterministic finitestate automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use of a sigmoidal discriminant function together with the recurrent structure contribute to this instability. We prove that a simple algorithm can construct secondorder recurrent neural networks with a sparse interconnection topology and sigmoidal discriminant function such that the internal DFA state representations are stable, i.e. the constructed network correctly classifies strings of arbitrary length. The algorithm is based on encoding strengths of weights directly into the neural network. We derive a relationship between the weight strength and the number of DFA states for robust string classification. For a DFA with n states and m input alphabet symbols, the constructive algorithm genera...
Dynamical Recognizers: Realtime Language Recognition by Analog Computers
 Theoretical Computer Science
, 1996
"... We consider a model of analog computation which can recognize various languages in real time. We encode an input word as a point in R d by composing iterated maps, and then apply inequalities to the resulting point to test for membership in the language. Each class of maps and inequalities, suc ..."
Abstract

Cited by 57 (4 self)
 Add to MetaCart
We consider a model of analog computation which can recognize various languages in real time. We encode an input word as a point in R d by composing iterated maps, and then apply inequalities to the resulting point to test for membership in the language. Each class of maps and inequalities, such as quadratic functions with rational coefficients, is capable of recognizing a particular class of languages; for instance, linear and quadratic maps can have both stacklike and queuelike memories. We use methods equivalent to the VapnikChervonenkis dimension to separate some of our classes from each other, e.g. linear maps are less powerful than quadratic or piecewiselinear ones, polynomials are less powerful than elementary (trigonometric and exponential) maps, and deterministic polynomials of each degree are less powerful than their nondeterministic counterparts. Comparing these dynamical classes with various discrete language classes helps illuminate how iterated maps can...
LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages
 IEEE Transactions on Neural Networks
, 2001
"... Previous work on learning regular languages from exemplary training sequences showed that Long Short Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural networks ..."
Abstract

Cited by 57 (21 self)
 Add to MetaCart
Previous work on learning regular languages from exemplary training sequences showed that Long Short Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural networks (RNNs), and show that it works even better than previous hardwired or highly specialized architectures.
Approximating the Semantics of Logic Programs by Recurrent Neural Networks
"... In [18] we have shown how to construct a 3layered recurrent neural network that computes the fixed point of the meaning function TP of a given propositional logic program P, which corresponds to the computation of the semantics of P. In this article we consider the first order case. We define a no ..."
Abstract

Cited by 54 (9 self)
 Add to MetaCart
In [18] we have shown how to construct a 3layered recurrent neural network that computes the fixed point of the meaning function TP of a given propositional logic program P, which corresponds to the computation of the semantics of P. In this article we consider the first order case. We define a notion of approximation for interpretations and prove that there exists a 3layered feed forward neural network that approximates the calculation of TP for a given first order acyclic logic program P with an injective level mapping arbitrarily well. Extending the feed forward network by recurrent connections we obtain a recurrent neural network whose iteration approximates the fixed point of TP. This result is proven by taking advantage of the fact that for acyclic logic programs the function TP is a contraction mapping on a complete metric space defined by the interpretations of the program. Mapping this space to the metric space IR with Euclidean distance, a real valued function fP can be defined which corresponds to TP and is continuous as well as a contraction. Consequently it can be approximated by an appropriately chosen class of feed forward neural networks.
Natural language grammatical inference with recurrent neural networks
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1998
"... This paper examines the inductive inference of a complex grammar with neural networks  specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the P ..."
Abstract

Cited by 45 (1 self)
 Add to MetaCart
This paper examines the inductive inference of a complex grammar with neural networks  specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or GovernmentandBinding theory. Neural networks are trained, without the division into learned vs. innate components assumed by Chomsky, in an attempt to produce the same judgments as native speakers on sharply grammatical/ungrammatical data. How a recurrent neural network could possess linguistic capability and the properties of various common recurrent neural network architectures are discussed. The problem exhibits training behavior which is often not present with smaller grammars and training was initially difficult. However, after implementing several techniques aimed at improving the convergence of the gradient descent backpropagationthroughtime training algorithm, significant learning was possible. It was found that certain architectures are better able to learn an appropriate grammar. The operation of the networks and their training is analyzed. Finally, the extraction of rules in the form of deterministic finite state automata is investigated.
Analysis of Dynamical Recognizers
 NEURAL COMPUTATION
, 1996
"... Pollack (1991) demonstrated that secondorder recurrent neural networks can act as dynamical recognizers for formal languages when trained on positive and negative examples, and observed both phase transitions in learning and IFSlike fractal state sets. Followon work focused mainly on the extra ..."
Abstract

Cited by 33 (5 self)
 Add to MetaCart
Pollack (1991) demonstrated that secondorder recurrent neural networks can act as dynamical recognizers for formal languages when trained on positive and negative examples, and observed both phase transitions in learning and IFSlike fractal state sets. Followon work focused mainly on the extraction and minimization of a finite state automaton (FSA) from the trained network. However, such networks are capable of inducing languages which are not regular, and therefore not equivalenttoany FSA. Indeed, it may be simpler for a small network to fit its training data by inducing such a nonregular language. But when is the network's language not regular? In this paper, using a low dimensional network capable of learning all the Tomita data sets, we present an empirical method for testing whether the language induced by the network is regular or not. We also provide a detailed "machine analysis of trained networks for both regular and nonregular languages.
A Survey of ContinuousTime Computation Theory
 Advances in Algorithms, Languages, and Complexity
, 1997
"... Motivated partly by the resurgence of neural computation research, and partly by advances in device technology, there has been a recent increase of interest in analog, continuoustime computation. However, while specialcase algorithms and devices are being developed, relatively little work exists o ..."
Abstract

Cited by 29 (6 self)
 Add to MetaCart
Motivated partly by the resurgence of neural computation research, and partly by advances in device technology, there has been a recent increase of interest in analog, continuoustime computation. However, while specialcase algorithms and devices are being developed, relatively little work exists on the general theory of continuoustime models of computation. In this paper, we survey the existing models and results in this area, and point to some of the open research questions. 1 Introduction After a long period of oblivion, interest in analog computation is again on the rise. The immediate cause for this new wave of activity is surely the success of the neural networks "revolution", which has provided hardware designers with several new numerically based, computationally interesting models that are structurally sufficiently simple to be implemented directly in silicon. (For designs and actual implementations of neural models in VLSI, see e.g. [30, 45]). However, the more fundamental...
Architectural Bias in Recurrent Neural Networks  Fractal Analysis
 IEEE Transactions on Neural Networks
, 1931
"... We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoidtype activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer ..."
Abstract

Cited by 28 (7 self)
 Add to MetaCart
We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoidtype activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002; Tino, Cernansky & Benuskova, 2002; Tino, Cernansky & Benuskova, 2002a). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this paper we further extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finitestate transition diagram  a scenario that has been frequently considered in the past e.g. when studying RNNbased learning and implementation of regular grammars and finitestate transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as boxcounting and Hausdor# dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.
ContextFree and ContextSensitive Dynamics in Recurrent Neural Networks
, 2000
"... Continuousvalued recurrent neural networks can learn mechanisms for processing contextfree languages. The dynamics of such networks is usually based on damped oscillation around fixed points in state space and requires that the dynamical components are arranged in certain ways. It is shown tha ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
Continuousvalued recurrent neural networks can learn mechanisms for processing contextfree languages. The dynamics of such networks is usually based on damped oscillation around fixed points in state space and requires that the dynamical components are arranged in certain ways. It is shown that qualitatively similar dynamics with similar constraints hold for a n b n c n , a contextsensitive language. The additional difficulty with a n b n c n , compared with the contextfree language a n b n , consists of "counting up" and "counting down" letters simultaneously. The network solution is to oscillate in two principal dimensions, one for counting up and one for counting down. This study focuses on the dynamics employed by the Sequential Cascaded Network, in contrast with the Simple Recurrent Network, and the use of Backpropagation Through Time. Found solutions generalize well beyond training data, however, learning is not reliable. The contribution of this ...