Results 1  10
of
29
LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages
 IEEE Transactions on Neural Networks
, 2001
"... Previous work on learning regular languages from exemplary training sequences showed that Long Short Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural networks ..."
Abstract

Cited by 57 (21 self)
 Add to MetaCart
Previous work on learning regular languages from exemplary training sequences showed that Long Short Term Memory (LSTM) outperforms traditional recurrent neural networks (RNNs). Here we demonstrate LSTM's superior performance on context free language (CFL) benchmarks for recurrent neural networks (RNNs), and show that it works even better than previous hardwired or highly specialized architectures.
Architectural Bias in Recurrent Neural Networks  Fractal Analysis
 IEEE Transactions on Neural Networks
, 1931
"... We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoidtype activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer ..."
Abstract

Cited by 28 (7 self)
 Add to MetaCart
We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoidtype activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002; Tino, Cernansky & Benuskova, 2002; Tino, Cernansky & Benuskova, 2002a). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this paper we further extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finitestate transition diagram  a scenario that has been frequently considered in the past e.g. when studying RNNbased learning and implementation of regular grammars and finitestate transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as boxcounting and Hausdor# dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.
ContextFree and ContextSensitive Dynamics in Recurrent Neural Networks
, 2000
"... Continuousvalued recurrent neural networks can learn mechanisms for processing contextfree languages. The dynamics of such networks is usually based on damped oscillation around fixed points in state space and requires that the dynamical components are arranged in certain ways. It is shown tha ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
Continuousvalued recurrent neural networks can learn mechanisms for processing contextfree languages. The dynamics of such networks is usually based on damped oscillation around fixed points in state space and requires that the dynamical components are arranged in certain ways. It is shown that qualitatively similar dynamics with similar constraints hold for a n b n c n , a contextsensitive language. The additional difficulty with a n b n c n , compared with the contextfree language a n b n , consists of "counting up" and "counting down" letters simultaneously. The network solution is to oscillate in two principal dimensions, one for counting up and one for counting down. This study focuses on the dynamics employed by the Sequential Cascaded Network, in contrast with the Simple Recurrent Network, and the use of Backpropagation Through Time. Found solutions generalize well beyond training data, however, learning is not reliable. The contribution of this ...
Rule Extraction from Recurrent Neural Networks: a Taxonomy and Review
 Neural Computation
, 2005
"... this paper, the progress of this development is reviewed and analysed in detail. In order to structure the survey and to evaluate the techniques, a taxonomy, specifically designed for this purpose, has been developed. Moreover, important open research issues are identified, that, if addressed pr ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
this paper, the progress of this development is reviewed and analysed in detail. In order to structure the survey and to evaluate the techniques, a taxonomy, specifically designed for this purpose, has been developed. Moreover, important open research issues are identified, that, if addressed properly, possibly can give the field a significant push forward
Inductive Bias in ContextFree Language Learning
 In Proceedings of the Ninth Australian Conference on Neural Networks
, 1998
"... Recurrent neural networks are capable of learning contextfree tasks, however learning performance is unsatisfactory.Weinvestigate the e#ect of biasing learning towards #nding a solution to a contextfree prediction task. The #rst series of simulations #xes various sets of weights of the network ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
Recurrent neural networks are capable of learning contextfree tasks, however learning performance is unsatisfactory.Weinvestigate the e#ect of biasing learning towards #nding a solution to a contextfree prediction task. The #rst series of simulations #xes various sets of weights of the network to values found in a successful network, limiting the search space of the backpropagation through time learning algorithm. We #nd that #xing similar sets of weights can havevery di#erent e#ects on learning performance. The second series of simulations employs an evolutionary hillclimbing algorithm with an error measure that more closely resembles the performance measure. We #nd that under these conditions, the network #nds di#erent solutions to those found by backpropagation, and is even biased towards #nding these solutions. An unexpected result is that the hillclimbing algorithm is capable of generalisation. The two simulations serve to highlight that seemingly similar biases can...
Stable Encoding of FiniteState Machines in DiscreteTime Recurrent Neural Nets with Sigmoid Units
, 1998
"... In recent years, there has been a lot of interest in the use of discretetime recurrent neural nets (DTRNN) to learn finitestate tasks, with interesting results regarding the induction of simple finitestate machines from inputoutput strings. Parallel work has studied the computational power of DT ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
In recent years, there has been a lot of interest in the use of discretetime recurrent neural nets (DTRNN) to learn finitestate tasks, with interesting results regarding the induction of simple finitestate machines from inputoutput strings. Parallel work has studied the computational power of DTRNN in connection with finitestate computation. This paper describes a simple strategy to devise stable encodings of finitestate machines in computationally capable discretetime recurrent neural architectures with sigmoid units, and gives a detailed presentation on how this strategy may be applied to encode a general class of finitestate machines in a variety of commonlyused first and secondorder recurrent neural networks. Unlike previous work that either imposed some restrictions to state values, or used a detailed analysis based on fixedpoint attractors, the present approach applies to any positive, bounded, strictly growing, continuous activation function, and uses simple bounding criteri...
Joint attention and dynamics repertoire in coupled dynamical recognizers
 In AISB 03: the Second International Symposium on Imitation in Animals and Artifacts
, 2003
"... A coupled dynamical recognizer is proposed as a model for simulating turntaking behavior. An agent is modeled as a mobile robot with two wheels. A recurrent neural network is used to produce the motor outputs. By controlling this, agents compete to take turns on a two dimensional arena. By using th ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
A coupled dynamical recognizer is proposed as a model for simulating turntaking behavior. An agent is modeled as a mobile robot with two wheels. A recurrent neural network is used to produce the motor outputs. By controlling this, agents compete to take turns on a two dimensional arena. By using the genetic algorithm technique, we show that turntaking behavior is developed between two agents. It is worth noting that turntaking is established with a variety of dynamics. A coupling between agents is sensitive to the dynamics and we discuss the sensitivity by referring to Trevarthen’s double monitor experiments. 1 Intersubjectivity and Joint Attention Here in this paper, we propose a simulation study of joint attention via coupled dynamical recognizers. There are many ways to understand psychological phenomena not directly by studying human behavior but by computer simulations and robot experiments (e.g. B.Scassellati (1999), K. Dautenhahn (1999)). To bridge between simulation studies and psychology, we think it worth discussing
Representation Beyond Finite States: Alternatives to PushDown Automata
 IN: KOLEN AND KREMER
, 2001
"... It has been well established that Dynamical Recurrent Networks (DRNs) can act as deterministic finitestate automata (DFAs  see Chapters 6 and 7). A DRN can reliably represent the states of a DFA as regions in its state space, and the DFA transitions as transitions between these regions. Howeve ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
It has been well established that Dynamical Recurrent Networks (DRNs) can act as deterministic finitestate automata (DFAs  see Chapters 6 and 7). A DRN can reliably represent the states of a DFA as regions in its state space, and the DFA transitions as transitions between these regions. However, as we shall see in this chapter, DRNs can learn to process languages which are nonregular (and therefore cannot be processed by any DFA). Moreover, DRNs are capable of generalizing in ways which go beyond the DFA framework. We will show how DRNs can learn to predict contextfree and contextsensitive languages, making use of the transient dynamics as the network activations move towards an attractor or away from a repeller. The resulting trajectory can be thought of as analogous to winding up a spring in one dimension and unwinding it in another. In contrast to pushdown automata, which rely on unbounded external memory, DRNs must instead rely on arbi
Dynamical Automata
, 1998
"... The recent work on automata whose variables and parameters are real numbers (e.g., Blum, Shub, and Smale, 1989; Koiran, 1993; Bournez and Cosnard, 1996; Siegelmann, 1996; Moore, 1996) has focused largely on questions about computational complexity and tractability. It is also revealing to examine th ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
The recent work on automata whose variables and parameters are real numbers (e.g., Blum, Shub, and Smale, 1989; Koiran, 1993; Bournez and Cosnard, 1996; Siegelmann, 1996; Moore, 1996) has focused largely on questions about computational complexity and tractability. It is also revealing to examine the metric relations that such systems induce on automata via the natural metrics on their parameter spaces. This brings the theory of computational classification closer to theories of learning and statistical modeling which depend on measuring distances between models. With this in mind, I develop a generalized method of identifying pushdown automata in one class of realvalued automata. I show how the realvalued automata can be implemented in neural networks. I then explore the metric organization of these automata in a basic example, showing how it fleshes out the skeletal structure of the Chomsky Hierarchy and indicates new approaches to problems in language learning and language typolog...