Results 1  10
of
40
The empirical case for two systems of reasoning
 Psychological Bulletin
, 1996
"... Distinctions have been proposed between systems of reasoning for centuries. This article distills properties shared by many of these distinctions and characterizes the resulting systems in light of recent findings and theoretical developments. One system is associative because its computations refle ..."
Abstract

Cited by 321 (3 self)
 Add to MetaCart
Distinctions have been proposed between systems of reasoning for centuries. This article distills properties shared by many of these distinctions and characterizes the resulting systems in light of recent findings and theoretical developments. One system is associative because its computations reflect similarity structure and relations of temporal contiguity. The other is "rule based " because it operates on symbolic structures that have logical content and variables and because its computations have the properties that are normally assigned to rules. The systems serve complementary functions and can simultaneously generate different solutions to a reasoning problem. The rulebased system can suppress the associative system but not completely inhibit it. The article reviews evidence in favor of the distinction and its characterization. One of the oldest conundrums in psychology is whether people are best conceived as parallel processors of information who operate along diffuse associative links or as analysts who operate by deliberate and sequential manipulation of internal representations. Are inferences drawn through a network of learned associative pathways or through application of a kind of "psychologic"
Learning longterm dependencies in NARX recurrent neural networks
, 1996
"... It has recently been shown that gradientdescent learning algorithms for recurrent neural networks can perform poorly on tasks that involve longterm dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show tht the longterm de ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
It has recently been shown that gradientdescent learning algorithms for recurrent neural networks can perform poorly on tasks that involve longterm dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show tht the longterm dependencies problem is lessened for a class of architectures called NARX recurrent neural networks, which have powerful representational capabilities. We have previously reported that gradient descent learning can be more effective in NARX networks than in recurrent neural network architectures that have "hidden states" on problems including grammatical inference and nonlinear system identification. Typically, the network converges much faster and generalizes better than other networks. The results in this paper are consistent with this phenomenon. We present some experimental results which show that NARX networks can often retain information for two to three times as long as conventi...
The dynamic universality of sigmoidal neural networks
 Inf. Comput
, 1996
"... We investigate the computational power of recurrent neural networks that apply the sigmoid activation function _(x)=[2 (1+e &x)]&1. These networks are extensively used in automatic learning of nonlinear dynamical behavior. We show that in the noiseless model, there exists a universal architecture t ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
We investigate the computational power of recurrent neural networks that apply the sigmoid activation function _(x)=[2 (1+e &x)]&1. These networks are extensively used in automatic learning of nonlinear dynamical behavior. We show that in the noiseless model, there exists a universal architecture that can be used to compute any recursive (Turing) function. This is the first result of its kind for the sigmoid activation function; previous techniques only applied to linearized and truncated version of this function. The significance of our result, besides the proving technique itself, lies in the popularity of the sigmoidal function both in engineering applications of artificial neural networks and in biological modelling. Our techniques can be applied to a much more general class of ``sigmoidallike' ' activation functions, suggesting that Turing universality is a relatively common property of recurrent neural network models.] 1996 Academic Press, Inc. 1.
Analysis of Dynamical Recognizers
 NEURAL COMPUTATION
, 1996
"... Pollack (1991) demonstrated that secondorder recurrent neural networks can act as dynamical recognizers for formal languages when trained on positive and negative examples, and observed both phase transitions in learning and IFSlike fractal state sets. Followon work focused mainly on the extra ..."
Abstract

Cited by 33 (5 self)
 Add to MetaCart
Pollack (1991) demonstrated that secondorder recurrent neural networks can act as dynamical recognizers for formal languages when trained on positive and negative examples, and observed both phase transitions in learning and IFSlike fractal state sets. Followon work focused mainly on the extraction and minimization of a finite state automaton (FSA) from the trained network. However, such networks are capable of inducing languages which are not regular, and therefore not equivalenttoany FSA. Indeed, it may be simpler for a small network to fit its training data by inducing such a nonregular language. But when is the network's language not regular? In this paper, using a low dimensional network capable of learning all the Tomita data sets, we present an empirical method for testing whether the language induced by the network is regular or not. We also provide a detailed "machine analysis of trained networks for both regular and nonregular languages.
Computational Capabilities of Recurrent NARX Neural Networks
 IEEE Trans. on Systems, Man and Cybernetics
, 1997
"... Abstract—Recently, fully connected recurrent neural networks have been proven to be computationally rich—at least as powerful as Turing machines. This work focuses on another network which is popular in control applications and has been found to be very effective at learning a variety of problems. T ..."
Abstract

Cited by 31 (8 self)
 Add to MetaCart
Abstract—Recently, fully connected recurrent neural networks have been proven to be computationally rich—at least as powerful as Turing machines. This work focuses on another network which is popular in control applications and has been found to be very effective at learning a variety of problems. These networks are based upon Nonlinear AutoRegressive models with eXogenous Inputs (NARX models), and are therefore called NARX networks. As opposed to other recurrent networks, NARX networks have a limited feedback which comes only from the output neuron rather than from hidden states. They are formalized by y(t) =9(u(t0nu);111;u(t01); u(t);y(t0ny);111;y(t01)) where u(t) and y(t) represent input and output of the network at time t, nu and ny are the input and output order, and the function 9 is the mapping performed by a Multilayer Perceptron. We constructively prove that the NARX networks with a finite number of parameters are computationally as strong as fully connected recurrent networks and thus Turing machines. We conclude that in theory one can use the NARX models, rather than conventional recurrent networks without any computational loss even though their feedback is limited. Furthermore, these results raise the issue of what amount of feedback or recurrence is necessary for any network to be Turing equivalent and what restrictions on feedback limit computational power. I.
Recurrent Networks: State Machines Or Iterated Function Systems?
 Proceedings of the 1993 Connectionist Models Summer School
, 1994
"... this paper, clustering of hidden unit activations, or recurrent network state space, provides incomplete information regarding the IP state of the network. IP states determine future behavior as well as encapsulate input history. The network's state transformations can exhibit sensitivity to initial ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
this paper, clustering of hidden unit activations, or recurrent network state space, provides incomplete information regarding the IP state of the network. IP states determine future behavior as well as encapsulate input history. The network's state transformations can exhibit sensitivity to initial conditions and generate disparate futures for state clusters of all sizes. The second part of the paper presents IFS theory and shows how it can explain recurrent network state dynamics. By linking IFSs and recurrent networks, existing constraints on network dynamics independent of network models are now evident. By assuming a finite set of inputs, which is often the case in symbolic domains, one can describe recurrent network models as a finite collection of nonlinear state transformations.The interaction of several transforms produces complex state spaces with recursive structure. The limit behavior of the collection of transformations, and recurrent networks in symbolic applications, is more complex than the union of the individual transformations. An input driven recurrent network behaves like the random iteration algorithm. Infinite input sequence generates sequences of points dense in the state space attractor when the transformations are contractive. While the demonstration in this paper used the SCN, other models produce similar IFSlike behaviors as long as the network's input selects transformations [19]. The IFS approach also explains the phenomena of state clustering in recurrent networks. In [20], ServenSchreiber et al report significant clustering in simple recurrent networks [21] both before and after training from the Reber grammar prediction task. A set of random transformations will normally reduce the volume of the recurrent networks state space, and plac...
Recurrent Neural Networks With Small Weights Implement Definite Memory Machines
 NEURAL COMPUTATION
, 2003
"... Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition funct ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition function of recurrent network with small weights and `squashing ' activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite mem
Analog Computation with Dynamical Systems
 Physica D
, 1997
"... This paper presents a theory that enables to interpret natural processes as special purpose analog computers. Since physical systems are naturally described in continuous time, a definition of computational complexity for continuous time systems is required. In analogy with the classical discrete th ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
This paper presents a theory that enables to interpret natural processes as special purpose analog computers. Since physical systems are naturally described in continuous time, a definition of computational complexity for continuous time systems is required. In analogy with the classical discrete theory we develop fundamentals of computational complexity for dynamical systems, discrete or continuous in time, on the basis of an intrinsic time scale of the system. Dissipative dynamical systems are classified into the computational complexity classes P d , CoRP d , NP d
Learning a Class of Large Finite State Machines with a Recurrent Neural Network
, 1995
"... One of the issues in any learning model is how it scales with problem size. The problem of learning finite state machine (FSMs) from examples with recurrent neural networks has been extensively explored. However, these results are somewhat disappointing in the sense that the machines that can be le ..."
Abstract

Cited by 20 (11 self)
 Add to MetaCart
One of the issues in any learning model is how it scales with problem size. The problem of learning finite state machine (FSMs) from examples with recurrent neural networks has been extensively explored. However, these results are somewhat disappointing in the sense that the machines that can be learned are too small to be competitive with existing grammatical inference algorithms. We show that a type of recurrent neural network (Narendra & Parthasarathy, 1990, IEEE Trans. Neural Networks, 1, 427) which has feedback but no hidden state neurons can learn a special type of FSM called a finite memory machine (FMM) under certain constraints. These machines have a large number of states (simulations are for 256 and 512 state FMMs) but have minimal order, relatively small depth and little logic when the FMM is implemented as a sequential machine,
Computability with Polynomial Differential Equations
, 2007
"... In this paper, we show that there are Initial Value Problems defined with polynomial ordinary differential equations that can simulate universal Turing machines in the presence of bounded noise. The polynomial ODE defining the IVP is explicitly obtained and the simulation is performed in real time. ..."
Abstract

Cited by 20 (13 self)
 Add to MetaCart
In this paper, we show that there are Initial Value Problems defined with polynomial ordinary differential equations that can simulate universal Turing machines in the presence of bounded noise. The polynomial ODE defining the IVP is explicitly obtained and the simulation is performed in real time.