Results 1  10
of
166
Learning the structure of event sequences
 JOURNAL OF EXPERIMENTAL PSYCHOLOGY: GENERAL
, 1991
"... How is complex sequential material acquired, processed, and represented when there is no intention to learn? Two experiments exploring a choice reaction time task are reported. Unknown to Ss, successive stimuli followed a sequence derived from a "noisy " finitestate grammar. After ..."
Abstract

Cited by 218 (28 self)
 Add to MetaCart
How is complex sequential material acquired, processed, and represented when there is no intention to learn? Two experiments exploring a choice reaction time task are reported. Unknown to Ss, successive stimuli followed a sequence derived from a &quot;noisy &quot; finitestate grammar. After considerable practice (60,000 exposures) with Experiment 1, Ss acquired a complex body of procedural knowledge about the sequential structure of the material. Experiment 2 was an attempt to identify limits on Ss ability to encode the temporal context by using more distant contingencies that spanned irrelevant material. Taken together, the results indicate that Ss become increasingly sensitive to the temporal context set by previous elements of the sequence, up to 3 elements. Responses are also affected by priming effects from recent trials. A connectionist model that incorporates sensitivity to the sequential structure and to priming effects is shown to capture key aspects of both acquisition and processing and to account for the interaction between attention and sequence structure reported by Cohen, Ivry, and Keele (1990).
Distributional Information: A Powerful Cue for Acquiring Syntactic Categories
 COGNITIVE SCIENCE
, 1998
"... Many theorists have dismissed a priori the idea that distributional information could play a significant role in syntactic category acquisition. We demonstrate empirically that such information provides a powerful cue to syntactic category membership, which can be exploited by a variety of simple, p ..."
Abstract

Cited by 201 (9 self)
 Add to MetaCart
Many theorists have dismissed a priori the idea that distributional information could play a significant role in syntactic category acquisition. We demonstrate empirically that such information provides a powerful cue to syntactic category membership, which can be exploited by a variety of simple, psychologically plausible mechanisms. We present a range of results using a large corpus of childdirected speech and explore their psychological implications. While our results show that a considerable amount of information concerning the syntactic categories can be obtained from distributional information alone, we stress that many other sources of information may also be potential contributors to the identification of syntactic classes.
Gradient calculation for dynamic recurrent neural networks: a survey
 IEEE Transactions on Neural Networks
, 1995
"... Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backp ..."
Abstract

Cited by 181 (3 self)
 Add to MetaCart
(Show Context)
Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
On The Computational Power Of Neural Nets
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1995
"... This paper deals with finite size networks which consist of interconnections of synchronously evolving processors. Each processor updates its state by applying a "sigmoidal" function to a linear combination of the previous states of all units. We prove that one may simulate all Turing Mach ..."
Abstract

Cited by 179 (23 self)
 Add to MetaCart
(Show Context)
This paper deals with finite size networks which consist of interconnections of synchronously evolving processors. Each processor updates its state by applying a "sigmoidal" function to a linear combination of the previous states of all units. We prove that one may simulate all Turing Machines by such nets. In particular, one can simulate any multistack Turing Machine in real time, and there is a net made up of 886 processors which computes a universal partialrecursive function. Products (high order nets) are not required, contrary to what had been stated in the literature. Nondeterministic Turing Machines can be simulated by nondeterministic rational nets, also in real time. The simulation result has many consequences regarding the decidability, or more generally the complexity, of questions about recursive nets.
Toward a connectionist model of recursion in human linguistic performance
 Cognitive Science
, 1999
"... Naturally occurring speech contains only a limited amount of complex recursive structure, and this is reflected in the empirically documented difficulties that people experience when processing such structures. We present a connectionist model of human performance in processing recursive language s ..."
Abstract

Cited by 170 (21 self)
 Add to MetaCart
Naturally occurring speech contains only a limited amount of complex recursive structure, and this is reflected in the empirically documented difficulties that people experience when processing such structures. We present a connectionist model of human performance in processing recursive language structures. The model is trained on simple artificial languages. We find that the qualitative performance profile of the model matches human behavior, both on the relative difficulty of centerembedding and crossdependency, and between the processing of these complex recursive structures and rightbranching recursive constructions. We analyze how these differences in performance are reflected in the internal representations of the model by performing discriminant analyses on these representations both before and after training. Furthermore, we show how a network trained to process recursive structures can also generate such structures in a probabilistic fashion. This work suggests a novel explanation of people's limited recursive performance, without assuming the existence of a mentally represented competence grammar allowing unbounded recursion.
Graded state machines: The representation of temporal contingencies in simple recurrent networks
 Machine Learning
, 1991
"... Abstract. We explore a network architecture introduced by Elman (1990) for predicting successive elements of a sequence. The network uses the pattern of activation over a set of hidden units from timestep t1, together with element t, to predict element t + 1. When the network is trained with strin ..."
Abstract

Cited by 104 (13 self)
 Add to MetaCart
Abstract. We explore a network architecture introduced by Elman (1990) for predicting successive elements of a sequence. The network uses the pattern of activation over a set of hidden units from timestep t1, together with element t, to predict element t + 1. When the network is trained with strings from a particular finitestate grammar, it can learn to be a perfect finitestate recognizer for the grammar. When the net has a minimal number of hidden units, patterns on the hidden units come to correspond to the nodes of the grammar; however, this correspondence is not necessary for the network to act as a perfect finitestate recognizer. Next, we provide a detailed analysis of how the network acquires its internal representations. We show that the network progressively encodes more and more temporal context by means of a probability analysis. Finally, we explore the conditions under which the network can carry information about distant sequential contingencies across intervening elements to distant elements. Such information is maintained with relative ease if it is relevant at each intermediate step; it tends to be lost when intervening elements do not depend on it. At first glance this may suggest that such networks are not relevant to natural language, in which dependencies may span indefinite distances. However, embeddings in natural language are not completely independent of earlier information. The final simulation shows that long distance sequential contingencies can be encoded by the network even if only subtle statistical properties of embedded strings depend on the early information. The network encodes longdistance dependencies by shading
Language Acquisition in the Absence of Explicit Negative Evidence: How Important is Starting Small?
 COGNITION
, 1999
"... It is commonly assumed that innate linguistic constraints are necessary to learn a natural language, based on the apparent lack of explicit negative evidence provided to children and on Gold's proof that, under assumptions of virtually arbitrary positive presentation, most interesting classe ..."
Abstract

Cited by 95 (6 self)
 Add to MetaCart
It is commonly assumed that innate linguistic constraints are necessary to learn a natural language, based on the apparent lack of explicit negative evidence provided to children and on Gold's proof that, under assumptions of virtually arbitrary positive presentation, most interesting classes of languages are not learnable. However, Gold's results do not apply under the rather common assumption that language presentation may be modeled as a stochastic process. Indeed, Elman (Elman, J.L., 1993. Learning and development in neural networks: the importance of starting small. Cognition 48, 7199) demonstrated that a simple recurrent connectionist network could learn an artificial grammar with some of the complexities of English, including embedded clauses, based on performing a word prediction task within a stochastic environment. However, the network was successful only when either embedded sentences were initially withheld and only later introduced gradually, or when the network itself was given initially limited memory which only gradually improved. This finding has been taken as support for Newport's `less is more' proposal, that child language acquisition may be aided rather than hindered by limited cognitive resources. The current article reports on connectionist simulations which indicate, to the contrary, that starting with simplified inputs or limited memory is not necessary in training recurrent networks to learn pseudonatural languages; in fact, such restrictions hinder acquisition as the languages are made more Englishlike by the introduction of semantic as well as syntactic constraints. We suggest that, under a statistical model of the language environment, Gold's theorem and the possible lack of explicit negative evidence do not implicate i...
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract

Cited by 84 (3 self)
 Add to MetaCart
(Show Context)
To Mom, Dad, and Susan, for their support and encouragement.
Extraction of Rules from Discretetime Recurrent Neural Networks
, 1996
"... The extraction of symbolic knowledge from trained neural networks and the direct encoding of (partial) knowledge into networks prior to training are important issues. They allow the exchange of information between symbolic and connectionist knowledge representations. The focas of this paper is on t ..."
Abstract

Cited by 69 (15 self)
 Add to MetaCart
The extraction of symbolic knowledge from trained neural networks and the direct encoding of (partial) knowledge into networks prior to training are important issues. They allow the exchange of information between symbolic and connectionist knowledge representations. The focas of this paper is on the quality of the rules that are extracted from recurrent neural networks. Discretetime recurrent neural networks can be trained to correctly classify strings of a regular language. Rules defining the learned grammar can be extracted from networks in the form of deterministic finitestate automata (DFAs) by applying clustering algorithms in the output space of recurrent state neurons. Our algorithm can extract different finitestate automata that are consistent with a training set from the same network. We compare the generalization performances of these different models and the trained network and we introduce a heuristic that permits us to choose among the consistent DFAs the model which best approximates the learned regular grammar.