Results 1 - 10
of
85
Learning and development in neural networks: The importance of starting small
- Cognition
, 1993
"... It is a striking fact that in humans the greatest learnmg occurs precisely at that point in time- childhood- when the most dramatic maturational changes also occur. This report describes possible synergistic interactions between maturational change and the ability to learn a complex domain (language ..."
Abstract
-
Cited by 290 (12 self)
- Add to MetaCart
It is a striking fact that in humans the greatest learnmg occurs precisely at that point in time- childhood- when the most dramatic maturational changes also occur. This report describes possible synergistic interactions between maturational change and the ability to learn a complex domain (language), as investigated in con-nectionist networks. The networks are trained to process complex sentences involving relative clauses, number agreement, and several types of verb argument structure. Training fails in the case of networks which are fully formed and ‘adultlike ’ in their capacity. Training succeeds only when networks begin with limited working memory and gradually ‘mature ’ to the adult state. This result suggests that rather than being a limitation, developmental restrictions on resources may constitute a necessary prerequisite for mastering certain complex domains. Specifically, successful learning may depend on starting small.
Linguistic Complexity: Locality of Syntactic Dependencies
- COGNITION
, 1998
"... This paper proposes a new theory of the relationship between the sentence processing mechanism and the available computational resources. This theory -- the Syntactic Prediction Locality Theory (SPLT) -- has two components: an integration cost component and a component for the memory cost associa ..."
Abstract
-
Cited by 163 (10 self)
- Add to MetaCart
This paper proposes a new theory of the relationship between the sentence processing mechanism and the available computational resources. This theory -- the Syntactic Prediction Locality Theory (SPLT) -- has two components: an integration cost component and a component for the memory cost associated with keeping track of obligatory syntactic requirements. Memory cost is
Learning at a distance I. Statistical learning of non-adjacent dependencies
- COGNITIVE PSYCHOLOGY
, 2004
"... ..."
Language as a Dynamical System
- In
, 1995
"... Introduction Despite considerable diversity among theories about how humans process language, there are a number of fundamental assumptions which are shared by most such theories. This consensus extends to the very basic question about what counts as a cognitive process. So although many cognitive s ..."
Abstract
-
Cited by 61 (2 self)
- Add to MetaCart
Introduction Despite considerable diversity among theories about how humans process language, there are a number of fundamental assumptions which are shared by most such theories. This consensus extends to the very basic question about what counts as a cognitive process. So although many cognitive scientists are fond of referring to the brain as a `mental organ' (e.g., Chomsky, 1975)---implying a similarity to other organs such as the liver or kidneys---it is also assumed that the brain is an organ with special properties which set it apart. Brains `carry out computation' (it is argued)
Language Acquisition in the Absence of Explicit Negative Evidence: How Important is Starting Small?
- COGNITION
, 1999
"... It is commonly assumed that innate linguistic constraints are necessary to learn a natural language, based on the apparent lack of explicit negative evidence provided to children and on Gold's proof that, under assumptions of virtually arbitrary positive presentation, most interesting classes of ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
It is commonly assumed that innate linguistic constraints are necessary to learn a natural language, based on the apparent lack of explicit negative evidence provided to children and on Gold's proof that, under assumptions of virtually arbitrary positive presentation, most interesting classes of languages are not learnable. However, Gold's results do not apply under the rather common assumption that language presentation may be modeled as a stochastic process. Indeed, Elman (Elman, J.L., 1993. Learning and development in neural networks: the importance of starting small. Cognition 48, 71--99) demonstrated that a simple recurrent connectionist network could learn an artificial grammar with some of the complexities of English, including embedded clauses, based on performing a word prediction task within a stochastic environment. However, the network was successful only when either embedded sentences were initially withheld and only later introduced gradually, or when the network itself was given initially limited memory which only gradually improved. This finding has been taken as support for Newport's `less is more' proposal, that child language acquisition may be aided rather than hindered by limited cognitive resources. The current article reports on connectionist simulations which indicate, to the contrary, that starting with simplified inputs or limited memory is not necessary in training recurrent networks to learn pseudonatural languages; in fact, such restrictions hinder acquisition as the languages are made more English-like by the introduction of semantic as well as syntactic constraints. We suggest that, under a statistical model of the language environment, Gold's theorem and the possible lack of explicit negative evidence do not implicate i...
Interference in Short-term Memory: The Magical Number Two (or Three) in Sentence Processing
, 1996
"... Many theories have been proposed to explain difficulty with center embedded constructions, most attributing the problem to some kind of limited capacity short-term memory. However, these theories have developed for the most part independently of more traditional memory research, which has focused on ..."
Abstract
-
Cited by 41 (7 self)
- Add to MetaCart
Many theories have been proposed to explain difficulty with center embedded constructions, most attributing the problem to some kind of limited capacity short-term memory. However, these theories have developed for the most part independently of more traditional memory research, which has focused on uncovering general principles such as chunking and interference. This article attempts to gain some unification with this research by suggesting that an interesting range of core sentence processing phenomena can be explained as interference effects in a sharply limited syntactic working memory. These include difficult and acceptable embeddings, as well as certain limitations on ambiguity resolution, length effects in garden path structures, and the requirement for locality in syntactic structure. The theory takes the form of an architecture for parsing which can index no more than two constituents under the same syntactic relation. A limitation of two or three items shows up in a variety o...
An Activation-Based Model of Sentence Processing as Skilled Memory Retrieval
, 2005
"... We present a detailed process theory of the moment-by-moment working-memory retrievals and associated control structure that subserve sentence comprehension. The theory is derived from the application of independently motivated principles of memory and cognitive skill to the specialized task of sent ..."
Abstract
-
Cited by 41 (6 self)
- Add to MetaCart
We present a detailed process theory of the moment-by-moment working-memory retrievals and associated control structure that subserve sentence comprehension. The theory is derived from the application of independently motivated principles of memory and cognitive skill to the specialized task of sentence parsing. The resulting theory construes sentence processing as a series of skilled associative memory retrievals modulated by similarity-based interference and fluctuating activation. The cognitive principles are formalized in computational form in the Adaptive Control of Thought–Rational (ACT–R) architecture, and our process model is realized in ACT–R. We present the results of 6 sets of simulations: 5 simulation sets provide quantitative accounts of the effects of length and structural interference on both unambiguous and garden-path structures. A final simulation set provides a graded taxonomy of double center embeddings ranging from relatively easy to extremely difficult. The explanation of center-embedding difficulty is a novel one that derives from the model’s complete reliance on discriminating retrieval cues in the absence of an explicit representation of serial order information. All fits were obtained with only 1 free scaling parameter fixed across the simulations; all other parameters were ACT–R defaults. The modeling results support the hypothesis that fluctuating activation and similarity-based interference are the key factors shaping working memory in sentence processing. We contrast the theory and empirical predictions with several related accounts of sentence-processing complexity.
Rethinking Eliminative Connectionism
, 1998
"... Humans routinely generalize universal relationships to unfamiliar instances. If we are told ‘‘if glork then frum,’ ’ and ‘‘glork,’ ’ we can infer ‘‘frum’’; any name that serves as the subject of a sentence can appear as the object of a sentence. These universals are pervasive in language and reasoni ..."
Abstract
-
Cited by 40 (3 self)
- Add to MetaCart
Humans routinely generalize universal relationships to unfamiliar instances. If we are told ‘‘if glork then frum,’ ’ and ‘‘glork,’ ’ we can infer ‘‘frum’’; any name that serves as the subject of a sentence can appear as the object of a sentence. These universals are pervasive in language and reasoning. One account of how they are generalized holds that humans possess mechanisms that manipulate symbols and variables; an alternative account holds that symbol-manipulation can be eliminated from scientific theories in favor of descriptions couched in terms of networks of interconnected nodes. Can these ‘‘eliminative’ ’ connectionist models offer a genuine alternative? This article shows that eliminative connectionist models cannot account for how we extend universals to arbitrary items. The argument runs as follows. First, if these models, as currently conceived, were to extend universals to arbitrary instances, they would have to generalize outside the space of training examples. Next, it is shown that the class of eliminative connectionist models that is currently popular cannot learn to extend universals outside the training space. This limitation might be avoided through the use of an architecture that implements symbol manipulation.
The finite connectivity of linguistic structure
- In
, 1994
"... While there is no interesting limitation on the degree of right-embedding in acceptable sentences, center-embedding is quite severely restricted. Similarly, while there is no interesting bound on the number of nouns that can occur in acceptable noun compounds, there is a very low bound on the number ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
While there is no interesting limitation on the degree of right-embedding in acceptable sentences, center-embedding is quite severely restricted. Similarly, while there is no interesting bound on the number of nouns that can occur in acceptable noun compounds, there is a very low bound on the number of causative morphemes that can occur in the verb compounds of agglutinative languages. Turning to the clause-final verb clusters of West Germanic languages, we find another similar bound. A cluster including verbs from one embedded clause may beacceptable, but clusters formed from the verbs of two or three or even more deeply embedded clauses are much more awkward (regardless of whether the subject-verb dependencies are crossing or nested). And in languages that allow multiple wh-extractions from a single clause, extractions of more than one element with a given case quickly become unacceptable. More careful experimental study of the nature of these limitations is needed, in a range of languages, but here a preliminary attempt is made to subsume them all under a single generalization, a version of the familar idea that the human parsing
Inheritance and Complementation: A Case Study of Easy Adjectives and Related Nouns
, 1992
"... this paper is to motivate the use of inheritance in lexical specification. To do this, we take a narrowly circumscribed phenomenon in English grammar--that of vp-complementtaking adjectives, as in hard + to deliver--and spell out the lexical specifications a thorough treatment demands. The sheer com ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
this paper is to motivate the use of inheritance in lexical specification. To do this, we take a narrowly circumscribed phenomenon in English grammar--that of vp-complementtaking adjectives, as in hard + to deliver--and spell out the lexical specifications a thorough treatment demands. The sheer complexity of these specifications cries out for a redundancy-eliminating approach, and we propose a structured lexicon treatment. The grammatical analysis not only serves to motivate the general approach, it also illustrates several key issues in the design of structured lexicons, such as the use of default inheritance, the need for lexical rules, and the range of phenomena amenable to this sort of treatment

