Results 1 - 10
of
144
Optimality Theory: Constraint interaction in Generative Grammar
, 1993
"... ~ ROA Version, 8/2002. Essentially identical to the Tech Report, with new pagination (but the same footnote and example numbering); correction of typos, oversights & outright errors; improved typography; and occasional small-scale clarificatory rewordings. Citation should include reference to this ..."
Abstract
-
Cited by 789 (23 self)
- Add to MetaCart
~ ROA Version, 8/2002. Essentially identical to the Tech Report, with new pagination (but the same footnote and example numbering); correction of typos, oversights & outright errors; improved typography; and occasional small-scale clarificatory rewordings. Citation should include reference to this version.
Parallel Networks that Learn to Pronounce English Text
- COMPLEX SYSTEMS
, 1987
"... This paper describes NETtalk, a class of massively-parallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed h ..."
Abstract
-
Cited by 413 (5 self)
- Add to MetaCart
This paper describes NETtalk, a class of massively-parallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed human performance. (i) The learning follows a power law. (;i) The more words the network learns, the better it is at generalizing and correctly pronouncing new words, (iii) The performance of the network degrades very slowly as connections in the network are damaged: no single link or processing unit is essential. (iv) Relearning after damage is much faster than learning during the original training. (v) Distributed or spaced practice is more effective for long-term retention than massed practice. Network models can be constructed that have the same performance and learning characteristics on a particular task, but differ completely at the levels of synaptic strengths and single-unit responses. However, hierarchical clustering techniques applied to NETtalk reveal that these different networks have similar internal representations of letter-to-sound correspondences within groups of processing units. This suggests that invariant internal representations may be found in assemblies of neurons intermediate in size between highly localized and completely distributed representations.
On Language and Connectionism: Analysis of a Parallel Distributed Processing Model of Language Acquisition
- COGNITION
, 1988
"... Does knowledge of language consist of mentally-represented rules? Rumelhart and McClelland have described a connectionist (parallel distributed processing) model of the acquisition of the past tense in English which successfully maps many stems onto their past tense forms, both regular (walk/walked) ..."
Abstract
-
Cited by 217 (5 self)
- Add to MetaCart
Does knowledge of language consist of mentally-represented rules? Rumelhart and McClelland have described a connectionist (parallel distributed processing) model of the acquisition of the past tense in English which successfully maps many stems onto their past tense forms, both regular (walk/walked) and irregular (go/went), and which mimics some of the errors and sequences of development of children. Yet the model contains no explicit rules, only a set of neuron-style units which stand for trigrams of phonetic features of the stem, a set of units which stand for trigrams of phonetic features of the past form, and an array of connections between the two sets of units whose strengths are modified during learning. Rumelhart and McClelland conclude that linguistic rules may be merely convenient approximate fictions and that the real causal processes in language use and acquisition must be characterized as the transfer of activation levels among units and the modification of the weights of their connections. We analyze both the linguistic and the developmental assumptions of the model in detail and discover that (1) it cannot represent certain words, (2) it cannot learn many rules, (3) it can learn rules found in no human language, (4) it cannot explain morphological and phonological regularities, (5) it cannot explain the differences between irregular and regular forms, (6) it fails at its assigned task of mastering the past tense of English, (7) it gives an incorrect explanation for two developmental phenomena: stages of overregularization of irregular forms such as bringed, and the appearance of doubly-marked forms such as ated, and (8) it gives accounts of two others (infrequent overregularization of verbs ending in t/d, and the order of acquisition of different irregula...
The induction of dynamical recognizers
- Machine Learning
, 1991
"... A higher order recurrent neural network architecture learns to recognize and generate languages after being "trained " on categorized exemplars. Studying these networks from the perspective of dynamical systems yields two interesting discoveries: First, a longitudinal examination of the learning pro ..."
Abstract
-
Cited by 197 (15 self)
- Add to MetaCart
A higher order recurrent neural network architecture learns to recognize and generate languages after being "trained " on categorized exemplars. Studying these networks from the perspective of dynamical systems yields two interesting discoveries: First, a longitudinal examination of the learning process illustrates a new form of mechanical inference: Induction by phase transition. A small weight adjustment causes a "bifurcation" in the limit behavior of the network. This phase transition corresponds to the onset of the network’s capacity for generalizing to arbitrary-length strings. Second, a study of the automata resulting from the acquisition of previously published training sets indicates that while the architecture is not guaranteed to find a minimal finite automaton consistent with the given exemplars, which is an NP-Hard problem, the architecture does appear capable of generating non-regular languages by exploiting fractal and chaotic dynamics. I end the paper with a hypothesis relating linguistic generative capacity to the behavioral regimes of non-linear dynamical systems.
The empirical case for two systems of reasoning
- Psychological Bulletin
, 1996
"... Distinctions have been proposed between systems of reasoning for centuries. This article distills properties shared by many of these distinctions and characterizes the resulting systems in light of recent findings and theoretical developments. One system is associative because its computations refle ..."
Abstract
-
Cited by 172 (3 self)
- Add to MetaCart
Distinctions have been proposed between systems of reasoning for centuries. This article distills properties shared by many of these distinctions and characterizes the resulting systems in light of recent findings and theoretical developments. One system is associative because its computations reflect similarity structure and relations of temporal contiguity. The other is "rule based " because it operates on symbolic structures that have logical content and variables and because its computations have the properties that are normally assigned to rules. The systems serve complementary functions and can simultaneously generate different solutions to a reasoning problem. The rule-based system can suppress the associative system but not completely inhibit it. The article reviews evidence in favor of the distinction and its characterization. One of the oldest conundrums in psychology is whether people are best conceived as parallel processors of information who operate along diffuse associative links or as analysts who operate by deliberate and sequential manipulation of internal representations. Are inferences drawn through a network of learned associative pathways or through application of a kind of "psychologic"
The Helmholtz Machine
, 1995
"... Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative model ..."
Abstract
-
Cited by 165 (22 self)
- Add to MetaCart
Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical self-supervised learning that may relate to the function of bottom-up and top-down cortical processing pathways.
The role of knowledge in discourse comprehension: A construction-integration model
- Psychological Review
, 1988
"... In contrast to expectation-based, predictive views of discourse comprehension, a model is developed in which the initial processing is strictly bottom-up. Word meanings are activated, propositions are formed, and inferences and elaborations are produced without regard to the discourse context. Howev ..."
Abstract
-
Cited by 160 (6 self)
- Add to MetaCart
In contrast to expectation-based, predictive views of discourse comprehension, a model is developed in which the initial processing is strictly bottom-up. Word meanings are activated, propositions are formed, and inferences and elaborations are produced without regard to the discourse context. However, a network of interrelated items is created in this manner, which can be integrated into a coherent structure through a spreading activation process. Data concerning the time course of word identification in a discourse context are examined. A simulation of arithmetic word-problem under-standing provides a plausible account for some well-known phenomena in this area. Discourse comprehension, from the viewpoint of a computa-tional theory, involves constructing a representation of a dis-course upon which various computations can be performed, the outcomes of which are commonly taken as evidence for com-prehension. Thus, after comprehending a text, one might rea-sonably expect to be able to answer questions about it, recall or summarize it, verify statements about it, paraphrase it, and SO on.
Optimality Theory
, 2000
"... Introduction Rene Kager's textbook is one of the first to cover Optimality Theory (OT), a declarative grammar framework that swiftly took over phonology after it was introduced by Prince, Smolensky, and McCarthy in 1993. OT reclaims traditional grammar's ability to express surface generalizations ..."
Abstract
-
Cited by 113 (0 self)
- Add to MetaCart
Introduction Rene Kager's textbook is one of the first to cover Optimality Theory (OT), a declarative grammar framework that swiftly took over phonology after it was introduced by Prince, Smolensky, and McCarthy in 1993. OT reclaims traditional grammar's ability to express surface generalizations ("syllables have onsets," "no nasal+voiceless obstruent clusters"). Empirically, some surface generalizations are robust within a language, or---perhaps for functionalist reasons--- widespread across languages. Derivational theories were forced to posit diverse rules that rescued these robust generalizations from other phonological processes. An OT grammar avoids such "conspiracies" by stating the generalizations directly, as in TwoLevel Morphology (Koskenniemi, 1983) or Declarative Phonology (Bird, 1995). In OT, the processes that try but fail to disrupt a robust generalization are described not as rules (cf. Paradis (1988)), but as lower-ranked generalizations. Suc
Some Aspects of Optimality in Natural Language Interpretation
- Journal of Semantics
, 1999
"... In a series of papers, Petra Hendriks, Helen de Hoop and Henritte de Swart have applied optimality theory (OT) to semantics. These authors argue that there is a fundamental difference between the form of OT as used in phonology, morphology and syntax on the one hand and its form as used in semantics ..."
Abstract
-
Cited by 94 (10 self)
- Add to MetaCart
In a series of papers, Petra Hendriks, Helen de Hoop and Henritte de Swart have applied optimality theory (OT) to semantics. These authors argue that there is a fundamental difference between the form of OT as used in phonology, morphology and syntax on the one hand and its form as used in semantics on the other hand. Whereas in the first case OT takes the point of view of the speaker, in the second case the point of view of the hearer is taken. The aim of this paper is to argue that the proper treatment of OT in natural language interpretation has to take both perspectives at the same time. A conceptual framework is established that realizes the integration of both perspectives. It will be argued that this framework captures the essence of the Gricean maxims and gives a precise explication of Atlas & Levinson`s (1981) idea of balancing between informativeness and efficiency in natural language processing. The ideas are then applied to resolve some puzzles in natural language interpret...
Generalized Alignment
- Yearbook of Morphology
, 1993
"... Overt or covert reference to the edges of constituents is a commonplace throughout phonology and morphology. Some examples include: •In English, Garawa, Indonesian and a number of other languages, the normal right-to-left ..."
Abstract
-
Cited by 90 (10 self)
- Add to MetaCart
Overt or covert reference to the edges of constituents is a commonplace throughout phonology and morphology. Some examples include: •In English, Garawa, Indonesian and a number of other languages, the normal right-to-left

