Results 1 - 10
of
11
The induction of dynamical recognizers
- Machine Learning
, 1991
"... A higher order recurrent neural network architecture learns to recognize and generate languages after being "trained " on categorized exemplars. Studying these networks from the perspective of dynamical systems yields two interesting discoveries: First, a longitudinal examination of the learning pro ..."
Abstract
-
Cited by 197 (15 self)
- Add to MetaCart
A higher order recurrent neural network architecture learns to recognize and generate languages after being "trained " on categorized exemplars. Studying these networks from the perspective of dynamical systems yields two interesting discoveries: First, a longitudinal examination of the learning process illustrates a new form of mechanical inference: Induction by phase transition. A small weight adjustment causes a "bifurcation" in the limit behavior of the network. This phase transition corresponds to the onset of the network’s capacity for generalizing to arbitrary-length strings. Second, a study of the automata resulting from the acquisition of previously published training sets indicates that while the architecture is not guaranteed to find a minimal finite automaton consistent with the given exemplars, which is an NP-Hard problem, the architecture does appear capable of generating non-regular languages by exploiting fractal and chaotic dynamics. I end the paper with a hypothesis relating linguistic generative capacity to the behavioral regimes of non-linear dynamical systems.
English Relative Clause Constructions
- JOURNAL OF LINGUISTICS
, 1997
"... This paper sketches a grammar of English relative clause constructions (including infinitival and reduced relatives) based on the notions of construction type and type constraints. Generalizations about dependency relations and clausal functions are factored into distinct dimensions contributing con ..."
Abstract
-
Cited by 125 (9 self)
- Add to MetaCart
This paper sketches a grammar of English relative clause constructions (including infinitival and reduced relatives) based on the notions of construction type and type constraints. Generalizations about dependency relations and clausal functions are factored into distinct dimensions contributing constraints to specific construction types in a multiple inheritance type hierarchy. The grammar presented here provides an account of extraction, pied piping and relative clause `stacking' without appeal to transformational operations, transderivational competition, or invisible (`empty') categories of any kind.
Natural Language Grammatical Inference: A Comparison of Recurrent Neural Networks and Machine Learning Methods
- Symbolic, Connectionist, and Statistical Approaches to Learning for Natural Language Processing, Lecture notes in AI
, 1996
"... We consider the task of training a neural network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government and Binding theory. We investigate the foll ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We consider the task of training a neural network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government and Binding theory. We investigate the following models: feed-forward neural networks, Frasconi-Gori-Soda and Back-Tsoi locally recurrent neural networks, Williams and Zipser and Elman recurrent neural networks, Euclidean and edit-distance nearest-neighbors, and decision trees. Non-neural network machine learning methods are included primarily for comparison. We find that the Elman and Williams & Zipser recurrent neural networks are able to find a representation for the grammar which we believe is more parsimonious. These models exhibit the best performance. 1 Motivation 1.1 Representational Power of Recurrent Neural Networks Natural language has traditionally been handled using symbolic computation and recursive processes. The most ...
On the Applicability of Neural Network and Machine Learning Methodologies to Natural Language Processing
, 1995
"... We examine the inductive inference of a complex grammar - specifically, we consider the task of training a model to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic frame ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We examine the inductive inference of a complex grammar - specifically, we consider the task of training a model to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government-and-Binding theory. We investigate the following models: feed-forward neural networks, Fransconi-Gori-Soda and Back-Tsoi locally recurrent networks, Elman, Narendra & Parthasarathy, and Williams & Zipser recurrent networks, Euclidean and edit-distance nearest-neighbors, simulated annealing, and decision trees. The feed-forward neural networks and non-neural network machine learning models are included primarily for comparison. We address the question: How can a neural network, with its distributed nature and gradient descent based iterative calculations, possess linguistic capability which is traditionally handled with symbolic computation and recursive processes? Initial...
Bottom-Up Earley Deduction
, 1994
"... We propose a bottom-up variant of Earley deduction. Bottom-up deduction is preferable to top-down deduction because it aJlows incremen- tal processing (even for head-driven grammars), it is data-driven, no subsumption check is needed, and preference values attached to lexical items can be used to gu ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We propose a bottom-up variant of Earley deduction. Bottom-up deduction is preferable to top-down deduction because it aJlows incremen- tal processing (even for head-driven grammars), it is data-driven, no subsumption check is needed, and preference values attached to lexical items can be used to guide best-first search. We discuss the scanning step for bottom-up Earley deduction and indexing schemes that help avoid useless deduc- tion steps.
Can Recurrent Neural Networks Learn Natural Language Grammars?
- IN PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS
, 1996
"... Recurrent neural networks are complex parametric dynamic systems that can exhibit a wide range of different behavior. We consider the task of grammatical inference with recurrent neural networks. Specifically, we consider the task of classifying natural language sentences as grammatical or ungrammat ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Recurrent neural networks are complex parametric dynamic systems that can exhibit a wide range of different behavior. We consider the task of grammatical inference with recurrent neural networks. Specifically, we consider the task of classifying natural language sentences as grammatical or ungrammatical - can a recurrent neural network be made to exhibit the same kind of discriminatory power which is provided by the Principles and Parameters linguistic framework, or Government and Binding theory? We attempt to train a network, without the bifurcation into learned vs. innate components assumed by Chomsky, to produce the same judgments as native speakers on sharply grammatical/ungrammatical data. We consider how a recurrent neural network could possess linguistic capability, and investigate the properties of Elman, Narendra & Parthasarathy (N&P) and Williams & Zipser (W&Z) recurrent networks, and Frasconi-Gori-Soda (FGS) locally recurrent networks in this setting. We show that both Elman...
Head Corner Parsing
- CONSTRAINTS, LANGUAGE AND COMPUTATION
, 1994
"... I describe a head-driven parser for a class of grammars that handle discontinuous constituency by a richer notion of string combination than ordinary concatenation. The parser is a generalization of the left-corner parser and can be used for grammars written in powerful formalisms such as non-co ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
I describe a head-driven parser for a class of grammars that handle discontinuous constituency by a richer notion of string combination than ordinary concatenation. The parser is a generalization of the left-corner parser and can be used for grammars written in powerful formalisms such as non-concatenative versions of UCG and HPSG.
Lexicalized non-local MCTAG with dominance links is NP-complete
"... An NP-hardness proof for nonlocal MCTAG by Rambow and Satta (1992), based on Dahlhaus and Warmuth (1986), is extended to some restrictions of that formalism. It is found that there are NP-hard grammars among nonlocal MCTAGs even if the following restrictions are imposed: every tree in every tree set ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
An NP-hardness proof for nonlocal MCTAG by Rambow and Satta (1992), based on Dahlhaus and Warmuth (1986), is extended to some restrictions of that formalism. It is found that there are NP-hard grammars among nonlocal MCTAGs even if the following restrictions are imposed: every tree in every tree set has a lexical anchor; every tree set may contain at most two trees; in every such tree set, there is a dominance link between the foot node of one tree and the root node of the other tree and this dominance link must be obeyed in the derived tree. This is the version of MCTAG used in Becker, Joshi, and Rambow (1991). The lexicalization restriction makes the grammar class NP-complete.
Predictive Head-Corner Chart Parsing
, 1993
"... Head-Corner (HC) parsing has come up in computational linguistics a few years ago, motivated by linguistic arguments. This idea is a heuristic, rather than a fail-safe principle, hence it is relevant indeed to consider the worst-case behaviour of the HC parser. We define a novel predictive head-corn ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Head-Corner (HC) parsing has come up in computational linguistics a few years ago, motivated by linguistic arguments. This idea is a heuristic, rather than a fail-safe principle, hence it is relevant indeed to consider the worst-case behaviour of the HC parser. We define a novel predictive head-corner chart parser of cubic time complexity. We start with a left-corner (LC) chart parser, which is easier to understand. Subsequently, the LC chart parser is generalized to an HC chart parser. It is briefly sketched how the parser can be enhanced with feature structures. 1. Introduction "Our Latin teachers were apparently right", Martin Kay (1989) remarks. "You should start [parsing] with the main verb. This will tell you what kinds of subjects and objects to look for and what cases they will be in. When you come to look for these, you should also start by trying to find the main word, because this will tell you most about what else to look for". Head-driven or head-corner parsing has been ...
Remarks on Binding Theory
, 2005
"... We propose some reformulations of binding principle A that build on recent work by Pollard and Xue, and by Runner et al. We then turn to the thorny issue of the status of indices, in connection with the seemingly simpler Principle B. We conclude that the notion of index is fundamentally incoherent, ..."
Abstract
- Add to MetaCart
We propose some reformulations of binding principle A that build on recent work by Pollard and Xue, and by Runner et al. We then turn to the thorny issue of the status of indices, in connection with the seemingly simpler Principle B. We conclude that the notion of index is fundamentally incoherent, and suggest some possible approaches to eliminating them as theoretical primitives. One possibility is to let logical variables take up the explanatory burden borne by indices, but this turns out to be fraught with difficulties. Another approach, which involves returning to the idea that referentially dependent expressions denote identity functions (as proposed, independently, by Pollard and Sag and by Jacobson) seerms to hold more promise. 1

