## On the Applicability of Neural Network and Machine Learning Methodologies to Natural Language Processing (1995)

### Cached

### Download Links

Citations: | 8 - 3 self |

### BibTeX

@MISC{Lawrence95onthe,

author = {Steve Lawrence and C. Lee Giles and Sandiway Fong},

title = {On the Applicability of Neural Network and Machine Learning Methodologies to Natural Language Processing},

year = {1995}

}

### OpenURL

### Abstract

We examine the inductive inference of a complex grammar - specifically, we consider the task of training a model to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government-and-Binding theory. We investigate the following models: feed-forward neural networks, Fransconi-Gori-Soda and Back-Tsoi locally recurrent networks, Elman, Narendra & Parthasarathy, and Williams & Zipser recurrent networks, Euclidean and edit-distance nearest-neighbors, simulated annealing, and decision trees. The feed-forward neural networks and non-neural network machine learning models are included primarily for comparison. We address the question: How can a neural network, with its distributed nature and gradient descent based iterative calculations, possess linguistic capability which is traditionally handled with symbolic computation and recursive processes? Initial...

### Citations

5229 | C4.5: Programs for Machine Learning - Quinlan - 1992 |

3944 |
Neural Networks – A Comprehensive Foundation, Upper Saddle
- Haykin
- 1998
(Show Context)
Citation Context ... recurrent network using the last two words as input to the model. 1. Weight initialization. Random weights are initialized with the goal of ensuring that the sigmoids do not start out in saturation (=-=Haykin 1994-=-). In addition, several sets of random weights are tested and the set which provides the best performance on the training data is chosen 9 . 2. Learning rate schedule. Relatively high learning rates a... |

1853 | Introduction to the Theory of Neural Computation - Hertz, Krogh, et al. - 1991 |

1621 | Finding structure in time
- Elman
- 1990
(Show Context)
Citation Context ...= 0; 1; :::; L (layer), and y l k j k=0 = 1 (bias). 2 4. Williams and Zipser. A fully recurrent network as described in (Williams & Zipser 1989). 5. Elman. A simple recurrent network as described in (=-=Elman 1990-=-, Elman 1991). Initially, partial success was only obtained with models employing a large temporal input window. We were unable to train the networks using a small temporal window although it is theor... |

1220 | Information Theory and Statistics - Kullback - 1968 |

845 | Lectures on Government and Binding - Chomsky - 1981 |

823 |
C4.5: Programs for
- Quinlan
- 1993
(Show Context)
Citation Context ...nsider how you would define the cost for deleting a noun without knowing the context in which it appears. 5 Decision Tree Methods We tested the C4.5 decision tree induction algorithm by Ross Quinlan (=-=Quinlan 1993-=-). Decision tree methods construct a tree which partitions the data at each level in the tree based on a particular feature of the data. C4.5 only deals with strings of constant length and we used an ... |

514 | Stochastic Complexity - Rissanen - 1989 |

512 | Identification and Control of Dynamical Systems Using Neural Networks - Narendra, Parthasarathy - 1990 |

429 | A learning algorithm for continually running fully recurrent neural networks - Williams, Zipser - 1989 |

360 | Three models for the description of language - Chomsky - 1956 |

330 | Introduction to Formal Language Theory - Harrison - 1978 |

329 | Distributed representations, simple recurrent networks, and grammatical structure
- Elman
- 1991
(Show Context)
Citation Context ...hen & Lee 1992). Do neural networks posses the power required for the task at hand? Yes, it has been shown that recurrent networks have the representational power required for hierarchical solutions (=-=Elman 1991-=-), and that they are Turing equivalent (Siegelmann & Sontag 1992). However, only recently has any work been successful with moderately large grammars. Recurrent neural networks have been used for seve... |

275 | Inside-Outside Reestimation from Partially Bracketed Corpora
- Pereira, Schabes
- 1992
(Show Context)
Citation Context ...uage models have been based on finite-state descriptions such as n-grams or hidden Markov models. However, finite-state models cannot represent hierarchical structures as found in natural language 2 (=-=Pereira 1992-=-). In the past few years several recurrent neural network architectures have emerged which have been used for grammatical inference (Cleeremans, Servan-Schreiber & McClelland 1989, Giles, Sun, Chen, L... |

211 | Syntactic Pattern Recognition and Applications - Fu - 1982 |

211 | The Induction of Dynamical Recognizers - Pollack - 1991 |

188 | 1898): “Very fast simulated re-annealing
- INGBER
(Show Context)
Citation Context ...same model as those models successfully trained to 100% correct training set classification using backpropagation through time. We have used the adaptive simulated annealing package by Lester Ingber (=-=Ingber 1989-=-, Ingber 1993). We have obtained no significant results from simulated annealing trials. Currently, the best simulated annealing trial has obtained an NMSE of 1.2 after two days of execution on a Sili... |

172 | Learning and extracting finite state automata with second-order recurrent neural networks - Giles, Miller, et al. - 1992 |

161 | On the computational power of neural nets - Siegelman, Sontag - 1995 |

153 | Finite state automata and simple recurrent networks - Cleeremans, Schreiber, et al. - 1989 |

136 | Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural description ?. Natural language parsing. pp 206-250 - Joshi - 1985 |

126 | Learning and applying contextual constraints in sentence comprehension - John, McClelland - 1990 |

124 | Experiments in Induction - Hunt, Marin, et al. - 1966 |

121 | Gradient-based learning algorithms for recurrent neural networks and their computational complexity - Williams, Peng - 1992 |

117 | An Efficient Gradient-Based Algorithm for online Training of Recurrent Neural Network Trajectories - Williams, Peng - 1990 |

97 |
An Overview of Sequence Comparison
- Kruskal
- 1983
(Show Context)
Citation Context ...he distance between the two complete sequences. i and j range from 0 to the length of the respective sequences and the superscripts denote sequences of the corresponding length. For more details see (=-=Kruskal 1983-=-). d(a i ; b j ) = min 8 ? ? ! ? ? : d(a i\Gamma1 ; b j + w(a i ; 0) deletion ofa i d(a i\Gamma1 ; b j\Gamma1 ) + w(a i ; b j ) b j replacesa i d(a i ; b j\Gamma1 ) + w(0; b j ) insertion ofb j 6 Cons... |

88 | Computation at the Onset of Chaos - Crutchfield, Young - 1990 |

88 | Paths and Categories - Pesetsky - 1982 |

82 | Induction of finite-state languages using second-order recurrent networks - Watrous, Kuhn - 1992 |

77 | FIR and IIR synapses, a new neural network architecture for time series modeling - Back, Tsoi - 1991 |

56 | Accelerated learning in layered neural networks - Solla, Levin, et al. - 1988 |

52 | Knowledge of Language: Its Nature
- Chomsky
- 1986
(Show Context)
Citation Context ...nguage is: How do people unfailingly manage to acquire such a complex rule system? A system so complex that it has resisted the efforts of linguists to date to adequately describe in a formal system (=-=Chomsky 1986-=-)? Here, we will provide a couple of examples of the kind of knowledge native speakers often take for granted. For instance, any native speaker of English knows that the adjective eager obligatorily t... |

51 | Language learning: Cues or rules - MacWhinney, Leinbach, et al. - 1989 |

49 | Supervised learning of probability distributions by neural networks - Baum, Wilczek - 1988 |

45 | Learning finite state machines with self-clustering recurrent networks - Zeng, Goodman, et al. - 1993 |

44 | Note on learning rate schedules for stochastic optimization - Darken, Moody - 1990 |

43 | Discovering rules from large collections of examples: a case study - Quinlan - 1979 |

42 | Towards faster stochastic gradient search - Darken, Moody - 1991 |

40 | Local feedback multilayered networks - Frasconi, Gori, et al. - 1992 |

39 | Higher order recurrent networks and grammatical inference - Giles, Sun, et al. - 1990 |

38 | An experimental Comparison of Recurrent Neural Networks - Horne, Giles - 1995 |

38 | Dynamic Construction Of FiniteState Automata From Examples Using Hill-Climbing - Tomita - 1982 |

37 | Extracting and learning an unknown grammar with recurrent neural networks - Giles, Miller, et al. - 1992 |

37 | Learning algorithms and probability distributions in feed-forward and feedback networks - Hopfield - 1987 |

33 | Unified integration of explicit rules and learning by example in recurrent networks - Frasconi, Gori, et al. - 1995 |

27 |
Structured representations and connectionist models
- Elman
- 1989
(Show Context)
Citation Context ...arge grammars. Recurrent neural networks have been used for several small natural language problems, e.g. papers using the Elman network for natural language tasks include: (Stolcke 1990, Allen 1983, =-=Elman 1984-=-, Harris & Elman 1984, John & McLelland 1990). 2 Data Our primary data consists of 552 English positive and negative examples taken from an introductory GB-linguistics textbook by Lasnik and Uriagerek... |

25 | Adaptive Simulated Annealing (ASA
- Ingber
(Show Context)
Citation Context ... those models successfully trained to 100% correct training set classification using backpropagation through time. We have used the adaptive simulated annealing package by Lester Ingber (Ingber 1989, =-=Ingber 1993-=-). We have obtained no significant results from simulated annealing trials. Currently, the best simulated annealing trial has obtained an NMSE of 1.2 after two days of execution on a Silicon Graphics ... |

25 | A Course in GB Syntax: Lectures on Binding and Empty Categories - Lasnik, Uriagereka - 1988 |

23 | The role of similarity in Hungarian vowel harmony: a connectionist account - Hare - 1990 |

21 | Learning Feature-based Semantics with Simple Recurrent Networks
- Stolcke
- 1990
(Show Context)
Citation Context ...ccessful with moderately large grammars. Recurrent neural networks have been used for several small natural language problems, e.g. papers using the Elman network for natural language tasks include: (=-=Stolcke 1990-=-, Allen 1983, Elman 1984, Harris & Elman 1984, John & McLelland 1990). 2 Data Our primary data consists of 552 English positive and negative examples taken from an introductory GB-linguistics textbook... |