This thesis demonstrates that several important kinds of natural language ambiguities can be resolved to state-of-the-art accuracies using a single statistical modeling technique based on the principle of maximum entropy. We discuss the problems of sentence boundary detection, part-of-speech tagging, prepositional phrase attachment, natural language parsing, and text categorization under the maximum entropy framework. In practice, we have found that maximum entropy models offer the following advantages: State-of-the-art Accuracy: The probability models for all of the tasks discussed perform at or near state-of-the-art accuracies, or outperform competing learning algorithms when trained and tested under similar conditions. Methods which outperform those presented here require much more supervision in the form of additional human involvement or additional supporting res...
|
4923
|
Elements of Information Theory
– Cover, Thomas
- 1991
|
|
3356
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
2573
|
Classification and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
2526
|
Induction of decision trees
– Quinlan
- 1986
|
|
1196
|
Building a large annotated corpus of English: the penn treebank
– Marcus, Marcinkiewicz, et al.
- 1993
|
|
628
|
A Maximum Entropy Approach to Natural Language Processing
– Berger, Pietra, et al.
- 1996
|
|
508
|
Estimation of probabilities from sparse data for the language model component of a speech recognizer
– Katz
- 1987
|
|
396
|
Class-based n-gram models of natural language
– BROWN, J, et al.
- 1990
|
|
362
|
Inducing features of random fields
– Pietra, Pietra, et al.
- 1997
|
|
361
|
A new statistical parser based on bigram lexical dependencies
– Collins
- 1996
|
|
359
|
Three generative lexicalised models for statistical parsing
– Collins
- 1997
|
|
311
|
Information theory and statistical mechanics
– Jaynes
- 1957
|
|
295
|
Generalized iterative scaling for log-linear models
– Darroch, Ratcliff
- 1972
|
|
288
|
Applied Logistic Regression
– Hosmer, Lemeshow
- 1989
|
|
271
|
Self-organized language modeling for speech recognition
– JELINEK
- 1990
|
|
267
|
Text chunking using transformation-based learning
– Ramshaw, Marcus
- 1995
|
|
260
|
Statistical Parsing with a ContextFree Grammar and Word Statistics
– Charniak
- 1997
|
|
246
|
Parsing by Chunks
– Abney
- 1991
|
|
245
|
Theory of Syntactic Recognition for Natural Languages
– Marcus
- 1980
|
|
239
|
Statistical decision-tree models for parsing
– Magerman
- 1995
|
|
232
|
Structural Ambiguity and Lexical Relations
– Hindle, Rooth
- 1993
|
|
222
|
Frequency Analysis of English Usage: Lexicon and Grammar
– Francis, Kucera
- 1982
|
|
213
|
A comparison of two learning algorithms for text categorization
– Lewis, Ringuette
- 1994
|
|
203
|
Some advances in transformation-based part of speech tagging
– Brill
- 1994
|
|
186
|
Automated learning of decision rules for text categorization
– Apte, Damerau
- 1994
|
|
175
|
A method for disambiguating word senses in a large corpus
– Gale, Church, et al.
- 1993
|
|
168
|
Generalised probabilistic LR parsing of natural language (corpora) with unification-based grammars
– Briscoe, Carroll
- 1993
|
|
157
|
A procedure for quantitatively comparing the syntactic coverage of English grammars
– Black, Abney, et al.
- 1991
|
|
156
|
Tagging english text with a probabilistic model
– Merialdo
- 1994
|
|
153
|
I-divergence geometry of probability distributions and minimization problems
– Csiszár
- 1975
|
|
152
|
A maximum entropy approach to adaptive statistical language modeling
– Rosenfeld
- 1996
|
|
143
|
Classi cation and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
137
|
A corpus-based approach to Language Learning
– Brill
- 1993
|
|
136
|
Towards history-based grammars: Using richer models for probabilistic parsing
– Black, Jelinek, et al.
- 1993
|
|
135
|
Representation and learning in information retrieval
– Lewis
- 1992
|
|
130
|
Three new probabilistic models for dependency parsing: An exploration
– Eisner
- 1996
|
|
111
|
A maximum entropy approach to identifying sentence boundaries
– Reynar, Ratnaparkhi
- 1997
|
|
111
|
Coping with ambiguity and unknown words through probabilistic models
– Weischedel, Meteer, et al.
- 1993
|
|
106
|
Prepositional Phrase Attachment through a Backed-off Model
– Collins, Brooks
- 1995
|
|
105
|
Word-sense disambiguation using decomposable models
– Bruce, Wiebe
- 1994
|
|
89
|
A Maximum Entropy Model for Prepositional Phrase Attachment
– Ratnaparkhi, Reynar, et al.
- 1994
|
|
66
|
Learning Parse and Translation Decisions from Examples with Rich Context
– Hermjakob, Mooney
- 1997
|
|
64
|
An Introduction to Mathematical Statistics and Its Applications
– Larsen, Marx
- 2001
|
|
53
|
A Freely Available Wide Coverage Morphological Analyzer for English
– Karp, Schabes, et al.
- 1992
|
|
50
|
Improving Data Driven Wordclass Tagging by System Combination
– Halteren, Zavrel, et al.
- 1993
|
|
43
|
The Linguistics of Punctuation
– Nunberg
- 1990
|
|
40
|
Adaptive Multilingual Sentence Boundary Disambiguation
– Palmer, Hearst
- 1997
|
|
40
|
Some applications of tree-based modelling to speech and language
– Riley
- 1989
|
|
39
|
Decision Tree Parsing using a Hidden Derivation Model
– Jelinek, Lafferty, et al.
- 1994
|
|
39
|
Text analysis and word pronunciation in text-to-speech synthesis
– Liberman, Church
- 1991
|