The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.
|
4923
|
Elements of Information Theory
– Cover, Thomas
- 1991
|
|
4735
|
Maximum Likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
549
|
The mathematics of statistical machine translation: Parameter estimation
– Brown, Pietra, et al.
- 1993
|
|
406
|
A statistical approach to machine translation
– Brown, Cocke, et al.
- 1990
|
|
396
|
Class-based n-gram models of natural language
– BROWN, J, et al.
- 1990
|
|
362
|
Inducing features of random fields
– Pietra, Pietra, et al.
- 1997
|
|
295
|
Generalized iterative scaling for log-linear models
– Darroch, Ratcliff
- 1972
|
|
230
|
Interpolated estimation of Markov source parameters from sparse data
– Jelinek, Mercer
- 1980
|
|
153
|
I-divergence geometry of probability distributions and minimization problems
– Csiszár
- 1975
|
|
136
|
Towards history-based grammars: Using richer models for probabilistic parsing
– Black, Jelinek, et al.
- 1993
|
|
96
|
Information geometry and alternating minimization procedures
– Csisz'ar, Tusn'ady
- 1984
|
|
93
|
A tree-based statistical language model for natural speech recognition
– Bahl, Brown, et al.
- 1990
|
|
48
|
An information theoretic approach to the automatic determination of phonemic baseforms
– Lucassen, Mercer
- 1984
|
|
47
|
Tagging text with a probabilistic model
– Merialdo
- 1994
|
|
39
|
The candide system for machine translation
– Berger, Brown, et al.
- 1994
|
|
28
|
Inference and estimation of a long-range trigram model
– Pietra, Pietra, et al.
- 1994
|
|
22
|
A Note on Approximations to Discrete Probability Distributions
– Brown
- 1959
|
|
18
|
The Principle of Maximum Entropy
– Guiasu, Shenitzer
- 1985
|
|
12
|
Inducing features of random elds
– Pietra, Pietra, et al.
- 1997
|
|
12
|
Notes on present status and future prospects
– Jaynes
- 1990
|
|
8
|
A statistical approach tomachine translation
– Brown, Pietra, et al.
- 1990
|
|
2
|
Pietra A Maximum Entropy Approach to NLP
– Berger, Della
- 1991
|
|
1
|
A geometric interpretation of Darroch and Ratcli 's generalized iterative scaling
– ibid
- 1989
|
|
1
|
Continuous speech recognition with automatically selected acoustic prototypes obtained by either bootstrapping or clustering
– N'adas, Bahl, et al.
- 1981
|
|
1
|
A Geometric Interpretation of Darroch and Ratcliff's Generalized Iterative Scaling. The Annals of Statistics
– ibid
- 1989
|
|
1
|
A Statistical Approach to Sense Disambiguation
– Brown, Pietra, et al.
- 1991
|
|
1
|
I-Divergence Geometry of Probability Distributions and Minimization Problems, The Annals of Probability
– Csiszdr
- 1975
|
|
1
|
Information Geometry and Alternating Minimization Procedures
– Csiszir, Tusnidy
- 1984
|