A Whole Sentence Maximum Entropy Language Model (1997) [23 citations — 6 self]
Abstract:
We introduce a new kind of language model, which models whole sentences or utterances directly using the Maximum Entropy paradigm. The new model is conceptually simpler, and more naturally suited to modeling whole-sentence phenomena, than the conditional ME models proposed to date. By avoiding the chain rule, the model treats each sentence or utterance as a "bag of features", where features are arbitrary computable properties of the sentence. The model is unnormalizable, but this does not interfere with training (done via sampling) or with use. Using the model is computationally straightforward. The main computational cost of training the model is in generating sample sentences from a Gibbs distribution. Interestingly, this cost has different dependencies, and is potentially lower, than in the comparable conditional ME model. 1 Motivation Conventional statistical language models estimate the probability of an sentence s by using the chain rule to decompose it into a product of condit...
Citations
| 2439 | Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images – Geman, Geman - 1984 |
| 628 | A Maximum Entropy Approach to Natural Language Processing – Berger, Pietra, et al. - 1996 |
| 362 | Inducing features of random fields – Pietra, Pietra, et al. - 1997 |
| 311 | Information theory and statistical mechanics – Jaynes - 1957 |
| 295 | Generalized iterative scaling for log-linear models – Darroch, Ratcliff - 1972 |
| 152 | A maximum entropy approach to adaptive statistical language modeling – Rosenfeld - 1996 |
| 89 | A Maximum Entropy Model for Prepositional Phrase Attachment – Ratnaparkhi, Reynar, et al. - 1994 |
| 65 | Triggerbased language models: a maximum entropy approach – Lau, Rosenfeld, et al. - 1993 |
| 26 | Adaptive language modeling using minimum discriminant estimation – Pietra, Pietra, et al. - 1992 |

