MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

A Whole Sentence Maximum Entropy Language Model (1997) [23 citations — 6 self]

by R. Rosenfeld
Proceedings of the IEEE Workshop on Speech Recognition and Understanding
Add To MetaCart

Abstract:

We introduce a new kind of language model, which models whole sentences or utterances directly using the Maximum Entropy paradigm. The new model is conceptually simpler, and more naturally suited to modeling whole-sentence phenomena, than the conditional ME models proposed to date. By avoiding the chain rule, the model treats each sentence or utterance as a "bag of features", where features are arbitrary computable properties of the sentence. The model is unnormalizable, but this does not interfere with training (done via sampling) or with use. Using the model is computationally straightforward. The main computational cost of training the model is in generating sample sentences from a Gibbs distribution. Interestingly, this cost has different dependencies, and is potentially lower, than in the comparable conditional ME model. 1 Motivation Conventional statistical language models estimate the probability of an sentence s by using the chain rule to decompose it into a product of condit...

Citations

2439 Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images – Geman, Geman - 1984
628 A Maximum Entropy Approach to Natural Language Processing – Berger, Pietra, et al. - 1996
362 Inducing features of random fields – Pietra, Pietra, et al. - 1997
311 Information theory and statistical mechanics – Jaynes - 1957
295 Generalized iterative scaling for log-linear models – Darroch, Ratcliff - 1972
152 A maximum entropy approach to adaptive statistical language modeling – Rosenfeld - 1996
89 A Maximum Entropy Model for Prepositional Phrase Attachment – Ratnaparkhi, Reynar, et al. - 1994
65 Triggerbased language models: a maximum entropy approach – Lau, Rosenfeld, et al. - 1993
26 Adaptive language modeling using minimum discriminant estimation – Pietra, Pietra, et al. - 1992