Results 1 
5 of
5
A Maximum Entropy Approach to Adaptive Statistical Language Modeling
 Computer, Speech and Language
, 1996
"... An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's histor ..."
Abstract

Cited by 242 (11 self)
 Add to MetaCart
An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's history, we propose and use trigger pairs as the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse. Next, statistical evidence from multiple sources must be combined. Traditionally, linear interpolation and its variants have been used, but these are shown here to be seriously deficient. Instead, we apply the principle of Maximum Entropy (ME). Each information source gives rise to a set of constraints, to be imposed on the combined estimate. The intersection of these constraints is the set of probability functions which are consistent with all the information sources. The function with the highest entropy within that set is the ME solution...
Statistical Language Modeling Using Avariable Context Length
 In Proc. Int. Conf. Spoken Language Processing
, 1996
"... In this paper we investigate statistical language models with avariable context length. For such models the number of relevantwords in a context is not fixed as in conventional Mgram models but depends on the context itself. We develop a measure for the quality of variablelength models and present ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this paper we investigate statistical language models with avariable context length. For such models the number of relevantwords in a context is not fixed as in conventional Mgram models but depends on the context itself. We develop a measure for the quality of variablelength models and present a pruning algorithm for the creation of such models, based on this measure. Further we address the question how the use of a special backingoff distribution can improve the language models. Experiments were performed...
Rare Probability Estimation under Regularly Varying Heavy Tails
"... This paper studies the problem of estimating the probability of symbols that have occurred very rarely, in samples drawn independently from an unknown, possibly infinite, discrete distribution. In particular, we study the multiplicative consistency of estimators, defined as the ratio of the estimate ..."
Abstract
 Add to MetaCart
This paper studies the problem of estimating the probability of symbols that have occurred very rarely, in samples drawn independently from an unknown, possibly infinite, discrete distribution. In particular, we study the multiplicative consistency of estimators, defined as the ratio of the estimate to the true quantity converging to one. We first show that the classical GoodTuring estimator is not universally consistent in this sense, despite enjoying favorable additive properties. We then use Karamata’s theory of regular variation to prove that regularly varying heavy tails are sufficient for consistency. At the core of this result is a multiplicative concentration that we establish both by extending the McAllesterOrtiz additive concentration for the missing mass to all rare probabilities and by exploiting regular variation. We also derive a family of estimators which, in addition to being consistent, address some of the shortcomings of the GoodTuring estimator. For example, they perform smoothing implicitly and have the absolute discounting structure of many heuristic algorithms. This also establishes a discrete parallel to extreme value theory, and many of the techniques therein can be adapted to the framework that we set forth.