Results 1 
1 of
1
A Hybrid Approach To Adaptive Statistical Language Modeling
 Proceedings of the ARPA workshop on human language technology
, 1994
"... We desert'be our latest attempt at adaptive language modeling. At the heart of our approach is a Maximum Entropy (ME) model which inc.orlxnates many knowledge sources in a consistent manner. The other components are a selective unigram cache, a conditional bigram cache, and a conventionalstatic ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
We desert'be our latest attempt at adaptive language modeling. At the heart of our approach is a Maximum Entropy (ME) model which inc.orlxnates many knowledge sources in a consistent manner. The other components are a selective unigram cache, a conditional bigram cache, and a conventionalstatic trigram. We describe the knowledge sources used to build such a model with ARPA's official WSJ corpus, and report on perplexity and word error rate results obtained with it. Then, three different adaptation paradigms are discussed, and an additional experiment, based on AP wire data, is used to compare them. 1. OVERVIEW OF ME FRAMEWORK Using several different probability estimates to arrive at one combined estimate is a general problem that arises in many tasks. The Maximum Entropy (ME) principle has recently been demonstrated as a powerful tool for combining statistical estimates from diverse sources[l, 2, 3]. The ME principle ([4, 5]) proposes the following: 1. Reformulate the different estimates as constraints on the expectation of various functions, to be satisfied by the target (combined) estimate. 2. Among all probability distributions that satisfy these constraints, choose the one that has the highest entropy. More specifically, for estimating a probability function P(x), each constraint i is associated with a constraintfunctionfi(x) and a desired expectation ci. The constraint is then written as: def E Eefi = P(x)fi(x) = ci. (1) X Given consistent constraints, a unique ME solutions is guaranteed to exist, and to be of the form: P(x) = II mf'°°, (2) i where the pi's are some unknown constants, to be found. Probability functions of the form (2) are called loglinear, and the family of functions defined by holding thefi's fixed and varying the pi's is called an exponential family.