Results 1 
5 of
5
A survey of smoothing techniques for ME models
 IEEE Transactions on Speech and Audio Processing
, 2000
"... Abstract—In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood (ML) training for exponential models, and like other ML methods is prone to overfitting of training data. Several smoothing methods for ME models have been proposed to address this problem, but previous r ..."
Abstract

Cited by 85 (1 self)
 Add to MetaCart
Abstract—In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood (ML) training for exponential models, and like other ML methods is prone to overfitting of training data. Several smoothing methods for ME models have been proposed to address this problem, but previous results do not make it clear how these smoothing methods compare with smoothing methods for other types of related models. In this work, we survey previous work in ME smoothing and compare the performance of several of these algorithms with conventional techniques for smoothinggram language models. Because of the mature body of research ingram model smoothing and the close connection between ME and conventionalgram models, this domain is wellsuited to gauge the performance of ME smoothing methods. Over a large number of data sets, we find that fuzzy ME smoothing performs as well as or better than all other algorithms under consideration. We contrast this method with previousgram smoothing methods to explain its superior performance. Index Terms—Exponential models, language modeling, maximum entropy, minimum divergence,gram models, smoothing.
Evaluation and Extension of Maximum Entropy Models with Inequality Constraints
, 2003
"... A maximum entropy (ME) model is usually estimated so that it conforms to equality constraints on feature expectations. ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
A maximum entropy (ME) model is usually estimated so that it conforms to equality constraints on feature expectations.
Estimating Probabilities from Small Samples
"... A novel solution is presented to a recurring problem in statistical modeling—estimating a probability mass function (pmf) for a discrete random variable from a small sample. The solution naturally leads to smooth pmf estimates, requires no held out data, nor makes any prior assumptions about the unk ..."
Abstract
 Add to MetaCart
A novel solution is presented to a recurring problem in statistical modeling—estimating a probability mass function (pmf) for a discrete random variable from a small sample. The solution naturally leads to smooth pmf estimates, requires no held out data, nor makes any prior assumptions about the unknown pmf, while still providing a way to incorporate prior knowledge when available. A pmf is deemed admissible as an estimate if it assigns merely a higher likelihood to the observed value of a sufficient statistic than to any other value possible for the same sample size. The maximum likelihood estimate is trivially admissible by this definition, but so are many other pmfs. An estimate is selected from this admissible family via criteria such as maximum entropy or minimum Idivergence. Empirical results in statistical language modeling are presented to demonstrate that estimates obtained in this manner have performance that is competitive with stateoftheart estimates, and have additional desirable properties not found in the stateoftheart.
Evaluation and Extension of Maximum Entropy Models with Inequality Constraints
"... A maximum entropy (ME) model is usually estimated so that it conforms to equality constraints on feature expectations. However, the equality constraint is inappropriate for sparse and therefore unreliable features. This study explores an ME model with boxtype inequality constraints, where the equal ..."
Abstract
 Add to MetaCart
A maximum entropy (ME) model is usually estimated so that it conforms to equality constraints on feature expectations. However, the equality constraint is inappropriate for sparse and therefore unreliable features. This study explores an ME model with boxtype inequality constraints, where the equality can be violated to reflect this unreliability. We evaluate the inequality ME model using text categorization datasets. We also propose an extension of the inequality ME model, which results in a natural integration with the Gaussian MAP estimation. Experimental results demonstrate the advantage of the inequality models and the proposed extension. 1
Building Maximum Entropy . . .
"... Over the recent years, text classification has become one of the key techniques for organizing information. Since handcoding text classifiers is impractical and handlabeling text is time and labor consuming, it is preferable to learn classifiers from a small amount of labeled examples and a large ..."
Abstract
 Add to MetaCart
Over the recent years, text classification has become one of the key techniques for organizing information. Since handcoding text classifiers is impractical and handlabeling text is time and labor consuming, it is preferable to learn classifiers from a small amount of labeled examples and a large example of unlabeled data. In many cases, such as online information retrieval or database applications, such unlabeled data are easily and abundantly available. Although a lot of this kind of learning algorithms have been designed, most of them rely on certain assumptions, which are dependent on specific datasets. Consequently, the lack of generality makes these algorithms unstable across different datasets. Therefore, we favor an algorithm with as little dependence on such assumptions or as weak assumption as possible. The maximum entropy models (MaxEnt) offers a generic framework meeting this requirement. Built upon a set of features which is equivalent to undirected graphical models, it provides a natural leverage of feature selection. Most importantly, the only assumption made by MaxEnt is that the average feature values on labeled data give a