Results 1 -
5 of
5
A Gaussian Prior for Smoothing Maximum Entropy Models
, 1999
"... In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood training for exponential models, and like other maximum likelihood methods is prone to overfitting of training data. Several smoothing methods for maximum entropy models have been proposed to address this problem, ..."
Abstract
-
Cited by 181 (1 self)
- Add to MetaCart
In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood training for exponential models, and like other maximum likelihood methods is prone to overfitting of training data. Several smoothing methods for maximum entropy models have been proposed to address this problem, but previous results do not make it clear how these smoothing methods compare with smoothing methods for other types of related models. In this work, we survey previous work in maximum entropy smoothing and compare the performance of several of these algorithms with conventional techniques for smoothing n-gram language models. Because of the mature body of research in n-gram model smoothing and the close connection between maximum entropy and conventional n-gram models, this domain is well-suited to gauge the performance of maximum entropy smoothing methods. Over a large number of data sets, we find that an ME smoothing method proposed to us by Lafferty [1] performs as well as or better tha...
Exponential Priors for Maximum Entropy Models
- In Proceedings of the Annual Meeting of the Association for Computational Linguistics
, 2003
"... this paper. Finally, thanks to Stan Chen and Roni Rosenfeld: our derivation for Exponential priors closely follows the text of their derivation for Gaussian priors. ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
this paper. Finally, thanks to Stan Chen and Roni Rosenfeld: our derivation for Exponential priors closely follows the text of their derivation for Gaussian priors.
Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling
"... We present a unified and complete account of maximum entropy density estimation subject to constraints represented by convex potential functions or, alternatively, by convex regularization. We provide fully general performance guarantees and an algorithm with a complete convergence proof. As special ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
We present a unified and complete account of maximum entropy density estimation subject to constraints represented by convex potential functions or, alternatively, by convex regularization. We provide fully general performance guarantees and an algorithm with a complete convergence proof. As special cases, we easily derive performance guarantees for many known regularization types, including ℓ1, ℓ2, ℓ 2 2, and ℓ1+ ℓ 2 2 style regularization. We propose an algorithm solving a large and general subclass of generalized maximum entropy problems, including all discussed in the paper, and prove its convergence. Our approach generalizes and unifies techniques based on information geometry and Bregman divergences as well as those based more directly on compactness. Our work is motivated by a novel application of maximum entropy to species distribution modeling, an important problem in conservation biology and ecology. In a set of experiments on real-world data, we demonstrate the utility of maximum entropy in this setting. We explore effects of different feature types, sample sizes, and regularization levels on the performance of maxent, and discuss interpretability of the resulting models.
Performance Prediction for Exponential Language Models
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, an ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including class-based models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1
Maximum entropy density estimation and modeling geographic distributions of species
, 2007
"... Maximum entropy (maxent) approach, formally equivalent to maximum likelihood, is a widely used density-estimation method. When input datasets are small, maxent is likely to overfit. Overfitting can be eliminated by various smoothing techniques, such as regularization and constraint relaxation, but t ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Maximum entropy (maxent) approach, formally equivalent to maximum likelihood, is a widely used density-estimation method. When input datasets are small, maxent is likely to overfit. Overfitting can be eliminated by various smoothing techniques, such as regularization and constraint relaxation, but theory explaining their properties is often missing or needs to be derived for each case separately. In this dissertation, we propose a unified treatment for a large and general class of smoothing techniques. We provide fully general guarantees on their statistical performance and propose optimization algorithms with complete convergence proofs. As special cases, we can easily derive performance guarantees for many known regularization types including L1 and L2-squared regularization. Furthermore, our general approach enables us to derive entirely new regularization functions with superior statistical guarantees. The new regularization functions use information about the structure of the feature space, incorporate information about sample selection bias, and combine information across several related density-estimation tasks. We propose algorithms solving a large and general subclass of generalized maxent problems, including all

