Results 1  10
of
11
A survey of smoothing techniques for ME models
 IEEE Transactions on Speech and Audio Processing
, 2000
"... ..."
(Show Context)
Evaluation and Extension of Maximum Entropy Models with Inequality Constraints
, 2003
"... A maximum entropy (ME) model is usually estimated so that it conforms to equality constraints on feature expectations. ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
A maximum entropy (ME) model is usually estimated so that it conforms to equality constraints on feature expectations.
Estimating Probabilities from Small Samples
"... A novel solution is presented to a recurring problem in statistical modeling—estimating a probability mass function (pmf) for a discrete random variable from a small sample. The solution naturally leads to smooth pmf estimates, requires no held out data, nor makes any prior assumptions about the unk ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A novel solution is presented to a recurring problem in statistical modeling—estimating a probability mass function (pmf) for a discrete random variable from a small sample. The solution naturally leads to smooth pmf estimates, requires no held out data, nor makes any prior assumptions about the unknown pmf, while still providing a way to incorporate prior knowledge when available. A pmf is deemed admissible as an estimate if it assigns merely a higher likelihood to the observed value of a sufficient statistic than to any other value possible for the same sample size. The maximum likelihood estimate is trivially admissible by this definition, but so are many other pmfs. An estimate is selected from this admissible family via criteria such as maximum entropy or minimum Idivergence. Empirical results in statistical language modeling are presented to demonstrate that estimates obtained in this manner have performance that is competitive with stateoftheart estimates, and have additional desirable properties not found in the stateoftheart.
LogLinear Models
, 2004
"... This is yet another introduction to loglinear (“maximum entropy”) models for NLP practitioners, in the spirit of Berger (1996) and Ratnaparkhi (1997b). The derivations here are similar to Berger’s, but more details are filled in and some errors are corrected. I do not address iterative scaling (Dar ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This is yet another introduction to loglinear (“maximum entropy”) models for NLP practitioners, in the spirit of Berger (1996) and Ratnaparkhi (1997b). The derivations here are similar to Berger’s, but more details are filled in and some errors are corrected. I do not address iterative scaling (Darroch and Ratcliff, 1972), but rather give derivations of the gradient and Hessian of the dual objective function (conditional likelihood). Note: This is a draft; please contact the author if you have comments, and do not cite or circulate this document. 1 Loglinear Models Loglinear models 1 have become a widelyused tool in NLP classification tasks (Berger et al., 1996; Ratnaparkhi, 1998). Loglinear models assign joint probabilities to observation/label pairs (x, y) ∈ X × Y as follows: Pr(x, y) =
Building Maximum Entropy . . .
"... Over the recent years, text classification has become one of the key techniques for organizing information. Since handcoding text classifiers is impractical and handlabeling text is time and labor consuming, it is preferable to learn classifiers from a small amount of labeled examples and a large ..."
Abstract
 Add to MetaCart
Over the recent years, text classification has become one of the key techniques for organizing information. Since handcoding text classifiers is impractical and handlabeling text is time and labor consuming, it is preferable to learn classifiers from a small amount of labeled examples and a large example of unlabeled data. In many cases, such as online information retrieval or database applications, such unlabeled data are easily and abundantly available. Although a lot of this kind of learning algorithms have been designed, most of them rely on certain assumptions, which are dependent on specific datasets. Consequently, the lack of generality makes these algorithms unstable across different datasets. Therefore, we favor an algorithm with as little dependence on such assumptions or as weak assumption as possible. The maximum entropy models (MaxEnt) offers a generic framework meeting this requirement. Built upon a set of features which is equivalent to undirected graphical models, it provides a natural leverage of feature selection. Most importantly, the only assumption made by MaxEnt is that the average feature values on labeled data give a
Evaluation and Extension of Maximum Entropy Models with Inequality Constraints
"... A maximum entropy (ME) model is usually estimated so that it conforms to equality constraints on feature expectations. However, the equality constraint is inappropriate for sparse and therefore unreliable features. This study explores an ME model with boxtype inequality constraints, where the equal ..."
Abstract
 Add to MetaCart
(Show Context)
A maximum entropy (ME) model is usually estimated so that it conforms to equality constraints on feature expectations. However, the equality constraint is inappropriate for sparse and therefore unreliable features. This study explores an ME model with boxtype inequality constraints, where the equality can be violated to reflect this unreliability. We evaluate the inequality ME model using text categorization datasets. We also propose an extension of the inequality ME model, which results in a natural integration with the Gaussian MAP estimation. Experimental results demonstrate the advantage of the inequality models and the proposed extension. 1
LETTER Communicated by Liam Paninski Maximum Likelihood Set for Estimating a Probability Mass Function
"... We propose a new method for estimating the probability mass function (pmf) of a discrete and finite random variable from a small sample. We focus on the observed counts—the number of times each value appears in the sample—and define the maximum likelihood set (MLS) as the set of pmfs that put more m ..."
Abstract
 Add to MetaCart
(Show Context)
We propose a new method for estimating the probability mass function (pmf) of a discrete and finite random variable from a small sample. We focus on the observed counts—the number of times each value appears in the sample—and define the maximum likelihood set (MLS) as the set of pmfs that put more mass on the observed counts than on any other set of counts possible for the same sample size. We characterize the MLS in detail in this article. We show that the MLS is a diamondshaped subset of the probability simplex [0, 1] k bounded by at most k × (k − 1) hyperplanes, where k is the number of possible values of the random variable. The MLS always contains the empirical distribution, as well as a family of Bayesian estimators based on a Dirichlet prior, particularly the wellknown Laplace estimator. We propose to select from the MLS the pmf that is closest to a fixed pmf that encodes prior knowledge. When using KullbackLeibler distance for this selection, the optimization problem comprises finding the minimum of a convex function over a domain defined by linear inequalities, for which standard numerical procedures are available. We apply this estimate to language modeling using Zipf’s law to encode prior knowledge and show that this method permits obtaining stateoftheart results while being conceptually simpler than most competing methods. 1
unknown title
"... On the existence and characterization of the maxent distribution under general moment inequality constraints Prakash Ishwar, Member, IEEE, and Pierre Moulin, Fellow, IEEE Abstract — A broad set of sufficient conditions that guarantees the existence of the maximum entropy (maxent) distribution consis ..."
Abstract
 Add to MetaCart
(Show Context)
On the existence and characterization of the maxent distribution under general moment inequality constraints Prakash Ishwar, Member, IEEE, and Pierre Moulin, Fellow, IEEE Abstract — A broad set of sufficient conditions that guarantees the existence of the maximum entropy (maxent) distribution consistent with specified bounds on certain generalized moments is derived. Most results in the literature are either focused on the minimum cross–entropy distribution or apply only to distributions with a bounded–volume support or address only equality constraints. The results of this work hold for general moment inequality constraints for probability distributions with possibly unbounded support, and the technical conditions are explicitly on the underlying generalized moment functions. An analytical characterization of the maxent distribution is also derived using results from the theory of constrained optimization in infinite–dimensional normed linear spaces. Several auxiliary results of independent interest pertaining to certain properties of convex coercive functions are also presented.
unknown title
"... Maximum entropy models with inequality constraints † A case study on text categorization ..."
Abstract
 Add to MetaCart
Maximum entropy models with inequality constraints † A case study on text categorization