Results 1 - 10
of
29
A Maximum Entropy Approach to Adaptive Statistical Language Modeling
- Computer, Speech and Language
, 1996
"... An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's histor ..."
Abstract
-
Cited by 201 (11 self)
- Add to MetaCart
An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's history, we propose and use trigger pairs as the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse. Next, statistical evidence from multiple sources must be combined. Traditionally, linear interpolation and its variants have been used, but these are shown here to be seriously deficient. Instead, we apply the principle of Maximum Entropy (ME). Each information source gives rise to a set of constraints, to be imposed on the combined estimate. The intersection of these constraints is the set of probability functions which are consistent with all the information sources. The function with the highest entropy within that set is the ME solution...
Maximum Entropy Models for Natural Language Ambiguity Resolution
, 1998
"... The best aspect of a research environment, in my opinion, is the abundance of bright people with whom you argue, discuss, and nurture your ideas. I thank all of the people at Penn and elsewhere who have given me the feedback that has helped me to separate the good ideas from the bad ideas. I hope th ..."
Abstract
-
Cited by 167 (1 self)
- Add to MetaCart
The best aspect of a research environment, in my opinion, is the abundance of bright people with whom you argue, discuss, and nurture your ideas. I thank all of the people at Penn and elsewhere who have given me the feedback that has helped me to separate the good ideas from the bad ideas. I hope that Ihave kept the good ideas in this thesis, and left the bad ideas out! Iwould like toacknowledge the following people for their contribution to my education: I thank my advisor Mitch Marcus, who gave me the intellectual freedom to pursue what I believed to be the best way to approach natural language processing, and also gave me direction when necessary. I also thank Mitch for many fascinating conversations, both personal and professional, over the last four years at Penn. I thank all of my thesis committee members: John La erty from Carnegie Mellon University, Aravind Joshi, Lyle Ungar, and Mark Liberman, for their extremely valuable suggestions and comments about my thesis research. I thank Mike Collins, Jason Eisner, and Dan Melamed, with whom I've had many stimulating and impromptu discussions in the LINC lab. Iowe them much gratitude for their valuable feedback onnumerous rough drafts of papers and thesis chapters.
A maximum entropy approach to named entity recognition
, 1999
"... iii Acknowledgments This work would not have been possible without the support of many people inside and outside of New York University. My advisor, Professor Ralph Grishman, has provided me with a great deal of useful advice, including suggesting the problem of named entity recognition to me as a p ..."
Abstract
-
Cited by 115 (3 self)
- Add to MetaCart
iii Acknowledgments This work would not have been possible without the support of many people inside and outside of New York University. My advisor, Professor Ralph Grishman, has provided me with a great deal of useful advice, including suggesting the problem of named entity recognition to me as a promising application for maximum entropy modeling. More than that, he has helped me work through a great deal of literature in statistical computational linguistics and he generously supplied me with the necessary time, equipment, and resources of his research staff which enabled me to put together the MENE system. I would also like to thank the other members of NYU's Proteus project for their assistance. In particular, John Sterling helped me to develop the idea of integrating the Proteus parser with the MENE system in the month before the MUC-7 evaluation. He and Eugene Agichtein put in extremely long hours leading up to the evaluation and helped to make it a success. The work on porting the MENE system to Japanese would not have been possible without the assistance of my friend and colleague, Satoshi Sekine. In addition, I would like to thank him for helping me out as the only English-speaking participant in the IREX evaluation. For his assistance with my upcoming trip to Japan and for all his work on translating IREX instructions for my benefit, I am very grateful.
A Simple Introduction to Maximum Entropy Models for Natural Language Processing
"... Many problems in natural language processing can be viewed as linguistic classification problems, in which linguistic contexts are used to predict linguistic classes. Maximum entropy models offer a clean way to combine diverse pieces of contextual evidence in order to estimate the probability of a c ..."
Abstract
-
Cited by 63 (0 self)
- Add to MetaCart
Many problems in natural language processing can be viewed as linguistic classification problems, in which linguistic contexts are used to predict linguistic classes. Maximum entropy models offer a clean way to combine diverse pieces of contextual evidence in order to estimate the probability of a certain linguistic class occurring with a certain linguistic context. This report demonstrates the use of a particular maximum entropy model on an example problem, and then proves some relevant mathematical facts about the model in a simple and accessible manner. This report also describes an existing procedure called Generalized Iterative Scaling, which estimates the parameters of this particular model. The goal of this report is to provide enough detail to re-implement the maximum entropy models described in [Ratnaparkhi, 1996, Reynar and Ratnaparkhi, 1997, Ratnaparkhi, 1997] and also to provide a simple explanation of the maximum entropy formalism. 1 Introduction Many problems in natural...
Using Unlabeled Data to Improve Text Classification
, 2001
"... One key difficulty with text classification learning algorithms is that they require many hand-labeled examples to learn accurately. This dissertation demonstrates that supervised learning algorithms that use a small number of labeled examples and many inexpensive unlabeled examples can create high- ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
One key difficulty with text classification learning algorithms is that they require many hand-labeled examples to learn accurately. This dissertation demonstrates that supervised learning algorithms that use a small number of labeled examples and many inexpensive unlabeled examples can create high-accuracy text classifiers. By assuming that documents are created by a parametric generative model, Expectation-Maximization (EM) finds local maximum a posteriori models and classifiers from all the data -- labeled and unlabeled. These generative models do not capture all the intricacies of text; however on some domains this technique substantially improves classification accuracy, especially when labeled data are sparse. Two problems arise from this basic approach. First, unlabeled data can hurt performance in domains where the generative modeling assumptions are too strongly violated. In this case the assumptions can be made more representative in two ways: by modeling sub-topic class structure, and by modeling super-topic hierarchical class relationships. By doing so, model probability and classification accuracy come into correspondence, allowing unlabeled data to improve classification performance. The second problem is that even with a representative model, the improvements given by unlabeled data do not sufficiently compensate for a paucity of labeled data. Here, limited labeled data provide EM initializations that lead to low-probability models. Performance can be significantly improved by using active learning to select high-quality initializations, and by using alternatives to EM that avoid low-probability local maxima.
On the toric algebra of graphical models
, 2006
"... We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. For decomposable graphical models these conditions are equivalent to a set of con ..."
Abstract
-
Cited by 28 (5 self)
- Add to MetaCart
We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. For decomposable graphical models these conditions are equivalent to a set of conditional independence statements similar to the Hammersley–Clifford theorem; however, we show that for nondecomposable graphical models they are not. We also show that nondecomposable models can have nonrational maximum likelihood estimates. These results are used to give several novel characterizations of decomposable graphical models.
Kullback-Leibler approximation of spectral density functions
- IEEE Trans. Inform. Theory
, 2003
"... Abstract—We introduce a Kullback–Leibler-type distance between spectral density functions of stationary stochastic processes and solve the problem of optimal approximation of a given spectral density 9 by one that is consistent with prescribed second-order statistics. In general, such statistics are ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
Abstract—We introduce a Kullback–Leibler-type distance between spectral density functions of stationary stochastic processes and solve the problem of optimal approximation of a given spectral density 9 by one that is consistent with prescribed second-order statistics. In general, such statistics are expressed as the state covariance of a linear filter driven by a stochastic process whose spectral density is sought. In this context, we show i) that there is a unique spectral density 8 which minimizes this Kullback–Leibler distance, ii) that this optimal approximate is of the form 9 where the “correction term ” is a rational spectral density function, and iii) that the coefficients of can be obtained numerically by solving a suitable convex optimization problem. In the special case where 9=1, the convex functional becomes quadratic and the solution is then specified by linear equations. Index Terms—Approximation of power spectra, cross-entropy minimization, Kullback–Leibler distance, mutual information, optimization, spectral estimation. I.
A convex optimization approach to generalized moment problems, Control and Modeling of Complex Systems
- Cybernetics in the 21st Century: Festschrift in Honor of Hidenori Kimura on the Occasion of his 60th
, 2003
"... ABSTRACT In this paper we present a universal solution to the generalized moment problem, with a nonclassical complexity constraint. We show that this solution can be obtained by minimizing a strictly convex nonlinear functional. This optimization problem is derived in two different ways. We first d ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
ABSTRACT In this paper we present a universal solution to the generalized moment problem, with a nonclassical complexity constraint. We show that this solution can be obtained by minimizing a strictly convex nonlinear functional. This optimization problem is derived in two different ways. We first derive this intrinsically, in a geometric way, by path integration of a one-form which defines the generalized moment problem. It is observed that this one-form is closed and defined on a convex set, and thus exact with, perhaps surprisingly, a strictly convex primitive function. We also derive this convex functional as the dual problem of a problem to maximize a cross entropy functional. In particular, these approaches give a constructive parameterization of all solutions to the Nevanlinna-Pick interpolation problem, with possible higher-order interpolation at certain points in the complex plane, with a degree constraint as well as all soutions to the rational covariance extension problem- two areas which have been advanced by the work of Hidenori Kimura. Illustrations of these results in system identifiaction and probablity are also mentioned. Key words. Moment problems, convex optimization, Nevanlinna-Pick interpolation, covariance extension, systems identification, Kullback-Leibler distance. 1
A Logically Sound Method for Uncertain Reasoning With Quantified Conditionals
- IN PROCEEDINGS ECSQARU / FAPR-97, LNAI 1244
, 1997
"... Conditionals play a central part in knowledge representation and reasoning. Describing certain relationships between antecedents and consequences by "if--then--sentences" their range of expressiveness includes commonsense knowledge as well as scientific statements. In this paper, we present the prin ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Conditionals play a central part in knowledge representation and reasoning. Describing certain relationships between antecedents and consequences by "if--then--sentences" their range of expressiveness includes commonsense knowledge as well as scientific statements. In this paper, we present the principles of maximum entropy resp. of minimum cross-entropy (ME-principles) as a logically sound and practicable method for representing and reasoning with quantified conditionals. First the meaning of these principles is made clear by sketching a characterization from a completely conditional-logical point of view. Then we apply the techniques presented to derive ME--deduction schemes and illustrate them by examples in the second part of this paper.

