Results 1  10
of
22
Inducing Features of Random Fields
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1997
"... We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the ..."
Abstract

Cited by 556 (13 self)
 Add to MetaCart
We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the KullbackLeibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are nonMarkovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classifica...
Maximum Entropy Models for Natural Language Ambiguity Resolution
, 1998
"... The best aspect of a research environment, in my opinion, is the abundance of bright people with whom you argue, discuss, and nurture your ideas. I thank all of the people at Penn and elsewhere who have given me the feedback that has helped me to separate the good ideas from the bad ideas. I hope th ..."
Abstract

Cited by 206 (1 self)
 Add to MetaCart
The best aspect of a research environment, in my opinion, is the abundance of bright people with whom you argue, discuss, and nurture your ideas. I thank all of the people at Penn and elsewhere who have given me the feedback that has helped me to separate the good ideas from the bad ideas. I hope that Ihave kept the good ideas in this thesis, and left the bad ideas out! Iwould like toacknowledge the following people for their contribution to my education: I thank my advisor Mitch Marcus, who gave me the intellectual freedom to pursue what I believed to be the best way to approach natural language processing, and also gave me direction when necessary. I also thank Mitch for many fascinating conversations, both personal and professional, over the last four years at Penn. I thank all of my thesis committee members: John La erty from Carnegie Mellon University, Aravind Joshi, Lyle Ungar, and Mark Liberman, for their extremely valuable suggestions and comments about my thesis research. I thank Mike Collins, Jason Eisner, and Dan Melamed, with whom I've had many stimulating and impromptu discussions in the LINC lab. Iowe them much gratitude for their valuable feedback onnumerous rough drafts of papers and thesis chapters.
Cluster Expansions And Iterative Scaling For Maximum Entropy Language Models
 Maximum Entropy and Bayesian Methods
, 1995
"... . The maximum entropy method has recently been successfully introduced to a variety of natural language applications. In each of these applications, however, the power of the maximum entropy method is achieved at the cost of a considerable increase in computational requirements. In this paper we pre ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
. The maximum entropy method has recently been successfully introduced to a variety of natural language applications. In each of these applications, however, the power of the maximum entropy method is achieved at the cost of a considerable increase in computational requirements. In this paper we present a technique, closely related to the classical cluster expansion from statistical mechanics, for reducing the computational demands necessary to calculate conditional maximum entropy language models. 1. Introduction In this paper we present a computational technique that can enable faster calculation of maximum entropy models. The starting point for our method is an algorithm [1] for constructing maximum entropy distributions that is an extension of the generalized iterative scaling algorithm of Darroch and Ratcliff [2,3]. The extended algorithm relaxes the assumption of [2,3] that the constraint functions sum to a constant, and results in a set of decoupled polynomial equations, one fo...
LANGUAGE MODEL ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION AND STATISTICAL MACHINE TRANSLATION
, 2004
"... Language modeling is critical and indispensable for many natural language applications such as automatic speech recognition and machine translation. Due to the complexity of natural language grammars, it is almost impossible to construct language models by a set of linguistic rules; therefore stati ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Language modeling is critical and indispensable for many natural language applications such as automatic speech recognition and machine translation. Due to the complexity of natural language grammars, it is almost impossible to construct language models by a set of linguistic rules; therefore statistical techniques have been dominant for language modeling over the last few decades. All statistical modeling techniques, in principle, work under some conditions: 1) a reasonable amount of training data is available and 2) the training data comes from the same population as the test data to which we want to apply our model. Based on observations from the training data, we build statistical models and therefore, the success of a statistical model is crucially dependent on the training data. In other words, if we don’t have enough data for training, or the training data is not matched with the test data, we are not able to build accurate statistical models. This thesis presents novel methods to cope with those problems in language modeling—language model adaptation.
Iterative proportional scaling via decomposable submodels for contingency tables
, 2006
"... We propose iterative proportional scaling (IPS) via decomposable submodels for maximizing likelihood function of a hierarchical model for contingency tables. In ordinary IPS the proportional scaling is performed by cycling through the members of the generating class of a hierarchical model. We propo ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We propose iterative proportional scaling (IPS) via decomposable submodels for maximizing likelihood function of a hierarchical model for contingency tables. In ordinary IPS the proportional scaling is performed by cycling through the members of the generating class of a hierarchical model. We propose to adjust more marginals at each step. This is accomplished by expressing the generating class as a union of decomposable submodels and cycling through the decomposable models. We prove convergence of our proposed procedure, if the amount of scaling is adjusted properly at each step. We also analyze the proposed algorithms around the maximum likelihood estimate (MLE) in detail. Faster convergence of our proposed procedure is illustrated by numerical examples. Keywords and phrases: decomposable model, hierarchical model, Iprojection, iterative proportional fitting, KullbackLeibler divergence. 1
AN ITERATIVE PROCEDURE FOR GENERAL PROBABILITY MEASURES TO OBTAIN IPROJECTIONS ONTO INTERSECTIONS OF CONVEX SETS
, 2006
"... The iterative proportional fitting procedure (IPFP) was introduced formally by Deming and Stephan in 1940. For bivariate densities, this procedure has been investigated by Kullback and Rüschendorf. It is well known that the IPFP is a sequence of successive Iprojections onto sets of probability meas ..."
Abstract
 Add to MetaCart
The iterative proportional fitting procedure (IPFP) was introduced formally by Deming and Stephan in 1940. For bivariate densities, this procedure has been investigated by Kullback and Rüschendorf. It is well known that the IPFP is a sequence of successive Iprojections onto sets of probability measures with fixed marginals. However, when finding the Iprojection onto the intersection of arbitrary closed, convex sets (e.g., marginal stochastic orders), a sequence of successive Iprojections onto these sets may not lead to the actual solution. Addressing this situation, we present a new iterative Iprojection algorithm. Under reasonable assumptions and using tools from Fenchel duality, convergence of this algorithm to the true solution is shown. The cases of infinite dimensional IPFP and marginal stochastic orders are worked out in this context. 1. Introduction. For two probability measures (PM) P and Q defined on an arbitrary measurable space (X,B), the Idivergence or the Kullback–
Member
, 2003
"... Various applications of information theoretical and combinatorial methods in data mining are presented. An axiomatization has been introduced for a family of entropies including both Shannon entropy and the Gini index as special cases. These entropies, and distances based on them, were then applied ..."
Abstract
 Add to MetaCart
Various applications of information theoretical and combinatorial methods in data mining are presented. An axiomatization has been introduced for a family of entropies including both Shannon entropy and the Gini index as special cases. These entropies, and distances based on them, were then applied to decision tree construction. It has been shown experimentally that trees using distances based on generalized entropies as splitting criteria are smaller than those constructed using other criteria without significant loss in accuracy. One of the major problems in association rule mining is the huge number of rules produced. This work contains contributions to two principal methods of addressing the problem: sorting rules based on some interestingness measure, and rule pruning. A new measure of rule interestingness is introduced generalizing three wellknown measures: chisquared, entropy gain and Gini gain, which moreover gives a whole family of intermediate measures with interesting properties. Also, iv a method of pruning association rules using the Maximum Entropy Principle has