Results 1  10
of
28
A Maximum Entropy approach to Natural Language Processing
 COMPUTATIONAL LINGUISTICS
, 1996
"... The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we des ..."
Abstract

Cited by 1342 (5 self)
 Add to MetaCart
The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we describe a method for statistical modeling based on maximum entropy. We present a maximumlikelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.
Approximating discrete probability distributions with dependence trees
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1968
"... A method is presented to approximate optimally an ndimensional discrete probability distribution by a product of secondorder distributions, or the distribution of the firstorder tree dependence. The problem is to find an optimum set of n1 first order dependence relationship among the n variables ..."
Abstract

Cited by 878 (0 self)
 Add to MetaCart
A method is presented to approximate optimally an ndimensional discrete probability distribution by a product of secondorder distributions, or the distribution of the firstorder tree dependence. The problem is to find an optimum set of n1 first order dependence relationship among the n variables. It is shown that the procedure derived in this paper yields an approximation of a minimum difference in information. It is further shown that when this procedure is applied to empirical observations from an unknown distribution of tree dependence, the procedure is the maximumlikelihood estimate of the distribution.
Inducing Features of Random Fields
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1997
"... We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the ..."
Abstract

Cited by 666 (14 self)
 Add to MetaCart
(Show Context)
We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the KullbackLeibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are nonMarkovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classifica...
Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals
 J. Comput. Biol
, 2004
"... ..."
(Show Context)
An algebra for probabilistic databases
"... An algebra is presented for a simple probabilistic data model that may be regarded as an extension of the standard relational model. The probabilistic algebra is developed in such a way that (restricted to αacyclic database schemes) the relational algebra is a homomorphic image of it. Strictly prob ..."
Abstract

Cited by 148 (1 self)
 Add to MetaCart
An algebra is presented for a simple probabilistic data model that may be regarded as an extension of the standard relational model. The probabilistic algebra is developed in such a way that (restricted to αacyclic database schemes) the relational algebra is a homomorphic image of it. Strictly probabilistic results are emphasized. Variations on the basic probabilistic data model are discussed. The algebra is used to explicate a commonly used statistical smoothing procedure and is shown to be potentially very useful for decision support with uncertain information.
A deterministic strongly polynomial algorithm for matrix scaling and approximate permanents
"... ..."
A decision theoretic framework for approximating concepts
 International Journal of Manmachine Studies
, 1992
"... This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by ..."
Abstract

Cited by 44 (22 self)
 Add to MetaCart
(Show Context)
This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by the αcut in the fuzzy set theory is obtained. On the other hand, if a given concept is approximated by two sets, we can derive both the algebraic and probabilistic rough set approximations. Moreover, based on the well known principle of maximum (minimum) entropy, we give a useful interpretation of fuzzy intersection and union. Our results enhance the understanding and broaden the applications of both fuzzy and rough sets. 1.
Maximum Entropy Modeling with Clausal Constraints
 In Proceedings of the 7th International Workshop on Inductive Logic Programming
, 1997
"... We present the learning system Maccent which addresses the novel task of stochastic MAximum ENTropy modeling with Clausal Constraints. Maximum Entropy method is a Bayesian method based on the principle that the target stochastic model should be as uniform as possible, subject to known constraints. ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
We present the learning system Maccent which addresses the novel task of stochastic MAximum ENTropy modeling with Clausal Constraints. Maximum Entropy method is a Bayesian method based on the principle that the target stochastic model should be as uniform as possible, subject to known constraints. Maccent incorporates clausal constraints that are based on the evaluation of Prolog clauses in examples represented as Prolog programs. We build on an existing maximumlikelihood approach to maximum entropy modeling, which we upgrade along two dimensions: (1) Maccent can handle larger search spaces, due to a partial ordering defined on the space of clausal constraints, and (2) uses a richer firstorder logic format. In comparison with other inductive logic programming systems, Maccent seems to be the first that explicitly constructs a conditional probability distribution p(CjI) based on an empirical distribution ~ p(CjI) (where p(CjI) (~p(CjI)) gives the induced (observed) probability of ...
Convexity, Maximum Likelihood and All That
, 1996
"... This note is meant as a gentle but comprehensive introduction to the expectationmaximization (EM) and improved iterative scaling (IIS) algorithms, two popular techniques in maximum likelihood estimation. The focus in this tutorial is on the foundation common to the two algorithms: convex functions a ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
This note is meant as a gentle but comprehensive introduction to the expectationmaximization (EM) and improved iterative scaling (IIS) algorithms, two popular techniques in maximum likelihood estimation. The focus in this tutorial is on the foundation common to the two algorithms: convex functions and their convenient properties. Where examples are called for, we draw from applications in human language technology. 1 Introduction The task is to characterize the behavior of a real or imaginary stochastic process. By "stochastic process," we mean something which generates a sequence of observable output values. These values can be viewed as a discrete time series. We denote a single observation by y, a random variable which takes on values in some alphabet Y. The modelling problem is to come up with an accurate (in a sense made precise later) model p(y) of the process. If the identity of y is influenced by some conditioning information x 2 X , then we might seek instead a conditional m...
Multisource Data Classification with Dependence Trees
 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
, 2002
"... In order to apply a statistical approach to the classification of multisource remotesensing data, one of the main problems to face lies in the estimation of probability distribution functions. This problem arises out of the difficulty of defining a common statistical model for such heterogeneous da ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
In order to apply a statistical approach to the classification of multisource remotesensing data, one of the main problems to face lies in the estimation of probability distribution functions. This problem arises out of the difficulty of defining a common statistical model for such heterogeneous data. A possible solution is to adopt nonparametric approaches, which rely on the availability of training samples without any assumption about the related statistical distributions. The purpose of this paper is to investigate the suitability of the concept of dependence trees for the integration of multisource information through estimation of probability distributions. First, this concept, introduced by Chow and Liu, is used to provide an approximation of a probability distribution defined in andimensional space by a product of 1 probability distributions defined in twodimensional (2D) spaces; this approximation corresponds, in terms of graph theoretical interpretation, to a tree of dependence. For each land cover class, a dependence tree is generated by minimizing an appropriate closeness measure. Then, a nonparametric estimation of the secondorder probability distributions is carried out through the Parzen window approach, based on the implementation of 2D Gaussian kernels. In this way, it is possible to reduce the complexity of the estimation, while capturing a significant part of the interdependence among variables. A comparison with other multisource data fusion methods, namely, the multilayer perceptron (MLP) method, thenearest neighbor (NN) method, and a Bayesian hierarchical classifier (BHC), is made. Experimental results obtained on multisensor [airborne thematic mapper (ATM) and synthetic aperture radar (SAR)] and multisource (experimental synthetic aperture radar (ESAR) and a textural feature) data sets show that the proposed fusion method based on dependence trees is able to provide a classification accuracy similar to those of the other methods considered, but with the advantage of a reduced computational load.