Results 1  10
of
15
A Maximum Entropy approach to Natural Language Processing
 COMPUTATIONAL LINGUISTICS
, 1996
"... The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we des ..."
Abstract

Cited by 1082 (5 self)
 Add to MetaCart
The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we describe a method for statistical modeling based on maximum entropy. We present a maximumlikelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.
Approximating discrete probability distributions with dependence trees
 IEEE Transactions on Information Theory
, 1968
"... AbsfracfA method is presented to approximate optimally an ndimensional discrete probability distribution by a product of secondorder distributions, or the distribution of the firstorder tree dependence.The problem is to find an optimum set of n 1 first order dependence relationship among the n ..."
Abstract

Cited by 645 (0 self)
 Add to MetaCart
AbsfracfA method is presented to approximate optimally an ndimensional discrete probability distribution by a product of secondorder distributions, or the distribution of the firstorder tree dependence.The problem is to find an optimum set of n 1 first order dependence relationship among the n variables. It is shown that the procedure derived in this paper yields an approximation of a minimum difference in information. It is further shown that when this procedure is applied to empirical observations from an unknown distribution of tree dependence, the procedure is the maximumlikelihood estimate of the distribution.
Inducing Features of Random Fields
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1997
"... We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the ..."
Abstract

Cited by 554 (14 self)
 Add to MetaCart
We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the KullbackLeibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are nonMarkovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classifica...
An algebra for probabilistic databases
"... An algebra is presented for a simple probabilistic data model that may be regarded as an extension of the standard relational model. The probabilistic algebra is developed in such a way that (restricted to αacyclic database schemes) the relational algebra is a homomorphic image of it. Strictly prob ..."
Abstract

Cited by 125 (1 self)
 Add to MetaCart
An algebra is presented for a simple probabilistic data model that may be regarded as an extension of the standard relational model. The probabilistic algebra is developed in such a way that (restricted to αacyclic database schemes) the relational algebra is a homomorphic image of it. Strictly probabilistic results are emphasized. Variations on the basic probabilistic data model are discussed. The algebra is used to explicate a commonly used statistical smoothing procedure and is shown to be potentially very useful for decision support with uncertain information.
A Deterministic Strongly Polynomial Algorithm for Matrix Scaling and Approximate Permanents
"... We present a deterministic strongly polynomial algorithm that computes the permanent of a nonnegative n x n matrix to within a multiplicative factor of e^n. To this end ..."
Abstract

Cited by 63 (8 self)
 Add to MetaCart
We present a deterministic strongly polynomial algorithm that computes the permanent of a nonnegative n x n matrix to within a multiplicative factor of e^n. To this end
Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals
 J. Comput. Biol
, 2004
"... ..."
Maximum Entropy Modeling with Clausal Constraints
 In Proceedings of the 7th International Workshop on Inductive Logic Programming
, 1997
"... We present the learning system Maccent which addresses the novel task of stochastic MAximum ENTropy modeling with Clausal Constraints. Maximum Entropy method is a Bayesian method based on the principle that the target stochastic model should be as uniform as possible, subject to known constraints. ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
We present the learning system Maccent which addresses the novel task of stochastic MAximum ENTropy modeling with Clausal Constraints. Maximum Entropy method is a Bayesian method based on the principle that the target stochastic model should be as uniform as possible, subject to known constraints. Maccent incorporates clausal constraints that are based on the evaluation of Prolog clauses in examples represented as Prolog programs. We build on an existing maximumlikelihood approach to maximum entropy modeling, which we upgrade along two dimensions: (1) Maccent can handle larger search spaces, due to a partial ordering defined on the space of clausal constraints, and (2) uses a richer firstorder logic format. In comparison with other inductive logic programming systems, Maccent seems to be the first that explicitly constructs a conditional probability distribution p(CjI) based on an empirical distribution ~ p(CjI) (where p(CjI) (~p(CjI)) gives the induced (observed) probability of ...
A decision theoretic framework for approximating concepts
 International Journal of Manmachine Studies
, 1992
"... This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by ..."
Abstract

Cited by 36 (20 self)
 Add to MetaCart
This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by the αcut in the fuzzy set theory is obtained. On the other hand, if a given concept is approximated by two sets, we can derive both the algebraic and probabilistic rough set approximations. Moreover, based on the well known principle of maximum (minimum) entropy, we give a useful interpretation of fuzzy intersection and union. Our results enhance the understanding and broaden the applications of both fuzzy and rough sets. 1.
Convexity, Maximum Likelihood and All That
, 1996
"... This note is meant as a gentle but comprehensive introduction to the expectationmaximization (EM) and improved iterative scaling (IIS) algorithms, two popular techniques in maximum likelihood estimation. The focus in this tutorial is on the foundation common to the two algorithms: convex functions a ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This note is meant as a gentle but comprehensive introduction to the expectationmaximization (EM) and improved iterative scaling (IIS) algorithms, two popular techniques in maximum likelihood estimation. The focus in this tutorial is on the foundation common to the two algorithms: convex functions and their convenient properties. Where examples are called for, we draw from applications in human language technology. 1 Introduction The task is to characterize the behavior of a real or imaginary stochastic process. By "stochastic process," we mean something which generates a sequence of observable output values. These values can be viewed as a discrete time series. We denote a single observation by y, a random variable which takes on values in some alphabet Y. The modelling problem is to come up with an accurate (in a sense made precise later) model p(y) of the process. If the identity of y is influenced by some conditioning information x 2 X , then we might seek instead a conditional m...
A Bayesian Neural Network Model With Extensions
, 1993
"... This report deals with a Bayesian neural network in a classifier context. In our network model, the units represent stochastic events, and the state of the units are related to the probability of these events. The basic Bayesian model is a onelayer neural network, which calculates the posterior pro ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This report deals with a Bayesian neural network in a classifier context. In our network model, the units represent stochastic events, and the state of the units are related to the probability of these events. The basic Bayesian model is a onelayer neural network, which calculates the posterior probabilities of events, given some observed, independent events. The formulas underlying this network are examined, and generalized in order to make the network handle graded input, n:ary attributes, and continuous valued attributes. The onelayer model is then extended to a multilayer architecture, to handle dependencies between input attributes. A few variations of this multilayer Bayesian neural network are discussed. The final result is a fairly general multilayer Bayesian neural network, capable of handling discrete as well as continuous valued attributes. 1 Introduction The Bayesian neural network [15, 13] discussed here is originally a onelayer artificial neural network of the Hop...