Results 1  10
of
16
Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction
, 1998
"... We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum... ..."
Abstract

Cited by 66 (0 self)
 Add to MetaCart
We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum...
Similaritybased approaches to natural language processing
, 1997
"... Statistical methods for automatically extracting information about associations between words or documents from large collections of text have the potential to have considerable impact in a number of areas, such as information retrieval and naturallanguagebased user interfaces. However, even huge ..."
Abstract

Cited by 42 (3 self)
 Add to MetaCart
Statistical methods for automatically extracting information about associations between words or documents from large collections of text have the potential to have considerable impact in a number of areas, such as information retrieval and naturallanguagebased user interfaces. However, even huge bodies of text yield highly unreliable estimates of the probability of relatively common events, and, in fact, perfectly reasonable events may not occur in the training data at all. This is known as the sparse data problem. Traditional approaches to the sparse data problem use crude approximations. We propose a different solution: if we are able to organize the data into classes of similar events, then, if information about an event is lacking, we can estimate its behavior from information about similar events. This thesis presents two such similaritybased approaches, where, in general, we measure similarity by the KullbackLeibler divergence, an informationtheoretic quantity. Our first approach is to build soft, hierarchical clusters: soft, because each event belongs to each cluster with some probability; hierarchical, because cluster centroids are iteratively split to model finer distinctions. Our clustering method, which uses the technique of deterministic annealing,
Entropic Priors for Discrete Probabilistic Networks and for Mixtures of Gaussian Models
 in Bayesian Inference and Maximum Entropy Methods
, 2002
"... The ongoing unprecedented exponential explosion of available computing power, has radically transformed the methods of statistical inference. What used to be a small minority of statisticians advocating for the use of priors and a strict adherence to bayes theorem, it is now becoming the norm across ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
The ongoing unprecedented exponential explosion of available computing power, has radically transformed the methods of statistical inference. What used to be a small minority of statisticians advocating for the use of priors and a strict adherence to bayes theorem, it is now becoming the norm across disciplines. The evolutionary direction is now clear. The trend is towards more realistic, flexible and complex likelihoods characterized by an ever increasing number of parameters. This makes the old question of: What should the prior be? to acquire a new central importance in the modern bayesian theory of inference. Entropic priors provide one answer to the problem of prior selection. The general definition of an entropic prior has existed since 1988 [1], but it was not until 1998 [2] that it was found that they provide a new notion of complete ignorance. This paper reintroduces the family of entropic priors as minimizers of mutual information between the data and the parameters, as in [2], but with a small change and a correction. The general formalism is then applied to two large classes of models: Discrete probabilistic networks and univariate finite mixtures of gaussians. It is also shown how to perform inference by e#ciently sampling the corresponding posterior distributions.
Entropic Priors
, 1991
"... : Entropic priors assign probabilities by combining in an inseparable way the information theoretic concept of entropy with the underlying Riemannian geometry of the hypothesis space. These priors form the cornerstone of a developing new and more objective Bayesian theory of inference. Contents 1 I ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
: Entropic priors assign probabilities by combining in an inseparable way the information theoretic concept of entropy with the underlying Riemannian geometry of the hypothesis space. These priors form the cornerstone of a developing new and more objective Bayesian theory of inference. Contents 1 Introduction 2 2 Background: Entropy, Geometry, and Priors 3 2.1 The Kullback Number : : : : : : : : : : : : : : : : : : : : : : : : 3 2.2 Fisher Information Metric : : : : : : : : : : : : : : : : : : : : : : 4 2.3 Prior Information is More Data : : : : : : : : : : : : : : : : : : : 5 3 Applications 6 3.1 Empirical Bayes : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 3.2 Time Series : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 3.2.1 The Signal Manifold : : : : : : : : : : : : : : : : : : : : : 9 3.2.2 Separating Frequencies from Amplitudes : : : : : : : : : : 11 3.3 Image Reconstruction : : : : : : : : : : : : : : : : : : : : : : : : 13 3.3.1 Digital Imaging : : :...
A Full Bayesian Approach to Curve and Surface Reconstruction
, 1999
"... When interpolating incomplete data, one can choose a parametric model, or opt for a more general approach and use a nonparametric model which allows a very large class of interpolants. A popular nonparametric model for interpolating various types of data is based on regularization, which looks for ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
When interpolating incomplete data, one can choose a parametric model, or opt for a more general approach and use a nonparametric model which allows a very large class of interpolants. A popular nonparametric model for interpolating various types of data is based on regularization, which looks for an interpolant that is both close to the data and also "smooth" in some sense. Formally, this interpolant is obtained by minimizing an error functional which is the weighted sum of a "fidelity term" and a "smoothness term".
On the Bayesian `Occam's factors' argument for Occam's razor', in Computational learning theory and natural learning systems III, T. Petsche et al
 Computational Learning Theory and Natural Learning Systems, Volume III: Selecting good models
, 1995
"... Abstract: This paper discusses some of the problematic aspects of the Bayesian firstprinciples “proof ” of Occam’s razor which involves Occam factors. Although it is true that the posterior for a model is reduced due to Occam factors if that model is capable of expressing many functions, the phenom ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Abstract: This paper discusses some of the problematic aspects of the Bayesian firstprinciples “proof ” of Occam’s razor which involves Occam factors. Although it is true that the posterior for a model is reduced due to Occam factors if that model is capable of expressing many functions, the phenomenon need not have anything to do with Occam’s razor. This paper shows this by i) performing reductio ad absurdum on the argument that the Occam factors effect implies Occam’s razor; ii) presenting an alternative Bayesian approach which explicitly does not result in Occam’s razor; and finally iii) disentangling the underlying problem with viewing the Occam factors argument as a proof or “automatic embodiment ” of Occam’s razor. 2
Learning Probabilistic Models: An Expected Utility Maximization Approach
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We consider the problem of learning a probabilistic model from the viewpoint of an expected utility maximizing decision maker/investor who would use the model to make decisions (bets), which result in well defined payoffs. In our new approach, we seek good outofsample model performance by consi ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We consider the problem of learning a probabilistic model from the viewpoint of an expected utility maximizing decision maker/investor who would use the model to make decisions (bets), which result in well defined payoffs. In our new approach, we seek good outofsample model performance by considering a oneparameter family of Pareto optimal models, which we define in terms of consistency with the training data and consistency with a prior (benchmark) model. We measure the former by means of the largesample distribution of a vector of sampleaveraged features, and the latter by means of a generalized relative entropy. We express each Pareto optimal model as the solution of a strictly convex optimization problem and its strictly concave (and tractable) dual. Each dual
A Rigorous Investigation Of "Evidence" And "Occam Factors" In Bayesian Reasoning
 The Sante Fe Institute, 1660 Old Pecos Trail, Suite A, Sante Fe, NM
, 1992
"... : This paper first reviews the reasoning behind the Bayesian "evidence" procedure for setting parameters in the probability distributions involved in inductive inference. This paper then proves that the evidence procedure is incorrect. More precisely, this paper proves that the assumptions ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
: This paper first reviews the reasoning behind the Bayesian "evidence" procedure for setting parameters in the probability distributions involved in inductive inference. This paper then proves that the evidence procedure is incorrect. More precisely, this paper proves that the assumptions going into the evidence procedure do not, as claimed, "let the data determine the distributions ". Instead, those assumptions simply amount to an implicit replacement of the original distributions, containing free parameters, with new distributions, none of whose parameters are free. For example, as used by MacKay [1991] in the context of neural nets, the evidence procedure is a means for using the training set to determine the free parameter a in the distribution P({w i }) exp(aS N i=1 w i 2 ), where the Nw i are the N weights in the network. As this paper proves, in actuality the assumptions going into MacKay's use of the evidence procedure do not result in a distribution P({w i }) exp(aS ...
The Adaptive Resolution Concept in FormFree Distribution Estimation
 Proceedings of the Workshop on Physics and Computer Science
, 1999
"... The ubiquitous illposed inverse problem of estimating a formfree distribution and the respective reliability given a set of noisy or incomplete data is solved with Bayesian probability theory by exploiting prior information. The method applies to problems where we do not have enough information to ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The ubiquitous illposed inverse problem of estimating a formfree distribution and the respective reliability given a set of noisy or incomplete data is solved with Bayesian probability theory by exploiting prior information. The method applies to problems where we do not have enough information to be able to characterize the distribution by a specific type of functional model or do not have enough confidence in functions with a few specific parameters. The price for the flexibility to allow for any conceivable solution is a large number of variables.
Entropic Priors
, 1991
"... Abstract: Entropic priors assign probabilities by combining in an inseparable way the information theoretic concept of entropy with the underlying Riemannian geometry of the hypothesis space. These priors form the cornerstone of a developing new and more objective Bayesian theory of inference. ..."
Abstract
 Add to MetaCart
Abstract: Entropic priors assign probabilities by combining in an inseparable way the information theoretic concept of entropy with the underlying Riemannian geometry of the hypothesis space. These priors form the cornerstone of a developing new and more objective Bayesian theory of inference.