Results 1 - 10
of
57
Bayesian Interpolation
- Neural Computation
, 1991
"... Although Bayesian analysis has been in use since Laplace, the Bayesian method of model--comparison has only recently been developed in depth. In this paper, the Bayesian approach to regularisation and model--comparison is demonstrated by studying the inference problem of interpolating noisy data. T ..."
Abstract
-
Cited by 417 (17 self)
- Add to MetaCart
Although Bayesian analysis has been in use since Laplace, the Bayesian method of model--comparison has only recently been developed in depth. In this paper, the Bayesian approach to regularisation and model--comparison is demonstrated by studying the inference problem of interpolating noisy data. The concepts and methods described are quite general and can be applied to many other problems. Regularising constants are set by examining their posterior probability distribution. Alternative regularisers (priors) and alternative basis sets are objectively compared by evaluating the evidence for them. `Occam's razor' is automatically embodied by this framework. The way in which Bayes infers the values of regularising constants and noise levels has an elegant interpretation in terms of the effective number of parameters determined by the data set. This framework is due to Gull and Skilling. 1 Data modelling and Occam's razor In science, a central task is to develop and compare models to a...
An Information-Theoretic Approach to Traffic Matrix Estimation
- In Proc. ACM SIGCOMM
, 2003
"... Traffic matrices are required inputs for many IP network management ..."
Abstract
-
Cited by 97 (12 self)
- Add to MetaCart
Traffic matrices are required inputs for many IP network management
A Hierarchical Dirichlet Language Model
- Natural Language Engineering
, 1994
"... We discuss a hierarchical probabilistic model whose predictions are similar to those of the popular language modelling procedure known as `smoothing'. A number of interesting differences from smoothing emerge. The insights gained from a probabilistic view of this problem point towards new directions ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
We discuss a hierarchical probabilistic model whose predictions are similar to those of the popular language modelling procedure known as `smoothing'. A number of interesting differences from smoothing emerge. The insights gained from a probabilistic view of this problem point towards new directions for language modelling. The ideas of this paper are also applicable to other problems such as the modelling of triphomes in speech, and DNA and protein sequences in molecular biology. The new algorithm is compared with smoothing on a two million word corpus. The methods prove to be about equally accurate, with the hierarchical model using fewer computational resources. Contents 1 Introduction 2 1.1 The bigram language model with smoothing 2 1.2 Any rational predictive procedure can be made Bayesian 3 2 An explicit model using Dirichlet priors 4 2.1 The inferences we will make 4 2.2 The likelihood function 5 2.3 What prior? 5 2.4 A convenient family of priors: Dirichlet distributions 5 2.5 ...
From Laplace To Supernova Sn 1987a: Bayesian Inference In Astrophysics
, 1990
"... . The Bayesian approach to probability theory is presented as an alternative to the currently used long-run relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
. The Bayesian approach to probability theory is presented as an alternative to the currently used long-run relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions to well-posed statistical problems, and is historically the original approach to statistics. The reasons for earlier rejection of Bayesian methods are discussed, and it is noted that the work of Cox, Jaynes, and others answers earlier objections, giving Bayesian inference a firm logical and mathematical foundation as the correct mathematical language for quantifying uncertainty. The Bayesian approaches to parameter estimation and model comparison are outlined and illustrated by application to a simple problem based on the gaussian distribution. As further illustrations of the Bayesian paradigm, Bayesian solutions to two interesting astrophysical problems are outlined: the measurement of wea...
On the use of evidence in neural networks
- In Advances in Neural Information Processing Systems
, 1992
"... The Bayesian “evidence ” approximation, which is closely related to generalized maximum likelihood, has recently been employed to determine the noise and weight-penalty terms for training neural nets. This paper shows that it is far simpler to perform the exact calculation than it is to set up the e ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
The Bayesian “evidence ” approximation, which is closely related to generalized maximum likelihood, has recently been employed to determine the noise and weight-penalty terms for training neural nets. This paper shows that it is far simpler to perform the exact calculation than it is to set up the evidence approximation. Moreover, unlike that approximation, the exact result does not have to be re-calculated for every new data set. Nor does it require the running of complex numerical computer code (the exact result is closed form). In addition, it turns out that for neural nets, the evidence procedure’s MAP estimate is in toto approximation error. Another advantage of the exact analysis is that it does not lead to incorrect intuition, like the claim that one can “evaluate different priors in light of the data”. This paper ends by discussing sufficiency conditions for the evidence approximation to hold, along with the implications of those conditions. Although couched in terms of neural nets, the analysis of this paper holds for any Bayesian interpolation problem.
Exploiting the generic viewpoint assumption
- IJCV
, 1996
"... The ¨generic viewpointässumption states that an observer is not in a special position relative to the scene. It is commonly used to disqualify scene interpretations that assume special viewpoints, following a binary decision that the viewpoint was either generic or accidental. In this paper, we appl ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
The ¨generic viewpointässumption states that an observer is not in a special position relative to the scene. It is commonly used to disqualify scene interpretations that assume special viewpoints, following a binary decision that the viewpoint was either generic or accidental. In this paper, we apply Bayesian statistics to quantify the probability of a view, and so derive a useful tool to estimate scene parameters. This approach may increase the scope and accuracy of scene estimates. It applies to a range of vision problems. We show shape from shading examples, where we rank shapes or reflectance functions in cases where these are otherwise unknown. The rankings agree with the perceived values.
A New Entropy Measure Based on the Wavelet Transform . . .
, 1998
"... We present in this brief a new way to measure the information in a signal, based on noise modeling. We show that the use of such an entropy-related measure leads to good results for signal restoration. I. INTRODUCTION The term "entropy" is due to Clausius (1865), and the concept of entropy was int ..."
Abstract
-
Cited by 16 (14 self)
- Add to MetaCart
We present in this brief a new way to measure the information in a signal, based on noise modeling. We show that the use of such an entropy-related measure leads to good results for signal restoration. I. INTRODUCTION The term "entropy" is due to Clausius (1865), and the concept of entropy was introduced by Boltzmann into statistical mechanics, in order to measure the number of microscopic ways that a given macroscopic state can be realized. Shannon [11] founded the mathematical theory of communication when he suggested that the information gained in a measurement depends on the number of possible outcomes out of which one is realized. Shannon also suggested that the entropy can be used for maximization of the bits transferred under a quality constraint. Jaynes [7] proposed to use the entropy measure for radio interometric image deconvolution, in order to select in a set of possible solutions which contains the minimum of information, or following his entropy definition, that which h...
Bayesian Decision Theory, the Maximum Local Mass Estimate, and Color Constancy
- IN PROCEEDINGS: FIFTH INTERNATIONAL CONFERENCE ON COMPUTER VISION, PP 210-217, (IEEE COMPUTER
, 1995
"... Vision algorithms are often developed in a Bayesian framework. Two estimators are commonly used: maximum a posteriori (MAP), and minimum mean squared error (MMSE). We argue that neither is appropriate for perception problems. The MAP estimator makes insufficient use of structure in the posterior pro ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Vision algorithms are often developed in a Bayesian framework. Two estimators are commonly used: maximum a posteriori (MAP), and minimum mean squared error (MMSE). We argue that neither is appropriate for perception problems. The MAP estimator makes insufficient use of structure in the posterior probability. The squared error penalty of the MMSE estimator does not reflect typical penalties. We describe a new
Can the Maximum Entropy Principle Be Explained as a Consistency Requirement?
, 1997
"... The principle of maximumentropy is a general method to assign values to probability distributions on the basis of partial information. This principle, introduced by Jaynes in 1957, forms an extension of the classical principle of insufficient reason. It has been further generalized, both in mathe ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
The principle of maximumentropy is a general method to assign values to probability distributions on the basis of partial information. This principle, introduced by Jaynes in 1957, forms an extension of the classical principle of insufficient reason. It has been further generalized, both in mathematical formulation and in intended scope, into the principle of maximum relative entropy or of minimum information. It has been claimed that these principles are singled out as unique methods of statistical inference that agree with certain compelling consistency requirements. This paper reviews these consistency arguments and the surrounding controversy. It is shown that the uniqueness proofs are flawed, or rest on unreasonably strong assumptions. A more general class of 1 inference rules, maximizing the so-called R'enyi entropies, is exhibited which also fulfill the reasonable part of the consistency assumptions. 1 Introduction In any application of probability theory to the pro...

