Results 1  10
of
16
Prior Probabilities
 IEEE Transactions on Systems Science and Cybernetics
, 1968
"... e case of location and scale parameters, rate constants, and in Bernoulli trials with unknown probability of success. In realistic problems, both the transformation group analysis and the principle of maximum entropy are needed to determine the prior. The distributions thus found are uniquely determ ..."
Abstract

Cited by 165 (3 self)
 Add to MetaCart
e case of location and scale parameters, rate constants, and in Bernoulli trials with unknown probability of success. In realistic problems, both the transformation group analysis and the principle of maximum entropy are needed to determine the prior. The distributions thus found are uniquely determined by the prior information, independently of the choice of parameters. In a certain class of problems, therefore, the prior distributions may now be claimed to be fully as "objective" as the sampling distributions. I. Background of the problem Since the time of Laplace, applications of probability theory have been hampered by difficulties in the treatment of prior information. In realistic problems of decision or inference, we often have prior information which is highly relevant to the question being asked; to fail to take it into account is to commit the most obvious inconsistency of reasoning and may lead to absurd or dangerously misleading results. As an extreme examp
A WEAKLY INFORMATIVE DEFAULT PRIOR DISTRIBUTION FOR LOGISTIC AND OTHER REGRESSION MODELS
"... We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Studentt prior distributions on the coefficients. As a default choice, we reco ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Studentt prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longertailed version of the distribution attained by assuming onehalf additional success and onehalf additional failure in a logistic regression. Crossvalidation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higherorder interactions. This can
A Compendium of Conjugate Priors
, 1997
"... This report reviews conjugate priors and priors closed under sampling for a variety of data generating processes where the prior distributions are univariate, bivariate, and multivariate. The effects of transformations on conjugate prior relationships are considered and cases where conjugate prior r ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This report reviews conjugate priors and priors closed under sampling for a variety of data generating processes where the prior distributions are univariate, bivariate, and multivariate. The effects of transformations on conjugate prior relationships are considered and cases where conjugate prior relationships can be applied under transformations are identified. Univariate and bivariate prior relationships are verified using Monte Carlo methods. Contents 1
A noninformative prior for neural networks
 Machine Learning
, 2000
"... While many implementations of Bayesian neural networks use large, complex hierarchical priors, in much of modern Bayesian statistics, noninformative (flat) priors are very common. This paper introduces a noninformative prior for feedforward neural networks, describing several theoretical and practi ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
While many implementations of Bayesian neural networks use large, complex hierarchical priors, in much of modern Bayesian statistics, noninformative (flat) priors are very common. This paper introduces a noninformative prior for feedforward neural networks, describing several theoretical and practical advantages of this approach. Details of implementation via Markov chain Monte Carlo are included.
RESURRECTING LOGICAL PROBABILITY
 ERKENNTNIS
, 2001
"... The logical interpretation of probability, or “objective Bayesianism” – the theory that (some) probabilities are strictly logical degrees of partial implication – is defended. The main argument against it is that it requires the assignment of prior probabilities, and that any attempt to determine t ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The logical interpretation of probability, or “objective Bayesianism” – the theory that (some) probabilities are strictly logical degrees of partial implication – is defended. The main argument against it is that it requires the assignment of prior probabilities, and that any attempt to determine them by symmetry via a “principle of insufficient reason” inevitably leads to paradox. Three replies are advanced: that priors are imprecise or of little weight, so that disagreement about them does not matter, within limits; that it is possible to distinguish reasonable from unreasonable priors on logical grounds; and that in real cases disagreement about priors can usually be explained by differences in the background information. It is argued also that proponents of alternative conceptions of probability, such as frequentists, Bayesians and Popperians, are unable to avoid committing themselves to the basic principles of logical probability.
A Representation Theorem and Applications
 in ‘Proceedings of the Seventh European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU)’, Lecture Notes in Artificial Intelligence
, 2003
"... We introduce a set of transformations on the set of all probability distributions over a finite state space, and show that these transformations are the only ones that preserve certain elementary probabilistic relationships. This result provides a new perspective on a variety of probabilistic inf ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We introduce a set of transformations on the set of all probability distributions over a finite state space, and show that these transformations are the only ones that preserve certain elementary probabilistic relationships. This result provides a new perspective on a variety of probabilistic inference problems in which invariance considerations play a role. Two particular applications we consider in this paper are the development of an equivariancebased approach to the problem of measure selection, and a new justification for Haldane's prior as the distribution that encodes prior ignorance about the parameter of a multinomial distribution.
A representation theorem and applications to measure selection and noninformative priors
"... We introduce a set of transformations on the set of all probability distributions over a finite state space, and show that these transformations are the only ones that preserve certain elementary probabilistic relationships. This result provides a new perspective on a variety of probabilistic infere ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We introduce a set of transformations on the set of all probability distributions over a finite state space, and show that these transformations are the only ones that preserve certain elementary probabilistic relationships. This result provides a new perspective on a variety of probabilistic inference problems in which invariance considerations play a role. Two particular applications we consider in this paper are the development of an equivariancebased approach to the problem of measure selection, and a new justification for Haldane’s prior as the distribution that encodes prior ignorance about the parameter of a multinomial distribution. 1.
Bayesian Geometric Theory of Statistical Inference
, 1996
"... Statistical estimation is studied in the framework of Bayesian decision theory and information geometry. Extending information deviation (divergence) to the space of finite measures, it is shown that for a given prior and sample there exist ideal estimates given by a posterior average. The error of ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Statistical estimation is studied in the framework of Bayesian decision theory and information geometry. Extending information deviation (divergence) to the space of finite measures, it is shown that for a given prior and sample there exist ideal estimates given by a posterior average. The error of any estimate is decomposed into the sum of its deviation from the ideal estimate and the error of the ideal estimate. The optimal estimates on any model are given by projections of the ideal estimates onto the model. Under usual assumptions the ideal estimates are sufficient statistics of the posterior. Several important nonBayesian theories are also expressed as special cases. Abbreviated Title: Bayesian Information Geometry Affiliation: Neural Computing Research Group, Dept of Computer Science and Applied Mathematics, Aston University, Aston Triangle, Birmingham B4 7ET, UK Current address: Huaiyu Zhu, Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA. Email: zhuh@santafe....
Default Priors for Neural Network Classification
, 2005
"... Feedforward neural networks are a popular tool for classification, offering a method for fully flexible modeling. This paper looks at the underlying probability model, so as to understand statistically what is going on in order to facilitate an intelligent choice of prior for a fully Bayesian analys ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Feedforward neural networks are a popular tool for classification, offering a method for fully flexible modeling. This paper looks at the underlying probability model, so as to understand statistically what is going on in order to facilitate an intelligent choice of prior for a fully Bayesian analysis. The parameters turn out to be difficult or impossible to interpret, and yet a coherent prior requires a quantification of this inherent uncertainty. Several approaches are discussed, including flat priors, Jeffreys priors and reference priors. Key Words: Bayesian neural network; nonparametric classification; noninformative prior
models
, 2006
"... default prior distribution for logistic and other regression ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
default prior distribution for logistic and other regression