Results 1 - 10
of
24
On the Dirichlet Prior and Bayesian Regularization
- In Advances in Neural Information Processing Systems 15
, 2002
"... A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a Dirichle ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a Dirichlet prior over the model parameters affects the learned model structure in a domain with discrete variables. Surprisingly, a weak prior in the sense of smaller equivalent sample size leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In particular, the empty graph is obtained in the limit of a vanishing strength of prior belief. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate. Since the prior affects the parameters as expected, the prior strength balances a "trade-off" between regularizing the parameters or the structure of the model. We demonstrate the benefits of optimizing this trade-off in the sense of predictive accuracy.
Representation Dependence in Probabilistic Inference
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2004
"... Non-deductive reasoning systems are often representation dependent: representing the same situation in two different ways may cause such a system to return two different answers. Some have viewed ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Non-deductive reasoning systems are often representation dependent: representing the same situation in two different ways may cause such a system to return two different answers. Some have viewed
Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction
- BRITISH JOURNAL FOR THE PHILOSOPHY OF SCIENCE
, 2006
"... Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical paradigm—type I and II errors, significance levels, power, confidence levels—they have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of N–P tests s ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical paradigm—type I and II errors, significance levels, power, confidence levels—they have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of N–P tests stem from unclarity and confusion, even among N–P adherents, as to how a test’s (pre-data) error probabilities are to be used for (post-data) inductive inference as opposed to inductive behavior. We argue that the relevance of error probabilities is to ensure that only statistical hypotheses that have passed severe or probative tests are inferred from the data. The severity criterion supplies a meta-statistical principle for evaluating proposed statistical inferences, avoiding classic fallacies from tests that are overly sensitive, as well as those not sensitive enough to particular errors and discrepancies.
Bayesian information criterion for censored survival models
- Biometrics
"... We investigate the Bayesian Information Criterion (BIC) for variable selection in models for censored survival data. Kass and Wasserman (1995) showed that BIC provides a close approximation to the Bayes factor when a unit-information prior on the parameter space is used. We propose a revision of the ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
We investigate the Bayesian Information Criterion (BIC) for variable selection in models for censored survival data. Kass and Wasserman (1995) showed that BIC provides a close approximation to the Bayes factor when a unit-information prior on the parameter space is used. We propose a revision of the penalty term in BIC so that it is de ned in terms of the number of uncensored events instead of the number of observations. For the simplest censored data model, that of exponential distributions of survival times (i.e. a constant hazard rate), this revision results in a better approximation to the exact Bayes factor based on a conjugate unit-information prior. In the Cox proportional hazards regression model, we propose de ning BIC in terms of the maximized partial likelihood. Using the number of deaths rather than the number of individuals in the BIC penalty term corresponds to a more realistic prior on the parameter space, and is shown to improve predictive performance for assessing stroke risk in the Cardiovascular Health Study.
Constructing a Logic of Plausible Inference: a Guide To Cox's Theorem
- International Journal of Approximate Reasoning
, 2003
"... Cox's Theorem provides a theoretical basis for using probability theory as a general logic of plausible inference. The theorem states that any system for plausible reasoning that satisfies certain qualitative requirements intended to ensure consistency with classical deductive logic and corresponden ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Cox's Theorem provides a theoretical basis for using probability theory as a general logic of plausible inference. The theorem states that any system for plausible reasoning that satisfies certain qualitative requirements intended to ensure consistency with classical deductive logic and correspondence with commonsense reasoning is isomorphic to probability theory. However, the requirements used to obtain this result have been the subject of much debate. We review Cox's Theorem, discussing its requirements, the intuition and reasoning behind these, and the most important objections, and finish with an abbreviated proof of the theorem.
Information Geometry and Prior Selection
, 2002
"... In this contribution, we study the problem of prior selection arising in Bayesian inference. There is an extensive literature on the construction of non informative priors and the subject seems far to be definitely solved [1]. Here we revisit this subject with differential geometry tools and propose ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
In this contribution, we study the problem of prior selection arising in Bayesian inference. There is an extensive literature on the construction of non informative priors and the subject seems far to be definitely solved [1]. Here we revisit this subject with differential geometry tools and propose to construct the prior in a Bayesian decision theoretic framework. We show how the construction of prior by projection is the best way to take into account the modelization restriction. For instance, we apply this procedure to the curved parametric families where the ignorance is directly expressed by the relative geometry of the restricted model in the wider model containing it.
The Maximum Entropy On The Mean Method, Noise And Sensitivity
"... . In this paper we address the problem of building convenient criteria to solve linear and noisy inverse problems of the form y = Ax + n. Our approach is based on the specification of constraints on the solution x through its belonging to a given convex set C. The solution is chosen as the mean of t ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
. In this paper we address the problem of building convenient criteria to solve linear and noisy inverse problems of the form y = Ax + n. Our approach is based on the specification of constraints on the solution x through its belonging to a given convex set C. The solution is chosen as the mean of the distribution which is the closest to a reference measure ¯ on C with respect to the Kullback divergence, or cross-entropy. This is therefore called the Maximum Entropy on the Mean Method (memm). This problem is shown to be equivalent to the convex one x = arg min x F(x) submitted to y = Ax (in the noiseless case). Many classical criteria are found to be particular solutions with different reference measures ¯. But except for some measures, these primal criteria have no explicit expression. Nevertheless, taking advantage of a dual formulation of the problem, the memm enables us to compute a solution in such cases. This indicates that such criteria could hardly have been derived without t...
The Behrens-Fisher problem revisited: A Bayes-frequentist synthesis
, 2001
"... The Behrens-Fisher problem concerns the inference for the difference between the means of two normal populations whose ratio of variances is unknown. In this situation, Fisher's fiducial interval differs markedly from the Neyman-Pearson confidence interval. A prior proposed by Jeffreys leads to a cr ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The Behrens-Fisher problem concerns the inference for the difference between the means of two normal populations whose ratio of variances is unknown. In this situation, Fisher's fiducial interval differs markedly from the Neyman-Pearson confidence interval. A prior proposed by Jeffreys leads to a credible interval that is equivalent to Fisher's solution, but carries a different interpretation. The authors propose an alternative prior leading to a credible interval whose asymptotic coverage probability matches the frequentist coverage probability more accurately than Jeffreys' interval. Their simulation results indicate excellent matching even in small samples.
Default priors for Bayesian and frequentist inference
- J. Royal Statist. Soc. B
, 2010
"... We investigate the choice of default prior for use with likelihood to facilitate Bayesian and frequentist inference. Such a prior is a density or relative density that weights an observed likelihood function leading to the elimination of parameters not of interest and accordingly providing a density ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We investigate the choice of default prior for use with likelihood to facilitate Bayesian and frequentist inference. Such a prior is a density or relative density that weights an observed likelihood function leading to the elimination of parameters not of interest and accordingly providing a density type assessment for a parameter of interest. For regular models with independent coordinates we develop a secondorder prior for the full parameter based on an approximate location relation from near a parameter value to near the observed data point; this derives directly from the coordinate distribution functions and is closely linked to the original Bayes approach. We then develop a modified prior that is targetted on a component parameter of interest and avoids the marginalization paradoxes of Dawid, Stone and Zidek (1973); this uses some extensions of Welch-Peers theory that modify the Jeffreys prior and builds more generally on the approximate location property. A third type of prior is then developed that targets a vector interest parameter in the presence of a vector nuisance parameter and is based more directly on the original Jeffreys approach. Examples are given to clarify the computation of the priors and the flexibility of the approach.
HIGHER ORDER SEMIPARAMETRIC FREQUENTIST INFERENCE WITH THE PROFILE SAMPLER
- SUBMITTED TO THE ANNALS OF STATISTICS
, 2006
"... We consider higher order frequentist inference for the parametric component of a semiparametric model based on sampling from the posterior profile distribution. The first order validity of this procedure established by Lee, Kosorok and Fine (2005) is extended to second order validity in the setting ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
We consider higher order frequentist inference for the parametric component of a semiparametric model based on sampling from the posterior profile distribution. The first order validity of this procedure established by Lee, Kosorok and Fine (2005) is extended to second order validity in the setting where the infinite dimensional nuisance parameter achieves the parametric rate. Specifically, we obtain higher order estimates of the maximum profile likelihood estimator and of the efficient Fisher information. Moreover, we prove that an exact frequentist confidence interval for the parametric component at level alpha can be estimated by the alpha level credible set from the profile sampler with an error of order OP (n −1). As far as we are aware, these results are the first higher order frequentist results obtained for semiparametric estimation. A fully Bayesian interpretation is established under a certain data dependent prior. The theory is verified for three specific examples.

