Results 1 - 10
of
43
Bayes Factors
, 1995
"... In a 1935 paper, and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null ..."
Abstract
-
Cited by 718 (65 self)
- Add to MetaCart
In a 1935 paper, and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null is one-half. Although there has been much discussion of Bayesian hypothesis testing in the context of criticism of P -values, less attention has been given to the Bayes factor as a practical tool of applied statistics. In this paper we review and discuss the uses of Bayes factors in the context of five scientific applications in genetics, sports, ecology, sociology and psychology.
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
- Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract
-
Cited by 155 (9 self)
- Add to MetaCart
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naive-Bayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
Model Selection and the Principle of Minimum Description Length
- Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract
-
Cited by 114 (4 self)
- Add to MetaCart
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can co-exist and be compared. We illustrate th...
Universal Prediction
- IEEE Transactions on Information Theory
, 1998
"... This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression.
Bayes factors and model uncertainty
- DEPARTMENT OF STATISTICS, UNIVERSITY OFWASHINGTON
, 1993
"... In a 1935 paper, and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null ..."
Abstract
-
Cited by 69 (6 self)
- Add to MetaCart
In a 1935 paper, and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null is one-half. Although there has been much discussion of Bayesian hypothesis testing in the context of criticism of P-values, less attention has been given to the Bayes factor as a practical tool of applied statistics. In this paper we review and discuss the uses of Bayes factors in the context of five scientific applications. The points we emphasize are:- from Jeffreys's Bayesian point of view, the purpose of hypothesis testing is to evaluate the evidence in favor of a scientific theory;- Bayes factors offer a way of evaluating evidence in favor ofa null hypothesis;- Bayes factors provide a way of incorporating external information into the evaluation of evidence about a hypothesis;- Bayes factors are very general, and do not require alternative models to be nested;- several techniques are available for computing Bayes factors, including asymptotic approximations which are easy to compute using the output from standard packages that maximize likelihoods;- in "non-standard " statistical models that do not satisfy common regularity conditions, it can be technically simpler to calculate Bayes factors than to derive non-Bayesian significance
Model Selection for Probabilistic Clustering Using Cross-Validated Likelihood
- Statistics and Computing
, 1998
"... Cross-validated likelihood is investigated as a tool for automatically determining the appropriate number of components (given the data) in finite mixture modelling, particularly in the context of model-based probabilistic clustering. The conceptual framework for the cross-validation approach to mod ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
Cross-validated likelihood is investigated as a tool for automatically determining the appropriate number of components (given the data) in finite mixture modelling, particularly in the context of model-based probabilistic clustering. The conceptual framework for the cross-validation approach to model selection is direct in the sense that models are judged directly on their out-of-sample predictive performance. The method is applied to a well-known clustering problem in the atmospheric science literature using historical records of upper atmosphere geopotential height in the Northern hemisphere. Cross-validated likelihood provides strong evidence for three clusters in the data set, providing an objective confirmation of earlier results derived using non-probabilistic clustering techniques. 1 Introduction Cross-validation is a well-known technique in supervised learning to select a model from a family of candidate models. Examples include selecting the best classification tree using cr...
A tutorial introduction to the minimum description length principle
- in Advances in Minimum Description Length: Theory and Applications. 2005
"... ..."
2003): “Forecast uncertainties in macroeconometric modelling: an application to the UK economy
- Journal of the American Statistical Association
"... This paper argues that probability forecasts convey information on the uncertainties that surround macro-economic forecasts in a straightforward manner which is preferable to other alternatives, including the use of confidence intervals. Probability forecasts obtained using a small benchmark macroec ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
This paper argues that probability forecasts convey information on the uncertainties that surround macro-economic forecasts in a straightforward manner which is preferable to other alternatives, including the use of confidence intervals. Probability forecasts obtained using a small benchmark macroeconometric model as well as a number of other alternatives are presented and evaluated using recursive forecasts generated over the period 1999q1-2001q1. Out of sample probability forecasts of inflation and output growth are also provided over the period 2001q2-2003q1, and their implications discussed in relation to the Bank of England’s inflation target and the need to avoid recessions, both as separate events and jointly. The robustness of the results to parameter and model uncertainties is also investigated by a pragmatic implementation of the Bayesian model averaging approach.
If You’re So Smart, Why Aren’t You Rich? Belief Selection in Complete and Incomplete Markets
, 2001
"... ..."
Models and Selection Criteria for Regression and Classification
- Uncertainty in Arificial Intelligence 13
, 1997
"... When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatory or input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we ca ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatory or input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we call Bayesian regression/classification (BRC) models, that can be factored into independent conditional (yjx) and input (x) models. These models are convenient, because the conditional model (the portion of the full model that we care about) can be analyzed by itself. We examine the practice of transforming arbitrary Bayesian models to BRC models, and argue that this practice is often inappropriate because it ignores prior knowledge that may be important for learning. In addition, we examine Bayesian methods for learning models from data. We discuss two criteria for Bayesian model selection that are appropriate for repression/classification: one described by Spiegelhalter et al. (1993), and an...

