Results 1  10
of
12
Model selection and accounting for model uncertainty in graphical models using Occam's window
, 1993
"... We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection o ..."
Abstract

Cited by 266 (46 self)
 Add to MetaCart
We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism which averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximising predictive ability. However, this has not been used in practice because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1011). We argue that the standard Bayesian formalism is unsatisfactory and we propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty byaveraging overamuch smaller set of models. An efficient search algorithm is developed for nding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable
Bayesian Model Averaging for Linear Regression Models
 Journal of the American Statistical Association
, 1997
"... We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem in ..."
Abstract

Cited by 184 (13 self)
 Add to MetaCart
We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem involves averaging over all possible models (i.e., combinations of predictors) when making inferences about quantities of
Prior Probabilities
 IEEE Transactions on Systems Science and Cybernetics
, 1968
"... e case of location and scale parameters, rate constants, and in Bernoulli trials with unknown probability of success. In realistic problems, both the transformation group analysis and the principle of maximum entropy are needed to determine the prior. The distributions thus found are uniquely determ ..."
Abstract

Cited by 166 (3 self)
 Add to MetaCart
e case of location and scale parameters, rate constants, and in Bernoulli trials with unknown probability of success. In realistic problems, both the transformation group analysis and the principle of maximum entropy are needed to determine the prior. The distributions thus found are uniquely determined by the prior information, independently of the choice of parameters. In a certain class of problems, therefore, the prior distributions may now be claimed to be fully as "objective" as the sampling distributions. I. Background of the problem Since the time of Laplace, applications of probability theory have been hampered by difficulties in the treatment of prior information. In realistic problems of decision or inference, we often have prior information which is highly relevant to the question being asked; to fail to take it into account is to commit the most obvious inconsistency of reasoning and may lead to absurd or dangerously misleading results. As an extreme examp
The Strength of Statistical Evidence for Composite Hypotheses: Inference to the Best Explanation
, 2010
"... A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors o ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors one composite hypothesis over another is the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function over the parameter of interest. Since the weight of evidence is generally only known up to a nuisance parameter, it is approximated by replacing the likelihood function with a reduced likelihood function on the interest parameter space. Unlike the Bayes factor and unlike the pvalue under interpretations that extend its scope, the weight of evidence is coherent in the sense that it cannot support a hypothesis over any hypothesis that it entails. Further, when comparing the hypothesis that the parameter lies outside a nontrivial interval to the hypothesis that it lies within the interval, the proposed method of weighing evidence almost always asymptotically favors the correct hypothesis
Bayesian computation: a statistical revolution
, 2003
"... The 1990s saw a statistical revolution sparked predominantly by the phenomenal advances in computing technology from the early 1980s onwards. These advances enabled the development of powerful new computational tools, which reignited interest in a philosophy of statistics that had lain almost dorman ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The 1990s saw a statistical revolution sparked predominantly by the phenomenal advances in computing technology from the early 1980s onwards. These advances enabled the development of powerful new computational tools, which reignited interest in a philosophy of statistics that had lain almost dormant since the turn of the century. In this paper we briefly review the historic and philosophical foundations of the two schools of statistical thought, before examining the implications of the reascendance of the Bayesian paradigm for both current and future statistical practice.
Explanation Trees for Causal Bayesian Networks
"... Bayesian networks can be used to extract explanations about the observed state of a subset of variables. In this paper, we explicate the desiderata of an explanation and confront them with the concept of explanation proposed by existing methods. The necessity of taking into account causal approaches ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Bayesian networks can be used to extract explanations about the observed state of a subset of variables. In this paper, we explicate the desiderata of an explanation and confront them with the concept of explanation proposed by existing methods. The necessity of taking into account causal approaches when a causal graph is available is discussed. We then introduce causal explanation trees, based on the construction of explanation trees using the measure of causal information ow (Ay and Polani, 2006). This approach is compared to several other methods on known networks. 1
Highly Informative Priors
, 1985
"... INTRODUCTION The statistical problems envisaged in our pedagogy are almost always ones in which we acquire new data D that give evidence concerning some hypotheses H; H 0 ; : : : (this includes parameter estimation, since H might be the statement that a parameter lies in a certain interval); and w ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
INTRODUCTION The statistical problems envisaged in our pedagogy are almost always ones in which we acquire new data D that give evidence concerning some hypotheses H; H 0 ; : : : (this includes parameter estimation, since H might be the statement that a parameter lies in a certain interval); and we make inferences about them solely from the data. Indeed, Fisher's maxim, "Let the data speak for themselves" seems to imply that it would be wrong  a violation of "scientific objectivity"  to allow ourselves to be influenced by other considerations such as prior knowledge about H . Yet the very act of choosing a model (i.e. a sampling distribution conditional on H) is a means of expressing some kind of prior knowledge about the existence and nature of H , and its observable effects. This was noted by John Tukey (1978), who observed that sampling theory is in the curious
Unknown heterogeneity, the ECEM algorithm, and Large T Approximation
, 1996
"... We study a panel structure with n subjects/entities being observed over T periods. We consider a class of models for each subject's data generating process, and allow for unknown heterogeneity. In other words, we do not know how many types we have, what the types are, and which subjects belong to ea ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We study a panel structure with n subjects/entities being observed over T periods. We consider a class of models for each subject's data generating process, and allow for unknown heterogeneity. In other words, we do not know how many types we have, what the types are, and which subjects belong to each type. We propose a large T approximation to the posterior mode on the unknowns through the Estimation/Classification (EC) algorithm of ElGamal and Grether (1995) which is linear in n, T , and the unknown number of types. If our class of models (likelihood functions) allows for a consistent asymptotically normal estimator under the assumption of homogeneity (number of types = 1), then the estimators obtained by our EC algorithm inherit those asymptotic properties as T " 1 and then as n " 1 (with a blockdiagonal covariance matrix facilitating hypothesistesting). We then propose a large T approximation to the EM algorithm to obtain posteriors on the subject classifications and diagnostic...
2 An Estimated DSGE Model of the Indian Economy ∗
, 2010
"... We develop a closedeconomy DSGE model of the Indian economy and estimate it by Bayesian Maximum Likelihood methods using Dynare. We build up in stages to a model with a number of features important for emerging economies in general and the Indian economy in particular: a large proportion of credit ..."
Abstract
 Add to MetaCart
We develop a closedeconomy DSGE model of the Indian economy and estimate it by Bayesian Maximum Likelihood methods using Dynare. We build up in stages to a model with a number of features important for emerging economies in general and the Indian economy in particular: a large proportion of creditconstrained consumers, a financial accelerator facing domestic firms seeking to finance their investment, and an informal sector. The simulation properties of the estimated model are examined under a generalized inflation targeting Taylortype interest rate rule with forward and backwardlooking components. We find that, in terms of model posterior probabilities and standard moments criteria, inclusion of the above financial frictions and an informal sector significantly improves the model fit. JEL Classification: E52, E37, E58
Copyright c○2010 by the author. The Strength of Statistical Evidence for Composite Hypotheses: Inference to the Best Explanation
"... A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors o ..."
Abstract
 Add to MetaCart
A general function to quantify the weight of evidence in a sample of data for one hypothesis over another is derived from the law of likelihood and from a statistical formalization of inference to the best explanation. For a fixed parameter of interest, the resulting weight of evidence that favors one composite hypothesis over another is the likelihood ratio using the parameter value consistent with each hypothesis that maximizes the likelihood function over the parameter of interest. Since the weight of evidence is generally only known up to a nuisance parameter, it is approximated by replacing the likelihood function with a reduced likelihood function on the interest parameter space. Unlike the Bayes factor and unlike the pvalue under interpretations that extend its scope, the weight of evidence is coherent in the sense that it cannot support a hypothesis over any hypothesis that it entails. Further, when comparing the hypothesis that the parameter lies outside a nontrivial interval to the hypothesis that it lies within the interval, the proposed method of weighing evidence almost always asymptotically favors the correct hypothesis