Results 1  10
of
97
Marginal likelihood from the Gibbs output
 J. Am. Stat. Assoc
, 1995
"... Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at ..."
Abstract

Cited by 392 (25 self)
 Add to MetaCart
(Show Context)
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
Model selection and accounting for model uncertainty in graphical models using Occam's window
, 1993
"... We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection o ..."
Abstract

Cited by 293 (46 self)
 Add to MetaCart
We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism which averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximising predictive ability. However, this has not been used in practice because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1011). We argue that the standard Bayesian formalism is unsatisfactory and we propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty byaveraging overamuch smaller set of models. An efficient search algorithm is developed for nding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable
Operations for Learning with Graphical Models
 Journal of Artificial Intelligence Research
, 1994
"... This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models ..."
Abstract

Cited by 253 (12 self)
 Add to MetaCart
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feedforward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...
Bayesian measures of model complexity and fit
 Journal of the Royal Statistical Society, Series B
, 2002
"... [Read before The Royal Statistical Society at a meeting organized by the Research ..."
Abstract

Cited by 203 (3 self)
 Add to MetaCart
[Read before The Royal Statistical Society at a meeting organized by the Research
A Guide to the Literature on Learning Probabilistic Networks From Data
, 1996
"... This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the ..."
Abstract

Cited by 179 (0 self)
 Add to MetaCart
This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The presentation avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples. Keywords Bayesian networks, graphical models, hidden variables, learning, learning structure, probabilistic networks, knowledge discovery. I. Introduction Probabilistic networks or probabilistic gra...
Prior distributions for variance parameters in hierarchical models
 Bayesian Analysis
, 2006
"... Various noninformative prior distributions have been suggested for scale parameters in hierarchical models. We construct a new foldednoncentralt family of conditionally conjugate priors for hierarchical standard deviation parameters, and then consider noninformative and weakly informative priors i ..."
Abstract

Cited by 179 (13 self)
 Add to MetaCart
(Show Context)
Various noninformative prior distributions have been suggested for scale parameters in hierarchical models. We construct a new foldednoncentralt family of conditionally conjugate priors for hierarchical standard deviation parameters, and then consider noninformative and weakly informative priors in this family. We use an example to illustrate serious problems with the inversegamma family of “noninformative ” prior distributions. We suggest instead to use a uniform prior on the hierarchical standard deviation, using the halft family when the number of groups is small and in other settings where a weakly informative prior is desired.
A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion
 Journal of the American Statistical Association
, 1995
"... ..."
Probabilistic Independent Component Analysis
, 2003
"... Independent Component Analysis is becoming a popular exploratory method for analysing complex data such as that from FMRI experiments. The application of such 'modelfree' methods, however, has been somewhat restricted both by the view that results can be uninterpretable and by the lack of ..."
Abstract

Cited by 91 (12 self)
 Add to MetaCart
(Show Context)
Independent Component Analysis is becoming a popular exploratory method for analysing complex data such as that from FMRI experiments. The application of such 'modelfree' methods, however, has been somewhat restricted both by the view that results can be uninterpretable and by the lack of ability to quantify statistical significance. We present an integrated approach to Probabilistic ICA for FMRI data that allows for nonsquare mixing in the presence of Gaussian noise. We employ an objective estimation of the amount of Gaussian noise through Bayesian analysis of the true dimensionality of the data, i.e. the number of activation and nonGaussian noise sources. Reduction of the data to this 'true' subspace before the ICA decomposition automatically results in an estimate of the noise, leading to the ability to assign significance to voxels in ICA spatial maps. Estimation of the number of intrinsic sources not only enables us to carry out probabilistic modelling, but also achieves an asymptotically unique decomposition of the data. This reduces problems of interpretation, as each final independent component is now much more likely to be due to only one physical or physiological process. We also describe other improvements to standard ICA, such as temporal prewhitening and variance normafisation of timeseries, the latter being particularly useful in the context of dimensionality reduction when weak activation is present. We discuss the use of prior information about the spatiotemporal nature of the source processes, and an alternativehypothesis testing approach for inference, using Gaussian mixture models. The performance of our approach is illustrated and evaluated on real and complex artificial FMRI data, and compared to the spatiotemporal accuracy of restfits obtaine...
Comparing Dynamic Causal Models
 NEUROIMAGE
, 2004
"... This article describes the use of Bayes factors for comparing Dynamic Causal Models (DCMs). DCMs are used to make inferences about effective connectivity from functional Magnetic Resonance Imaging (fMRI) data. These inferences, however, are contingent upon assumptions about model structure, that is, ..."
Abstract

Cited by 86 (33 self)
 Add to MetaCart
(Show Context)
This article describes the use of Bayes factors for comparing Dynamic Causal Models (DCMs). DCMs are used to make inferences about effective connectivity from functional Magnetic Resonance Imaging (fMRI) data. These inferences, however, are contingent upon assumptions about model structure, that is, the connectivity pattern between the regions included in the model. Given the current lack of detailed knowledge on anatomical connectivity in the human brain, there are often considerable degrees of freedom when defining the connectional structure of DCMs. In addition, many plausible scientific hypotheses may exist about which connections are changed by experimental manipulation, and a formal procedure for directly comparing these competing hypotheses is highly desirable. In this article, we show how Bayes factors can be used to guide choices about model structure, both with regard to the intrinsic connectivity pattern and the contextual modulation of individual connections. The combined use of Bayes factors and DCM thus allows one to evaluate competing scientific theories about the architecture of largescale neural networks and the neuronal interactions that mediate perception and cognition.