Results 1  10
of
69
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
 Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract

Cited by 195 (12 self)
 Add to MetaCart
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naiveBayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
Markov Chain Monte Carlo Model Determination for Hierarchical and Graphical Loglinear Models
 Biometrika
, 1996
"... this paper, we will only consider undirected graphical models. For details of Bayesian model selection for directed graphical models see Madigan et al (1995). An (undirected) graphical model is determined by a set of conditional independence constraints of the form `fl 1 is independent of fl 2 condi ..."
Abstract

Cited by 79 (12 self)
 Add to MetaCart
this paper, we will only consider undirected graphical models. For details of Bayesian model selection for directed graphical models see Madigan et al (1995). An (undirected) graphical model is determined by a set of conditional independence constraints of the form `fl 1 is independent of fl 2 conditional on all other fl i 2 C'. Graphical models are so called because they can each be represented as a graph with vertex set C and an edge between each pair fl 1 and fl 2 unless fl 1 and fl 2 are conditionally independent as described above. Darroch, Lauritzen and Speed (1980) show that each graphical loglinear model is hierarchical, with generators given by the cliques (complete subgraphs) of the graph. The total number of possible graphical models is clearly given by 2 (
Estimating the integrated likelihood via posterior simulation using the harmonic mean identity
 Bayesian Statistics
, 2007
"... The integrated likelihood (also called the marginal likelihood or the normalizing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the likelihood times the prior density. The Bayes factor for model comparison a ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
(Show Context)
The integrated likelihood (also called the marginal likelihood or the normalizing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the likelihood times the prior density. The Bayes factor for model comparison and Bayesian testing is a ratio of integrated likelihoods, and the model weights in Bayesian model averaging are proportional to the integrated likelihoods. We consider the estimation of the integrated likelihood from posterior simulation output, aiming at a generic method that uses only the likelihoods from the posterior simulation iterations. The key is the harmonic mean identity, which says that the reciprocal of the integrated likelihood is equal to the posterior harmonic mean of the likelihood. The simplest estimator based on the identity is thus the harmonic mean of the likelihoods. While this is an unbiased and simulationconsistent estimator, its reciprocal can have infinite variance and so it is unstable in general. We describe two methods for stabilizing the harmonic mean estimator. In the first one, the parameter space is reduced in such a way that the modified estimator involves a harmonic mean of heaviertailed densities, thus resulting in a finite variance estimator. The resulting
Bayesian Estimation and Testing of Structural Equation Models
 Psychometrika
, 1999
"... The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameter ..."
Abstract

Cited by 43 (10 self)
 Add to MetaCart
The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameters can be computed from these samples. If the prior distribution over the parameters is uninformative, the posterior is proportional to the likelihood, and asymptotically the inferences based on the Gibbs sample are the same as those based on the maximum likelihood solution, e.g., output from LISREL or EQS. In small samples, however, the likelihood surface is not Gaussian and in some cases contains local maxima. Nevertheless, the Gibbs sample comes from the correct posterior distribution over the parameters regardless of the sample size and the shape of the likelihood surface. With an informative prior distribution over the parameters, the posterior can be used to make inferences about the parameters of underidentified models, as we illustrate on a simple errorsinvariables model.
Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2001
"... ..."
Bayesian finite mixtures with an unknown number of components: the allocation sampler
 University of Glasgow
, 2005
"... A new Markov chain Monte Carlo method for the Bayesian analysis of finite mixture distributions with an unknown number of components is presented. The sampler is characterized by a state space consisting only of the number of components and the latent allocation variables. Its main advantage is that ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
(Show Context)
A new Markov chain Monte Carlo method for the Bayesian analysis of finite mixture distributions with an unknown number of components is presented. The sampler is characterized by a state space consisting only of the number of components and the latent allocation variables. Its main advantage is that it can be used, with minimal changes, for mixtures of components from any parametric family, under the assumption that the component parameters can be integrated out of the model analytically. Artificial and real data sets are used to illustrate the method and mixtures of univariate and of multivariate normals are explicitly considered. The problem of label switching, when parameter inference is of interest, is addressed in a postprocessing stage.
An empirical comparison of logit choice models with discrete versus continuous representations of heterogeneity
 Journal of Marketing Research
, 2002
"... Currently, there is an important debate about the relative merits of models with discrete and continuous representations of consumer heterogeneity. In a recent JMR study, Andrews, Ansari, and Currim (2002; hereafter AAC) compared metric conjoint analysis models with discrete and continuous represent ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Currently, there is an important debate about the relative merits of models with discrete and continuous representations of consumer heterogeneity. In a recent JMR study, Andrews, Ansari, and Currim (2002; hereafter AAC) compared metric conjoint analysis models with discrete and continuous representations of heterogeneity and found no differences between the two models with respect to parameter recovery and prediction of ratings for holdout profiles. Models with continuous representations of heterogeneity fit the data better than models with discrete representations of heterogeneity. The goal of the current study is to compare the relative performance of logit choice models with discrete versus continuous representations of heterogeneity in terms of the accuracy of householdlevel parameters, fit, and forecasting accuracy. To accomplish this goal, the authors conduct an extensive simulation experiment with logit models in a scanner data context, using an experimental design based on AAC and other recent simulation studies. One of the main findings is that models with continuous and discrete representations of heterogeneity recover householdlevel parameter estimates and predict holdout choices about equally well except when the number of purchases per household is small, in which case the models with continuous representations perform very poorly. As in the AAC study, models with continuous representations of heterogeneity fit the data better.
The Posterior Probability of Bayes Nets with Strong Dependences
 Soft Computing
, 1999
"... Stochastic independence is an idealized relationship located at one end of a continuum of values measuring degrees of dependence. Modeling real world systems, we are often not interested in the distinction between exact independence and any degree of dependence, but between weak ignorable and strong ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
Stochastic independence is an idealized relationship located at one end of a continuum of values measuring degrees of dependence. Modeling real world systems, we are often not interested in the distinction between exact independence and any degree of dependence, but between weak ignorable and strong substantial dependence. Good models map significant deviance from independence and neglect approximate independence or dependence weaker than a noise threshold. This intuition is applied to learning the structure of Bayes nets from data. We determine the conditional posterior probabilities of structures given that the degree of dependence at each of their nodes exceeds a critical noise level. Deviance from independence is measured by mutual information. Arc probabilities are determined by the amount of mutual information the neighbors contribute to a node, is greater than a critical minimum deviance from independence. A Ø 2 approximation for the probability density function of mutual info...
2006 “A Bayesian Approach to Diffusion Models of DecisionMaking and Response Time” NIPS
"... We present a computational Bayesian approach for Wiener diffusion models, which are prominent accounts of response time distributions in decisionmaking. We first develop a general closedform analytic approximation to the response time distributions for onedimensional diffusion processes, and deri ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
We present a computational Bayesian approach for Wiener diffusion models, which are prominent accounts of response time distributions in decisionmaking. We first develop a general closedform analytic approximation to the response time distributions for onedimensional diffusion processes, and derive the required Wiener diffusion as a special case. We use this result to undertake Bayesian modeling of benchmark data, using posterior sampling to draw inferences about the interesting psychological parameters. With the aid of the benchmark data, we show the Bayesian account has several advantages, including dealing naturally with the parameter variation needed to account for some key features of the data, and providing quantitative measures to guide decisions about model construction. 1