## Easy Computation of Bayes Factors and Normalizing Constants for Mixture Models via Mixture Importance Sampling (2001)

Citations: | 3 - 0 self |

### BibTeX

@TECHREPORT{Emond01easycomputation,

author = {Mary J. Emond and Adrian E. Raftery and Russell J. Steele},

title = {Easy Computation of Bayes Factors and Normalizing Constants for Mixture Models via Mixture Importance Sampling},

institution = {},

year = {2001}

}

### OpenURL

### Abstract

We propose a method for approximating integrated likelihoods, or posterior normalizing constants, in finite mixture models, for which analytic approximations such as the Laplace method are invalid. Integrated likelihoods are key components of Bayes factors and of the posterior model probabilities used in Bayesian model averaging. The method starts by formulating the model in terms of the unobserved group memberships, Z, and making these, rather than the model parameters, the variables of integration. The integral is then evaluated using importance sampling over the Z. The tricky part is choosing the importance sampling function, and we study the use of mixtures as importance sampling functions. We propose two forms of this: defensive mixture importance sampling (DMIS), and Z-distance importance sampling. We choose the parameters of the mixture adaptively, and we show how this can be done so as to approximately minimize the variance of the approximation to the integral.

### Citations

9054 | Maximum likelihood from incomplete data via the EM algorithm - Dempster, Laird, et al. - 1977 |

2771 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...considered by Rozenkranz and Raftery (1994), Raftery (1996b) and Lewis and Raftery (1997). The Bayesian Information Criterion (BIC) can be used as an asymptotic approximation to the log Bayes factor (=-=Schwarz, 1978-=-; Kass and Wasserman, 1995). In finite mixture models, however, none of these methods is fully satisfactory. Two features of mixture models make many current methods for approximating the integrated l... |

1176 | Bayes factors - Kass, Raftery - 1995 |

724 | Statistical Analysis of Finite Mixture Distributions - Titterington, Smith, et al. - 1985 |

515 | Bayesian Classification (AutoClass): Theory and Results - Cheeseman, Stutz - 1996 |

485 | On Bayesian analysis of Mixtures with an Unknown Number of Components
- Richardson, Green
- 1997
(Show Context)
Citation Context ...is not trivial with MCMC because of the dependence between successive samples. Reversible jump MCMC methods can be used to estimate Bayes factors and posterior model probabilities for mixture models (=-=Richardson and Green 1997-=-), but they are also prone to label-switching problems, and convergence can be even more of a problem than for regular MCMC. For example, in the rejoinder to the discussion of their paper, Richardson ... |

392 | Marginal Likelihood from the Gibbs Output
- Chib
- 1995
(Show Context)
Citation Context ...els that have substantial numbers of parameters. Markov chain Monte Carlo (MCMC) can be used to estimate mixture models, and associated methods can be used to approximate integrated likelihoods (e.g. =-=Chib 1995-=-, Raftery 1996b). However, in addition to the usual problems with MCMC methods (dependent samples, convergence issues, complexity of programming and implementation), in mixture models they can easily ... |

323 | Bayesian Model Selection in Social Research
- Raftery
- 1995
(Show Context)
Citation Context ...he BIC was in terms of the Laplace method, and it provides a good approximation to the integrated likelihood in regular models for a unit information prior on the parameters (Kass and Wasserman 1995; =-=Raftery 1995-=-). This justification does not hold for mixture models, although BIC does provide a consistent estimate of the number of components in the mixture (Leroux 1992; Keribin 1998), it leads to density esti... |

319 | Model-Based Clustering, Discriminant Analysis, and Density Estimation - Fraley, Raftery - 2002 |

316 | How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis.” The Computer Journal 41(8):578–88
- Fraley, Raftery
- 1998
(Show Context)
Citation Context ...in 1998), it leads to density estimates that are consistent for the true density (Roeder and Wasserman 1997), and it has given good results in a range of applications (e.g. Dasgupta and Raftery 1998; =-=Fraley and Raftery 1998-=-, 2000). Quadrature methods can be used but they begin to break down for problems with more than 4 parameters (Evans and Swartz 1995), and the number of parameters in mixture models quickly surpasses ... |

229 |
Accurate approximations for posterior moments and marginal densities
- Tierney, Kadane
- 1986
(Show Context)
Citation Context ...height. Additional local modes are often present (Lindsay, 1995; Titterington, Smith and Makov, 1985), especially when more than G components are fitted (Atwood et al, 1992). The Laplace method (e.g. =-=Tierney and Kadane 1986-=-) provides an analytic approximation to the integrated likelihood based on the assumption that the posterior distribution is approximately elliptically contoured (e.g. Raftery 1996a), and when this as... |

190 | A monte carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms - Wei, Tanner - 1990 |

183 | Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables - Chickering, Heckerman - 1997 |

164 | Approximate Bayesian Inference With the Weighted Likelihood Bootstrap” (with discussion - Newton, Raftery - 1994 |

149 | A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion
- KASS, WASSERMAN
- 1995
(Show Context)
Citation Context ...ozenkranz and Raftery (1994), Raftery (1996b) and Lewis and Raftery (1997). The Bayesian Information Criterion (BIC) can be used as an asymptotic approximation to the log Bayes factor (Schwarz, 1978; =-=Kass and Wasserman, 1995-=-). In finite mixture models, however, none of these methods is fully satisfactory. Two features of mixture models make many current methods for approximating the integrated likelihood problematic. The... |

143 |
Mixture models: Theory, geometry and applications
- Lindsay
(Show Context)
Citation Context ... hold in finite mixture models whenever one estimates a model with G components but the true number of components is smaller, so that the true parameter values lie on the edge of the parameter space (=-=Lindsay 1995). A secon-=-d feature is the "label-switching" problem, namely that the likelihood is invariant to relabelling of the mixture components, and so has G! modes of the same height. Additional local modes a... |

124 | Practical Bayesian density estimation using mixtures of normals
- Roeder, Wasserman
- 1997
(Show Context)
Citation Context ...models, although BIC does provide a consistent estimate of the number of components in the mixture (Leroux 1992; Keribin 1998), it leads to density estimates that are consistent for the true density (=-=Roeder and Wasserman 1997-=-), and it has given good results in a range of applications (e.g. Dasgupta and Raftery 1998; Fraley and Raftery 1998, 2000). Quadrature methods can be used but they begin to break down for problems wi... |

124 | Dealing with label switching in mixture models - Stephens - 2000 |

118 | Assessing a mixture model for clustering with the integrated completed likelihood - Biernacki, Celeux, et al. - 2000 |

118 | Computational and inferential difficulties with mixture posterior distributions - Celeux, Hurn, et al. - 2000 |

117 | Theory of Probability (3rd ed - Jeffreys - 1961 |

110 | Approximate Bayes factors and accounting for model uncertainty in generalised linear models
- Raftery
- 1996
(Show Context)
Citation Context ...od (e.g. Tierney and Kadane 1986) provides an analytic approximation to the integrated likelihood based on the assumption that the posterior distribution is approximately elliptically contoured (e.g. =-=Raftery 1996-=-a), and when this assumption holds it can provide approximations of remarkable quality (e.g. Tierney and Kadane 1986; Grunwald, Guttorp and Raftery 1993; Lewis and Raftery 1997). However, for mixture ... |

89 | Detecting features in spatial point processes with clutter via model-based clustering
- Dasgupta, Raftery
- 1998
(Show Context)
Citation Context ...mixture (Leroux 1992; Keribin 1998), it leads to density estimates that are consistent for the true density (Roeder and Wasserman 1997), and it has given good results in a range of applications (e.g. =-=Dasgupta and Raftery 1998-=-; Fraley and Raftery 1998, 2000). Quadrature methods can be used but they begin to break down for problems with more than 4 parameters (Evans and Swartz 1995), and the number of parameters in mixture ... |

78 | Approximating posterior distributions by mixtures - WEST - 1993 |

62 |
Weighted average importance sampling and defensive mixture distributions
- Hesterberg
- 1995
(Show Context)
Citation Context ...cess of any importance sampling method depends critically on the importance sampling function, and here two methods for creating this function are proposed: (1) defensive mixture importance sampling (=-=Hesterberg 1995), and (2)-=- sampling via perturbation of an initial grouping that has high posterior probability (the "Z-distance method"). The second method appears to be new. In both of these methods, the importance... |

53 |
Consistent estimation of a mixing distribution
- Leroux
- 1992
(Show Context)
Citation Context ...he parameters (Kass and Wasserman 1995; Raftery 1995). This justification does not hold for mixture models, although BIC does provide a consistent estimate of the number of components in the mixture (=-=Leroux 1992-=-; Keribin 1998), it leads to density estimates that are consistent for the true density (Roeder and Wasserman 1997), and it has given good results in a range of applications (e.g. Dasgupta and Raftery... |

51 |
Hypothesis testing and model selection
- Raftery
- 1996
(Show Context)
Citation Context ...od (e.g. Tierney and Kadane 1986) provides an analytic approximation to the integrated likelihood based on the assumption that the posterior distribution is approximately elliptically contoured (e.g. =-=Raftery 1996-=-a), and when this assumption holds it can provide approximations of remarkable quality (e.g. Tierney and Kadane 1986; Grunwald, Guttorp and Raftery 1993; Lewis and Raftery 1997). However, for mixture ... |

41 | On the asymptotic behaviour of posterior distributions - Walker - 1969 |

38 | Estimating Bayes Factors via Posterior Simulation with the Laplace-Metropolis Estimator
- Lewis, Raftery
- 1997
(Show Context)
Citation Context ...tely elliptically contoured (e.g. Raftery 1996a), and when this assumption holds it can provide approximations of remarkable quality (e.g. Tierney and Kadane 1986; Grunwald, Guttorp and Raftery 1993; =-=Lewis and Raftery 1997-=-). However, for mixture models this assumption fails when the model being fit has G components and the actual number of components is smaller (Lindsay 1995), which is a situation of great interest for... |

36 | Methods for approximating integrals in statistics with special emphasis on Bayesian integration problems
- Evans, Swartz
- 1995
(Show Context)
Citation Context ...sults in a range of applications (e.g. Dasgupta and Raftery 1998; Fraley and Raftery 1998, 2000). Quadrature methods can be used but they begin to break down for problems with more than 4 parameters (=-=Evans and Swartz 1995-=-), and the number of parameters in mixture models quickly surpasses this as the number of groups and/or the dimension of the data increase. When testing or comparing mixture models, one is typically c... |

32 | Local adaptive importance sampling for multivariate densities with strong nonlinear relationships - Givens, Raftery - 1996 |

26 |
Integration of multimodal functions by Monte Carlo importance sampling
- Oh, Berger
- 1993
(Show Context)
Citation Context ...-- (1 - 5). -p(Zlx , - -- b) q- 5p(Z), (16) where b indexes the G! permutations of the components of . The sampling is best done in a stratified manner, since this is both simpler and more efficient (=-=Oh and Berger, 1993-=-; Hesterberg, 1995): if K samples are planned, then Z is drawn from p(Z) for K5 of the samples, and from each of the p(Zl- -- b, x)'s for K(1-5)/G! of the samples. We may still calculate 5opt using (1... |

24 |
Using Bootstrap Likelihood Ratios in Finite Mixture Models
- Feng, McCulloch
- 1996
(Show Context)
Citation Context ...o restrict the integration to a subset of the parameter space so that L(xlZ)p(Z ) is unimodal over this subset. Allowingsto converge over a set of G points still leads to a finite limit, however (see =-=Feng and McCulloch, 1996-=-). 3. Condition (3) in Appendix A and the assumption of positive definiteness for (r0) and J(r0) both require that rj > 0, j = 1,... G. It is of interest to know whether these conditions may be relaxe... |

17 | Time series of continuous proportions - Grunwald, E, et al. - 1993 |

15 |
Determination of the frequency of loss of heterozygosity in esophageal adenocarcinoma by cell sorting, whole genome amplification and microsatellite polymorphisms. Oncogene 12
- Barrett, Galipeau, et al.
- 1996
(Show Context)
Citation Context ...mance of the importance sampling methods for a particular application in molecular biology (Newton et al, 1998; Desai, 2000). Data Sets i and 2 are each a subset from one of two allelotype data sets (=-=Barrett et al, 1996-=-; and Shibagaki et al, 1994, respectively) where it is of interest to determine whether there exist two binomial components (Newton et al, 1998). Data Set 3 is simulated data from a single binomial co... |

15 | Inference in molecular population genetics (with discussion - Stephens, Donnelly - 2000 |

11 | On the statistical analysis of allelic-loss data
- Newton, Gould, et al.
- 1998
(Show Context)
Citation Context ...0,000 for Data Sets 4, 5, and 6. This model was chosen for the simulation study in order to study the performance of the importance sampling methods for a particular application in molecular biology (=-=Newton et al, 1998-=-; Desai, 2000). Data Sets i and 2 are each a subset from one of two allelotype data sets (Barrett et al, 1996; and Shibagaki et al, 1994, respectively) where it is of interest to determine whether the... |

10 |
Mixture Models for Genetic changes in cancer cells
- Desai
- 2000
(Show Context)
Citation Context ...4, 5, and 6. This model was chosen for the simulation study in order to study the performance of the importance sampling methods for a particular application in molecular biology (Newton et al, 1998; =-=Desai, 2000-=-). Data Sets i and 2 are each a subset from one of two allelotype data sets (Barrett et al, 1996; and Shibagaki et al, 1994, respectively) where it is of interest to determine whether there exist two ... |

10 |
Consistent estimate of the order of mixture models. Comptes Rendues de l’Academie des Sciences, Série I-Mathématiques 326
- Keribin
- 1998
(Show Context)
Citation Context ... (Kass and Wasserman 1995; Raftery 1995). This justification does not hold for mixture models, although BIC does provide a consistent estimate of the number of components in the mixture (Leroux 1992; =-=Keribin 1998-=-), it leads to density estimates that are consistent for the true density (Roeder and Wasserman 1997), and it has given good results in a range of applications (e.g. Dasgupta and Raftery 1998; Fraley ... |

9 | Reweighting Monte Carlo Mixtures - Geyer - 1991 |

5 | Covariate selection in hierarchical models of hospital admission counts: A Bayes factor approach - Rozenkranz, Raftery - 1994 |

4 |
Contribution to the discussion of paper by Richardson and Green
- Celeux
- 1997
(Show Context)
Citation Context ...the usual problems with MCMC methods (dependent samples, convergence issues, complexity of programming and implementation), in mixture models they can easily fall foul of the label-switching problem (=-=Celeux 1997-=-; Stephens 1997, 2000). For example, Neal (1998) pointed out that Chib's (1995) results for a mixture model were in error for this reason. Assessing the accuracy of the estimated integrated likelihood... |

4 | Erroneous results in "Marginal likelihood from the Gibbs output". Manuscript. ftp.cs.utoronto.ca/pub/radford/chib-letter.pdf - Neal - 1998 |

4 | Adaptive mixture importance sampling - Raghavan, Cox - 1998 |

4 |
Contribution to the discussion of Richardson and
- Stephens
- 1997
(Show Context)
Citation Context ...blems with MCMC methods (dependent samples, convergence issues, complexity of programming and implementation), in mixture models they can easily fall foul of the label-switching problem (Celeux 1997; =-=Stephens 1997-=-, 2000). For example, Neal (1998) pointed out that Chib's (1995) results for a mixture model were in error for this reason. Assessing the accuracy of the estimated integrated likelihoods is not trivia... |

3 |
Computational aspects of fitting a mixture of two normal distributions using maximum likelihood
- Atwood, Wilson, et al.
- 1992
(Show Context)
Citation Context ...e components, and so has G! modes of the same height. Additional local modes are often present (Lindsay, 1995; Titterington, Smith and Makov, 1985), especially when more than G components are fitted (=-=Atwood et al, 1992-=-). The Laplace method (e.g. Tierney and Kadane 1986) provides an analytic approximation to the integrated likelihood based on the assumption that the posterior distribution is approximately elliptical... |

2 | A new importance sampling method to compute bayes factors for mixure models with application to allelic-loss data - Desai, Emond - 2001 |