## Computing Normalizing Constants for Finite Mixture Models via Incremental Mixture Importance Sampling (IMIS) (2003)

### Cached

### Download Links

Citations: | 14 - 5 self |

### BibTeX

@MISC{Steele03computingnormalizing,

author = {Russell J. Steele and Adrian E. Raftery and Mary J. Emond},

title = {Computing Normalizing Constants for Finite Mixture Models via Incremental Mixture Importance Sampling (IMIS)},

year = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

We propose a method for approximating integrated likelihoods in finite mixture models. We formulate the model in terms of the unobserved group memberships, z, and make them the variables of integration. The integral is then evaluated using importance sampling over the z. We propose an adaptive importance sampling function which is itself a mixture, with two types of component distributions, one concentrated and one diffuse. The more concentrated type of component serves the usual purpose of an importance sampling function, sampling mostly group assignments of high posterior probability. The less concentrated type of component allows for the importance sampling function to explore the space in a controlled way to find other, unvisited assignments with high posterior probability. Components are added adaptively, one at a time, to cover areas of high posterior probability not well covered by the current important sampling function. The method is called Incremental Mixture Importance Sampling (IMIS). IMIS is easy to implement and to monitor for convergence. It scales easily for higher dimensional

### Citations

2490 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...by Rozenkranz and Raftery (1994), Raftery (1996b), and Lewis and Raftery (1997). The Bayesian information criterion (BIC) can be used as the basis for an asymptotic approximation to the Bayes factor (=-=Schwarz 1978-=-; Kass and Wasserman 1995; Raftery 1995). For finite mixture models, however, none of these methods is fully satisfactory. Two features of mixture models make many current methods for approximating th... |

1049 | Bayes factors
- Kass, Raftery
- 1995
(Show Context)
Citation Context ...ce approximate 95% confidence bands for Î within 1.0 of the “gold” standard long EMC estimate. This is adequate for interpretation on the standard scale for interpreting Bayes factors (Jeffreys 1961; =-=Kass and Raftery 1995-=-), which views a Bayes factor of three or less as weak evidence or, in Jeffreys’s words, “evidence not worth more than a bare mention.” Sampling from the prior gives a reasonably good answer when aver... |

655 | Statistical Analysis of Finite Mixture Distributions - Titterington, Smith, et al. - 1985 |

497 | Bayesian Classification (AutoClass): Theory and Results - Cheeseman, Stutz - 1996 |

461 | On Bayesian analysis of mixtures with an unknown number of components
- Richardson, Green
- 1997
(Show Context)
Citation Context ... addressed difficulties with the multiple likelihood modes due to label-switching by specifying ordering contraints on the parameters by, for example, constraining θ1 >θ2 for a two-component mixture (=-=Richardson and Green 1997-=-). There are two drawbacks to this sort of prior specification. First, ordering components can become complicated as the dimensionality of θi and the number of groups both increase. Second, other rese... |

431 | Monte Carlo Methods - Hammersley, Handscomb - 1964 |

343 | Marginal likelihood from the Gibbs output
- Chib
- 1995
(Show Context)
Citation Context ...Laplace method does not work in this situation. Markov chain Monte Carlo (MCMC) can be used to estimate mixture models, and associated methods can be used to approximate integrated likelihoods (e.g., =-=Chib 1995-=-;s714 R. J. STEELE, A.E.RAFTERY, AND M. J. EMOND Raftery 1996b). However, in addition to the usual problems with MCMC methods (dependent samples, convergence issues, complexity of programming and impl... |

298 | How many clusters? Which clustering method? Answers Via Model Based Cluster Analysis - Fraley, Raftery - 1998 |

296 |
Adaptive rejection sampling for Gibbs sampling
- Gilks, Wild
- 1992
(Show Context)
Citation Context ...method, we used simple Metropolis proposal densities. Other proposal approaches and tuning parameters could have been used that may have increased the efficiency of the bridge sampling estimator (see =-=Gilks and Wild 1992-=- or Neal 2003 for examples). However, the fact that other more sophisticated (and computationally expensive) adjustments must be made in order to get better estimates from MCMC-based methods only make... |

282 | Model-based clustering, discriminant analysis, and density estimation - Fraley, Raftery - 2002 |

209 |
Accurate Approximations for Posterior Moments and Marginal Densities
- Tierney, Kadane
- 1986
(Show Context)
Citation Context ...as G! modes of the same height. Additional local modes are often present (Lindsay 1995; Titterington, Smith, and Makov 1985; Atwood, Wilson, Elston, and Bailey-Wilson 1992). The Laplace method (e.g., =-=Tierney and Kadane 1986-=-) provides an analytic approximation to the integrated likelihood based on the assumption that the posterior distribution is approximately elliptically contoured (e.g., Raftery 1996a), and when this a... |

168 | A Monte carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms - Wei, Tanner - 1990 |

148 | Simulating normalizing constants: from importance sampling to bridge sampling to path sampling, Statist - Gelman, Meng - 1998 |

134 | A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion
- E, WASSERMAN
- 1994
(Show Context)
Citation Context ...and Raftery (1994), Raftery (1996b), and Lewis and Raftery (1997). The Bayesian information criterion (BIC) can be used as the basis for an asymptotic approximation to the Bayes factor (Schwarz 1978; =-=Kass and Wasserman 1995-=-; Raftery 1995). For finite mixture models, however, none of these methods is fully satisfactory. Two features of mixture models make many current methods for approximating the integrated likelihood p... |

127 |
Mixture models: theory, geometry and applications
- Lindsay
- 1995
(Show Context)
Citation Context ... hold in finite mixture models whenever one estimates a model with G components but the true number of components is smaller, so that the true parameter values lie on the edge of the parameter space (=-=Lindsay 1995-=-). A second feature is the “label-switching” problem, namely that the likelihood is invariant to relabeling of the mixture components, and so has G! modes of the same height. Additional local modes ar... |

119 | Practical Bayesian density estimation using mixtures of normals - Roeder, Wasserman - 1997 |

118 | Dealing with label-switching in mixture models - Stephens - 2000 |

116 | W.: Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica 6
- Meng, Wong
- 1996
(Show Context)
Citation Context ...ikewise in order to facilitate comparisons with other methods. Liang and Wong (2001) suggested a simulated annealing MCMC approach for calculating normalizing constants combined with bridge sampling (=-=Meng and Wong 1996-=-). Their method, called evolutionary Monte Carlo (EMC), requires running several (in their examples, 20) Markov chains, each of which samples from fi(τ) =(p(y|τ,G)) ui p(τ|G), where ui =0, 0.05,...,1.... |

115 | Computational and inferential difficulties with mixture posterior distributions
- Celeux, Hurn, et al.
- 2000
(Show Context)
Citation Context ... as the dimensionality of θi and the number of groups both increase. Second, other researchers have found that ordering the components can cause computational and inferential difficulties (see, e.g., =-=Celeux et al. 2000-=- and Stephens 2000). Another difficulty with sampling from p(z|ˆτ,y) is that p(z|ˆτ,y) often contains many values close to 1, which does not allow the importance sampling function to explore much of t... |

114 | Assessing a mixture model for clustering with the integrated completed likelihood - Biernacki, Celeux, et al. - 2000 |

109 | Marginal likelihood from the Metropolis–Hastings output - Chib, Jeliazkov - 2001 |

85 |
Bayesian Model Selection
- Raftery
- 1995
(Show Context)
Citation Context ...ry (1996b), and Lewis and Raftery (1997). The Bayesian information criterion (BIC) can be used as the basis for an asymptotic approximation to the Bayes factor (Schwarz 1978; Kass and Wasserman 1995; =-=Raftery 1995-=-). For finite mixture models, however, none of these methods is fully satisfactory. Two features of mixture models make many current methods for approximating the integrated likelihood problematic. Th... |

74 |
Approximating Posterior Distribution by Mixture
- West
- 1993
(Show Context)
Citation Context ...ce. Hesterberg (1995) suggested a simple fix for this particular drawback of importance sampling. Although mixtures of importance sampling functions had been proposed in the past (Oh and Berger 1993; =-=West 1993-=-; Givens and Raftery 1996), Hesterberg (1995) was the first to suggest using the Monte Carlo sampling function, p(z), as a component of the mixture importance sampling function δp(z)+(1− δ)g(z), givin... |

72 | Sequential Importance Sampling for Nonparametric Bayes Models: The Next Generation - MacEachern, Clyde, et al. - 1999 |

67 | Bayesian analysis of mixture models with an unknown number of components - an alternative to reversible jump methods - Stephens |

60 | Density estimation with confidence sets exemplified by superclusters and voids in the galaxies - Roeder - 1992 |

57 |
Weighted average importance sampling and defensive mixture distributions
- Hesterberg
- 1995
(Show Context)
Citation Context ...er than the parameters of the component densities) that are themselves mixtures and are specified adaptively. We propose two approaches to this. The first takes defensive mixture importance sampling (=-=Hesterberg 1995-=-; Raghavan and Cox 1998) as a starting point, and the second is based on sampling via perturbation of an initial grouping that has high posterior probability. One key advantage of our approach is that... |

47 | Theory of Probability, 3rd Edition - Jeffreys - 1961 |

46 | W.: Real-Parameter Evolutionary Monte Carlo With Applications to Bayesian Mixture Models - Liang, Wong |

42 | Inference in molecular population genetics - Stephens, Donnelly - 2000 |

41 |
Theory of probability (3rd
- Jeffreys
- 1961
(Show Context)
Citation Context ... all runs produce approximate 95% confidence bands for Î within 1.0 of the “gold” standard long EMC estimate. This is adequate for interpretation on the standard scale for interpreting Bayes factors (=-=Jeffreys 1961-=-; Kass and Raftery 1995), which views a Bayes factor of three or less as weak evidence or, in Jeffreys’s words, “evidence not worth more than a bare mention.” Sampling from the prior gives a reasonabl... |

36 | Methods for approximating integrals in statistics with special emphasis on Bayesian integration problems - Evans, Swartz - 1995 |

35 | Estimating Bayes factors via posterior simulation with the Laplace-Metropolis estimator
- Lewis, Raftery
- 1997
(Show Context)
Citation Context ...y elliptically contoured (e.g., Raftery 1996a), and when this assumption holds it can provide approximations of remarkable quality (e.g., Tierney and Kadane 1986; Grunwald, Guttorp, and Raftery 1993; =-=Lewis and Raftery 1997-=-). However, for mixture models this assumption fails when the model being fit has G components and the actual number of components is smaller (Lindsay 1995), which is a situation of great interest for... |

31 |
Local adaptive importance sampling for multivariate densities with strong nonlinear relationships
- Givens, Raftery
- 1996
(Show Context)
Citation Context ...erg (1995) suggested a simple fix for this particular drawback of importance sampling. Although mixtures of importance sampling functions had been proposed in the past (Oh and Berger 1993; West 1993; =-=Givens and Raftery 1996-=-), Hesterberg (1995) was the first to suggest using the Monte Carlo sampling function, p(z), as a component of the mixture importance sampling function δp(z)+(1− δ)g(z), giving the following importanc... |

30 | Rejection control for sequential importance sampling - Liu, Chen, et al. - 1998 |

30 | Safe and effective importance sampling - Owen, Zhou - 2000 |

26 |
Integration of multimodal functions by Monte Carlo importance sampling
- Oh, Berger
- 1993
(Show Context)
Citation Context ... a very large variance. Hesterberg (1995) suggested a simple fix for this particular drawback of importance sampling. Although mixtures of importance sampling functions had been proposed in the past (=-=Oh and Berger 1993-=-; West 1993; Givens and Raftery 1996), Hesterberg (1995) was the first to suggest using the Monte Carlo sampling function, p(z), as a component of the mixture importance sampling function δp(z)+(1− δ)... |

20 | Monte Carlo Methods - Chen, Shao, et al. - 2000 |

20 |
Bayesian analysis of mixtures with an unknown number of components — an alternative to reversible jump methods
- Stephens
- 2000
(Show Context)
Citation Context ...ent samples, convergence issues, complexity of programming and implementation), in mixture models they can easily fall foul of the label-switching problem (Celeux 1997; Celeux, Hurn, and Robert 2000; =-=Stephens 1997-=-, 2000b). For example, Neal (1998) pointed out that Chib’s (1995) results for a mixture model were in error for this reason. The problem could be solved correctly using the methods of Chib and Jeliazk... |

17 |
An Attempt to Define the Nature of Chemical Diabetes using
- M, MILLER
- 1979
(Show Context)
Citation Context ...onte Carlo integration for this example makes it undesirable, and it is included here only for comparison. 3.2 DIABETES DATA Next we consider a higher-dimensional example from the medical literature (=-=Reaven and Miller 1979-=-). The dataset consists of blood measures of insulin, glucose, and insulin resistance levels (SSPG) for 145 diabetes patients; the pairs plot of the data is shown in Figure 4. Fraley and Raftery (1998... |

15 | Time series of continuous proportions - Grunwald, Raftery, et al. - 1993 |

15 | Erroneous results in “Marginal likelihood from the Gibbs output - Neal - 1999 |

13 | Probes of large-scale structure in the Corona Borealis region - Postman, Huchra, et al. - 1986 |

12 | Determination of the frequency of loss of heterozygosity in esophageal adeno-carcinoma nu cell sorting, whole genome amplification and microsatellite polymorphisms,” Oncogene - Barrett, Galipeau, et al. - 1996 |

11 | On the statistical analysis of allelic-loss data - Newton, Gould, et al. - 1998 |

10 | Mixture Models for Genetic changes in cancer cells - Desai - 2000 |

9 | Reweighting Monte Carlo Mixtures
- Geyer
- 1991
(Show Context)
Citation Context ...ng the mixture importance sampling function by incrementally adding components to the mixture to capture parts of the space that have been missed. We use a mixture importance sampling function (as in =-=Geyer 1991-=-), based on several τ ∗ j ’s, where each τ ∗ j =(θ∗ ,π ∗ ) corresponds to a local posterior mode in the parameter space. In the notation of Section 2.2, we will be adaptively constructing a function o... |

8 | Methods for Approximating Integrals - Evans, Swartz - 1995 |

5 | Covariate selection in hierarchical models of hospital admission counts: A Bayes factor approach - Rozenkranz, Raftery - 1994 |

4 | Adaptive mixture importance sampling
- Raghavan, Cox
- 1998
(Show Context)
Citation Context ...eters of the component densities) that are themselves mixtures and are specified adaptively. We propose two approaches to this. The first takes defensive mixture importance sampling (Hesterberg 1995; =-=Raghavan and Cox 1998-=-) as a starting point, and the second is based on sampling via perturbation of an initial grouping that has high posterior probability. One key advantage of our approach is that the algorithm does not... |