#### DMCA

## Bayesian density estimation and inference using mixtures. (1995)

Venue: | J. Amer. Statist. Assoc. |

Citations: | 653 - 18 self |

### Citations

3626 | Equation of state calculations by fast computing machines
- Metropolis, Rosenbluth, et al.
- 1953
(Show Context)
Citation Context ...ersity, Durham, NC 27708. Michael D. Escobar was partially financed by National Cancer Institute Grant RO1-CA54852-01, a National Research Service Award from National Institutes of Mental Health Grant MH 15758 and by the National Science and Engineering Research Council of Canada. Mike West was partially financed by National Science Foundation Grants DMS-8903842 and DMS-9024793. The authors would like to thank Hani Doss and Steve MacEachern for helpful discussions. (1990). Some of the earlier eferences on Markov Chain Monte Carlo methods include work of Geman and Geman (1984), Hasting (1970), Metropolis et al. (1953), and Tanner and Wong (1987). Besag and Green (1993) and Smith and Roberts (1993) recently reviewed Markov Chain Monte Carlo methods. The basic normal mixture model, similar to that of Ferguson (1983), is described as follows. Suppose that data Y1, ... . Y, are conditionally independent and normally distributed, (YYi 17ri ) - N(Mu, JVi), with means Ai and variances Vi determining the parameters 7ri = (Ai , Vi), i = 1, .. ., n. Suppose further that the 7ri come from some prior distribution on 9 X X+. Having observed ata Dn = {Y1i .., Yn }, with yi the observed value of Yi, the distribution fa f... |

3087 |
An Introduction to Probability Theory and Its Applications,
- FELLER
- 1968
(Show Context)
Citation Context ... distinct values ir = (O4 V), with some nj taking this common value. Then (7) reduces to (7ri 1r(i), Dn) qoGi (ir ) + I qj" 6;(r ), where the weights now include the nj (viz., qj oc njexp{-(yi - (2V*)} (2 VJ /)12 The sampling process results in an approximate draw from p(I Dn). Escobar (1994) discussed theoretical spects of convergence inthe simpler case where Vi is known. Unfortunately, the proof in that simple case does not extend easily to this model, because the qj can get arbitrarily close to 1. This results in a violation of the equicontinuity condition required by Escobar (1988, 1994), Feller (1971, pp. 271- 272) and Tanner and Wong (1987). Instead, we use the results from Tierney (1994), which are based on the monograph by Nummelin (1984). The theorem is stated later; the proof and additional discussion of convergence issues are contained in the Appendix. Let Q,( ir(O), A) be the probability hat, with initial value -r(O) and after one iteration, Algorithm I produces a sample value that is contained in the measurable set A. Let Q'(-r(O), A) be the probability that, with initial value ir(O) and after s iterations, Algorithm I produces a sample value that is contained in the measurable se... |

2368 | An Introduction to Probability Theory and its - Feller - 1971 |

1246 |
Sampling-based approaches to calculating marginal densities
- Gelfand, Smith
- 1990
(Show Context)
Citation Context ... the Monte Carlo approximation to(6) given by N P(Y_ +, ID-) -N- 1 7,P(Yn+,I m(r) -m (r)- (r))- 8 r=l with the summands given by the mixtures in (5 ), and the notation ow explicitly recognizes the dependence on the sampled values of m and T. Additional information available includes the sampled values of k, {k(r), r = 1, . .., NJ, which directly provide a histogram approximation to p (k I Dn), of interest inassessing the number of components. The posteriors for m and/or T may also be approximated by mixture of their conditional posteriors noted earlier, following eneral principles xpounded by Gelfand and Smith (1990). For m, this leads to the mixture of normals p(m I Dn) N- p(mIr(r), r(r)); for , to the mixture of inverse gammas p(r I D") N`1 p(-r I m(r), r(r)), the sums being over r = 1, . .. , N in each case. Using theorem 3of Tierney ( 1994 ), it can be shown that the path averages of bounded functions converge almost surely to their posterior expectations. Therefore, stimates of the cumulative distribution functions, estimates of the probability functions of discrete random variables, and histogram estimates of probability density functions allconverge almost surely to the posterior expectations. The ... |

1209 |
A bayesian analysis of some nonparametric problems. The annals of statistics
- Ferguson
- 1973
(Show Context)
Citation Context ...ture of normals (Escobar 1994; Ferguson 1983; West 1990). In particular, we suppose that G - D(aGo), a Dirichlet process defined by a, a positive scalar, and Go( * ), a specified bivariate distribution function over R X X+. Go( * ) is the prior expectation of G( ), so that E{ G(1r) } = Go(r) for all i- E R X X+, and a is a precision parameter, determining the concentration fthe prior for G(*) about Go(*). Write ir = {frl, . . ., lrn}. A key feature of the model structure, and of its analysis, relates to the discreteness ofG( * ) under the Dirichlet process assumption. (Details may be found in Ferguson 1973.) Briefly, inany sample i- of size n from G( * ) there is positive probability ofcoincident values. See this as follows. For any i = 1, ... ., n, let r (i) be gr without gri . r (i) =I7 { , . . . , '7ri- 1 +ri ..1 . , rn }. Then the conditional prior for (7rk I r (i)) is n (1ri Ilr(i)) - aan_1Go(1rir) + an-I 6( 1ri)X (1) j= 1,joi where kXj(r) denotes a unit point mass at i- = -xi and ar = 1/ (a + r) for positive integers r.Similarly, the distribution f (Orn +I I X?) is given by n (irn+I 1ir) - aanG0(7rn+l) + an ,1i(7rn+l)) (2) i=l Thus, given ii-, a sample of size n from G(.*), the next case ... |

959 |
Markov chains and stochastic stability.
- Meyn, Tweedie
- 1993
(Show Context)
Citation Context ...or distribution, that the chain be Harris recurrent, and that the posterior expectations of p(Y,+ I I r(r), m(r), r(r)) be bounded and equal to p(Yn+, IDn). By our Theorem 2, we know that the Markov chain converges. It is straightforward to show that the path averages have the right expectation. We will make our recurrent Markov chain Harris recurrent by throwing away a set of starting values that have measure zero under the posterior distribution. See this as follows. First of all, we know that our Markov chain is positive recurrent by theorem 1 of Tierney (1994). Theorems 9.0.1 and 9.1.5 of Meyn and Tweedie (1993). state that the state space can be divided into two disjoint sets H and T, where the set T is a transient null set, and where the set H is an absorbing set with the property that our Markov chain restricted to this set is Harris recurrent. Therefore, if we do not use starting values in T, then we start our chain in the absorbing set H, and in this state space our recurrent chain is Harris recurrent. For a fuller discussion of this argument, please see the discussion around theorems 9.0.1 and 9.1.5 of Meyn and Tweedie (1993). Finally, we need to show that the posterior expectations are finite.... |

926 | The calculation of posterior distributions by data augmentation
- Tanner, Wong
- 1987
(Show Context)
Citation Context ...ael D. Escobar was partially financed by National Cancer Institute Grant RO1-CA54852-01, a National Research Service Award from National Institutes of Mental Health Grant MH 15758 and by the National Science and Engineering Research Council of Canada. Mike West was partially financed by National Science Foundation Grants DMS-8903842 and DMS-9024793. The authors would like to thank Hani Doss and Steve MacEachern for helpful discussions. (1990). Some of the earlier eferences on Markov Chain Monte Carlo methods include work of Geman and Geman (1984), Hasting (1970), Metropolis et al. (1953), and Tanner and Wong (1987). Besag and Green (1993) and Smith and Roberts (1993) recently reviewed Markov Chain Monte Carlo methods. The basic normal mixture model, similar to that of Ferguson (1983), is described as follows. Suppose that data Y1, ... . Y, are conditionally independent and normally distributed, (YYi 17ri ) - N(Mu, JVi), with means Ai and variances Vi determining the parameters 7ri = (Ai , Vi), i = 1, .. ., n. Suppose further that the 7ri come from some prior distribution on 9 X X+. Having observed ata Dn = {Y1i .., Yn }, with yi the observed value of Yi, the distribution fa future case is a mixture of n... |

906 | Statistical Analysis of Finite Mixture Distributions. - Titterington, Smith, et al. - 1985 |

643 |
Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics,
- Antoniak
- 1974
(Show Context)
Citation Context ... n ) = Z P (Y n+1 j��)dP (��jD n ): (6) Direct evaluation of (6) is computationally extremely involved for even rather small sample size n due to the inherent complexity of the posterior P (��=-=��jD n ) (Antoniak 1974-=-; Escobar 1992; Lo 1984; West 1990). Fortunately, Monte Carlo approximation is possible using extensions of the iterative technique in Escobar (1988, 1994), now described. 3. COMPUTATIONS Recall that,... |

520 | Sampling based approaches to calculating marginal densities. - Gelfand, Smith - 1990 |

456 |
Bayesian Computation via the Gibbs Sampler and Related Markov Chain Monte Carlo Methods,"
- Smith, Roberts
- 1993
(Show Context)
Citation Context ...ancer Institute Grant RO1-CA54852-01, a National Research Service Award from National Institutes of Mental Health Grant MH 15758 and by the National Science and Engineering Research Council of Canada. Mike West was partially financed by National Science Foundation Grants DMS-8903842 and DMS-9024793. The authors would like to thank Hani Doss and Steve MacEachern for helpful discussions. (1990). Some of the earlier eferences on Markov Chain Monte Carlo methods include work of Geman and Geman (1984), Hasting (1970), Metropolis et al. (1953), and Tanner and Wong (1987). Besag and Green (1993) and Smith and Roberts (1993) recently reviewed Markov Chain Monte Carlo methods. The basic normal mixture model, similar to that of Ferguson (1983), is described as follows. Suppose that data Y1, ... . Y, are conditionally independent and normally distributed, (YYi 17ri ) - N(Mu, JVi), with means Ai and variances Vi determining the parameters 7ri = (Ai , Vi), i = 1, .. ., n. Suppose further that the 7ri come from some prior distribution on 9 X X+. Having observed ata Dn = {Y1i .., Yn }, with yi the observed value of Yi, the distribution fa future case is a mixture of normals; the relevant density function Yn+ I N(gn+1 , ... |

372 |
General Irreducible Markov Chains and Non-Negative Operators,
- Nummelin
- 1984
(Show Context)
Citation Context ... weights now include the nj (viz., qj oc njexp{-(yi - (2V*)} (2 VJ /)12 The sampling process results in an approximate draw from p(I Dn). Escobar (1994) discussed theoretical spects of convergence inthe simpler case where Vi is known. Unfortunately, the proof in that simple case does not extend easily to this model, because the qj can get arbitrarily close to 1. This results in a violation of the equicontinuity condition required by Escobar (1988, 1994), Feller (1971, pp. 271- 272) and Tanner and Wong (1987). Instead, we use the results from Tierney (1994), which are based on the monograph by Nummelin (1984). The theorem is stated later; the proof and additional discussion of convergence issues are contained in the Appendix. Let Q,( ir(O), A) be the probability hat, with initial value -r(O) and after one iteration, Algorithm I produces a sample value that is contained in the measurable set A. Let Q'(-r(O), A) be the probability that, with initial value ir(O) and after s iterations, Algorithm I produces a sample value that is contained in the measurable set A. For the Markov chain implied by Algorithm I, Q,( *, * ) is called the transition kernel for the Markov chain. (For an explicit representati... |

157 |
Spatial statistics and Bayesian computation.
- Besag, PJ
- 1993
(Show Context)
Citation Context ...ally financed by National Cancer Institute Grant RO1-CA54852-01, a National Research Service Award from National Institutes of Mental Health Grant MH 15758 and by the National Science and Engineering Research Council of Canada. Mike West was partially financed by National Science Foundation Grants DMS-8903842 and DMS-9024793. The authors would like to thank Hani Doss and Steve MacEachern for helpful discussions. (1990). Some of the earlier eferences on Markov Chain Monte Carlo methods include work of Geman and Geman (1984), Hasting (1970), Metropolis et al. (1953), and Tanner and Wong (1987). Besag and Green (1993) and Smith and Roberts (1993) recently reviewed Markov Chain Monte Carlo methods. The basic normal mixture model, similar to that of Ferguson (1983), is described as follows. Suppose that data Y1, ... . Y, are conditionally independent and normally distributed, (YYi 17ri ) - N(Mu, JVi), with means Ai and variances Vi determining the parameters 7ri = (Ai , Vi), i = 1, .. ., n. Suppose further that the 7ri come from some prior distribution on 9 X X+. Having observed ata Dn = {Y1i .., Yn }, with yi the observed value of Yi, the distribution fa future case is a mixture of normals; the relevant den... |

143 |
Using Kernel Density Estimates to Investigate Multimodality,”
- Silverman
- 1981
(Show Context)
Citation Context ...(r) , m (r)) A pro I D") a.s r=1 Theorem 5. The estimate of the posterior density of m, evaluated at the fixed point moi, is strongly consistent for almost all starting values of the algorithm. That is, for almost all starting values, given any fixed point mo, N~~~~~~-o N-1 zp(mo I r(r) ,() pp(mo IDO) a.s. r=1 4. MIXTURE DECONVOLUTION Common, and closely linked, objectives in density estimation are the assessment of the number of components of a discrete mixture and inference about the number of modes of a population distribution (see, for example, Hartigan and Hartigan 1985, Roeder 1990, and Silverman 1981). In our framework, prior and posterior distributions for the number of components underlying anobserved ata set are readily derived, as is shown and Illustrated here, and, if desired, inference on modality questions can be deduced as a byproduct. Consider generating a sample ir of size n from the model in ( 1), resulting inpredicting anobservation using the mixture (5 ). With knowledge of ir, this mixture isthe Bayesian Escobar and West: Bayesian Inference for Density Estimation 581 estimate of the population distribution. The number of distinct components k from which the n realized observat... |

135 | A semiparametric Bayesian model for randomised block designs. - Bush, MacEachern - 1996 |

123 | The dip test of unimodality,”
- Hartigan, Hartigan
- 1985
(Show Context)
Citation Context ...er of modes of a population distribution. Roeder (1990), for example, nonparametric inference on the number of modes in a mixture. Various methods exists for inference about the modality of mixtures (=-=Hartigan and Hartigan 1985-=-; Silverman 1981; Roeder 1990), though approaches to direct inference on numbers of components are less well-developed. In our framework, prior and posterior distributions for Bayesian density estimat... |

110 |
Facilitating the Gibbs sampler: the Gibbs stopper and the griddy-Gibbs sampler.
- Ritter, Tanner
- 1992
(Show Context)
Citation Context ...d depending on the form of the prior p(ff): Alternatively, we may discretise the range of ff so that (11) provides a discrete approximation to the posteriors -- the so-called `griddy Gibbs' approach (=-=Ritter and Tanner 1991-=-). More attractively, sampling from the exact, continuous posterior (11) is possible in the Gibbs iterations when the prior p(ff) comes from the class of mixtures of gamma distributions. We develop th... |

31 |
Exploring Posterior Distributions using Markov Chains,"
- Tierney
- 1991
(Show Context)
Citation Context ...on functions appears in Figures 1(c) and 1(d). A nice way to exhibit uncertainties about density and distribution functions is via `live' animated graphical display of sequentially sampled functions (=-=Tierney 1991-=-). Restricted to static plots, we prefer displaying sampled curves to bands mapping pointwise interval estimates of the functions since the latter do not define density or distribution functions. The ... |

30 |
Monte Carlo Sampling Methods Using Markov Chains and Their Applications,"
- Hasting
- 1970
(Show Context)
Citation Context ...ences, Duke University, Durham, NC 27708. Michael D. Escobar was partially financed by National Cancer Institute Grant RO1-CA54852-01, a National Research Service Award from National Institutes of Mental Health Grant MH 15758 and by the National Science and Engineering Research Council of Canada. Mike West was partially financed by National Science Foundation Grants DMS-8903842 and DMS-9024793. The authors would like to thank Hani Doss and Steve MacEachern for helpful discussions. (1990). Some of the earlier eferences on Markov Chain Monte Carlo methods include work of Geman and Geman (1984), Hasting (1970), Metropolis et al. (1953), and Tanner and Wong (1987). Besag and Green (1993) and Smith and Roberts (1993) recently reviewed Markov Chain Monte Carlo methods. The basic normal mixture model, similar to that of Ferguson (1983), is described as follows. Suppose that data Y1, ... . Y, are conditionally independent and normally distributed, (YYi 17ri ) - N(Mu, JVi), with means Ai and variances Vi determining the parameters 7ri = (Ai , Vi), i = 1, .. ., n. Suppose further that the 7ri come from some prior distribution on 9 X X+. Having observed ata Dn = {Y1i .., Yn }, with yi the observed value of... |

28 |
Estimating the Means of Several Normal Populations by Nonparametric Estimation of the Distribution of the Means,"
- Escobar
- 1988
(Show Context)
Citation Context ...+1 jD n ): If the common prior distribution for the �� i is uncertain and modelled, in whole or in part, as a Dirichlet process, then the data come from a Dirichlet mixture of normals (Ferguson 19=-=83; Escobar 1988, 19-=-94; West 1990). The important special case in which V i = V has been studied widely; references appear in West (1990, 1992) who considers the common setup in which the �� i have a uncertain prior ... |

25 |
Computations of mixtures of Dirichlet Processes .
- Kuo
- 1986
(Show Context)
Citation Context ... who considered the common setup in which the Ai have an uncertain prior that is modeled as a Dirichlet process with a normal base measure (see also West and Cao 1993). The connections with kernel estimation techniques are explored in these papers, as are some analytic and numerical pproximations to the predictive distributions derived from such models. The analysis covers problems of estimating the Vi. Escobar (1988, 1994) considered similar models, differing in the use of a uniform Dirichlet process base measure, and assuming Vi = V, known. Ferguson (1983), using Monte Carlo techniques from Kuo (1986), considered more generally the case of possibly distinct and uncertain Vi. The suitability of this model form for density estimation has been well argued there and in the earlier eferences. With a suitable Dirichlet process prior structure, described later, this model produces predictive distributions qualitatively similar to kernel techniques, but catering for differing degrees of smoothing across the sample space through the use of possibly differing variances ?3 1995 American Statistical Association Journal of the American Statistical Association June 1995, Vol. 90, No. 430, Theory and Met... |

21 | Probes of LargeScale Structure in the Corona Borealis Region," - Postman, Huchra, et al. - 1986 |

11 | Estimating Normal Means With a ConjugateStyle Dirichlet Process Prior," - MacEachern - 1994 |

10 | Assessing mechanisms of neural synaptic activity - WEST, CAO - 1993 |

9 |
On a Class of Bayesian Nonparametric Estimates: 1. Density Estimates,"
- Lo
- 1984
(Show Context)
Citation Context ...D n ): (6) Direct evaluation of (6) is computationally extremely involved for even rather small sample size n due to the inherent complexity of the posterior P (��jD n ) (Antoniak 1974; Escobar 19=-=92; Lo 1984; We-=-st 1990). Fortunately, Monte Carlo approximation is possible using extensions of the iterative technique in Escobar (1988, 1994), now described. 3. COMPUTATIONS Recall that, for each i, �� (i) = f... |

7 |
The geometry of mixture likelihoods, part I: a general theory.
- Lindsay
- 1983
(Show Context)
Citation Context ...based on mixtures of standard components, such as normal mixtures, underly mainstream approaches to density estimation, including kernel techniques (Silverman 1986), nonparametric maximum likelihood (=-=Lindsay 1983-=-), and Bayesian approaches using mixtures of Dirichlet processes (Ferguson 1983). The latter provide theoretical bases for more traditional, nonparametric methods, such as kernel techniques, and hence... |

4 |
Bayesian Kernel Density Estimation," Discussion Paper 90-A02,
- West
- 1990
(Show Context)
Citation Context ...nal nonparametric methods, uch as kernel techniques, and hence a modeling framework within which the various practical problems of local versus global smoothing, smoothing parameter estimation, and the assessment of uncertainty about density estimates may be addressed. In contrast with nonparametric approaches, a formal model allows these problems to be addressed irectly via inference about the relevant model parameters. We discuss these issues using data distributions derived as normal mixtures in the framework of mixtures of Dirichlet processes, essentially the framework of Ferguson (1983). West (1990) discussed these models in a special case of the framework studied here. West's paper is concerned with developing approximations to predictive distributions based on a clustering algorithm otivated by the model structure and draws obvious connections with kernel approaches. The current article develops, in a more general framework, a computational method that allows for the evaluation of posterior distributions for all model parameters and direct evaluation of predictive distributions. As a natural by-product, wedevelop approaches to inference about the numbers of components and modes in a po... |

3 |
Density Estimation With Confidence Sets Emplified by Superclusters and Voids in the Galaxies,"
- Roeder
- 1990
(Show Context)
Citation Context ...eder (1990), for example, nonparametric inference on the number of modes in a mixture. Various methods exists for inference about the modality of mixtures (Hartigan and Hartigan 1985; Silverman 1981; =-=Roeder 1990-=-), though approaches to direct inference on numbers of components are less well-developed. In our framework, prior and posterior distributions for Bayesian density estimation July 5, 1994 the number o... |

2 |
Bayesian Kernel Density Estimation," Discussion Paper #90-A02
- West
- 1990
(Show Context)
Citation Context ...mmon prior distribution for the �� i is uncertain and modelled, in whole or in part, as a Dirichlet process, then the data come from a Dirichlet mixture of normals (Ferguson 1983; Escobar 1988, 19=-=94; West 1990). T-=-he important special case in which V i = V has been studied widely; references appear in West (1990, 1992) who considers the common setup in which the �� i have a uncertain prior which is modelled... |

1 | Bayesian density estimation July 5 - Silverman - 1994 |