## Particle Learning for General Mixtures

### Cached

### Download Links

Citations: | 2 - 1 self |

### BibTeX

@MISC{Carvalho_particlelearning,

author = {M. Carvalho and Hedibert F. Lopes and Nicholas G. Polson and Matthew A. Taddy},

title = {Particle Learning for General Mixtures},

year = {}

}

### OpenURL

### Abstract

This paper develops efficient sequential learning methods for the estimation of general mixture models. The approach is distinguished from alternative particle filtering methods in two major ways. First, each iteration begins by resampling particles according to posterior predictive probability, leading to a more efficient set for propagation. Second, each particle tracks only the state of sufficient information for latent mixture components, thus leading to reduced dimensional inference. In addition, we describe how the approach will apply to more general mixture models of current interest in the literature; it is hoped that this will inspire a greater number of researchers to adopt sequential Monte Carlo methods for fitting their sophisticated mixture based models. Finally, we show that this particle learning approach leads to straightforward tools for marginal likelihood calculation and posterior cluster allocation. Specific versions of the algorithm are derived for standard density estimation applications based on both finite mixture models and Dirichlet process mixture models, as well as for the less common settings of latent feature selection through an Indian Buffet process and dependent distribution tracking through a probit stickbreaking model. Three simulation examples are presented: density estimation and model selection for a finite mixture model; a simulation study for Dirichlet process density estimation with as many as 12500 observations of 25 dimensional data, and an example of nonparametric mixture regression that requires learning truncated approximations to the infinite random mixing distribution.

### Citations

716 |
A Bayesian analysis of some nonparametric problems
- Ferguson
- 1973
(Show Context)
Citation Context ...bilities for the prior on G, including the simple finite dimensional models leading to a finite mixture models specification. The most common models, including the very popular Dirichlet process (DP; =-=Ferguson, 1973-=-), are based on the stick-breaking construction for an infinite set of probability weights. Other priors of this type include the beta two-parameter process (Ishwaran and Zarepour, 2000) and kernel st... |

520 | Filtering via simulation: Auxiliary particle filters
- Pitt, Shephard
- 1999
(Show Context)
Citation Context ...of particles. In contrast, PL always resamples particles first, proportional to the predictive probability of a new observation. In this regard, PL can be interpreted as an Auxiliary Particle Filter (=-=Pitt and Shephard, 1999-=-) version of MCL. Liu and Chen (1998) discuss the potential advantages of re-sampling first in the context of dynamic systems where only state variables are unknown – we take this discussion further b... |

443 | On Bayesian analysis of mixtures with an unknown number of components (with discussion - Richardson, Green - 1998 |

419 | Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems - Antoniak - 1974 |

400 | Bayesian density estimation and inference using mixtures
- Escobar, West
- 1995
(Show Context)
Citation Context ...nvolves sampling from an update gamma posterior for each particle. Example 2: DP Mixtures of Multivariate Normals The d-dimensional DP multivariate normal mixture (DP-MVN) model has density function (=-=Escobar and West, 1995-=-) ∫ f(yt; G) = N(yt|µt, Σt)dG(µt, Σt), and G ∼ DP (α, G0(µ, Σ)), (7) with given concentration parameter α and conjugate centering distribution G0 = N(µ; λ, Σ/κ) W(Σ −1 ; ν, Ω), where W(Σ −1 ; ν, Ω) de... |

375 | Markov Chain Sampling Methods for Dirichlet Process Mixture Models - Neal - 2000 |

326 | Marginal likelihood from the gibbs output
- Chib
- 1995
(Show Context)
Citation Context ...pproach offers a simple and robust sequential Monte Carlo alternative to the traditionally hard problem of approximating marginal predictive densities via MCMC output (see. e.g., Basu and Chib, 2003; =-=Chib, 1995-=-; Chib and Jeliazkov, 2001; Han and Carlin, 2001). The following sections will carefully discuss definition of Zt and exemplify application of this general PL mixtures strategy in a variety of models.... |

267 |
Ferguson distributions via Pólya urn schemes
- Blackwell, MacQueen
- 1973
(Show Context)
Citation Context ... probability function can be used to develop probability distributions over yt through use of de Finetti’s representation theorem. Many 7common nonparametric priors, including the Dirichlet process (=-=Blackwell and MacQueen, 1973-=-) and, more generally, beta-Stacy processes (Walker and Muliere, 1997), Pólya trees (Muliere and Walker, 1997), and species sampling models (Perman et al., 1992) can be characterized in this way. More... |

216 | Gibbs Sampling Methods for Stick-Breaking Priors - Ishwaran, James - 2001 |

185 | Infinite latent feature models and the Indian buffet process - Ghahramani, Griffiths - 2005 |

185 | Sequential imputations and Bayesian missing data problems - Kong, Liu, et al. - 1994 |

161 | Prior distributions on spaces of probability measures - Ferguson - 1974 |

147 | Sequential Monte Carlo samplers - Moral, Doucet, et al. - 2006 |

143 | Estimating Mixture of Dirichlet Process Models - MacEachern, Müller - 1998 |

130 | Variational inference for Dirichlet process mixtures - Blei, Jordan - 2005 |

111 | Dealing with label-switching in mixture models - Stephens - 2000 |

106 |
Exchangeable and partially exchangeable random partitions
- Pitman
- 1995
(Show Context)
Citation Context ...ery naturally within common Bayesian nonparametric modeling frameworks. In what follows, we are motivated by models where the prior on the mixing distribution is defined via a species sampling model (=-=Pitman, 1995-=-) that guarantees almost surely discrete realizations of G. Making a parallel to (1) and (2), an informal formulation of the collapsed state-space model is E [f(yt+1; G) | Zt] = E [dG(θ) | Zt] = ∫ ∫ k... |

106 |
On a class of Bayesian nonparametric estimates. I. Density estimates
- Lo
- 1984
(Show Context)
Citation Context ...parametric mixture models have, since the early work of Ferguson (1974) and Antoniak (1974), emerged as a dominant modeling tool for Bayesian nonparametric density estimation (see also Ferguson 1983; =-=Lo 1984-=-). Unlike in finite mixture models, the number of unique mixture components is random. For this reason, both mt and θ ⋆ t now depend upon the “time” t. Analogously to the finite setting, the posterior... |

101 |
Marginal Likelihood from Metropolis-Hastings Output
- Chib, Jeliazkov
- 2001
(Show Context)
Citation Context ...rs a simple and robust sequential Monte Carlo alternative to the traditionally hard problem of approximating marginal predictive densities via MCMC output (see. e.g., Basu and Chib, 2003; Chib, 1995; =-=Chib and Jeliazkov, 2001-=-; Han and Carlin, 2001). The following sections will carefully discuss definition of Zt and exemplify application of this general PL mixtures strategy in a variety of models. 2 Density Estimation In t... |

98 | A semiparametric bayesian model for randomised block designs - Bush, MacEachern - 1996 |

95 | Monte Carlo Smoothing for Nonlinear Time Series - Godsill, Doucet, et al. - 2004 |

71 |
A class of dependent Dirichlet Processes
- Griffin, Steel
- 2004
(Show Context)
Citation Context ...y directly. In addition, a number of different approaches have recently been proposed for the construction of dependent nonparametric mixture models with correlated stick-breaking weights (see, e.g., =-=Griffin and Steel, 2006-=-; Rodriguez and Dunson, 2009). Although these models tend to be more difficult to fit through MCMC than the single-θ type schemes, it is often possible to develop fairly straightforward PL simulation ... |

68 | Sequential importance sampling for nonparametric Bayes models: The next generation - MacEachern, Clyde, et al. - 1999 |

66 |
Size-biased sampling of Poisson point processes and excursions
- Perman, Pitman, et al.
- 1992
(Show Context)
Citation Context ...cluding the Dirichlet process (Blackwell and MacQueen, 1973) and, more generally, beta-Stacy processes (Walker and Muliere, 1997), Pólya trees (Muliere and Walker, 1997), and species sampling models (=-=Perman et al., 1992-=-) can be characterized in this way. More recently, Lee, Quintana, Müller, and Trippa (2008) propose a general approach to defining predictive probability functions for species sampling models, and arg... |

65 |
Bayesian Density Estimation by Mixtures of Normal Distributions
- Ferguson
- 1983
(Show Context)
Citation Context ...ls Discrete nonparametric mixture models have, since the early work of Ferguson (1974) and Antoniak (1974), emerged as a dominant modeling tool for Bayesian nonparametric density estimation (see also =-=Ferguson, 1983-=-; Lo, 1984). Unlike in finite mixture models, the number of unique mixture components is random. For this reason, both mt and θ ⋆ t now depend upon the “time” t. Analogously to the finite setting, the... |

63 | Bayesian nonparametric spatial modeling with Dirichlet process mixing - Gelfand, Kottas, et al. - 2005 |

61 | A sequential particle filter method for static models - Chopin - 2002 |

49 | Nonparametric Bayesian data analysis - Muller, Quintana - 2004 |

40 | Kernel stick-breaking processes
- Dunson, Park
- 2008
(Show Context)
Citation Context ...k-breaking construction for an infinite set of probability weights. Other priors of this type include the beta two-parameter process (Ishwaran and Zarepour, 2000) and kernel stick-breaking processes (=-=Dunson and Park, 2008-=-). Pólya trees (e.g. Paddock et al., 2003) provide an alternative where the distribution is built through a random partitioning of the measurable space. We refer the reader to Walker, Damien, Laud, an... |

37 | Particle filters for mixture models with an unknown number of components - Fearnhead - 2004 |

36 |
Markov chain Monte Carlo methods for computing Bayes factors
- Han, Carlin
- 2001
(Show Context)
Citation Context ...uential Monte Carlo alternative to the traditionally hard problem of approximating marginal predictive densities via MCMC output (see. e.g., Basu and Chib, 2003; Chib, 1995; Chib and Jeliazkov, 2001; =-=Han and Carlin, 2001-=-). The following sections will carefully discuss definition of Zt and exemplify application of this general PL mixtures strategy in a variety of models. 2 Density Estimation In this section, we consid... |

33 | The nested dirichlet process - Rodriguez, Dunson, et al. |

32 |
Markov chain Monte Carlo in approximate Dirichlet and beta twoparameter process hierarchical models. Biometrika
- Ishwaran, Zarepour
- 2000
(Show Context)
Citation Context ...ular Dirichlet process (DP; Ferguson, 1973), are based on the stick-breaking construction for an infinite set of probability weights. Other priors of this type include the beta two-parameter process (=-=Ishwaran and Zarepour, 2000-=-) and kernel stick-breaking processes (Dunson and Park, 2008). Pólya trees (e.g. Paddock et al., 2003) provide an alternative where the distribution is built through a random partitioning of the measu... |

29 | Bayesian nonparametric inference for random distributions and related functions (with discussion - Walker, Damien, et al. - 1999 |

27 | A computational approach for full nonparametric Bayesian inference under Dirichlet process mixture models - Gelfand, Kottas - 2002 |

23 | D.A.: On Population-Based Simulation for Static Inference - Jasra, Stephens - 2005 |

22 |
Marginal likelihood and Bayes factors for Dirichlet process mixture models
- Basu, Chib
- 2003
(Show Context)
Citation Context ... (i) ) t−1 /N. This approach offers a simple and robust sequential Monte Carlo alternative to the traditionally hard problem of approximating marginal predictive densities via MCMC output (see. e.g., =-=Basu and Chib, 2003-=-; Chib, 1995; Chib and Jeliazkov, 2001; Han and Carlin, 2001). The following sections will carefully discuss definition of Zt and exemplify application of this general PL mixtures strategy in a variet... |

18 | Particle learning and smoothing
- Carvalho, Johannes, et al.
- 2010
(Show Context)
Citation Context ...ring problems (i.e., in absence of model uncertainty), but the advantage of resampling-first is most pronounced in learning problems where the particles include a set of unknown model parameters (see =-=Carvalho et al., 2009-=-; Lopes et al., 2010). The second feature of PL for mixtures is a subtle distinction in the statistics that are tracked in time. The MacEachern et al framework attempts to track a smoothed distributio... |

13 |
Beta-Stacy processes and a generalization of the Pólya-urn scheme
- Walker, Muliere
- 1997
(Show Context)
Citation Context ...er yt through use of de Finetti’s representation theorem. Many 7common nonparametric priors, including the Dirichlet process (Blackwell and MacQueen, 1973) and, more generally, beta-Stacy processes (=-=Walker and Muliere, 1997-=-), Pólya trees (Muliere and Walker, 1997), and species sampling models (Perman et al., 1992) can be characterized in this way. More recently, Lee, Quintana, Müller, and Trippa (2008) propose a general... |

12 | Computational aspects of nonparametric bayesian analysis with applications to the modelling of multiple binary sequences - Quintana, Newton - 2000 |

8 | Randomized Pólya tree models for nonparametric Bayesian inference
- Paddock, Ruggeri, et al.
- 2003
(Show Context)
Citation Context ...t of probability weights. Other priors of this type include the beta two-parameter process (Ishwaran and Zarepour, 2000) and kernel stick-breaking processes (Dunson and Park, 2008). Pólya trees (e.g. =-=Paddock et al., 2003-=-) provide an alternative where the distribution is built through a random partitioning of the measurable space. We refer the reader to Walker, Damien, Laud, and Smith (1999) or Müller and Quintana (20... |

8 | Nonparametric Bayesian models through probit stick-breaking processes - A, Dunson - 2009 |

6 | Nonparametric Bayesian survival analysis using mixtures of Weibull distributions - Kottas - 2006 |

6 |
A Bayesian non–parametric approach to survival analysis using Pólya trees
- Muliere, Walker
- 1997
(Show Context)
Citation Context ...entation theorem. Many 7common nonparametric priors, including the Dirichlet process (Blackwell and MacQueen, 1973) and, more generally, beta-Stacy processes (Walker and Muliere, 1997), Pólya trees (=-=Muliere and Walker, 1997-=-), and species sampling models (Perman et al., 1992) can be characterized in this way. More recently, Lee, Quintana, Müller, and Trippa (2008) propose a general approach to defining predictive probabi... |

6 |
Bayesian nonparametric approach to inference for quantile regression
- Taddy, Kottas
- 2009
(Show Context)
Citation Context ...), and hence about G itself, rather than about E[f(y; G)]. For example, functionals of the conditional density f(x, y; G)/f(x; G) are the objects of inference in implied conditional regression (e.g., =-=Taddy and Kottas, 2009-=-), and Kottas (2006) describes inference for the hazard function derived from f(y; G). The standard approach to sampling G is to apply a truncated version of the constructive definition in (17) to dra... |

6 |
Kernel stick-breaking processes. Biometrika
- Dunson, Park
- 2007
(Show Context)
Citation Context ...ck-breaking construction for an infinite set of probability weights. Other priors of this type include the beta two-parameter process (Ishwaran and Zarepour 2000) and kernel stick-breaking processes (=-=Dunson and Park 2008-=-). Pólya trees (e.g. Paddock et al. 2003) provide an alternative where the distribution is built through a random partitioning of the measurable space. We refer the reader to Walker, Damien, Laud, and... |

6 |
An autoregressive process for beta random variables
- McKenzie
- 1985
(Show Context)
Citation Context ...series of beta random variables. In detail, the model is as in (21) except that each series of stick-breaking weights, vl = [vl1 . . . vlT ] is modeled as a Beta Autoregressive Process (introduced by =-=McKenzie 1985-=-). Section 3.2 of Taddy (2010) details the PL algorithm for this model. In contrast with the algorithms of 2.2, it is not possible to integrate over all of the stick-breaking weights, and a finite num... |

5 | A Bayesian analysis of some nonparametric problems - unknown authors - 1973 |

4 | A Learning - GAINES, ANDREAE - 1966 |

3 | Defining predictive probability functions for species sampling models - Lee, Müller, et al. - 2009 |