## Variational inference for Dirichlet process mixtures (2005)

### Cached

### Download Links

Venue: | Bayesian Analysis |

Citations: | 147 - 18 self |

### BibTeX

@ARTICLE{Blei05variationalinference,

author = {David M. Blei and Michael I. Jordan},

title = {Variational inference for Dirichlet process mixtures},

journal = {Bayesian Analysis},

year = {2005},

volume = {1},

pages = {121--144}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis problems. However, MCMC sampling can be prohibitively slow, and it is important to explore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias 2000; Ghahramani and Beal 2001; Blei et al. 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem.

### Citations

4100 |
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ... by iteratively sampling each latent variable conditional on the data and the most recently sampled values of the other latent variables. This yields a chain with the desired stationary distribution (=-=Geman and Geman, 1984-=-; Gelfand and Smith, 1990; Neal, 1993). Below, we review the Gibbs sampling algorithms for DP and TDP mixtures. 3.1 Collapsed Gibbs sampling In the collapsed Gibbs sampler for a DP mixture with conjug... |

3839 | Convex Analysis
- Rockafellar
- 1970
(Show Context)
Citation Context ...dz. As discussed by Wainwright and Jordan (2003), this quantity can also be expressed variationally as: a(θ) = sup {θ µ∈M T µ − a ∗ (µ)}, (12) where a ∗ (µ) is the Fenchel-Legendre conjugate of a(θ) (=-=Rockafellar, 1970-=-), and M is the set of realizable expected sufficient statistics: M = {µ : µ = � t(z)p(z)h(z)dz, for some p}. There is a one-to-one mapping between parameters θ and the interior of M (Brown, 1986). Ac... |

2662 | Latent dirichlet allocation
- Blei, Ng, et al.
- 1022
(Show Context)
Citation Context ...ordan, 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias, 2000; Ghahramani and Beal, 2001; =-=Blei et al., 2003-=-). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present ... |

1028 | Monte Carlo Statistical Methods
- Robert, Casella
- 1999
(Show Context)
Citation Context ...active field of research. Theoretical bounds on the mixing time are of little practical use, and there is no consensus on how to choose among the several empirical methods developed for this purpose (=-=Robert and Casella 2004-=-). But there are several potential disadvantages of variational methods as well. First, the optimization procedure can fall prey to local maxima in the variational parameter space. Local maxima can be... |

872 | An introduction to variational methods for graphical models
- Jordan, Ghahramani, et al.
- 1998
(Show Context)
Citation Context ...is also important to explore alternatives, particularly in the context of large-scale problems. One such class of alternatives is provided by variational inference methods (Ghahramani and Beal, 2001; =-=Jordan et al., 1999-=-; Opper and Saad, 2001; Wainwright and Jordan, 2003; Wiegerinck, 2000). Like MCMC, variational inference methods have their roots in statistical physics, and, in contradistinction to MCMC methods, the... |

837 | Neuro-Dynamic programming - Bertsekas, Tsitsiklis - 1996 |

799 |
A Bayesian analysis of some nonparametric problems
- Ferguson
- 1973
(Show Context)
Citation Context ...ontinuous random variable, let G0 be a non-atomic probability distribution for η, and let α be a positive, real-valued scalar. A random measure G is distributed according to a Dirichlet process (DP) (=-=Ferguson, 1973-=-), with scaling parameter α and base measure G0, if for all natural numbers k and k-partitions 3s{B1, . . . , Bk}: (G(η ∈ B1), G(η ∈ B2), . . . , G(η ∈ Bk)) ∼ Dir(αG0(B1), αG0(B2), . . . , αG0(Bk)). I... |

597 | Probabilistic inference using markov chain monte carlo methods
- Neal
- 1993
(Show Context)
Citation Context ...nditional on the data and the most recently sampled values of the other latent variables. This yields a chain with the desired stationary distribution (Geman and Geman, 1984; Gelfand and Smith, 1990; =-=Neal, 1993-=-). Below, we review the Gibbs sampling algorithms for DP and TDP mixtures. 3.1 Collapsed Gibbs sampling In the collapsed Gibbs sampler for a DP mixture with conjugate base measure (Escobar and West, 1... |

474 | Graphical models, exponential families, and variational inference
- Wainwright, Jordan
- 2003
(Show Context)
Citation Context ...plore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad, 2001; =-=Wainwright and Jordan, 2003-=-). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias, 2000; Ghahramani and Beal, 2001; Blei et al.... |

460 |
Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems
- Antoniak
- 1974
(Show Context)
Citation Context ...een most directly in the stick-breaking representation of the DP, in which G is represented explicitly as an infinite sum of 2satomic measures (Sethuraman, 1994). The Dirichlet process mixture model (=-=Antoniak, 1974-=-) adds a level to the hierarchy, treating ηn as the parameter of the distribution of the nth observation. Given the discreteness of G, the DP mixture has an interpretation as a mixture model with an u... |

454 | Bayesian density estimation and inference using mixtures
- Escobar, West
- 1995
(Show Context)
Citation Context ...p(η | x1, . . . , xN) is complicated and difficult to characterize in a closed form in the DP mixture setting. MCMC provides one class of approximations for this posterior and the predictive density (=-=Escobar and West, 1995-=-; Neal, 2000). In this paper, we present a variational inference algorithm for DP mixtures based on the stick-breaking representation of the underlying DP. The algorithm involves two probability distr... |

425 |
Sampling based approaches to calculating marginal densities
- Gelfand, Smith
- 1990
(Show Context)
Citation Context ...g each latent variable conditional on the data and the most recently sampled values of the other latent variables. This yields a chain with the desired stationary distribution (Geman and Geman, 1984; =-=Gelfand and Smith, 1990-=-; Neal, 1993). Below, we review the Gibbs sampling algorithms for DP and TDP mixtures. 3.1 Collapsed Gibbs sampling In the collapsed Gibbs sampler for a DP mixture with conjugate base measure (Escobar... |

410 | Markov chain sampling methods for Dirichlet process mixture models
- Neal
- 2000
(Show Context)
Citation Context ... complicated and difficult to characterize in a closed form in the DP mixture setting. MCMC provides one class of approximations for this posterior and the predictive density (Escobar and West, 1995; =-=Neal, 2000-=-). In this paper, we present a variational inference algorithm for DP mixtures based on the stick-breaking representation of the underlying DP. The algorithm involves two probability distributions—the... |

348 |
A constructive definition of Dirichlet priors
- Sethuraman
- 1994
(Show Context)
Citation Context ...om measure G is discrete with probability one. This is seen most directly in the stick-breaking representation of the DP, in which G is represented explicitly as an infinite sum of 2satomic measures (=-=Sethuraman, 1994-=-). The Dirichlet process mixture model (Antoniak, 1974) adds a level to the hierarchy, treating ηn as the parameter of the distribution of the nth observation. Given the discreteness of G, the DP mixt... |

347 | Automatic image annotation and retrieval using cross-media relevance models
- Jeon, Lavrenko, et al.
(Show Context)
Citation Context ...Image analysis Finite Gaussian mixture models are widely used in computer vision to model natural images for the purposes of automatic clustering, retrieval, and classification (Barnard et al., 2003; =-=Jeon et al., 2003-=-). These applications are often large-scale data analysis problems, involving thousands of data points (images) in hundreds of dimensions (pixels). The appropriate number of mixture components to use ... |

303 |
Ferguson distributions via Pólya urn schemes
- Blackwell, MacQueen
- 1973
(Show Context)
Citation Context ...at {η1, . . . , ηN} are subsequently drawn independently from G: ηn | G ∼ G. Marginalizing out the random measure G, the joint distribution of {η1, . . . , ηN} turns out to follow a Pólya urn scheme (=-=Blackwell and MacQueen, 1973-=-). Thus, positive probability is assigned to configurations in which different ηn take on identical values, and the underlying random measure G is discrete with probability one. This is seen most dire... |

298 | Variational algorithms for approximate Bayesian inference - Beal - 2003 |

240 | Gibbs sampling methods for stick-breaking priors
- ISHWARAN, JAMES
- 2001
(Show Context)
Citation Context ... efficiently in any direct way. It must be approximated, and Markov chain 6sMonte Carlo (MCMC) methods are the method of choice for approximating these posteriors (Escobar and West, 1995; Neal, 2000; =-=Ishwaran and James, 2001-=-). As in the parametric setting, the idea behind MCMC for approximate posterior inference in the DP mixture is to construct a Markov chain for which the stationary distribution is the posterior of int... |

202 | A variational Bayesian framework for graphical models
- Attias
- 2000
(Show Context)
Citation Context ...s (Opper and Saad, 2001; Wainwright and Jordan, 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (=-=Attias, 2000-=-; Ghahramani and Beal, 2001; Blei et al., 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algori... |

166 |
Fundamentals of Statistical Exponential Families
- Brown
- 1986
(Show Context)
Citation Context ...ockafellar, 1970), and M is the set of realizable expected sufficient statistics: M = {µ : µ = � t(z)p(z)h(z)dz, for some p}. There is a one-to-one mapping between parameters θ and the interior of M (=-=Brown, 1986-=-). Accordingly, the interior of M is often referred to as the set of mean parameters. Let θ(µ) be a natural parameter corresponding to the mean parameter µ ∈ M; thus Eθ [t(Z)] = µ. Let q(z | θ(µ)) den... |

128 |
Monte Carlo Statistical Methods (SpringerVerlag
- Robert, Casella
- 2004
(Show Context)
Citation Context ...active field of research. Theoretical bounds on the mixing time are of little practical use, and there is no consensus on how to choose among the several empirical methods developed for this purpose (=-=Robert and Casella, 1999-=-). Furthermore, in this context, the variational technique provides an explicit estimate of the infinite-dimensional parameter G by using the truncated stick-breaking construction. The best Gibbs samp... |

112 |
Estimating normal means with a conjugate style Dirichlet process prior
- MACEACHERN
- 1994
(Show Context)
Citation Context ...the posterior distribution p(η | x1, . . . , xN , G0, α) is complicated and is not available in a closed form. MCMC provides one class of approximations for this posterior and the predictive density (=-=MacEachern 1994-=-; Escobar and West 1995; Neal 2000). 1 Ferguson (1973) parameterizes the Dirichlet process by a single base measure, which is αG0 in our notation.sD. M. Blei and M. I. Jordan 3 In this paper, we prese... |

110 | Propagation algorithms for variational Bayesian learning
- Ghahramani, Beal
- 2000
(Show Context)
Citation Context ...aad, 2001; Wainwright and Jordan, 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias, 2000; =-=Ghahramani and Beal, 2001-=-; Blei et al., 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gau... |

51 |
D.: Advanced Mean Field Methods: Theory and Practice
- Opper, Saad
- 2001
(Show Context)
Citation Context ... it is important to explore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (=-=Opper and Saad, 2001-=-; Wainwright and Jordan, 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias, 2000; Ghahraman... |

47 | Comment: One long run with diagnostics: Implementation strategies for Markov Chain Monte Carlo - Raftery, Lewis - 1992 |

44 |
Concepts of independence for proportions with a generalization of the Dirichlet distribution
- Connor, Mosimann
- 1969
(Show Context)
Citation Context ... γk,2 = α + � K i=k+1 8 � N n=1 zi n.sThis step follows from the conjugacy between the multinomial data z and the truncated stick-breaking construction, which is a generalized Dirichlet distribution (=-=Connor and Mosimann, 1969-=-). 3. For k ∈ {1, . . . , K}, independently sample η ∗ k from p(η∗ k | τk). This distribution is in the same family as the base measure, with parameters: τk,1 = λ1 + � i�=n zk i xi τk,2 = λ2 + � i�=n ... |

43 | Variational approximations between mean field theory and the junction tree algorithm
- Wiegerinck
- 2000
(Show Context)
Citation Context ...f large-scale problems. One such class of alternatives is provided by variational inference methods (Ghahramani and Beal, 2001; Jordan et al., 1999; Opper and Saad, 2001; Wainwright and Jordan, 2003; =-=Wiegerinck, 2000-=-). Like MCMC, variational inference methods have their roots in statistical physics, and, in contradistinction to MCMC methods, they are deterministic. The basic idea of variational inference is to fo... |

39 |
VIBES: A variational inference engine for Bayesian networks
- Bishop, Winn, et al.
(Show Context)
Citation Context ... for a beta distribution, τt are natural parameters for the distributions of η ∗ t , and φn are parameters for a multinomial distribution. 3 This relationship has inspired the software package VIBES (=-=Bishop et al., 2003-=-), which is a variational version of the popular BUGS package (Gilks et al., 1996). 13 n=1sNotice that in the model of Figure 1, the variables V, η ∗ , and Z are each identically distributed, whereas,... |

29 | A computational approach for full nonparametric Bayesian inference under Dirichlet process mixture models - Gelfand, Kottas - 2002 |

13 |
Markov Chain Monte Carlo Methods in Practice
- Gilks, Richardson, et al.
- 1996
(Show Context)
Citation Context ... , and φn are parameters for a multinomial distribution. 3 This relationship has inspired the software package VIBES (Bishop et al., 2003), which is a variational version of the popular BUGS package (=-=Gilks et al., 1996-=-). 13 n=1sNotice that in the model of Figure 1, the variables V, η ∗ , and Z are each identically distributed, whereas, under the variational distribution, there is a different parameter for each vari... |

1 | Markov chain sampling methods for Dirichlet process mixture models - Blei, Jordan - 2000 |