## An Alternative Prior Process for Nonparametric Bayesian Clustering

### Cached

### Download Links

Citations: | 3 - 0 self |

### BibTeX

@MISC{Wallach_analternative,

author = {Hanna M. Wallach and Shane T. Jensen and Lee Dicker and Katherine A. Heller},

title = {An Alternative Prior Process for Nonparametric Bayesian Clustering},

year = {}

}

### OpenURL

### Abstract

Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit “rich-get-richer ” characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering—the uniform process—for applications where the “rich-get-richer ” property is undesirable. We also explore the cost of this process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings. 1

### Citations

4024 |
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...re-specified and fixed. The vector c denotes the cluster assignments for the documents: cd is the cluster assignment for document d. Given a set of observed documents W = {wd} D d=1 , Gibbs sampling (=-=Geman and Geman, 1984-=-) can be used to infer the latent cluster assignments c. Specifically, the cluster assignment cd for document d can be resampled from P (cd | c \d, w, θ) ∝ P (cd | c \d, θ) · P (wd | cd, c \d, W \d, β... |

588 | Hierarchical Dirichlet Processes
- Teh, Jordan, et al.
- 2004
(Show Context)
Citation Context ...rings. 1 Introduction Nonparametric Bayesian models provide a powerful and popular approach to many difficult statistical problems, including document clustering (Zhang et al., 2005), topic modeling (=-=Teh et al., 2006-=-b), and clustering motifs in DNA sequences (Jensen and Liu, Appearing in Proceedings of the 13 th International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, Chia Laguna Resort,... |

344 |
A constructive definition of Dirichlet priors
- Sethuraman
- 1994
(Show Context)
Citation Context ...t cluster. New observations are therefore more likely to join already-large clusters. The “rich-get-richer” characteristic is also evident in the stick-breaking construction of the Dirichlet process (=-=Sethuraman, 1994-=-; Ishwaran and James, 2001), where each unique point mass is assigned a random weight. These weights are generated as a product of Beta random variables, which can be visualized as breaks of a unit-le... |

234 | Gibbs sampling methods for stick-breaking priors
- Ishwaran, James
(Show Context)
Citation Context ...ervations are therefore more likely to join already-large clusters. The “rich-get-richer” characteristic is also evident in the stick-breaking construction of the Dirichlet process (Sethuraman, 1994; =-=Ishwaran and James, 2001-=-), where each unique point mass is assigned a random weight. These weights are generated as a product of Beta random variables, which can be visualized as breaks of a unit-length stick. Earlier breaks... |

232 | The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator
- Pitman, Yor
- 1995
(Show Context)
Citation Context ...reaks of a unit-length stick. Earlier breaks of the stick will tend to lead to larger weights, which again gives rise to the “rich-get-richer” property. 2.2 Pitman-Yor Process The Pitman-Yor process (=-=Pitman and Yor, 1997-=-) has three parameters: a concentration parameter θ, a base distribution G0, and a discount parameter 0 ≤ α < 1. Together, θ and α control the formation of new clusters. The Pitman-Yor predictive prob... |

172 | Slice sampling
- Neal
- 2003
(Show Context)
Citation Context ...sets of clusters and documents, respectively, excluding document d. The vector β = (β, β1, β0) represents the concentration parameters in the model, which can be inferred from W using slice sampling (=-=Neal, 2003-=-), as described by Wallach (2008). The likelihood component of (16) is P (wd | cd, c \d, W \d, β) = Nd ∏ n=1 N <d,n wn|d + β N <d,n +β1 wn|cd P w N <d,n wn +β 0 1 W P w N<d,n w +β 0 N <d,n w|c d +β1 ∑... |

133 |
Exchangeability and Related Topics
- Aldous
- 1985
(Show Context)
Citation Context ... observations in that cluster) and joins a new cluster, consisting of XN+1 only, with probability proportional to θ. This predictive probability is evident in the Chinese restaurant process metaphor (=-=Aldous, 1985-=-). The most obvious characteristic of the Dirichlet process predictive probability (given by (2)) is the “richget-richer” property: the probability of joining an existing cluster is proportional to th... |

128 | Combinatorial stochastic processes - Pitman - 2002 |

114 |
Logarithmic Combinatorial Structures : a Probabilistic Approach
- Arratia, Barbour, et al.
- 2003
(Show Context)
Citation Context ...d number of unique clusters KN in a partition is E (KN | DP) = N∑ n=1 θ n − 1 + θ The expected number of clusters of size M is ≃ θ log N. (5) lim N→∞ E (HM,N | DP) = θ . (6) M This well-known result (=-=Arratia et al., 2003-=-) implies that as N → ∞, the expected number of clusters of size M is inversely proportional to M regardless of the value of θ. In other words, in expectation, there will be a small number of large cl... |

96 | Some Developments of the Blackwell-MacQueen Urn Scheme - Pitman - 1996 |

89 | A hierarchical bayesian language model based on pitman-yor processes
- Teh
- 2006
(Show Context)
Citation Context ...itman and Yor (1997) introduced the Pitman-Yor process, a two-parameter generalization of the Dirichlet process. These processes can also be nested within a hierarchical structure (Teh et al., 2006a; =-=Teh, 2006-=-). A key property of any model based on Dirichlet or Pitman-Yor processes is that the posterior distribution provides a partition of the data into clusters, without requiring that the number of cluste... |

73 | Modelling Heterogeneity with and without the Dirichlet Process - Green, Richardson - 1999 |

60 | Nonparametric Bayesian Data Analysis - Müller, Quintana - 2004 |

58 | Generalized weighted Chinese restaurant process for species sampling mixture models - Ishawaran, James - 2003 |

50 | Evaluation methods for topic models
- Wallach, Murray, et al.
- 2009
(Show Context)
Citation Context ...el. We compute log P (Wtest | Dtrain , θ, β) = log ∑ c test P (Wtest , c test | D train , θ, β), where D train = (W train , c train ) and the sum over c test is approximated using a novel variant of (=-=Wallach et al., 2009-=-)’s “leftto-right” algorithm (see supplementary materials). We average this quantity over runs of the Gibbs sampler for W train , runs of the “left-to-right” algorithm, and twenty permutations of the ... |

33 |
Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites
- Qin, McCue, et al.
- 2003
(Show Context)
Citation Context ...ince new observations are more likely to be assigned to larger clusters. For many tasks, however, a prior over partitions that induces more uniformly-sized clusters is desirable. The uniform process (=-=Qin et al., 2003-=-; Jensen and Liu, 2008) is one such prior. The predictive probability for the uniform process is given by P (ψN+1 | ψ1, . . . , ψN, θ, G0) = { 1 K+θ θ K+θ ψN+1 ∼ G0. ψN+1 = φk ∈ {φ1, . . . , φK} (4) T... |

30 | Structured topic models for language - Wallach - 2008 |

28 | A probabilistic model for online document clustering with application to novelty detection
- Zhang, Ghahramani, et al.
(Show Context)
Citation Context ...its lack of exchangeability over orderings. 1 Introduction Nonparametric Bayesian models provide a powerful and popular approach to many difficult statistical problems, including document clustering (=-=Zhang et al., 2005-=-), topic modeling (Teh et al., 2006b), and clustering motifs in DNA sequences (Jensen and Liu, Appearing in Proceedings of the 13 th International Conference on Artificial Intelligence and Statistics ... |

13 | Recent Developments in Document Clustering - Andrews, Fox |

7 | Bayesian Clustering of Transcription Factor Binding Motifs
- Jensen, Liu
- 2007
(Show Context)
Citation Context ...ons are more likely to be assigned to larger clusters. For many tasks, however, a prior over partitions that induces more uniformly-sized clusters is desirable. The uniform process (Qin et al., 2003; =-=Jensen and Liu, 2008-=-) is one such prior. The predictive probability for the uniform process is given by P (ψN+1 | ψ1, . . . , ψN, θ, G0) = { 1 K+θ θ K+θ ψN+1 ∼ G0. ψN+1 = φk ∈ {φ1, . . . , φK} (4) The probability that ne... |

6 | Dirichlet processes - Hierarchical |

6 | Bayesian modeling of dependency trees using hierarchical Pitman-Yor priors
- Wallach, Hann, et al.
- 2008
(Show Context)
Citation Context ...iscount parameter α serves to reduce the probability of adding a new observation to an existing cluster. This prior is particularly well-suited to natural language processing applications (Teh, 2006; =-=Wallach et al., 2008-=-) because it yields power-law behavior (cluster usage) when 0 < α < 1. 2.3 Uniform Process Predictive probabilities (2) and (3) result in partitions that are dominated by a few large clusters, since n... |

4 | Flexible priors for infinite mixture models - Welling - 2006 |