Results 1  10
of
187
Hierarchical Dirichlet processes
 Journal of the American Statistical Association
, 2004
"... program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. 1 We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture comp ..."
Abstract

Cited by 535 (56 self)
 Add to MetaCart
program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. 1 We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the wellknown clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of
Gibbs Sampling Methods for StickBreaking Priors
"... ... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stickbreaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling meth ..."
Abstract

Cited by 212 (17 self)
 Add to MetaCart
... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stickbreaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling method currently employed for Dirichlet process computing. This method applies to stickbreaking priors with a known P'olya urn characterization; that is priors with an explicit and simple prediction rule. Our second method, the blocked Gibbs sampler, is based on a entirely different approach that works by directly sampling values from the posterior of the random measure. The blocked Gibbs sampler can be viewed as a more general approach as it works without requiring an explicit prediction rule. We find that the blocked Gibbs avoids some of the limitations seen with the Polya urn approach and should be simpler for nonexperts to use.
Brownian Excursions, Critical Random Graphs and the Multiplicative Coalescent
, 1996
"... Let (B t (s); 0 s ! 1) be reflecting inhomogeneous Brownian motion with drift t \Gamma s at time s, started with B t (0) = 0. Consider the random graph G(n; n \Gamma1 +tn \Gamma4=3 ), whose largest components have size of order n 2=3 . Normalizing by n \Gamma2=3 , the asymptotic joint d ..."
Abstract

Cited by 84 (10 self)
 Add to MetaCart
Let (B t (s); 0 s ! 1) be reflecting inhomogeneous Brownian motion with drift t \Gamma s at time s, started with B t (0) = 0. Consider the random graph G(n; n \Gamma1 +tn \Gamma4=3 ), whose largest components have size of order n 2=3 . Normalizing by n \Gamma2=3 , the asymptotic joint distribution of component sizes is the same as the joint distribution of excursion lengths of B t (Corollary 2). The dynamics of merging of components as t increases are abstracted to define the multiplicative coalescent process. The states of this process are vectors x of nonnegative real cluster sizes (x i ), and clusters with sizes x i and x j merge at rate x i x j . The multiplicative coalescent is shown to be a Feller process on l 2 . The random graph limit specifies the standard multiplicative coalescent, which starts from infinitesimally small clusters at time \Gamma1: the existence of such a process is not obvious. AMS 1991 subject classifications. 60C05, 60J50, Key words and phras...
A hierarchical Bayesian language model based on Pitman–Yor processes
 In Coling/ACL, 2006. 9
, 2006
"... We propose a new hierarchical Bayesian ngram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called PitmanYor processes which produce powerlaw distributions more closely resembling those in natural languages. We show that an approxi ..."
Abstract

Cited by 78 (8 self)
 Add to MetaCart
We propose a new hierarchical Bayesian ngram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called PitmanYor processes which produce powerlaw distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical PitmanYor language model recovers the exact formulation of interpolated KneserNey, one of the best smoothing methods for ngram language models. Experiments verify that our model gives cross entropy results superior to interpolated KneserNey and comparable to modified KneserNey. 1
Interpolating between types and tokens by estimating powerlaw generators
 In Advances in Neural Information Processing Systems 18
, 2006
"... Standard statistical models of language fail to capture one of the most striking properties of natural languages: the powerlaw distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce powerlaws, augmenting standard generative ..."
Abstract

Cited by 76 (13 self)
 Add to MetaCart
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the powerlaw distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce powerlaws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process – the PitmanYor process – as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology. 1
The Standard Additive Coalescent
, 1997
"... Regard an element of the set \Delta := f(x 1 ; x 2 ; : : :) : x 1 x 2 : : : 0; X i x i = 1g as a fragmentation of unit mass into clusters of masses x i . The additive coalescent of Evans and Pitman (1997) is the \Deltavalued Markov process in which pairs of clusters of masses fx i ; x j g mer ..."
Abstract

Cited by 63 (22 self)
 Add to MetaCart
Regard an element of the set \Delta := f(x 1 ; x 2 ; : : :) : x 1 x 2 : : : 0; X i x i = 1g as a fragmentation of unit mass into clusters of masses x i . The additive coalescent of Evans and Pitman (1997) is the \Deltavalued Markov process in which pairs of clusters of masses fx i ; x j g merge into a cluster of mass x i +x j at rate x i +x j . They showed that a version (X 1 (t); \Gamma1 ! t ! 1) of this process arises as a n !1 weak limit of the process started at time \Gamma 1 2 log n with n clusters of mass 1=n. We show this standard additive coalescent may be constructed from the continuum random tree of Aldous (1991,1993) by Poisson splitting along the skeleton of the tree. We describe the distribution of X 1 (t) on \Delta at a fixed time t. We show that the size of the cluster containing a given atom, as a process in t, has a simple representation in terms of the stable subordinator of index 1=2. As t ! \Gamma1, we establish a Gaussian limit for (centered and norm...
Generalized weighted Chinese restaurant processes for species sampling mixture models
 Statistica Sinica
, 2003
"... Abstract: The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conj ..."
Abstract

Cited by 52 (8 self)
 Add to MetaCart
Abstract: The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conjunction with Pitman (1995, 1996), we derive characterizations of the posterior distribution in terms of a posterior partition distribution that extend the results of Lo (1984) for the Dirichlet process. These results provide a better understanding of models and have both theoretical and practical applications. To facilitate the use of our models we generalize the work in Brunner, Chan, James and Lo (2001) by extending their weighted Chinese restaurant (WCR) Monte Carlo procedure, an i.i.d. sequential importance sampling (SIS) procedure for approximating posterior mean functionals based on the Dirichlet process, to the case of approximation of mean functionals and additionally their posterior laws in species sampling mixture models. We also discuss collapsed Gibbs sampling, Pólya urn Gibbs sampling and a Pólya urn SIS scheme. Our framework allows for numerous applications, including multiplicative counting process models subject to weighted gamma processes, as well as nonparametric and semiparametric hierarchical models based on the Dirichlet process, its twoparameter extension, the PitmanYor process and finite dimensional Dirichlet priors. Key words and phrases: Dirichlet process, exchangeable partition, finite dimensional Dirichlet prior, twoparameter PoissonDirichlet process, prediction rule, random probability measure, species sampling sequence.
Construction Of Markovian Coalescents
 Ann. Inst. Henri Poincar'e
, 1997
"... Partitionvalued and measurevalued coalescent Markov processes are constructed whose state describes the decomposition of a finite total mass m into a finite or countably infinite number of masses with sum m, and whose evolution is determined by the following intuitive prescription: each pair of ma ..."
Abstract

Cited by 44 (20 self)
 Add to MetaCart
Partitionvalued and measurevalued coalescent Markov processes are constructed whose state describes the decomposition of a finite total mass m into a finite or countably infinite number of masses with sum m, and whose evolution is determined by the following intuitive prescription: each pair of masses of magnitudes x and y runs the risk of a binary collision to form a single mass of magnitude x+y at rate (x; y), for some nonnegative, symmetric collision rate kernel (x; y). Such processes with finitely many masses have been used to model polymerization, coagulation, condensation, and the evolution of galactic clusters by gravitational attraction. With a suitable choice of state space, and under appropriate restrictions on and the initial distribution of mass, it is shown that such processes can be constructed as Feller or Fellerlike processes. A number of further results are obtained for the additive coalescent with collision kernel (x; y) = x + y. This process, which arises fro...
Kernel stickbreaking processes
, 2007
"... Summary. This article proposes a class of kernel stickbreaking processes (KSBP) for uncountable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probability measures and betadistributed ..."
Abstract

Cited by 39 (11 self)
 Add to MetaCart
Summary. This article proposes a class of kernel stickbreaking processes (KSBP) for uncountable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probability measures and betadistributed random weights are assigned to each location. Predictordependent random probability measures are then constructed by mixing over the locations, with stickbreaking probabilities expressed as a kernel multiplied by the beta weights. Some theoretical properties of the KSBP are described, including a covariatedependent prediction rule. A retrospective MCMC algorithm is developed for posterior computation, and the methods are illustrated using a simulated example and an epidemiologic application.