Results 1 - 10
of
356
Hierarchical Dirichlet processes.
- Journal of the American Statistical Association,
, 2006
"... We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this s ..."
Abstract
-
Cited by 942 (78 self)
- Add to MetaCart
(Show Context)
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of a stick-breaking process, and a generalization of the Chinese restaurant process that we refer to as the "Chinese restaurant franchise." We present Markov chain Monte Carlo algorithms for posterior inference in hierarchical Dirichlet process mixtures and describe applications to problems in information retrieval and text modeling.
Gibbs Sampling Methods for Stick-Breaking Priors
"... ... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stick-breaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling meth ..."
Abstract
-
Cited by 388 (19 self)
- Add to MetaCart
(Show Context)
... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stick-breaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling method currently employed for Dirichlet process computing. This method applies to stick-breaking priors with a known P'olya urn characterization; that is priors with an explicit and simple prediction rule. Our second method, the blocked Gibbs sampler, is based on a entirely different approach that works by directly sampling values from the posterior of the random measure. The blocked Gibbs sampler can be viewed as a more general approach as it works without requiring an explicit prediction rule. We find that the blocked Gibbs avoids some of the limitations seen with the Polya urn approach and should be simpler for non-experts to use.
Coalescents With Multiple Collisions
- Ann. Probab
, 1999
"... For each finite measure on [0 ..."
(Show Context)
A hierarchical Bayesian language model based on Pitman–Yor processes
- In Coling/ACL, 2006. 9
, 2006
"... We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approxi ..."
Abstract
-
Cited by 148 (10 self)
- Add to MetaCart
(Show Context)
We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney. 1
Interpolating between types and tokens by estimating power-law generators
- In Advances in Neural Information Processing Systems 18
, 2006
"... Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative ..."
Abstract
-
Cited by 123 (19 self)
- Add to MetaCart
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process – the Pitman-Yor process – as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology. 1
Adaptor grammars: a framework for specifying compositional nonparametric Bayesian models
- In Advances in Neural Information Processing Systems 19
, 2007
"... This paper introduces adaptor grammars, a class of probabilistic models of lan-guage that generalize probabilistic context-free grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with “adaptors ” that can in-duce dependencies among successive uses. With a particular choice o ..."
Abstract
-
Cited by 117 (19 self)
- Add to MetaCart
(Show Context)
This paper introduces adaptor grammars, a class of probabilistic models of lan-guage that generalize probabilistic context-free grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with “adaptors ” that can in-duce dependencies among successive uses. With a particular choice of adaptor, based on the Pitman-Yor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a general-purpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework. 1
Brownian Excursions, Critical Random Graphs and the Multiplicative Coalescent
, 1996
"... Let (B t (s); 0 s ! 1) be reflecting inhomogeneous Brownian motion with drift t \Gamma s at time s, started with B t (0) = 0. Consider the random graph G(n; n \Gamma1 +tn \Gamma4=3 ), whose largest components have size of order n 2=3 . Normalizing by n \Gamma2=3 , the asymptotic joint d ..."
Abstract
-
Cited by 106 (8 self)
- Add to MetaCart
Let (B t (s); 0 s ! 1) be reflecting inhomogeneous Brownian motion with drift t \Gamma s at time s, started with B t (0) = 0. Consider the random graph G(n; n \Gamma1 +tn \Gamma4=3 ), whose largest components have size of order n 2=3 . Normalizing by n \Gamma2=3 , the asymptotic joint distribution of component sizes is the same as the joint distribution of excursion lengths of B t (Corollary 2). The dynamics of merging of components as t increases are abstracted to define the multiplicative coalescent process. The states of this process are vectors x of nonnegative real cluster sizes (x i ), and clusters with sizes x i and x j merge at rate x i x j . The multiplicative coalescent is shown to be a Feller process on l 2 . The random graph limit specifies the standard multiplicative coalescent, which starts from infinitesimally small clusters at time \Gamma1: the existence of such a process is not obvious. AMS 1991 subject classifications. 60C05, 60J50, Key words and phras...
The Standard Additive Coalescent
, 1997
"... Regard an element of the set \Delta := f(x 1 ; x 2 ; : : :) : x 1 x 2 : : : 0; X i x i = 1g as a fragmentation of unit mass into clusters of masses x i . The additive coalescent of Evans and Pitman (1997) is the \Delta-valued Markov process in which pairs of clusters of masses fx i ; x j g mer ..."
Abstract
-
Cited by 87 (21 self)
- Add to MetaCart
Regard an element of the set \Delta := f(x 1 ; x 2 ; : : :) : x 1 x 2 : : : 0; X i x i = 1g as a fragmentation of unit mass into clusters of masses x i . The additive coalescent of Evans and Pitman (1997) is the \Delta-valued Markov process in which pairs of clusters of masses fx i ; x j g merge into a cluster of mass x i +x j at rate x i +x j . They showed that a version (X 1 (t); \Gamma1 ! t ! 1) of this process arises as a n !1 weak limit of the process started at time \Gamma 1 2 log n with n clusters of mass 1=n. We show this standard additive coalescent may be constructed from the continuum random tree of Aldous (1991,1993) by Poisson splitting along the skeleton of the tree. We describe the distribution of X 1 (t) on \Delta at a fixed time t. We show that the size of the cluster containing a given atom, as a process in t, has a simple representation in terms of the stable subordinator of index 1=2. As t ! \Gamma1, we establish a Gaussian limit for (centered and norm...
Generalized weighted Chinese restaurant processes for species sampling mixture models
- STATISTICA SINICA
, 2003
"... The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conjunction ..."
Abstract
-
Cited by 86 (11 self)
- Add to MetaCart
The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conjunction with Pitman (1995, 1996), we derive characterizations of the posterior distribution in terms of a posterior partition distribution that extend the results of Lo (1984) for the Dirichlet process. These results provide a better understanding of models and have both theoretical and practical applications. To facilitate the use of our models we generalize the work in Brunner, Chan, James and Lo (2001) by extending their weighted Chinese restaurant (WCR) Monte Carlo procedure, an i.i.d. sequential importance sampling (SIS) procedure for approximating posterior mean functionals based on the Dirichlet process, to the case of approximation of mean functionals and additionally their posterior laws in species sampling mixture models. We also discuss collapsed Gibbs sampling, Pólya urn Gibbs sampling and a Pólya urn SIS scheme. Our framework allows for numerous applications, including multiplicative counting process models subject to weighted gamma processes, as well as nonparametric and semiparametric hierarchical models based on the Dirichlet process, its two-parameter extension, the Pitman-Yor process and finite dimensional Dirichlet priors.
Kernel stick-breaking processes
, 2007
"... Summary. This article proposes a class of kernel stick-breaking processes (KSBP) for un-countable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probabil-ity measures and beta-distributed ..."
Abstract
-
Cited by 74 (17 self)
- Add to MetaCart
Summary. This article proposes a class of kernel stick-breaking processes (KSBP) for un-countable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probabil-ity measures and beta-distributed random weights are assigned to each location. Predictor-dependent random probability measures are then constructed by mixing over the locations, with stick-breaking probabilities expressed as a kernel multiplied by the beta weights. Some theoretical properties of the KSBP are described, including a covariate-dependent predic-tion rule. A retrospective MCMC algorithm is developed for posterior computation, and the methods are illustrated using a simulated example and an epidemiologic application.