• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator (1997)

by J Pitman, Yor
Venue:Ann. Probab
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 356
Next 10 →

Hierarchical Dirichlet processes.

by Yee Whye Teh , Michael I Jordan , Matthew J Beal , David M Blei - Journal of the American Statistical Association, , 2006
"... We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this s ..."
Abstract - Cited by 942 (78 self) - Add to MetaCart
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of a stick-breaking process, and a generalization of the Chinese restaurant process that we refer to as the "Chinese restaurant franchise." We present Markov chain Monte Carlo algorithms for posterior inference in hierarchical Dirichlet process mixtures and describe applications to problems in information retrieval and text modeling.
(Show Context)

Citation Context

...ns and Extensions The DP is the canonical distribution over probability measures and a wide range of generalizations have been proposed in the literature. First and foremost is the Pitman-Yor process =-=[13, 11]-=-, which has recently seen successful applications modeling data exhibiting power-law properties [14, 15]. The Pitman-Yor process includes a third parameter d ∈ [0, 1), with d = 0 reducing to the DP. T...

Gibbs Sampling Methods for Stick-Breaking Priors

by Hemant Ishwaran, Lancelot F. James
"... ... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stick-breaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling meth ..."
Abstract - Cited by 388 (19 self) - Add to MetaCart
... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stick-breaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling method currently employed for Dirichlet process computing. This method applies to stick-breaking priors with a known P'olya urn characterization; that is priors with an explicit and simple prediction rule. Our second method, the blocked Gibbs sampler, is based on a entirely different approach that works by directly sampling values from the posterior of the random measure. The blocked Gibbs sampler can be viewed as a more general approach as it works without requiring an explicit prediction rule. We find that the blocked Gibbs avoids some of the limitations seen with the Polya urn approach and should be simpler for non-experts to use.
(Show Context)

Citation Context

...llection of seemingly unrelated measures scattered throughout the literature. These include (a) the Ferguson Dirichlet process (Ferguson 1973, 1974), (b) the two-parameter Poisson– Dirichlet process=-= (Pitman and Yor 1997),-=- (c) Dirichletmultinomial processes (Muliere and Secchi 1995), m-spike models (Liu 1996), � nite dimensional Dirichlet priors (Ishwaran and Zarepour 2000b,c) and (d) beta two-parameter processes (Is...

Coalescents With Multiple Collisions

by Jim Pitman - Ann. Probab , 1999
"... For each finite measure on [0 ..."
Abstract - Cited by 183 (11 self) - Add to MetaCart
For each finite measure on [0
(Show Context)

Citation Context

...rtition of N characterized by the EPF (15), or by frequencies of the form (12)-(13), an (ff; `) partition. It was shown in [31] how to construct an (ff; `) partition by a simple urn scheme. Following =-=[37]-=-, define the Poisson-Dirichlet distribution with parameters (ff; `), abbreviated PD(ff; `), to be the distribution of ranked frequencies of an (ff; `) partition. That is, PD(ff; `) is the distribution...

A hierarchical Bayesian language model based on Pitman–Yor processes

by Yee Whye Teh - In Coling/ACL, 2006. 9 , 2006
"... We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approxi ..."
Abstract - Cited by 148 (10 self) - Add to MetaCart
We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney. 1
(Show Context)

Citation Context

...en variable is distributed according to a Pitman-Yor process, a nonparametric generalization of the Dirichlet distribution that is widely studied in the statistics and probability theory communities (=-=Pitman and Yor, 1997-=-; Ishwaran and James, 2001; Pitman, 2002). 985 Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 985–992, Sydney, July 2006. c○200...

Interpolating between types and tokens by estimating power-law generators

by Sharon Goldwater, Thomas L. Griffiths, Mark Johnson - In Advances in Neural Information Processing Systems 18 , 2006
"... Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative ..."
Abstract - Cited by 123 (19 self) - Add to MetaCart
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process – the Pitman-Yor process – as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology. 1

Adaptor grammars: a framework for specifying compositional nonparametric Bayesian models

by Mark Johnson, Thomas L. Griffiths - In Advances in Neural Information Processing Systems 19 , 2007
"... This paper introduces adaptor grammars, a class of probabilistic models of lan-guage that generalize probabilistic context-free grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with “adaptors ” that can in-duce dependencies among successive uses. With a particular choice o ..."
Abstract - Cited by 117 (19 self) - Add to MetaCart
This paper introduces adaptor grammars, a class of probabilistic models of lan-guage that generalize probabilistic context-free grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with “adaptors ” that can in-duce dependencies among successive uses. With a particular choice of adaptor, based on the Pitman-Yor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a general-purpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework. 1
(Show Context)

Citation Context

...acterized by a simple grammar. Adaptor grammars provide a simple framework for defining nonparametric Bayesian models of language. With a particular choice of adaptor, based on the Pitman-Yor process =-=[1, 2, 3]-=-, simple context-free grammars specify distributions commonly used in nonparametric Bayesian statistics, such as Dirichlet processes [4] and hierarchical Dirichlet processes [5]. As a consequence, man...

Brownian Excursions, Critical Random Graphs and the Multiplicative Coalescent

by David J. Aldous , 1996
"... Let (B t (s); 0 s ! 1) be reflecting inhomogeneous Brownian motion with drift t \Gamma s at time s, started with B t (0) = 0. Consider the random graph G(n; n \Gamma1 +tn \Gamma4=3 ), whose largest components have size of order n 2=3 . Normalizing by n \Gamma2=3 , the asymptotic joint d ..."
Abstract - Cited by 106 (8 self) - Add to MetaCart
Let (B t (s); 0 s ! 1) be reflecting inhomogeneous Brownian motion with drift t \Gamma s at time s, started with B t (0) = 0. Consider the random graph G(n; n \Gamma1 +tn \Gamma4=3 ), whose largest components have size of order n 2=3 . Normalizing by n \Gamma2=3 , the asymptotic joint distribution of component sizes is the same as the joint distribution of excursion lengths of B t (Corollary 2). The dynamics of merging of components as t increases are abstracted to define the multiplicative coalescent process. The states of this process are vectors x of nonnegative real cluster sizes (x i ), and clusters with sizes x i and x j merge at rate x i x j . The multiplicative coalescent is shown to be a Feller process on l 2 . The random graph limit specifies the standard multiplicative coalescent, which starts from infinitesimally small clusters at time \Gamma1: the existence of such a process is not obvious. AMS 1991 subject classifications. 60C05, 60J50, Key words and phras...

The Standard Additive Coalescent

by David Aldous, Jim Pitman , 1997
"... Regard an element of the set \Delta := f(x 1 ; x 2 ; : : :) : x 1 x 2 : : : 0; X i x i = 1g as a fragmentation of unit mass into clusters of masses x i . The additive coalescent of Evans and Pitman (1997) is the \Delta-valued Markov process in which pairs of clusters of masses fx i ; x j g mer ..."
Abstract - Cited by 87 (21 self) - Add to MetaCart
Regard an element of the set \Delta := f(x 1 ; x 2 ; : : :) : x 1 x 2 : : : 0; X i x i = 1g as a fragmentation of unit mass into clusters of masses x i . The additive coalescent of Evans and Pitman (1997) is the \Delta-valued Markov process in which pairs of clusters of masses fx i ; x j g merge into a cluster of mass x i +x j at rate x i +x j . They showed that a version (X 1 (t); \Gamma1 ! t ! 1) of this process arises as a n !1 weak limit of the process started at time \Gamma 1 2 log n with n clusters of mass 1=n. We show this standard additive coalescent may be constructed from the continuum random tree of Aldous (1991,1993) by Poisson splitting along the skeleton of the tree. We describe the distribution of X 1 (t) on \Delta at a fixed time t. We show that the size of the cluster containing a given atom, as a process in t, has a simple representation in terms of the stable subordinator of index 1=2. As t ! \Gamma1, we establish a Gaussian limit for (centered and norm...

Generalized weighted Chinese restaurant processes for species sampling mixture models

by Hemant Ishwaran, Lancelot F. James - STATISTICA SINICA , 2003
"... The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conjunction ..."
Abstract - Cited by 86 (11 self) - Add to MetaCart
The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conjunction with Pitman (1995, 1996), we derive characterizations of the posterior distribution in terms of a posterior partition distribution that extend the results of Lo (1984) for the Dirichlet process. These results provide a better understanding of models and have both theoretical and practical applications. To facilitate the use of our models we generalize the work in Brunner, Chan, James and Lo (2001) by extending their weighted Chinese restaurant (WCR) Monte Carlo procedure, an i.i.d. sequential importance sampling (SIS) procedure for approximating posterior mean functionals based on the Dirichlet process, to the case of approximation of mean functionals and additionally their posterior laws in species sampling mixture models. We also discuss collapsed Gibbs sampling, Pólya urn Gibbs sampling and a Pólya urn SIS scheme. Our framework allows for numerous applications, including multiplicative counting process models subject to weighted gamma processes, as well as nonparametric and semiparametric hierarchical models based on the Dirichlet process, its two-parameter extension, the Pitman-Yor process and finite dimensional Dirichlet priors.

Kernel stick-breaking processes

by David B. Dunson, Ju-hyun Park, Biostatistics Branch , 2007
"... Summary. This article proposes a class of kernel stick-breaking processes (KSBP) for un-countable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probabil-ity measures and beta-distributed ..."
Abstract - Cited by 74 (17 self) - Add to MetaCart
Summary. This article proposes a class of kernel stick-breaking processes (KSBP) for un-countable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probabil-ity measures and beta-distributed random weights are assigned to each location. Predictor-dependent random probability measures are then constructed by mixing over the locations, with stick-breaking probabilities expressed as a kernel multiplied by the beta weights. Some theoretical properties of the KSBP are described, including a covariate-dependent predic-tion rule. A retrospective MCMC algorithm is developed for posterior computation, and the methods are illustrated using a simulated example and an epidemiologic application.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University