Results 1  10
of
21
The twoparameter PoissonDirichlet distribution derived from a stable subordinator.
, 1995
"... The twoparameter PoissonDirichlet distribution, denoted pd(ff; `), is a distribution on the set of decreasing positive sequences with sum 1. The usual PoissonDirichlet distribution with a single parameter `, introduced by Kingman, is pd(0; `). Known properties of pd(0; `), including the Markov ..."
Abstract

Cited by 221 (37 self)
 Add to MetaCart
The twoparameter PoissonDirichlet distribution, denoted pd(ff; `), is a distribution on the set of decreasing positive sequences with sum 1. The usual PoissonDirichlet distribution with a single parameter `, introduced by Kingman, is pd(0; `). Known properties of pd(0; `), including the Markov chain description due to VershikShmidtIgnatov, are generalized to the twoparameter case. The sizebiased random permutation of pd(ff; `) is a simple residual allocation model proposed by Engen in the context of species diversity, and rediscovered by Perman and the authors in the study of excursions of Brownian motion and Bessel processes. For 0 ! ff ! 1, pd(ff; 0) is the asymptotic distribution of ranked lengths of excursions of a Markov chain away from a state whose recurrence time distribution is in the domain of attraction of a stable law of index ff. Formulae in this case trace back to work of Darling, Lamperti and Wendel in the 1950's and 60's. The distribution of ranked lengths of e...
Probabilistic Models for Bacterial Taxonomy
 INTERNATIONAL STATISTICAL REVIEW
, 2000
"... We give a survey of different probabilistic partitioning methods that have been applied to bacterial taxonomy. We introduce a theoretical framework, which makes it possible to treat the various models in a unified way. The key concepts of our approach are prediction and storing of microbiological in ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We give a survey of different probabilistic partitioning methods that have been applied to bacterial taxonomy. We introduce a theoretical framework, which makes it possible to treat the various models in a unified way. The key concepts of our approach are prediction and storing of microbiological information in a Bayesian forecasting setting. We show that there is a close connection between classification and probabilistic identification and that, in fact, our approach ties these two concepts together in a coherent way.
Large deviations for Dirichlet processes and PoissonDirichlet distribution with two parameters
 Electro. J. Probab
, 2007
"... E l e c t r o n ..."
Spectrum: joint bayesian inference of population structure and recombination events
 Bioinformatics
, 2007
"... Motivation: While genetic properties such as linkage disequilibrium (LD) and population structure are closely related under a common inheritance process, the statistical methodologies developed so far mostly deal with LD analysis and structural inference separately, using specialized models that do ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Motivation: While genetic properties such as linkage disequilibrium (LD) and population structure are closely related under a common inheritance process, the statistical methodologies developed so far mostly deal with LD analysis and structural inference separately, using specialized models that do not capture their statistical and genetic relationships. Also, most of these approaches ignore the inherent uncertainty in the genetic complexity of the data and rely on inflexible models built on a closed genetic space. These limitations may make it difficult to infer detailed and consistent structural information from rich genomic data such as populational SNP profiles. Results: We propose a new modelbased approach to address these issues through joint inference of population structure and recombination events under a nonparametric Bayesian framework; we present Spectrum, an efficient implementation based on our new model. We validated Spectrum on simulated data and applied it to two real SNP datasets, including singlepopulation Daly data and the fourpopulation HapMap data. Our method performs well relative to LDhat 2.0 in estimating the recombination rates and hotspots on these datasets. More interestingly, it generates an ancestral spectrum for representing population structures which not only displays substructure based on population founders but also reveals details of the genetic diversity of each individual. It offers an alternative view of the population structures to that offered by Structure 2.1, which ignores chromosomelevel mutation and combination with respect to founders. 1
A HIERARCHICAL DIRICHLET PROCESS MIXTURE MODEL FOR HAPLOTYPE RECONSTRUCTION FROM MULTIPOPULATION DATA
, 2009
"... The perennial problem of “how many clusters?” remains an issue of substantial interest in data mining and machine learning communities, and becomes particularly salient in large data sets such as populational genomic data where the number of clusters needs to be relatively large and openended. This ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The perennial problem of “how many clusters?” remains an issue of substantial interest in data mining and machine learning communities, and becomes particularly salient in large data sets such as populational genomic data where the number of clusters needs to be relatively large and openended. This problem gets further complicated in a coclustering scenario in which one needs to solve multiple clustering problems simultaneously because of the presence of common centroids (e.g., ancestors) shared by clusters (e.g., possible descents from a certain ancestor) from different multiplecluster samples (e.g., different human subpopulations). In this paper we present a hierarchical nonparametric Bayesian model to address this problem in the context of multipopulation haplotype inference. Uncovering the haplotypes of single nucleotide polymorphisms is essential for many biological and medical applications. While it is uncommon for the genotype data to be pooled from multiple ethnically distinct populations, few existing programs have explicitly leveraged the individual ethnic information for haplotype inference. In this paper we present a new haplotype inference program, Haploi, which makes use of such information and is readily applicable to genotype sequences with thousands of SNPs from heterogeneous populations, with competent and sometimes superior speed and accuracy comparing to the stateoftheart programs. Underlying Haploi is a new haplotype distribution model based on a nonparametric Bayesian formalism known as the hierarchical Dirichlet process, which represents a tractable surrogate to the coalescent process. The proposed model is exchangeable, unbounded, and capable of coupling demographic information of different populations. It offers a wellfounded statistical framework for posterior inference of individual haplotypes, the size and configuration of haplotype ancestor pools, and other parameters of interest given genotype data.
AN ASYMPTOTIC SAMPLING FORMULA FOR THE COALESCENT WITH RECOMBINATION
"... Ewens sampling formula (ESF) is a oneparameter family of probability distributions with a number of intriguing combinatorial connections. This elegant closedform formula first arose in biology as the stationary probability distribution of a sample configuration at one locus under the infinitealle ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Ewens sampling formula (ESF) is a oneparameter family of probability distributions with a number of intriguing combinatorial connections. This elegant closedform formula first arose in biology as the stationary probability distribution of a sample configuration at one locus under the infinitealleles model of mutation. Since its discovery in the early 1970s, the ESF has been used in various biological applications, and has sparked several interesting mathematical generalizations. In the population genetics community, extending the underlying randommating model to include recombination has received much attention in the past, but no general closedform sampling formula is currently known even for the simplest extension, that is, a model with two loci. In this paper, we show that it is possible to obtain useful closedform results in the case the populationscaled recombination rate ρ is large but not necessarily infinite. Specifically, we consider an asymptotic expansion of the twolocus sampling formula in inverse powers of ρ and obtain closedform expressions for the first few terms in the expansion. Our asymptotic sampling formula applies to arbitrary sample sizes and configurations. 1. Introduction. The
Record indices and ageordered frequencies in Exchangeable Gibbs Partitions
, 2008
"... Abstract We consider a random partition Π of N = {1, 2,...} such that, for each n, its restriction Πn to [n] = {1,..., n} is given by an exchangeable Gibbs partition with parameters α, V for α ∈ (−∞, 1] and V = (Vn,k) defined recursively by setting V1,1 = 1 and Vn,k = (n − αk)Vn+1,k + Vn+1,k+1 k ≤ ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract We consider a random partition Π of N = {1, 2,...} such that, for each n, its restriction Πn to [n] = {1,..., n} is given by an exchangeable Gibbs partition with parameters α, V for α ∈ (−∞, 1] and V = (Vn,k) defined recursively by setting V1,1 = 1 and Vn,k = (n − αk)Vn+1,k + Vn+1,k+1 k ≤ n = 1, 2,... (Gnedin and Pitman 2006). By ranking the blocks Πn1,..., Πnk of Πn by their ageorder i.e. by the order of their least elements i1,...,ik, we study how the distribution of the frequencies of the blocks depends on i1,...,ik. Several interesting representations for the limit ageordered relative frequencies X1, X2,... of Π arise, depending on which ij’s one conditions on. In particular, conditioning on the entire vector i = 1 = i1 < i2 <..., a representation is Xj = ξj−1 (1 − ξi) j = 1, 2,... i=j where the ξj’s are independent Beta random variables with parameters, respectively, (1−α, ij+1−αj−1). We show the connection of such a representation with the socalled BetaStacy class of random discrete distributions (Walker and Muliere 1997). The vector i is found to form a Markov chain depending on both α and V. When V is chosen from Pitman’s subfamily, the twoparameter GEM distribution is reobtained by averaging the ξ over i. Conditioning on ik alone, we give two alternative representations for the Laplace transform of both − log Xk and − log ( ∑k i=1 Xi), and we characterize Ewens ’ partitions as the only exchangeable Gibbs partitions for which − logXkik can be represented as an infinite sum of independent random variables. We finally show that, for every k, conditional on ∑k i=1 Xi, the distribution of the normalized ageordered frequencies X1 / ∑k i=1 Xi,..., Xk / ∑k i=1 Xi is a mixture of Dirichlet distributions on the (k − 1)dimensional simplex, whose mixing measure is indexed by ik. We provide a nontrivial explicit formula for the marginal distribution of ik. Many of the mentioned representations are extensions of Griffiths and Lessard (2005) results on Ewens ’ partitions.
A Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and MultiPopulation Data
, 2007
"... Uncovering the haplotypes of single nucleotide polymorphisms and their population demography is essential for many biological and medical applications. Methods for haplotype inference developed thus far –including those based on approximate coalescence, finite mixtures, and maximal parsimony – often ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Uncovering the haplotypes of single nucleotide polymorphisms and their population demography is essential for many biological and medical applications. Methods for haplotype inference developed thus far –including those based on approximate coalescence, finite mixtures, and maximal parsimony – often bypass issues such as unknown complexity of haplotypespace and demographic structures underlying multipopulation genotype data. In this paper, we propose a new class of haplotype inference models based on a nonparametric Bayesian formalism built on the Dirichlet process, which represents a tractable surrogate to the coalescent process underlying population haplotypes and offers a wellfounded statistical framework to tackle the aforementioned issues. Our proposed model, known as a hierarchical Dirichlet process mixture, is exchangeable, unbounded, and capable of coupling demographic information of different populations for posterior inference of individual haplotypes, the size and configuration of haplotype ancestor pools, and other parameters of interest given genotype data. The resulting haplotype inference program, Haploi, is readily applicable to genotype sequences with thousands of SNPs, at a timecost often twoorders of magnitude less than that of the stateoftheart PHASE program, with competitive and sometimes superior performance. Haploi also significantly outperforms several other extant algorithms on both simulated and realistic data.
Demand Creation and Economic Growth
, 1999
"... paper very carefully and provided us with very useful comments and suggestions. All remaining errors are, of course, ours. Demand Creation and Economic Growth In the standard literature, the fundamental factor to restrain economic growth is diminishing returns to capital. This paper presents a model ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
paper very carefully and provided us with very useful comments and suggestions. All remaining errors are, of course, ours. Demand Creation and Economic Growth In the standard literature, the fundamental factor to restrain economic growth is diminishing returns to capital. This paper presents a model in which the factor to restrain growth is saturation of demand. We begin with common observation that growth of an individual product or sector grows fast at first, but its growth eventually declines to zero. The economy sustains growth by the introduction of new products/industries. Preferences are endogenous in this model. The introduction of new products/industries affects preferences, and creates demand. By so doing, it induces capital accumulation, and ultimately sustains economic growth.
Lecture Topics
, 2006
"... A diffusion process model of the frequency of a mutation ◦ Reversibility of a 1dimensional diffusion process ◦ Frequency spectrum and age of a mutation General binary coalescent trees ◦ Combinatorial derivation of the age of a mutation ◦ Ewens ’ sampling formula, a combinatorial derivation ◦ Coales ..."
Abstract
 Add to MetaCart
A diffusion process model of the frequency of a mutation ◦ Reversibility of a 1dimensional diffusion process ◦ Frequency spectrum and age of a mutation General binary coalescent trees ◦ Combinatorial derivation of the age of a mutation ◦ Ewens ’ sampling formula, a combinatorial derivation ◦ Coalescent lineage distributions Gene trees and Coalescent trees ◦ DNA sequences and the infinitelymanysites model ◦ Mutation histories, gene trees and coalescent trees ◦ Ancestral inference from gene trees Importance sampling on coalescent histories ◦ Constructing importance sampling algorithms ◦ Examples of particular models and ancestral inference The Ancestral Recombination graph ◦ Graphical description ◦ Probability calculations on the graph ◦ An MCMC algorithm for the time to the most recent ancestor along sequences A diffusion process model of the frequency of a mutation ◦ Diffusion process model of the frequency of a mutation ◦ Reversibility of a 1dimensional diffusion process ◦ Simulation of diffusion paths ◦ Frequency spectrum and age of a mutation The population frequency of a mutation The frequency {X(t),t ≥ 0} is modelled by a diffusion process with generator L = 1 2 σ2 (x) ∂2 ∂ + µ(x) ∂x2 ∂x σ 2 (x) =x(1 − x) , µ(x) =β(x)x(1 − x) Denote ∆X(t) =X(t +∆t) − X(t) E(∆X(t)  X(t) =x) = µ(x)∆t + o(∆t) Var(∆X(t)  X(t) =x) = σ 2 (x)∆t + o(∆t)