Results 1  10
of
24
The twoparameter PoissonDirichlet distribution derived from a stable subordinator.
, 1995
"... The twoparameter PoissonDirichlet distribution, denoted pd(ff; `), is a distribution on the set of decreasing positive sequences with sum 1. The usual PoissonDirichlet distribution with a single parameter `, introduced by Kingman, is pd(0; `). Known properties of pd(0; `), including the Markov ..."
Abstract

Cited by 347 (33 self)
 Add to MetaCart
The twoparameter PoissonDirichlet distribution, denoted pd(ff; `), is a distribution on the set of decreasing positive sequences with sum 1. The usual PoissonDirichlet distribution with a single parameter `, introduced by Kingman, is pd(0; `). Known properties of pd(0; `), including the Markov chain description due to VershikShmidtIgnatov, are generalized to the twoparameter case. The sizebiased random permutation of pd(ff; `) is a simple residual allocation model proposed by Engen in the context of species diversity, and rediscovered by Perman and the authors in the study of excursions of Brownian motion and Bessel processes. For 0 ! ff ! 1, pd(ff; 0) is the asymptotic distribution of ranked lengths of excursions of a Markov chain away from a state whose recurrence time distribution is in the domain of attraction of a stable law of index ff. Formulae in this case trace back to work of Darling, Lamperti and Wendel in the 1950's and 60's. The distribution of ranked lengths of e...
Where have all the interactions gone? Estimating the coverage of twohybrid protein interaction maps
 PLoS Comput. Biol
, 2007
"... Yeast twohybrid screens are an important method for mapping pairwise physical interactions between proteins. The fraction of interactions detected in independent screens can be very small, and an outstanding challenge is to determine the reason for the low overlap. Low overlap can arise from either ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
Yeast twohybrid screens are an important method for mapping pairwise physical interactions between proteins. The fraction of interactions detected in independent screens can be very small, and an outstanding challenge is to determine the reason for the low overlap. Low overlap can arise from either a high falsediscovery rate (interaction sets have low overlap because each set is contaminated by a large number of stochastic falsepositive interactions) or a high falsenegative rate (interaction sets have low overlap because each misses many true interactions). We extend capture–recapture theory to provide the first unified model for falsepositive and falsenegative rates for twohybrid screens. Analysis of yeast, worm, and fly data indicates that 25 % to 45 % of the reported interactions are likely false positives. Membrane proteins have higher falsediscovery rates on average, and signal transduction proteins have lower rates. The overall falsenegative rate ranges from 75 % for worm to 90 % for fly, which arises from a roughly 50% falsenegative rate due to statistical undersampling and a 55 % to 85 % falsenegative rate due to proteins that appear to be systematically lost from the assays. Finally, statistical model selection conclusively rejects the ErdösRényi network model in favor of the power law model for yeast and the truncated power law for worm and fly degree distributions. Much as genome sequencing coverage estimates were essential for planning the human genome sequencing project, the coverage estimates developed here will be valuable for guiding future proteomic screens. All
Random Discrete Distributions Derived From SelfSimilar Random Sets
 Electronic J. Probability
, 1996
"... : A model is proposed for a decreasing sequence of random variables (V 1 ; V 2 ; \Delta \Delta \Delta) with P n V n = 1, which generalizes the PoissonDirichlet distribution and the distribution of ranked lengths of excursions of a Brownian motion or recurrent Bessel process. Let V n be the length ..."
Abstract

Cited by 15 (10 self)
 Add to MetaCart
(Show Context)
: A model is proposed for a decreasing sequence of random variables (V 1 ; V 2 ; \Delta \Delta \Delta) with P n V n = 1, which generalizes the PoissonDirichlet distribution and the distribution of ranked lengths of excursions of a Brownian motion or recurrent Bessel process. Let V n be the length of the nth longest component interval of [0; 1]nZ, where Z is an a.s. nonempty random closed of (0; 1) of Lebesgue measure 0, and Z is selfsimilar, i.e. cZ has the same distribution as Z for every c ? 0. Then for 0 a ! b 1 the expected number of n's such that V n 2 (a; b) equals R b a v \Gamma1 F (dv) where the structural distribution F is identical to the distribution of 1 \Gamma sup(Z " [0; 1]). Then F (dv) = f(v)dv where (1 \Gamma v)f(v) is a decreasing function of v, and every such probability distribution F on [0; 1] can arise from this construction. Keywords: interval partition, zero set, excursion lengths, regenerative set, structural distribution. AMS subject classificat...
Record indices and ageordered frequencies in Exchangeable Gibbs Partitions
, 2008
"... Abstract We consider a random partition Π of N = {1, 2,...} such that, for each n, its restriction Πn to [n] = {1,..., n} is given by an exchangeable Gibbs partition with parameters α, V for α ∈ (−∞, 1] and V = (Vn,k) defined recursively by setting V1,1 = 1 and Vn,k = (n − αk)Vn+1,k + Vn+1,k+1 k ≤ ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Abstract We consider a random partition Π of N = {1, 2,...} such that, for each n, its restriction Πn to [n] = {1,..., n} is given by an exchangeable Gibbs partition with parameters α, V for α ∈ (−∞, 1] and V = (Vn,k) defined recursively by setting V1,1 = 1 and Vn,k = (n − αk)Vn+1,k + Vn+1,k+1 k ≤ n = 1, 2,... (Gnedin and Pitman 2006). By ranking the blocks Πn1,..., Πnk of Πn by their ageorder i.e. by the order of their least elements i1,...,ik, we study how the distribution of the frequencies of the blocks depends on i1,...,ik. Several interesting representations for the limit ageordered relative frequencies X1, X2,... of Π arise, depending on which ij’s one conditions on. In particular, conditioning on the entire vector i = 1 = i1 < i2 <..., a representation is Xj = ξj−1 (1 − ξi) j = 1, 2,... i=j where the ξj’s are independent Beta random variables with parameters, respectively, (1−α, ij+1−αj−1). We show the connection of such a representation with the socalled BetaStacy class of random discrete distributions (Walker and Muliere 1997). The vector i is found to form a Markov chain depending on both α and V. When V is chosen from Pitman’s subfamily, the twoparameter GEM distribution is reobtained by averaging the ξ over i. Conditioning on ik alone, we give two alternative representations for the Laplace transform of both − log Xk and − log ( ∑k i=1 Xi), and we characterize Ewens ’ partitions as the only exchangeable Gibbs partitions for which − logXkik can be represented as an infinite sum of independent random variables. We finally show that, for every k, conditional on ∑k i=1 Xi, the distribution of the normalized ageordered frequencies X1 / ∑k i=1 Xi,..., Xk / ∑k i=1 Xi is a mixture of Dirichlet distributions on the (k − 1)dimensional simplex, whose mixing measure is indexed by ik. We provide a nontrivial explicit formula for the marginal distribution of ik. Many of the mentioned representations are extensions of Griffiths and Lessard (2005) results on Ewens ’ partitions.
THE GENEALOGY OF BRANCHING PROCESSES AND THE AGE OF OUR MOST RECENT COMMON ANCESTOR
 APPLIED PROBABILITY TRUST
, 1995
"... We obtain a weak approximation for the reduced family tree in a nearcritical Markov branching process when the time interval considered is long; we also extend Yaglom's theorem and the exponential law to this case. These results are then applied to the problem of estimating the age of our most ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We obtain a weak approximation for the reduced family tree in a nearcritical Markov branching process when the time interval considered is long; we also extend Yaglom's theorem and the exponential law to this case. These results are then applied to the problem of estimating the age of our most recent common female ancestor, using mitochondria1 DNA sequences taken from a sample of contemporary humans.
An accurate model for genetic hitchhiking
 Genetics
, 2007
"... We suggest a simple deterministic approximation for the growth of the favouredallele frequency during a selective sweep. Using this approximation we introduce an accurate model for genetic hitchhiking. Only when Ns < 10 (N is the population size and s denotes the selection coefficient), are dis ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We suggest a simple deterministic approximation for the growth of the favouredallele frequency during a selective sweep. Using this approximation we introduce an accurate model for genetic hitchhiking. Only when Ns < 10 (N is the population size and s denotes the selection coefficient), are discrepancies between our approximation and direct numerical simulations of a Moran model noticeable. Our model describes the gene genealogies of a contiguous segment of neutral loci close to the selected one, and it does not assume that the selective sweep happens instantaneously. This enables us to compute SNP distributions on the neutral segment without bias. I.
Importance sampling for the infinite sites model
, 2008
"... Importance sampling or Markov Chain Monte Carlo sampling is required for stateoftheart statistical analysis of population genetics data. The applicability of these samplingbased inference techniques depends crucially on the proposal distribution. In this paper, we discuss importance sampling f ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Importance sampling or Markov Chain Monte Carlo sampling is required for stateoftheart statistical analysis of population genetics data. The applicability of these samplingbased inference techniques depends crucially on the proposal distribution. In this paper, we discuss importance sampling for the infinite sites model. The infinite sites assumption is attractive because it constraints the number of possible genealogies, thereby allowing for the analysis of larger data sets. We recall the GriffithsTavare ́ and StephensDonnelly proposals and emphasize the relation between the latter proposal and exact sampling from the infinite alleles model. We also introduce a new proposal that takes knowledge of the ancestral state into account. The new proposal is derived from a new result on exact sampling from a single site. The methods are illustrated on simulated data sets and the data considered in Griffiths and Tavare ́ (1994).
Sampling formulae arising from random Dirichlet populations
, 2004
"... Consider the random Dirichlet partition of the interval into n fragments at temperature θ> 0. Some statistical features of this random discrete distribution are recalled, together with explicit results on the law of its sizebiased permutation. Using these, preasymptotic versions of the Ewens an ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Consider the random Dirichlet partition of the interval into n fragments at temperature θ> 0. Some statistical features of this random discrete distribution are recalled, together with explicit results on the law of its sizebiased permutation. Using these, preasymptotic versions of the Ewens and DonnellyTavaréGriffiths sampling formulae from finite Dirichlet partitions are computed exactly. From these, new proofs of the usual sampling formulae from random proportions with GEM(γ) distribution are supplied, when considering the Kingman limit n ↑ ∞, θ ↓ 0 1 while nθ = γ> 0.
Convergence Time to the Ewens Sampling Formula
"... In this paper, we establish the cutoff phenomena for the discrete time infinite alleles Moran model. If M is the population size and µ is the mutation rate, we find a cutoff time of log(Mµ)/µ generations. The stationary distribution for this process in the case of sampling without replacement is the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we establish the cutoff phenomena for the discrete time infinite alleles Moran model. If M is the population size and µ is the mutation rate, we find a cutoff time of log(Mµ)/µ generations. The stationary distribution for this process in the case of sampling without replacement is the Ewens sampling formula. We show that the bound for the total variation distance from the generation t distribution to the Ewens sampling formula is well approximated by one of the extreme value distributions, namely, a standard Gumbel distribution. Beginning with the card shuffling examples of Aldous and Diaconis and extending the ideas of Donnelly and Rodrigues for the two allele model, this model adds to the list of Markov chains that displays the cutoff phenomenon. Because of the broad use of infinite alleles models, this cutoff sets the time scale of applicability for statistical tests based on the Ewens sampling formula and other tests of neutrality in a number of population genetic studies.
Model ∗
"... Importance sampling or Markov Chain Monte Carlo sampling is required for stateoftheart statistical analysis of population genetics data. The applicability of these samplingbased inference techniques depends crucially on the proposal distribution. In this paper, we discuss importance sampling for ..."
Abstract
 Add to MetaCart
(Show Context)
Importance sampling or Markov Chain Monte Carlo sampling is required for stateoftheart statistical analysis of population genetics data. The applicability of these samplingbased inference techniques depends crucially on the proposal distribution. In this paper, we discuss importance sampling for the infinite sites model. The infinite sites assumption is attractive because it constraints the number of possible genealogies, thereby allowing for the analysis of larger data sets. We recall the GriffithsTavaré and StephensDonnelly proposals and emphasize the relation between the latter proposal and exact sampling from the infinite alleles model. We also introduce a new proposal that takes knowledge of the ancestral state into account. The new proposal is derived from a new result on exact sampling from a single site. The methods are illustrated on simulated data sets and the data considered in Griffiths and Tavaré (1994).