Results 1  10
of
60
Probability approximations via the Poisson clumping heuristic
 Applied Mathematical Sciences
, 1989
"... ..."
Probabilistic and Statistical Properties of Words: An Overview
 Journal of Computational Biology
, 2000
"... In the following, an overview is given on statistical and probabilistic properties of words, as occurring in the analysis of biological sequences. Counts of occurrence, counts of clumps, and renewal counts are distinguished, and exact distributions as well as normal approximations, Poisson process a ..."
Abstract

Cited by 87 (1 self)
 Add to MetaCart
In the following, an overview is given on statistical and probabilistic properties of words, as occurring in the analysis of biological sequences. Counts of occurrence, counts of clumps, and renewal counts are distinguished, and exact distributions as well as normal approximations, Poisson process approximations, and compound Poisson approximations are derived. Here, a sequence is modelled as a stationary ergodic Markov chain; a test for determining the appropriate order of the Markov chain is described. The convergence results take the error made by estimating the Markovian transition probabilities into account. The main tools involved are moment generating functions, martingales, Stein’s method, and the ChenStein method. Similar results are given for occurrences of multiple patterns, and, as an example, the problem of unique recoverability of a sequence from SBH chip data is discussed. Special emphasis lies on disentangling the complicated dependence structure between word occurrences, due to selfoverlap as well as due to overlap between words. The results can be used to derive approximate, and conservative, con � dence intervals for tests. Key words: word counts, renewal counts, Markov model, exact distribution, normal approximation, Poisson process approximation, compound Poisson approximation, occurrences of multiple words, sequencing by hybridization, martingales, moment generating functions, Stein’s method, ChenStein method. 1.
Sequence Comparison Significance and Poisson Approximation
 Stat. Sci
, 1994
"... The ChenStein method of Poisson approximation has been used to establish theorems about comparison of two DNA or protein sequences. The most useful result for sequence alignment applies to alignment scoring for aligned letters and no gaps. However there has not been a valid method to assign statist ..."
Abstract

Cited by 40 (4 self)
 Add to MetaCart
The ChenStein method of Poisson approximation has been used to establish theorems about comparison of two DNA or protein sequences. The most useful result for sequence alignment applies to alignment scoring for aligned letters and no gaps. However there has not been a valid method to assign statistical significance to alignment scores with gaps. In this paper we extend Poisson approximation techniques using the Aldous clumping heuristic to a practical method of estimating statistical significance.
Multivariate normal approximation using exchangeable pairs
"... Abstract. Since the introduction of Stein’s method in the early 1970s, much research has been done in extending and strengthening it; however, there does not exist a version of Stein’s original method of exchangeable pairs for multivariate normal approximation. The aim of this article is to fill thi ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Abstract. Since the introduction of Stein’s method in the early 1970s, much research has been done in extending and strengthening it; however, there does not exist a version of Stein’s original method of exchangeable pairs for multivariate normal approximation. The aim of this article is to fill this void. We present two abstract normal approximation theorems using exchangeable pairs in multivariate contexts, one for situations in which the underlying symmetries are discrete, and one for situations involving continuous symmetry groups. We provide several illustrative examples, including a multivariate version of Hoeffding’s combinatorial central limit theorem and a treatment of projections of Haar measure on the orthogonal and unitary groups. 1.
Stein's method and Plancherel measure of the symmetric group
, 2003
"... We initiate a Stein’s method approach to the study of the Plancherel measure of the symmetric group. A new proof of Kerov’s central limit theorem for character ratios of random representations of the symmetric group on transpositions is obtained; the proof gives an error term. The construction of ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
We initiate a Stein’s method approach to the study of the Plancherel measure of the symmetric group. A new proof of Kerov’s central limit theorem for character ratios of random representations of the symmetric group on transpositions is obtained; the proof gives an error term. The construction of an exchangeable pair needed for applying Stein’s method arises from the theory of harmonic functions on Bratelli diagrams. We also find the spectrum of the Markov chain on partitions underlying the construction of the exchangeable pair. This yields an intriguing method for studying the asymptotic decomposition of tensor powers of some representations of the symmetric group.
Poisson process approximation for sequence repeats, and sequencing by hybridization
 J. of Computational Biology
, 1996
"... Sequencing by hybridization is a tool to determine a DNA sequence from the unordered lit of all Ituples contained in this sequence; typical numbers for 1 are I = 8, 10, 12. For theoretical purposes we assume that the multiset of all Ituples is known. This multiset determines the DNA sequence uniqu ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Sequencing by hybridization is a tool to determine a DNA sequence from the unordered lit of all Ituples contained in this sequence; typical numbers for 1 are I = 8, 10, 12. For theoretical purposes we assume that the multiset of all Ituples is known. This multiset determines the DNA sequence uniquely if none of the socalled Ukkonen transformations are possible. These transformations require repeats of (1 1)tuples in the sequence, with these repeats occurring in certain spatial patterns. We model DNA as an i.i.d. sequence. We first prove Poisson process approximations for the process of indicators of all leftmost long repeats allowing selfoverlap and for the process of indicators of all leftmost long repeats without selfoverlap. Using the ChenStein method, we get bounds on the error of these approximations. As a corollary, we approximate the distribution of longest repeats. In the second step we analyze the spatial patterns of the repeats. Finally we combine these two steps to prove an approximation for the probability that a random sequence is uniquely recoverable from its list of Ituples. For all our results we give some numerical examples including error bounds. Key words: sequencing by hybridization, sequence repeats, DNA sequences, ChenStein method, Poisson process approximation, Ukkonen transformations. 0
Inequalities for Rare Events in TimeReversible Markov Chains I
 STOCHASTIC INEQUALITIES, IMS
, 1992
"... The distribution of waiting time until a rare event is often approximated by the exponential distribution. In the context of first hitting times for stationary reversible chains, the error has a simple explicit bound involving only the mean waiting time ET and the relaxation time ø of the chain. We ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
The distribution of waiting time until a rare event is often approximated by the exponential distribution. In the context of first hitting times for stationary reversible chains, the error has a simple explicit bound involving only the mean waiting time ET and the relaxation time ø of the chain. We recall general upper and lower bounds on ET and then discuss improvements available in the case ET AE ø where the exponential approximation holds. In a sequel, Stein's method will be used to get explicit bounds on the Poisson approximation for the number of nonadjacent visits to a rare subset.
Poisson approximation for functionals of random trees
 and Alg
, 1996
"... We use Poisson approximation techniques for sums of indicator random variables to derive explicit error bounds and central limit theorems for several functionals of random trees. In particular, we consider (i) the number of comparisons for successful and unsuccessful search in a binary search tree a ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We use Poisson approximation techniques for sums of indicator random variables to derive explicit error bounds and central limit theorems for several functionals of random trees. In particular, we consider (i) the number of comparisons for successful and unsuccessful search in a binary search tree and (ii) internode distances in increasing trees. The Poisson approximation setting is shown to be a natural and fairly simple framework for deriving asymptotic results.
Stein's Method and BirthDeath Processes
 Ann. Probab
, 1999
"... Barbour (1988) introduced a probabilistic view of Stein's method for estimating the error in probability approximations. However, in the case of approximations by general distributions on the integers, there have been no purely probabilistic proofs of Stein bounds till this paper. Furthermore, ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
Barbour (1988) introduced a probabilistic view of Stein's method for estimating the error in probability approximations. However, in the case of approximations by general distributions on the integers, there have been no purely probabilistic proofs of Stein bounds till this paper. Furthermore, the methods introduced here apply to a very large class of approximating distributions on the nonnegative integers, amongst which there is a natural class for higherorder approximations by probability distributions rather than signed measures (as previously) . The methods also produce Stein magic factors for process approximations which do not increase with the number of observation and which are simpler to apply than those in Brown, Weinberg and Xia (2000). Key words and phrases. Stein's method, birthdeath process, distributional approximation, total variation distance, Poisson process, Wasserstein distance, compound Poisson distribution, negative binomial distribution, polynomial birthdeath...
Asymptotics of Poisson approximation to random discrete distributions: an analytic approach
 Advances in Applied Probability
, 1998
"... this paper, we shall describe the asymptotic behaviors of several distances of Poisson approximation to a wide class of discrete distributions covering many examples from number theory, combinatorics and arithmetic semigroups. Our aim is to show that whenever (analytic) generating functions of the r ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
this paper, we shall describe the asymptotic behaviors of several distances of Poisson approximation to a wide class of discrete distributions covering many examples from number theory, combinatorics and arithmetic semigroups. Our aim is to show that whenever (analytic) generating functions of the random variables in question are available, complexanalytic methods can be used to derive precise asymptotic results for the five distances above. Actually, we shall consider the following generalized distances: let ff ? 0 be a fixed positive number, (X; Y ) = FM (X; Y ) = (X; Y ) = sup K (X; Y ) = sup M (X; Y ) = jP(X = j) \Gamma P(Y = j) Note that d TV = d M . Besides the case ff = 1 (and ff = 1=2 for d M ), only the case d TV was previously studied by Franken [39] for Poisson approximation to the sum of independent but not identically distributed Bernoulli random variables. We take these quantities as our measures of degree of nearness of Poisson approximation, some of which may be interpreted as certain norms in suitable space as many authors did (cf. [12, 22, 23, 74, 96]). For a large class of discrete distributions, we shall derive an asymptotic main term together with an error estimate for each of these distances. Our results are thus "approximation theorems" rather than "limit theorems". The common form of the underlying structure of these distributions suggests the study of an analytic scheme as we did previously for normal approximation and large deviations (cf. [53, 54]). Many concrete examples from probabilistic number theory and combinatorial structures will justify the study of this scheme. Our treatment being completely general, many extensions can be further pursued with essentially the same line of methods. We shall di...