Results 1  10
of
480
Probabilistic and Statistical Properties of Words: An Overview
 Journal of Computational Biology
, 2000
"... In the following, an overview is given on statistical and probabilistic properties of words, as occurring in the analysis of biological sequences. Counts of occurrence, counts of clumps, and renewal counts are distinguished, and exact distributions as well as normal approximations, Poisson process a ..."
Abstract

Cited by 84 (1 self)
 Add to MetaCart
In the following, an overview is given on statistical and probabilistic properties of words, as occurring in the analysis of biological sequences. Counts of occurrence, counts of clumps, and renewal counts are distinguished, and exact distributions as well as normal approximations, Poisson process approximations, and compound Poisson approximations are derived. Here, a sequence is modelled as a stationary ergodic Markov chain; a test for determining the appropriate order of the Markov chain is described. The convergence results take the error made by estimating the Markovian transition probabilities into account. The main tools involved are moment generating functions, martingales, Stein’s method, and the ChenStein method. Similar results are given for occurrences of multiple patterns, and, as an example, the problem of unique recoverability of a sequence from SBH chip data is discussed. Special emphasis lies on disentangling the complicated dependence structure between word occurrences, due to selfoverlap as well as due to overlap between words. The results can be used to derive approximate, and conservative, con � dence intervals for tests. Key words: word counts, renewal counts, Markov model, exact distribution, normal approximation, Poisson process approximation, compound Poisson approximation, occurrences of multiple words, sequencing by hybridization, martingales, moment generating functions, Stein’s method, ChenStein method. 1.
A Variety Theorem Without Complementation.
, 1995
"... this paper is to show that such results are not isolated, but are instances of a result as general as Eilenberg's theorem. On the language side, we consider positive varieties of languages, which have the same properties as varieties of languages except they are not supposed to be closed under compl ..."
Abstract

Cited by 68 (27 self)
 Add to MetaCart
this paper is to show that such results are not isolated, but are instances of a result as general as Eilenberg's theorem. On the language side, we consider positive varieties of languages, which have the same properties as varieties of languages except they are not supposed to be closed under complement. On the algebraic side, varieties of finite semigroups are replaced by varieties of finite ordered semigroups. Our main result states there is a onetoone correspondence between positive varieties of languages and varieties of finite ordered semigroups. Due to the lack of space, we shall just give a few examples of this correspondence and defer to future papers the detailed study of our new types of varieties. For instance, P. Weil and the author have shown that the theorems of Birkhoff and Reiterman can be extended to ordered semigroups by replacing equations by inequations
On the numerical integration of ordinary differential equations by symmetric composition methods
 SIAM J. Sci. Comput
, 1995
"... Abstract. Differential equations of the form ˙x = X = A + B are considered, where the vector fields A and B can be integrated exactly, enabling numerical integration of X by composition of the flows of A and B. Various symmetric compositions are investigated for order, complexity, and reversibility. ..."
Abstract

Cited by 60 (10 self)
 Add to MetaCart
Abstract. Differential equations of the form ˙x = X = A + B are considered, where the vector fields A and B can be integrated exactly, enabling numerical integration of X by composition of the flows of A and B. Various symmetric compositions are investigated for order, complexity, and reversibility. Free Lie algebra theory gives simple formulae for the number of determining equations for a method to have a particular order. A new, more accurate way of applying the methods thus obtained to compositions of an arbitrary firstorder integrator is described and tested. The determining equations are explored, and new methods up to 100 times more accurate (at constant work) than those previously known are given. 1. Composition methods. Composition methods are particularly useful for numerically integrating differential equations when the equations have some special structure which it is advantageous to preserve. They tend to have larger local truncation errors than standard (RungeKutta, multistep) methods [4,5], but this defect can be more than compensated for by their superior conservation properties. Capital letters such as X will denote vector fields on some space with coordinates x, with flows exp(tX), i.e., ˙x = X(x) ⇒ x(t) = exp(tX)(x(0)). The vector field X is given and is to be integrated numerically with fixed time step t. Composition methods apply when one can write X = A + B in such a way that exp(tA), exp(tB) can both be calculated explicitly. Then the most elementary such method is the map (essentially the “LieTrotter ” formula [26]) ϕ: x ↦ → x ′ = exp(tA) exp(tB)(x) = x(t) + O(t 2). (1.1) The advantage of composing exact solutions in this way is that many geometric properties of the true flow exp(tX) are preserved: group properties in particular. If X, A, and B are Hamiltonian vector fields then both exp(tX) and the map ϕ
Basic Analytic Combinatorics of Directed Lattice Paths
 Theoretical Computer Science
, 2001
"... This paper develops a unified enumerative and asymptotic theory of directed 2dimensional lattice paths in halfplanes and quarterplanes. The lattice paths are speci ed by a finite set of rules that are both time and space homogeneous, and have a privileged direction of increase. (They are then ess ..."
Abstract

Cited by 59 (11 self)
 Add to MetaCart
This paper develops a unified enumerative and asymptotic theory of directed 2dimensional lattice paths in halfplanes and quarterplanes. The lattice paths are speci ed by a finite set of rules that are both time and space homogeneous, and have a privileged direction of increase. (They are then essentially 1dimensional objects.) The theory relies on a specific "kernel method" that provides an important decomposition of the algebraic generating functions involved, as well as on a generic study of singularities of an associated algebraic curve. Consequences are precise computable estimates for the number of lattice paths of a given length under various constraints (bridges, excursions, meanders) as well as a characterization of the limit laws associated to several basic parameters of paths.
Uniform Spectral Properties Of OneDimensional Quasicrystals, IV. QuasiSturmian Potentials
 I. Absence of eigenvalues, Commun. Math. Phys
, 2000
"... We consider discrete onedimensional Schrodinger operators with quasiSturmian potentials. We present a new approach to the trace map dynamical system which is independent of the initial conditions and establish a characterization of the spectrum in terms of bounded trace map orbits. Using this, ..."
Abstract

Cited by 51 (32 self)
 Add to MetaCart
We consider discrete onedimensional Schrodinger operators with quasiSturmian potentials. We present a new approach to the trace map dynamical system which is independent of the initial conditions and establish a characterization of the spectrum in terms of bounded trace map orbits. Using this, it is shown that the operators have purely singular continuous spectrum and their spectrum is a Cantor set of Lebesgue measure zero. We also exhibit a subclass having purely ffcontinuous spectrum. All these results hold uniformly on the hull generated by a given potential.
Enumeration of Planar Constellations
 Adv. in Appl. Math
, 2000
"... The enumeration of transitive ordered factorizations of a given permutation is a combinatorial problem related to singularity theory. Let n ? 1, m ? 2, and let oe 0 be a permutation of Sn having d i cycles of length i, for i ? 1. We prove that the number of mtuples (oe 1 ; : : : ; oe m ) of permut ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
The enumeration of transitive ordered factorizations of a given permutation is a combinatorial problem related to singularity theory. Let n ? 1, m ? 2, and let oe 0 be a permutation of Sn having d i cycles of length i, for i ? 1. We prove that the number of mtuples (oe 1 ; : : : ; oe m ) of permutations of Sn such that: ffl oe 1 oe 2 \Delta \Delta \Delta oe m = oe 0 , ffl the group generated by oe 1 ; : : : ; oe m acts transitively on f1; 2; : : : ; ng, ffl P m i=0 c(oe i ) = n(m \Gamma 1) + 2, where c(oe i ) denotes the number of cycles of oe i , is m [(m \Gamma 1)n \Gamma 1]! [(m \Gamma 1)n \Gamma c(oe0 ) + 2]! Y i?1 " i / mi \Gamma 1 i !# d i : A onetoone correspondence relates these mtuples to some rooted planar maps, which we call constellations and enumerate via a bijection with some bicolored trees. For m = 2, we recover a formula of Tutte for the number of Eulerian maps. The proof extends the method applied in [21] to the latter case, and relies on the idea...
Transcendence of Sturmian or morphic continued fractions
 J. Number Theory
"... Communicated byM. Waldschmidt ..."
Efficient detection of unusual words
 J. COMP. BIOL
, 2000
"... Words that are, by some measure, over or underrepresented in the context of larger sequences have been variously implicated in biological functions and mechanisms. In most approaches to such anomaly detections, the words (up to a certain length) are enumerated more or less exhaustively and are indi ..."
Abstract

Cited by 37 (8 self)
 Add to MetaCart
Words that are, by some measure, over or underrepresented in the context of larger sequences have been variously implicated in biological functions and mechanisms. In most approaches to such anomaly detections, the words (up to a certain length) are enumerated more or less exhaustively and are individually checked in terms of observed and expected frequencies, variances, and scores of discrepancy and significance thereof. Here we take the global approach of annotating the suffix tree of a sequence with some such values and scores, having in mind to use it as a collective detector of all unexpected behaviors, or perhaps just as a preliminary filter for words suspicious enough to undergo a more accurate scrutiny. We consider in depth the simple probabilistic model in which sequences are produced by a random source emitting symbols from a known alphabet independently and according to a given distribution. Our main result consists of showing that, within this model, full tree annotations can be carried out in a timeandspace optimal fashion for the mean, variance and some of the adopted measures of significance. This result is achieved by an ad hoc embedding in statistical expressions of the combinatorial structure of the periods of a string. Specifically,