Results 1 
9 of
9
On the power of profiles for transcription factor binding site detection
 STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY
, 2003
"... Transcription factor binding site (TFBS) detection plays an important role in computational biology, with applications in gene finding and gene regulation. The sites are often modeled by gapless profiles, also known as positionweight matrices. Past research has focused on the significance of profil ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
Transcription factor binding site (TFBS) detection plays an important role in computational biology, with applications in gene finding and gene regulation. The sites are often modeled by gapless profiles, also known as positionweight matrices. Past research has focused on the significance of profile scores (the ability to avoid false positives), but this alone is not enough: The profile must also possess the power to detect the true positive signals. Several completed genomes are now available, and the search for TFBSs is moving to a large scale; so discriminating signal from noise becomes even more challenging. Since TFBS profiles are usually estimated from only a few experimentally confirmed instances, careful regularization is an important issue. We present a novel method that is well suited for this situation. We further develop measures that help in judging profile quality, based on both sensitivity and selectivity of a profile. It is shown that these quality measures can be efficiently computed, and we propose statistically wellfounded methods to choose score thresholds. Our findings are applied to the TRANSFAC database of transcription factor binding sites. The results are disturbing: If we insist on a significance level of 5 % in sequences of length 500, only 19 % of the profiles detect a true signal instance with 95 % success probability under varying background sequence compositions.
Partial Words and the Critical Factorization Theorem
 J. Combin. Theory Ser. A
, 2007
"... The study of combinatorics on words, or finite sequences of symbols from a finite alphabet, finds applications in several areas of biology, computer science, mathematics, and physics. Molecular biology, in particular, has stimulated considerable interest in the study of combinatorics on partial word ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
The study of combinatorics on words, or finite sequences of symbols from a finite alphabet, finds applications in several areas of biology, computer science, mathematics, and physics. Molecular biology, in particular, has stimulated considerable interest in the study of combinatorics on partial words that are sequences that may have a number of “do not know ” symbols also called “holes”. This paper is devoted to a fundamental result on periods of words, the Critical Factorization Theorem, which states that the period of a word is always locally detectable in at least one position of the word resulting in a corresponding critical factorization. Here, we describe precisely the class of partial words w with one hole for which the weak period is locally detectable in at least one position of w. Our proof provides an algorithm which computes a critical factorization when one exists. A World Wide Web server interface at
Lineartime computation of local periods
 Theoret. Comput. Sci
"... Abstract. We present a lineartime algorithm for computing all local periods of a given word. This subsumes (but is substantially more powerful than) the computation of the (global) period of the word and on the other hand, the computation of a critical factorization, implied by the Critical Factori ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. We present a lineartime algorithm for computing all local periods of a given word. This subsumes (but is substantially more powerful than) the computation of the (global) period of the word and on the other hand, the computation of a critical factorization, implied by the Critical Factorization Theorem. 1
Equations on partial words
 MFCS 2006 31st International Symposium on Mathematical Foundations of Computer Science, Lecture Notes in Computer Science
, 2006
"... It is well known that some of the most basic properties of words, like the commutativity (xy = yx) and the conjugacy (xz = zy), can be expressed as solutions of word equations. An important problem is to decide whether or not a given equation on words has a solution. For instance, the equation x m y ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
It is well known that some of the most basic properties of words, like the commutativity (xy = yx) and the conjugacy (xz = zy), can be expressed as solutions of word equations. An important problem is to decide whether or not a given equation on words has a solution. For instance, the equation x m y n = z p has only periodic solutions in a free monoid, that is, if x m y n = z p holds with integers m, n, p ≥ 2, then there exists a word w such that x, y, z are powers of w. This result, which received a lot of attention, was first proved by Lyndon and Schützenberger for free groups. In this paper, we investigate equations on partial words. Partial words are sequences over a finite alphabet that may contain a number of “do not know ” symbols. When we speak about equations on partial words, we replace the notion of equality (=) with compatibility (↑). Among other equations, we solve xy ↑ yx, xz ↑ zy, and special cases of x m y n ↑ z p for integers m, n, p ≥ 2.
On the Distribution of the Number of Missing Words in Random Texts
, 2003
"... Introduction Let X be the ran[ mn umber of missin words of len th q (also called qgrams) a ran[' text of lenfi0 n+q1 over an alphabet # of size #. Theun4' lyin probability space is (# P(# ),# ),where# is theun)4 rm distribution on #. Let := X ; this is classicallyin terpreted as th ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Introduction Let X be the ran[ mn umber of missin words of len th q (also called qgrams) a ran[' text of lenfi0 n+q1 over an alphabet # of size #. Theun4' lyin probability space is (# P(# ),# ),where# is theun)4 rm distribution on #. Let := X ; this is classicallyin terpreted as then umber of empty urn after n balls have been thrown ran7 mly a nin]' en7' tlyin to # ur n) It isin terestin to compare the laws of X .In othcases,nqgrams are ran[ mly drawn The di#eren7 is that Y ts then they are drawn ink2 en7[ tly, a n X ts then they are drawn from a sin42 text of len'fi n+q1, such that two successive qgrams overlap by q 1 characters. The law of Y is quite wellun)7 stood (for example, see [7, 14] , but the law of X has received little atten tion so far. An algorithmic approach to compute the E.R. is supported by the Programme interEPST BioInformatique 2000, the Genopole from Montpellier, and by the University of Montpellier II. expectation has
unknown title
, 2009
"... Using reads to annotate the genome: influence of length, background distribution, and sequence errors on prediction capacity ..."
Abstract
 Add to MetaCart
Using reads to annotate the genome: influence of length, background distribution, and sequence errors on prediction capacity
unknown title
, 2009
"... doi:10.1093/nar/gkp492 Using reads to annotate the genome: influence of length, background distribution, and sequence errors on prediction capacity ..."
Abstract
 Add to MetaCart
doi:10.1093/nar/gkp492 Using reads to annotate the genome: influence of length, background distribution, and sequence errors on prediction capacity
unknown title
, 2012
"... Analysis of an exhaustive search algorithm in random graphs and the n c log nasymptotics ..."
Abstract
 Add to MetaCart
Analysis of an exhaustive search algorithm in random graphs and the n c log nasymptotics