Results 1  10
of
60
Finding Maximal Repetitions in a Word in Linear Time
 In Symposium on Foundations of Computer Science
, 1999
"... A repetition in a word is a subword with the period of at most half of the subword length. We study maximal repetitions occurring in, that is those for which any extended subword of has a bigger period. The set of such repetitions represents in a compact way all repetitions in.We first prove a combi ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
A repetition in a word is a subword with the period of at most half of the subword length. We study maximal repetitions occurring in, that is those for which any extended subword of has a bigger period. The set of such repetitions represents in a compact way all repetitions in.We first prove a combinatorial result asserting that the sum of exponents of all maximal repetitions of a word of length is bounded by a linear function in. This implies, in particular, that there is only a linear number of maximal repetitions in a word. This allows us to construct a lineartime algorithm for finding all maximal repetitions. Some consequences and applications of these results are discussed, as well as related works. 1.
Algorithms for Discovering Repeated Patterns in Multidimensional Representations of Polyphonic Music
, 2003
"... In this paper we give an overview of four algorithms that we have developed for pattern matching, pattern discovery and data compression in multidimensional datasets. We show that these algorithms can fruitfully be used for processing musical data. In particular, we show that our algorithms can disc ..."
Abstract

Cited by 47 (14 self)
 Add to MetaCart
In this paper we give an overview of four algorithms that we have developed for pattern matching, pattern discovery and data compression in multidimensional datasets. We show that these algorithms can fruitfully be used for processing musical data. In particular, we show that our algorithms can discover instances of perceptually signifrant musica 1 repetition that cannot be found using previous approaches. We also describe results that suggest the possibility of using our datacompression algorithm for modelling expert motivicthematic music analysis.
Algorithms for Computing Approximate Repetitions in Musical Sequences
 International Journal of Computer Mathematics
, 1999
"... Here we introduce two new notions of approximate matching with application in computer assisted music analysis. We present algorithms for each notion of approximation: for approximate string matching and for computing approximate squares. ..."
Abstract

Cited by 44 (14 self)
 Add to MetaCart
Here we introduce two new notions of approximate matching with application in computer assisted music analysis. We present algorithms for each notion of approximation: for approximate string matching and for computing approximate squares.
On Maximal Repetitions in Words
 J. Discrete Algorithms
, 1999
"... A (fractional) repetition in a word w is a subword with the period of at most half of the subword length. We study maximal repetitions occurring in w, that is those for which any extended subword of w has a bigger period. The set of such repetitions represents in a compact way all repetitions in w. ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
A (fractional) repetition in a word w is a subword with the period of at most half of the subword length. We study maximal repetitions occurring in w, that is those for which any extended subword of w has a bigger period. The set of such repetitions represents in a compact way all repetitions in w. We first study maximal repetitions in Fibonacci words  we count their exact number, and estimate the sum of their exponents. These quantities turn out to be linearlybounded in the length of the word. We then prove that the maximal number of maximal repetitions in general words (on arbitrary alphabet) of length n is linearlybounded in n, and we mention some applications and consequences of this result.
Linear Time Algorithms for Finding and Representing all Tandem Repeats in a String
 TREES, AND SEQUENCES: COMPUTER SCIENCE AND COMPUTATIONAL BIOLOGY
, 1998
"... A tandem repeat (or square) is a string ffff, where ff is a nonempty string. We present an O(jSj)time algorithm that operates on the suffix tree T (S) for a string S, finding and marking the endpoint in T (S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
A tandem repeat (or square) is a string ffff, where ff is a nonempty string. We present an O(jSj)time algorithm that operates on the suffix tree T (S) for a string S, finding and marking the endpoint in T (S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents all occurrences of tandem repeats in S, and can be used to efficiently solve many questions concerning tandem repeats and tandem arrays in S. This improves and generalizes several prior efforts to efficiently capture large subsets of tandem repeats.
Finding maximal pairs with bounded gap
 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1645 of Lecture Notes in Computer Science
, 1999
"... A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this pape ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this paper we present methods for finding all maximal pairs under various constraints on the gap. In a string of length n we can find all maximal pairs with gap in an upper and lower bounded interval in time O(n log n + z) where z is the number of reported pairs. If the upper bound is removed the time reduces to O(n+z). Since a tandem repeat is a pair where the gap is zero, our methods can be seen as a generalization of finding tandem repeats. The running time of our methods equals the running time of well known methods for finding tandem repeats.
Finding approximate repetitions under Hamming distance
 Theoretical Computer Science
, 2001
"... The problem of computing tandem repetitions with K possible mismatches is studied. Two main definitions are considered, and for both of them an O(nK log K + S) algorithm is proposed (S the size of the output). This improves, in particular, the bound obtained in [LS93]. Finally, other possible defini ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
The problem of computing tandem repetitions with K possible mismatches is studied. Two main definitions are considered, and for both of them an O(nK log K + S) algorithm is proposed (S the size of the output). This improves, in particular, the bound obtained in [LS93]. Finally, other possible definions are briefly analyzed.
How Many Squares Can a String Contain?
, 1998
"... . All our words (strings) are over a fixed alphabet. A square is a subword of the form uu = u 2 , where u is a nonempty word. Two squares are distinct if they are of different shape, not just translates of each other. A word u is primitive if u cannot be written in the form u = v j for some j ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
. All our words (strings) are over a fixed alphabet. A square is a subword of the form uu = u 2 , where u is a nonempty word. Two squares are distinct if they are of different shape, not just translates of each other. A word u is primitive if u cannot be written in the form u = v j for some j 2. A square u 2 with u primitive is primitive rooted. Let M (n) denote the maximum number of distinct squares, P (n) the number of distinct primitive rooted squares in a word of length n. We prove: no position in any word can be the beginning of the rightmost occurrence of more than two squares, from which we deduce M (n) ! 2n for all n ? 0, and P (n) = n \Gamma o(n) for infinitely many n. 1. Introduction We consider words (strings, sequences) over a fixed alphabet throughout. A square is a pair of identical adjacent subwords, such as 1011010110 = 10110 2 over the binary alphabet. Two squares are distinct if they are of different shape, not just translates of each other. Denote by M (n...
Identifying Satellites and Periodic Repetitions in Biological Sequences
, 1998
"... We present in this paper an algorithm for identifying satellites in DNA sequences. Satellites (simple, micro, or mini) are repeats in number between 30 and as many as 1,000,000 whose lengths vary between 2 and hundreds of base pairs and that appear, with some mutations, in tandem along the sequenc ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We present in this paper an algorithm for identifying satellites in DNA sequences. Satellites (simple, micro, or mini) are repeats in number between 30 and as many as 1,000,000 whose lengths vary between 2 and hundreds of base pairs and that appear, with some mutations, in tandem along the sequence. We concentrate here on short to moderately long (up to 3040 base pairs) approximate tandem repeats where copies may di#er up to # = 1520% from a consensus model of the repeating unit (implying individual units may vary by 2# from each other). The algorithm is composed of two parts. The first one consists of a filter that basically eliminates all regions whose probability of containing a satellite is less than one in 10 4 when # = 10%. The second part realizes an exhaustive exploration of the space of all possible models for the repeating units present in the sequence. It therefore has the advantage over previous work of being able to report a consensus model, say m, of the repe...
A characterization of the Squares in a Fibonacci string
 THEORETICAL COMPUTER SCIENCE
"... A (finite) Fibonacci string F n is defined as follows: F 0 = b, F 1 = a; for every integer n 2, F n = F n\Gamma1 F n\Gamma2 . For n 1, the length of F n is denoted by f n = jF n j. The infinite Fibonacci string F is the string which contains every F n , n 1, as a prefix. Apart from their general ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
A (finite) Fibonacci string F n is defined as follows: F 0 = b, F 1 = a; for every integer n 2, F n = F n\Gamma1 F n\Gamma2 . For n 1, the length of F n is denoted by f n = jF n j. The infinite Fibonacci string F is the string which contains every F n , n 1, as a prefix. Apart from their general theoretical importance, Fibonacci strings are often cited as worst case examples for algorithms which compute all the repetitions or all the "Abelian squares" in a given string. In this paper we provide a characterization of all the squares in F , hence in every prefix F n ; this characterization naturally gives rise to a \Theta(f n ) algorithm which specifies all the squares of F n in an appropriate encoding. This encoding is made possible by the fact that the squares of F n occur consecutively, in "runs", the number of which is \Theta(f n ). By contrast, the known general algorithms for the computation of the repetitions in an arbitrary string require \Theta(f n log f n ) time (and pro...