Results 1  10
of
39
The enhanced suffix array and its applications to genome analysis
 In Proc. Workshop on Algorithms in Bioinformatics, in Lecture Notes in Computer Science
, 2002
"... Abstract. In large scale applications as computational genome analysis, the space requirement of the suffix tree is a severe drawback. In this paper, we present a uniform framework that enables us to systematically replace every string processing algorithm that is based on a bottomup traversal of a ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
Abstract. In large scale applications as computational genome analysis, the space requirement of the suffix tree is a severe drawback. In this paper, we present a uniform framework that enables us to systematically replace every string processing algorithm that is based on a bottomup traversal of a suffix tree by a corresponding algorithm based on an enhanced suffix array (a suffix array enhanced with the lcptable). In this framework, we will show how maximal, supermaximal, and tandem repeats, as well as maximal unique matches can be efficiently computed. Because enhanced suffix arrays require much less space than suffix trees, very large genomes can now be indexed and analyzed, a task which was not feasible before. Experimental results demonstrate that our programs require not only less space but also much less time than other programs developed for the same tasks. 1
Finding approximate repetitions under Hamming distance
 Theoretical Computer Science
, 2001
"... The problem of computing tandem repetitions with K possible mismatches is studied. Two main definitions are considered, and for both of them an O(nK log K + S) algorithm is proposed (S the size of the output). This improves, in particular, the bound obtained in [LS93]. Finally, other possible defini ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
The problem of computing tandem repetitions with K possible mismatches is studied. Two main definitions are considered, and for both of them an O(nK log K + S) algorithm is proposed (S the size of the output). This improves, in particular, the bound obtained in [LS93]. Finally, other possible definions are briefly analyzed.
Finding approximate tandem repeats in genomic sequences
 J. Comp. Biol
, 2005
"... An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and examined ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and examined and its effectiveness on genomic data is demonstrated.
New lower bounds for the maximum number of runs in a string
 in Proc. Prague Stringology Conference (PSC’08), 2008
"... Abstract. We show a new lower bound for the maximum number of runs in a string. We prove that for any ε> 0, (α − ε)n is an asymptotic lower bound, where α = 174719/184973 ≈ 0.944565. It is superior to the previous bound 3/(1 + √ 5) ≈ 0.927 given by Franěk et al. [6,7]. Moreover, our construction of ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Abstract. We show a new lower bound for the maximum number of runs in a string. We prove that for any ε> 0, (α − ε)n is an asymptotic lower bound, where α = 174719/184973 ≈ 0.944565. It is superior to the previous bound 3/(1 + √ 5) ≈ 0.927 given by Franěk et al. [6,7]. Moreover, our construction of the strings and the proof is much simpler than theirs. 1
Finding Repeats With Fixed Gap
 IN: PROC. OF THE 7TH INT’L SYMP. ON STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE). WASHINGTON: IEEE COMPUTER SOCIETY
, 2000
"... We propose an algorithm for finding in a word all pairs of occurrences of the same subword with a given distance r between them. The obtained complexity is O(n log r + S), where S is the size of the output. We also show how the algorithm can be modified in order to find all such pairs of occurrences ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
We propose an algorithm for finding in a word all pairs of occurrences of the same subword with a given distance r between them. The obtained complexity is O(n log r + S), where S is the size of the output. We also show how the algorithm can be modified in order to find all such pairs of occurrences separated by a given word. The solution uses an algorithm for finding all quasisquares in two strings, a problem that generalizes the known problem of searching for squares.
Computing longest previous factor in linear time and applications
"... Abstract. We give two optimal lineartime algorithms for computing the Longest Previous Factor (LPF) array corresponding to a string w. For any position i in w, LPF[i] gives the length of the longest factor of w starting at position i that occurs previously in w. Several properties and applications ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Abstract. We give two optimal lineartime algorithms for computing the Longest Previous Factor (LPF) array corresponding to a string w. For any position i in w, LPF[i] gives the length of the longest factor of w starting at position i that occurs previously in w. Several properties and applications of LPF are investigated. They include computing the LempelZiv factorization of a string and detecting all repetitions (runs) in a string in linear time independently of the integer alphabet size.
Efficient Algorithms for Handling Molecular Weighted Sequences
 In 3rd IFIP International Conference on Theoretical Computer Science
, 2004
"... Abstract In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNAProtein Binding Proce ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
Abstract In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNAProtein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a very important procedure in the translation of gene expression and regulation. We present time and space efficient algorithms for constructing the weighted suffix tree and some applications of the proposed data structure to problems taken from the Molecular Biology area such as pattern matching, repeats discovery, discovery of the longest common subsequence of two weighted sequences and computation of covers.
Lineartime computation of local periods
 Theoret. Comput. Sci
"... Abstract. We present a lineartime algorithm for computing all local periods of a given word. This subsumes (but is substantially more powerful than) the computation of the (global) period of the word and on the other hand, the computation of a critical factorization, implied by the Critical Factori ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. We present a lineartime algorithm for computing all local periods of a given word. This subsumes (but is substantially more powerful than) the computation of the (global) period of the word and on the other hand, the computation of a critical factorization, implied by the Critical Factorization Theorem. 1