Results 1 - 10
of
21
The enhanced suffix array and its applications to genome analysis
- In Proc. Workshop on Algorithms in Bioinformatics, in Lecture Notes in Computer Science
, 2002
"... Abstract. In large scale applications as computational genome analysis, the space requirement of the suffix tree is a severe drawback. In this paper, we present a uniform framework that enables us to systematically replace every string processing algorithm that is based on a bottomup traversal of a ..."
Abstract
-
Cited by 38 (5 self)
- Add to MetaCart
Abstract. In large scale applications as computational genome analysis, the space requirement of the suffix tree is a severe drawback. In this paper, we present a uniform framework that enables us to systematically replace every string processing algorithm that is based on a bottomup traversal of a suffix tree by a corresponding algorithm based on an enhanced suffix array (a suffix array enhanced with the lcp-table). In this framework, we will show how maximal, supermaximal, and tandem repeats, as well as maximal unique matches can be efficiently computed. Because enhanced suffix arrays require much less space than suffix trees, very large genomes can now be indexed and analyzed, a task which was not feasible before. Experimental results demonstrate that our programs require not only less space but also much less time than other programs developed for the same tasks. 1
Finding approximate repetitions under Hamming distance
- Theoretical Computer Science
, 2001
"... The problem of computing tandem repetitions with K possible mismatches is studied. Two main definitions are considered, and for both of them an O(nK log K + S) algorithm is proposed (S the size of the output). This improves, in particular, the bound obtained in [LS93]. Finally, other possible defini ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
The problem of computing tandem repetitions with K possible mismatches is studied. Two main definitions are considered, and for both of them an O(nK log K + S) algorithm is proposed (S the size of the output). This improves, in particular, the bound obtained in [LS93]. Finally, other possible definions are briefly analyzed.
Efficient Algorithms for Handling Molecular Weighted Sequences
- In 3rd IFIP International Conference on Theoretical Computer Science
, 2004
"... Abstract In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Proce ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Abstract In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a very important procedure in the translation of gene expression and regulation. We present time and space efficient algorithms for constructing the weighted suffix tree and some applications of the proposed data structure to problems taken from the Molecular Biology area such as pattern matching, repeats discovery, discovery of the longest common subsequence of two weighted sequences and computation of covers.
Finding Repeats With Fixed Gap
- IN: PROC. OF THE 7TH INT’L SYMP. ON STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE). WASHINGTON: IEEE COMPUTER SOCIETY
, 2000
"... We propose an algorithm for finding in a word all pairs of occurrences of the same subword with a given distance r between them. The obtained complexity is O(n log r + S), where S is the size of the output. We also show how the algorithm can be modified in order to find all such pairs of occurrences ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We propose an algorithm for finding in a word all pairs of occurrences of the same subword with a given distance r between them. The obtained complexity is O(n log r + S), where S is the size of the output. We also show how the algorithm can be modified in order to find all such pairs of occurrences separated by a given word. The solution uses an algorithm for finding all quasi-squares in two strings, a problem that generalizes the known problem of searching for squares.
Linear-time computation of local periods
- Theoret. Comput. Sci
"... Abstract. We present a linear-time algorithm for computing all local periods of a given word. This subsumes (but is substantially more powerful than) the computation of the (global) period of the word and on the other hand, the computation of a critical factorization, implied by the Critical Factori ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. We present a linear-time algorithm for computing all local periods of a given word. This subsumes (but is substantially more powerful than) the computation of the (global) period of the word and on the other hand, the computation of a critical factorization, implied by the Critical Factorization Theorem. 1
New lower bounds for the maximum number of runs in a string
- in Proc. Prague Stringology Conference (PSC’08), 2008
"... Abstract. We show a new lower bound for the maximum number of runs in a string. We prove that for any ε> 0, (α − ε)n is an asymptotic lower bound, where α = 174719/184973 ≈ 0.944565. It is superior to the previous bound 3/(1 + √ 5) ≈ 0.927 given by Franěk et al. [6,7]. Moreover, our construction of ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. We show a new lower bound for the maximum number of runs in a string. We prove that for any ε> 0, (α − ε)n is an asymptotic lower bound, where α = 174719/184973 ≈ 0.944565. It is superior to the previous bound 3/(1 + √ 5) ≈ 0.927 given by Franěk et al. [6,7]. Moreover, our construction of the strings and the proof is much simpler than theirs. 1
Computing longest previous factor in linear time and applications
"... Abstract. We give two optimal linear-time algorithms for computing the Longest Previous Factor (LPF) array corresponding to a string w. For any position i in w, LPF[i] gives the length of the longest factor of w starting at position i that occurs previously in w. Several properties and applications ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. We give two optimal linear-time algorithms for computing the Longest Previous Factor (LPF) array corresponding to a string w. For any position i in w, LPF[i] gives the length of the longest factor of w starting at position i that occurs previously in w. Several properties and applications of LPF are investigated. They include computing the Lempel-Ziv factorization of a string and detecting all repetitions (runs) in a string in linear time independently of the integer alphabet size. Key words: algorithms design, strings, suffix array, longest common prefix, longest previous factor, Lempel–Ziv factorization, repetitions, runs MSC: 68W05, 68W40, 68R15 1
In-place update of suffix array while recoding words
- in "International Journal of Foundations of Computer Science (IJFCS)", 2010. Symbiose 31
"... Abstract. Motivated by grammatical inference and data compression applications, we propose an algorithm to update a suffix array after the substitution, in the indexed text, of some occurrences of a given word by a new character. Compared to other published index update methods, the problem addresse ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Motivated by grammatical inference and data compression applications, we propose an algorithm to update a suffix array after the substitution, in the indexed text, of some occurrences of a given word by a new character. Compared to other published index update methods, the problem addressed here may require the modification of a large number of distinct positions over the original text. The proposed algorithm uses the specific internal order of suffix arrays in order to update simultaneously groups of entries, and ensures that only entries to be modified are visited. Experiments confirm a significant execution time speed-up compared to the construction of suffix array from scratch at each step of the application.

