Results 1  10
of
15
The String BTree: A New Data Structure for String Search in External Memory and its Applications.
 Journal of the ACM
, 1998
"... We introduce a new textindexing data structure, the String BTree, that can be seen as a link between some traditional externalmemory and stringmatching data structures. In a short phrase, it is a combination of Btrees and Patricia tries for internalnode indices that is made more effective by a ..."
Abstract

Cited by 122 (11 self)
 Add to MetaCart
We introduce a new textindexing data structure, the String BTree, that can be seen as a link between some traditional externalmemory and stringmatching data structures. In a short phrase, it is a combination of Btrees and Patricia tries for internalnode indices that is made more effective by adding extra pointers to speed up search and update operations. Consequently, the String BTree overcomes the theoretical limitations of inverted files, Btrees, prefix Btrees, suffix arrays, compacted tries and suffix trees. String Btrees have the same worstcase performance as Btrees but they manage unboundedlength strings and perform much more powerful search operations such as the ones supported by suffix trees. String Btrees are also effective in main memory (RAM model) because they improve the online suffix tree search on a dynamic set of strings. They also can be successfully applied to database indexing and software duplication.
Combinatorial algorithms for DNA sequence assembly
 Algorithmica
, 1993
"... The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The seq ..."
Abstract

Cited by 45 (3 self)
 Add to MetaCart
The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, complicated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NPhard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice. Our method is robust in the sense that it can accommodate high sequencing error rates and list a series of alternate solutions in the event that several appear equally good. Moreover it uses a limited form ...
Incremental String Comparison
 SIAM JOURNAL ON COMPUTING
, 1995
"... The problem of comparing two sequences A and B to determine their LCS or the edit distance between them has been much studied. In this paper we consider the following incremental version of these problems: given an appropriate encoding of a comparison between A and B, can one incrementally compute t ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
The problem of comparing two sequences A and B to determine their LCS or the edit distance between them has been much studied. In this paper we consider the following incremental version of these problems: given an appropriate encoding of a comparison between A and B, can one incrementally compute the answer for A and bB, and the answer for A and Bb with equal efficiency, where b is an additional symbol? Our main result is a theorem exposing a surprising relationship between the dynamic programming solutions for two such "adjacent" problems. Given a threshold k on the number of differences to be permitted in an alignment, the theorem leads directly to an O(k) algorithm for incrementally computing a new solution from an old one, as contrasts the O(k²) time required to compute a solution from scratch. We further show with a series of applications that this algorithm is indeed more powerful than its nonincremental counterpart by solving the applications with greater asymptotic ef...
Suffix Trees and their Applications in String Algorithms
, 1993
"... : The suffix tree is a compacted trie that stores all suffixes of a given text string. This data structure has been intensively employed in pattern matching on strings and trees, with a wide range of applications, such as molecular biology, data processing, text editing, term rewriting, interpreter ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
: The suffix tree is a compacted trie that stores all suffixes of a given text string. This data structure has been intensively employed in pattern matching on strings and trees, with a wide range of applications, such as molecular biology, data processing, text editing, term rewriting, interpreter design, information retrieval, abstract data types and many others. In this paper, we survey some applications of suffix trees and some algorithmic techniques for their construction. Special emphasis is given to the most recent developments in this area, such as parallel algorithms for suffix tree construction and generalizations of suffix trees to higher dimensions, which are important in multidimensional pattern matching. Work partially supported by the ESPRIT BRA ALCOM II under contract no. 7141 and by the Italian MURST Project "Algoritmi, Modelli di Calcolo e Strutture Informative". y Part of this work was done while the author was visiting AT&T Bell Laboratories. Email: grossi@di.uni...
A 2 2/3Approximation Algorithms for the Shortest Superstring Problem
 DIMACS WORKSHOP ON SEQUENCING AND MAPPING
, 1995
"... Given a collection of strings S = fs1; : : : ; sng over an alphabet, a superstring of S is a string containing each si as a substring; that is, for each i, 1 i n, contains a block of jsij consecutive characters that match si exactly. The shortest superstring problem is the problem of nding a superst ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Given a collection of strings S = fs1; : : : ; sng over an alphabet, a superstring of S is a string containing each si as a substring; that is, for each i, 1 i n, contains a block of jsij consecutive characters that match si exactly. The shortest superstring problem is the problem of nding a superstring of minimum length. The shortest superstring problem has applications in both data compression and computational biology. In data compression, the problem is a part of a general model of string compression proposed by Gallant, Maier and Storer (JCSS '80). Much of the recent interest in the problem is due to its application to DNA sequence assembly. The problem has been shown to be NPhard; in fact, it was shown by Blum et al.(JACM '94) to be MAX SNPhard. The rst O(1)approximation was also due to Blum et al., who gave an algorithm that always returns a superstring no more than 3 times the length of an optimal solution. Several researchers have published results that improve on the approximation ratio; of these, the best previous result is our algorithm ShortString, which achieves a 2 3
Compressed index for dynamic text, in
 Proc. of Data Compression Conference (DCC
, 2004
"... This paper investigates how to index a text which is subject to updates. The best solution in the literature [6] is based on suffix tree using O(n log n) bits of storage, where n is the length of the text. It supports finding all occurrences of a pattern P in O(P  + occ) time, where occ is the num ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
This paper investigates how to index a text which is subject to updates. The best solution in the literature [6] is based on suffix tree using O(n log n) bits of storage, where n is the length of the text. It supports finding all occurrences of a pattern P in O(P  + occ) time, where occ is the number of occurrences. Each text update consists of inserting or deleting a substring of length y and can be supported in O(y + n) time. In this paper, we initiate the study of compressed index using only O(n log Σ) bits of space, where Σ denotes the alphabet. Our solution supports finding all occurrences of a pattern P in O(P  log2 n(log n + log Σ) + occ log1+ n) time, while insertion or deletion of a substring of length y can be done in O((y + n) log2+ n) amortized time, where 0 < ≤ 1. The core part of our data structure is based on the recent work on Compressed Suffix Trees
Algorithms and Orders for Finding Noncommutative Gröbner Bases
, 1997
"... The problem of choosing efficient algorithms and good admissible orders for computing Gröbner bases in noncommutative algebras is considered. Gröbner bases are an important tool that make many problems in polynomial algebra computationally tractable. However, the computation of Grobner bases is expe ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
The problem of choosing efficient algorithms and good admissible orders for computing Gröbner bases in noncommutative algebras is considered. Gröbner bases are an important tool that make many problems in polynomial algebra computationally tractable. However, the computation of Grobner bases is expensive, and in noncommutative algebras is not guaranteed to terminate. The algorithm, together with the order used to determine the leading term of each polynomial, are known to affect the cost of the computation, and are the focus of this thesis. A Gröbner basis is a set of polynomials computed, using Buchberger's algorithm, from another set of polynomials. The noncommutative form of Buchberger's algorithm repeatedly constructs a new polynomial from a triple, which is a pair of polynomials whose leading terms overlap and form a nontrivial common multiple. The algorithm leaves a number of details underspecified, and can be altered to improve its behavior. A significant improvement is the devel...
Sequential and Parallel Approximation of Shortest Superstrings
, 1997
"... Superstrings have many applications in data compression and genetics. However, the decision version of the shortest superstring problem is N Pcomplete. In this paper we examine the complexity of approximating shortest superstrings. There are two basic measures of the approximations: the length fact ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Superstrings have many applications in data compression and genetics. However, the decision version of the shortest superstring problem is N Pcomplete. In this paper we examine the complexity of approximating shortest superstrings. There are two basic measures of the approximations: the length factor and the compression factor. The well known and practical approximation algorithm is the sequential algorithm GREEDY. It approximates the shortest superstring with the compression 1 factor of and with the length factor of 4. Our main results are: Ž. 2 1 A sequential length approximation algorithm which achieves a length factor of 2.83. This result improves the best previously known bound of 2.89 due to Teng and Yao. Very recently, this bound was improved by Kosaraju, Park, and Stein to 2.79, and by Armen and Stein to 2.75. Ž. 2 A proof that the algorithm GREEDY is not paralleliz
Incremental clone detection. Diploma thesis
, 2008
"... I would like to thank the members of the Software Engineering Group for discussing my ideas and giving me feedback. Special thanks go to Rainer Koschke and Renate KlempienHinrichs for supervising this thesis. Furthermore, I would like to apologize to my family and friends, who did not get the share ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
I would like to thank the members of the Software Engineering Group for discussing my ideas and giving me feedback. Special thanks go to Rainer Koschke and Renate KlempienHinrichs for supervising this thesis. Furthermore, I would like to apologize to my family and friends, who did not get the share of my time that they deserved. Declaration of Authorship I declare that I wrote this thesis without external help. I did not use any sources except those explicitly stated or referenced in the bibliography. All parts which have been literally or according to their meaning taken from publications are indicated as such.