Results 1 - 10
of
13
The String Edit Distance Matching Problems with Moves
, 2006
"... The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smalles ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t. We relax the problem so that (a) we allow an additional operation, namely, substring moves, and (b) we allow approximation of this string edit distance. Our result is a near linear time deterministic algorithm to produce a factor of O(log n log ∗ n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique we call Edit
Optimal Parallel Suffix Tree Construction
, 1997
"... An O(m)-work, O(m)-space, O(log m)-time CREW-PRAM algorithm for constructing the suffix tree of a string s of length m drawn from any fixed alphabet set is obtained. This is the first known work and space optimal parallel algorithm for this problem. It can be generalized to a string s drawn fr ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
An O(m)-work, O(m)-space, O(log m)-time CREW-PRAM algorithm for constructing the suffix tree of a string s of length m drawn from any fixed alphabet set is obtained. This is the first known work and space optimal parallel algorithm for this problem. It can be generalized to a string s drawn from any general alphabet set to perform in O(log m) time and O(m log j\Sigmaj) work and space, after the characters in s have been sorted alphabetically, where j\Sigmaj is the number of distinct characters in s. In this case too, the algorithm is work-optimal.
Suffix Trees and their Applications in String Algorithms
, 1993
"... : The suffix tree is a compacted trie that stores all suffixes of a given text string. This data structure has been intensively employed in pattern matching on strings and trees, with a wide range of applications, such as molecular biology, data processing, text editing, term rewriting, interpreter ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
: The suffix tree is a compacted trie that stores all suffixes of a given text string. This data structure has been intensively employed in pattern matching on strings and trees, with a wide range of applications, such as molecular biology, data processing, text editing, term rewriting, interpreter design, information retrieval, abstract data types and many others. In this paper, we survey some applications of suffix trees and some algorithmic techniques for their construction. Special emphasis is given to the most recent developments in this area, such as parallel algorithms for suffix tree construction and generalizations of suffix trees to higher dimensions, which are important in multidimensional pattern matching. Work partially supported by the ESPRIT BRA ALCOM II under contract no. 7141 and by the Italian MURST Project "Algoritmi, Modelli di Calcolo e Strutture Informative". y Part of this work was done while the author was visiting AT&T Bell Laboratories. Email: grossi@di.uni...
Optimal Logarithmic Time Randomized Suffix Tree Construction
- In Proc 23rd ICALP
, 1996
"... The su#x tree of a string, the fundamental data structure in the area of combinatorial pattern matching, has many elegant applications. In this paper, we present a novel, simple sequential algorithm for the construction of su#x trees. We are also able to parallelize our algorithm so that we settl ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
The su#x tree of a string, the fundamental data structure in the area of combinatorial pattern matching, has many elegant applications. In this paper, we present a novel, simple sequential algorithm for the construction of su#x trees. We are also able to parallelize our algorithm so that we settle the main open problem in the construction of su#x trees: we give a Las Vegas CRCW PRAM algorithm that constructs the su#x tree of a binary string of length n in O(log n) time and O(n) work with high probability. In contrast, the previously known work-optimal algorithms, while deterministic, take# (log n) time.
Alphabet Independent And Dictionary Scaled Matching
, 1996
"... The rapidly growing need for analysis of digitized images in multimedia systems has lead to a variety of interesting problems in multidimensional pattern matching. One of the problems is that of scaled matching, finding all appearances of a pattern in a text in all sizes. Another important proble ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
The rapidly growing need for analysis of digitized images in multimedia systems has lead to a variety of interesting problems in multidimensional pattern matching. One of the problems is that of scaled matching, finding all appearances of a pattern in a text in all sizes. Another important problem is dictionary matching, quick search through a dictionary of preprocessed patterns in order to find all dictionary patterns that appear in the input text. In this paper we provide a simple algorithm for two dimensional scaled matching. Our algorithm is the first linear-time alphabet-independent scaled matching algorithm. Its running time is O(jT j), where jT j is the text size, and is independent of j\Sigmaj, the size of the alphabet. The main idea behind our algorithm is identifying and exploiting a scaling-invariant property of patterns. Our technique generalizes to produce the first known algorithm for scaled dictionary matching. We can find all appearances of all dictionary pa...
Perfect hashing for strings: Formalization and Algorithms
- IN PROC 7TH CPM
, 1996
"... Numbers and strings are two objects manipulated by most programs. Hashing has been well-studied for numbers and it has been effective in practice. In contrast, basic hashing issues for strings remain largely unexplored. In this paper, we identify and formulate the core hashing problem for strings th ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Numbers and strings are two objects manipulated by most programs. Hashing has been well-studied for numbers and it has been effective in practice. In contrast, basic hashing issues for strings remain largely unexplored. In this paper, we identify and formulate the core hashing problem for strings that we call substring hashing. Our main technical results are highly efficient sequential/parallel (CRCW PRAM) Las Vegas type algorithms that determine a perfect hash function for substring hashing. For example, given a binary string of length n, one of our algorithms finds a perfect hash function in O(log n) time, O(n) work, and O(n) space; the hash value for any substring can then be computed in O(log log n) time using a single processor. Our approach relies on a novel use of the suffix tree of a string. In implementing our approach, we design optimal parallel algorithms for the problem of determining weighted ancestors on a edge-weighted tree that may be of independent interest.
Optimal Parallel Construction of Minimal Suffix and Factor Automata
, 1995
"... This paper gives optimal parallel algorithms for the construction of the smallest deterministic finite automata recognizing all the suffixes and the factors of a string. The algorithms use recently discovered optimal parallel suffix tree construction algorithms together with data structures for t ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper gives optimal parallel algorithms for the construction of the smallest deterministic finite automata recognizing all the suffixes and the factors of a string. The algorithms use recently discovered optimal parallel suffix tree construction algorithms together with data structures for the efficient manipulation of trees, exploiting the well known relation between suffix and factor automata and suffix trees.
Linear-Time Construction of Two-Dimensional Suffix Trees (Extended Abstract)
- In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP), volume 1644 of LNCS
, 1999
"... Dong Kyue Kim Kunsoo Park Department of Computer Engineering Seoul National University, Seoul 151-742, Korea fdkkim,kparkg@theory.snu.ac.kr Abstract. The suffix tree of a string S is a compacted trie that represents all suffixes of S. Linear-time algorithms for constructing the suffix tree hav ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Dong Kyue Kim Kunsoo Park Department of Computer Engineering Seoul National University, Seoul 151-742, Korea fdkkim,kparkg@theory.snu.ac.kr Abstract. The suffix tree of a string S is a compacted trie that represents all suffixes of S. Linear-time algorithms for constructing the suffix tree have been known for quite a while. In two dimensions, however, linear-time construction of two-dimensional suffix trees has been an open problem. We present the first linear-time algorithm for constructing twodimensional suffix trees.
Random Suffix Search Trees
, 2003
"... A random suffix search tree is a binary search tree constructed for the suffixes X i = 0:B i B i+1 B i+2 : : : of a sequence B 1 ; B 2 ; B 3 :; : : : of independent identically distributed random b-ary digits B j . Let D n denote the depth of the node for X n in this tree when B 1 is uniform on Z b ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
A random suffix search tree is a binary search tree constructed for the suffixes X i = 0:B i B i+1 B i+2 : : : of a sequence B 1 ; B 2 ; B 3 :; : : : of independent identically distributed random b-ary digits B j . Let D n denote the depth of the node for X n in this tree when B 1 is uniform on Z b . We show that for any value of b > 1, E D n = 2 log n + O(log log n), just as for the random binary search tree. We also show that D n = E D n ! 1 in probability.
Dictionary Automaton in Optimal Space
, 1999
"... . In this paper we describe the data structure of a time and space efficient string dictionary automaton, providing insertion and deletion of strings and finite state machine based substring searching. If the input alphabet is bounded and the pattern are mutually substring free an optimal worst ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. In this paper we describe the data structure of a time and space efficient string dictionary automaton, providing insertion and deletion of strings and finite state machine based substring searching. If the input alphabet is bounded and the pattern are mutually substring free an optimal worst case bound of O(jmj) for both insertion and deletion of a pattern m is achieved. The underlying structure is a multi suffix tree. Let M be the set of strings stored in the dictionary resulting from an arbitrary sequence of Insert and Delete operations. One main result in this paper is that the space complexity of the augmented multi suffix tree is O(d) with d = P m2M jmj. This bound is optimal if we assume that each pattern in the dictionary uses at least linear space. Additionally, we present a new on-line substring search algorithm to find one substring of x in time O(jxj). The novelity of the approach is to realize a multi suffix tree based finite state automaton for substring ...

