Results 1  10
of
57
The String Edit Distance Matching Problems with Moves
, 2006
"... The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smalles ..."
Abstract

Cited by 60 (3 self)
 Add to MetaCart
The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t. We relax the problem so that (a) we allow an additional operation, namely, substring moves, and (b) we allow approximation of this string edit distance. Our result is a near linear time deterministic algorithm to produce a factor of O(log n log ∗ n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique we call Edit
Dictionary matching and indexing with errors and don't cares
 In STOC '04
, 2004
"... ..."
(Show Context)
Overlap Matching
 Information and Computation
, 2001
"... We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between the text and pattern areas is satisfied. In particular we define the structural matching problem of Overlap (Parity) Matching. We seek the text locations where all overlaps of the given pattern and text intervals have even length. We show that this problem can be solved in time O(n log m), where the text length is n and the pattern length is m. As an application of overlap matching, we show how to reduce the String Matching with Swaps problem to the overlap matching problem. The String Matching with Swaps problem is the problem of string matching in the presence of local swaps. The best known deterministic upper bound for this problem was O(nm 1/3 log m log #) for a general alphabet #, wher...
Approximate Subset Matching with Don't Cares
"... The Subset Matching problem was recently introduced by Cole and Hariharan. The input of the problem is a text array of n sets totaling s elements and a pattern array of m sets totaling s0 elements. There is a match of the pattern in a text location if every pattern set is a subset of the correspondi ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
The Subset Matching problem was recently introduced by Cole and Hariharan. The input of the problem is a text array of n sets totaling s elements and a pattern array of m sets totaling s0 elements. There is a match of the pattern in a text location if every pattern set is a subset of the corresponding text set. Subset matching has proven to be a powerful technique and enabled finding an efficient solution to the Tree Matching problem. The subset matching model may prove useful in solving other hard problems, e.g. Swap Matching. In this paper we investigate the complexity of approximate subset matching with "don't care"s. We provide two algorithms for the problem. A randomized algorithm whose complexity is O((s + n + n m s 0)pm log2 m) and a deterministic algorithm whose complexity is O((s + n)ps0 log m).
kmismatch with don’t cares
 In ESA
, 2007
"... Abstract. We give the first nontrivial algorithms for the kmismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k,our algorithms find all the places that the pattern matches the text with at most k mismatches ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
Abstract. We give the first nontrivial algorithms for the kmismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k,our algorithms find all the places that the pattern matches the text with at most k mismatches.WefirstgiveanO(n(k +lognlog log n)logm)time randomised solution which finds the correct answer with high probability. We then present a new deterministic O(nk 2 log 3 m) time solution that uses tools developed for group testing and finally an approach based on kselectors that runs in O(nk polylog m) time but requires O(poly m) time preprocessing. In each case, the location of the mismatches at each alignment is also given at no extra cost. 1
Random Access to GrammarCompressed Strings
, 2011
"... Let S be a string of length N compressed into a contextfree grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, αk(n) is ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Let S be a string of length N compressed into a contextfree grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, αk(n) is the inverse of the k th row of Ackermann’s function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammarcompressed strings without decompression. For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{P k, k 4 + P } + log N) + occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammarcompressed trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two ”biased” weighted ancestor data structures, and a compact representation of heavypaths in grammars.
Pattern matching with address errors: rearrangement distances
 In SODA
, 2006
"... Historically, approximate pattern matching has mainly focused at coping with errors in the data, while the order of the text/pattern was assumed to be more or less correct. In this paper we consider a class of pattern matching problems where the content is assumed to be correct, while the locations ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Historically, approximate pattern matching has mainly focused at coping with errors in the data, while the order of the text/pattern was assumed to be more or less correct. In this paper we consider a class of pattern matching problems where the content is assumed to be correct, while the locations may have shifted/changed. We formally define a broad class of problems of this type, capturing situations in which the pattern is obtained from the text by a sequence of rearrangements. We consider several natural rearrangement schemes, including the analogues of the ℓ1 and ℓ2 distances, as well as two distances based on interchanges. For these, we present efficient algorithms to solve the resulting string matching problems. 1
Finding patterns with variable length gaps or don’t cares
 of Lecture Notes in Computer Science
, 2006
"... Abstract. In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi − ai))) time where n is the length o ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi − ai))) time where n is the length of the text, m is the summation of the lengths of the component subpatterns, α is the total number of occurrences of the component subpatterns in the text and ai and bi are, respectively, the minimum and maximum number of don’t cares allowed between the ith and (i+1)st component of the pattern. We also present another algorithm which, given a suffix array of the text, can report whether P occurs in T in O(m + α log log n) time. Both the algorithms record information to report all the occurrences of P in T. Furthermore, the techniques used in our algorithms are shown to be useful in many other contexts. 1
On Exact and Approximation Algorithms for Distinguishing Substring Selection
 In Proc. 14th FCT, volume 2751 of LNCS
, 2003
"... The NPcomplete Distinguishing Substring Selection problem (DSSS for short) asks, given a set of "good" strings and a set of "bad" strings, for a solution string which is, with respect to Hamming metric, "away" from the good strings and "close" to the bad ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
The NPcomplete Distinguishing Substring Selection problem (DSSS for short) asks, given a set of "good" strings and a set of "bad" strings, for a solution string which is, with respect to Hamming metric, "away" from the good strings and "close" to the bad strings.