Results 1  10
of
37
Verifying candidate matches in sparse and wildcard matching
 In Proceedings on 34th Annual ACM Symposium on Theory of Computing (STOC 2002
, 2002
"... This paper obtains the following results on pattern mat
hing problems in whi
h the text has length n and the pattern has length m. An O(n logm) time deterministi
algorithm for the String Mat
hing with Wild
ards problems, even when the alphabet is large. AnO(k log 2 m) time Las Vegas algorithm for ..."
Abstract

Cited by 52 (3 self)
 Add to MetaCart
This paper obtains the following results on pattern mat
hing problems in whi
h the text has length n and the pattern has length m. An O(n logm) time deterministi
algorithm for the String Mat
hing with Wild
ards problems, even when the alphabet is large. AnO(k log 2 m) time Las Vegas algorithm for the Sparse String Mat
hing with Wild
ards problem, where k << n is the number of nonzeros in the text. We also give Las Vegas algorithms for the higher dimensional version of this problem. As an appli
ation of the above, an O(n log 2 m) time Las Vegas algorithm for the Subset Mat
hing and Tree Pattern Mat
hing problems, and a Las Vegas algorithm for the Geometri
Pattern Mat
hing problem. Finally, an O(n log 2 m) time deterministi
algorithm for Subset Mat
hing and Tree Pattern Mat
hing. The ru
ial new idea underlying the rst three results above is that of
onrming mat
hes by
onvolving ve
tors obtained by oding
hara
ters in the alphabet with nonboolean (i.e., rational or even
omplex) entries; in
ontrast, almost all previous pattern mat
hing algorithms
onsider only boolean odes for the alphabet. The
ru
ial new idea underlying the fourth result is a simpler method of shifting
hara
ters whi h ensures that ea
h
hara
ter o
urs as a singleton in some shift.
Pattern Matching for Spatial Point Sets
 PROC. 39TH ANNU. IEEE SYMPOS. FOUND. COMPUT. SCI
, 1998
"... Two sets of points in ddimensional space are given: a data set D consisting of N points, and a pattern set or probe P consisting of k points. We address the problem of determining whether there is a transformation, among a specified group of transformations of the space, carrying P into or near (me ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
Two sets of points in ddimensional space are given: a data set D consisting of N points, and a pattern set or probe P consisting of k points. We address the problem of determining whether there is a transformation, among a specified group of transformations of the space, carrying P into or near (meaning at a small directed Hausdorff distance of) D. The groups we consider are translations and rigid motions. Runtimes of approximately O(n log n) and O(n d log n) respectively are obtained (letting n = maxfN; kg and omitting the effects of several secondary parameters). For translations, a runtime of approximately O(n(ak + 1) log² n) is obtained for the case that a constant fraction a ! 1 of the points of the probe is allowed to fail to match.
Efficient patternmatching with don’t cares
 In Proceedings of the thirteenth annual ACMSIAM symposium on Discrete algorithms
, 2002
"... Abstract We present a randomized algorithm for the string matching with don't cares problem. Based on the simple fingerprint method of Karp and Rabin for ordinary string matching [4], our algorithm runs in time O(n log m) for a text of length n and a pattern of length m and is simpler and sligh ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
(Show Context)
Abstract We present a randomized algorithm for the string matching with don't cares problem. Based on the simple fingerprint method of Karp and Rabin for ordinary string matching [4], our algorithm runs in time O(n log m) for a text of length n and a pattern of length m and is simpler and slightly faster than the previous algorithms [3, 5, 1]. 1 Introduction. We extend the simple randomized fingerprinting algorithm of Karp and Rabin [4] to the problem of string matching with don't cares. Our algorithm uses a single, simple convolution. This is optimal in the sense that the string matching with don't cares problem is at least as hard as the boolean convolution problem [6]. Thus, to improve our run time of O(n log m) on text of length n and pattern of length m, one would have to improve on the Fast Fourier Transform.
Efficient algorithms for substring near neighbor problem
 in Proc. 17th Annu. ACMSIAM Sympos. Discrete Algorithms
"... In this paper we consider the problem of finding the approximate nearest neighbor when the data set points are the substrings of a given text T. Specifically, for a string T of length n, we present a data structure which does the following: given a pattern P, if there is a substring of T within the ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
In this paper we consider the problem of finding the approximate nearest neighbor when the data set points are the substrings of a given text T. Specifically, for a string T of length n, we present a data structure which does the following: given a pattern P, if there is a substring of T within the distance R from P, it reports a (possibly different) substring of T within distance cR from P. The length of the pattern P, denoted by m, is not known in advance. For the case where the distances are measured using the Hamming distance, we present a data structure which uses Õ(n1+1/c) space1 and with Õ n1/c +mno(1) query time. This essentially matches the earlier bounds of [Ind98], which assumed that the pattern length m is fixed in advance. In addition, our data structure can be constructed in time Õ n1+1/c + n1+o(1)M1/3, whereM is an upper bound for m. This essentially matches the preprocessing bound of [Ind98] as long as the term Õ n1+1/c dominates the running time, which is the case when, e.g., c < 3. We also extend our results to the case where the distances are measured according to the l1 distance. The query time and the space bound are essentially the same, while the preprocessing time becomes Õ n1+1/c + n1+o(1)M2/3
kmismatch with don’t cares
 In ESA
, 2007
"... Abstract. We give the first nontrivial algorithms for the kmismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k,our algorithms find all the places that the pattern matches the text with at most k mismatches ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We give the first nontrivial algorithms for the kmismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k,our algorithms find all the places that the pattern matches the text with at most k mismatches.WefirstgiveanO(n(k +lognlog log n)logm)time randomised solution which finds the correct answer with high probability. We then present a new deterministic O(nk 2 log 3 m) time solution that uses tools developed for group testing and finally an approach based on kselectors that runs in O(nk polylog m) time but requires O(poly m) time preprocessing. In each case, the location of the mismatches at each alignment is also given at no extra cost. 1
Geometric Pattern Matching: A Performance Study
, 1999
"... In this paper, we undertake a performance study of some recent algorithms for geometric pattern matching. These algorithms cover two general paradigms for pattern matching; alignment and combinatorial pattern matching. We present analytical and empirical evaluations of these schemes. Our results ind ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we undertake a performance study of some recent algorithms for geometric pattern matching. These algorithms cover two general paradigms for pattern matching; alignment and combinatorial pattern matching. We present analytical and empirical evaluations of these schemes. Our results indicate that a proper implementation of an alignmentbased method outperforms other (often asymptotically better) approaches.
A New Approach to Pattern Matching in Degenerate DNA/RNA Sequences and Distributed Pattern Matching
"... Abstract. In this paper, we consider the pattern matching problem in DNA and RNA sequences where either the pattern or the text can be degenerate i.e. contain sets of characters. We present an asymptotically faster algorithm for the above problem that works in O(n log m) time, where n and m is the l ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we consider the pattern matching problem in DNA and RNA sequences where either the pattern or the text can be degenerate i.e. contain sets of characters. We present an asymptotically faster algorithm for the above problem that works in O(n log m) time, where n and m is the length of the text and the pattern respectively. We also suggest an efficient implementation of our algorithm, which works in linear time when the pattern size is small. Finally, we also describe how our approach can be used to solve the distributed pattern matching problem. Keywords: algorithm, degenerate, DNA/RNA sequence, pattern matching. 1
Faster pattern matching with character classes using prime number encoding
 Journal of Computer and System Sciences
"... In pattern matching with character classes the goal is to find all occurrences of a pattern of length m in a text of length n, where each pattern position consists of an allowed set of characters from a finite alphabet Σ. We present an FFTbased algorithm that uses a novel primenumbers encoding sch ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
In pattern matching with character classes the goal is to find all occurrences of a pattern of length m in a text of length n, where each pattern position consists of an allowed set of characters from a finite alphabet Σ. We present an FFTbased algorithm that uses a novel primenumbers encoding scheme, which is log n / log m times faster than the fastest extant approaches, which are based on boolean convolutions. In particular, if m Σ  = n O(1) , our algorithm runs in time O(n log m), matching the complexity of the fastest techniques for wildcard matching, a special case of our problem. A major advantage of our algorithm is that it allows a tradeoff between the running time and the RAM word size. Our algorithm also speeds up solutions to approximate matching with character classes problems — namely, matching with k mismatches and Hamming distance, as well as to the subset matching problem.
A Black Box for Online Approximate Pattern Matching
"... Abstract. We present a deterministic black box solution for online approximate matching. Given a pattern of length m and a streaming text of length n that arrives one character at a time, the task is to report the distance between the pattern and a sliding window of the text as soon as the new chara ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We present a deterministic black box solution for online approximate matching. Given a pattern of length m and a streaming text of length n that arrives one character at a time, the task is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. Our solution requires O(Σ log2 m j=1 T (n, 2j−1)/n) time for each input character, where T (n, m) is the total running time of the best offline algorithm. The types of approximation that are supported include exact matching with wildcards, matching under the Hamming norm, approximating the Hamming norm, kmismatch and numerical measures such as the L2 and L1 norms. For these examples, the resulting online algorithms take O(log 2 m), O ( √ m log m), O(log 2 m/ɛ 2), O ( √ k log k log m), O(log 2 m)andO ( √ m log m) time per character respectively. The space overhead is O(m) which we show is optimal. 1
Consequences of Faster Alignment of Sequences
"... Abstract. The Local Alignment problem is a classical problem with applications in biology. Given two input strings and a scoring function on pairs of letters, one is asked to find the substrings of the two input strings that are most similar under the scoring function. The best algorithms for Local ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract. The Local Alignment problem is a classical problem with applications in biology. Given two input strings and a scoring function on pairs of letters, one is asked to find the substrings of the two input strings that are most similar under the scoring function. The best algorithms for Local Alignment run in time that is roughly quadratic in the string length. It is a big open problem whether substantially subquadratic algorithms exist. In this paper we show that for all ε> 0, an O(n2−ε) time algorithm for Local Alignment on strings of length n would imply breakthroughs on three longstanding open problems: it would imply that for some δ> 0, 3SUM on n numbers is in O(n2−δ) time, CNFSAT on n variables is in O((2 − δ)n) time, and Max Weight 4Clique is in O(n4−δ) time. Our result for CNFSAT also applies to the easier problem of finding the longest common substring of binary strings with don’t cares. We also give strong conditional lower bounds for the more general Multiple Local Alignment problem on k strings, under both kwise and SP scoring, and for other string similarity problems such as Global Alignment with gap penalties and normalized Longest Common Subsequence. 1