Results 1  10
of
19
A New Approach to Text Searching
"... We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of ..."
Abstract

Cited by 229 (15 self)
 Add to MetaCart
We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of these algorithms are that they don't need to buffer the input, they are real time algorithms (for constant size patterns), and they are suitable to be implemented in hardware. 1 Introduction String searching is a very important component of many problems, including text editing, bibliographic retrieval, and symbol manipulation. Recent surveys of string searching can be found in [17, 4]. The string matching problem consists of finding all occurrences of a pattern of length m in a text of length n. We generalize the problem allowing "don't care" symbols, the complement of a symbol, and any finite class of symbols. We solve this problem for one or more patterns, with or without mismatches. Fo...
Fast and Practical Approximate String Matching
 In Combinatorial Pattern Matching, Third Annual Symposium
, 1992
"... We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searchin ..."
Abstract

Cited by 54 (0 self)
 Add to MetaCart
We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searching. Second, we present an algorithm for string matching with errors based on partitioning the pattern that requires linear expected time for typical inputs. 1 Introduction Approximate string matching is one of the main problems in combinatorial pattern matching. Recently, several new approaches emphasizing the expected search time and practicality have appeared [3, 4, 27, 32, 31, 17], in contrast to older results, most of them are only of theoretical interest. Here, we continue this trend, by presenting two new simple and efficient algorithms for approximate string matching. First, we present an algorithm for string matching with k mismatches. This problem consists of finding all instances o...
Dictionary matching and indexing with errors and don’t cares
 In STOC ’04
, 2004
"... This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which ..."
Abstract

Cited by 50 (1 self)
 Add to MetaCart
This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of “don’t care ” characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an alltoall matching problem, for which we provide a single solution. The performance bounds all have a similar character. For example, for the indexing problem with n = t  and m = p, the query time for k substitutions is O(m + (c1 log n) k k! # matches), with a data structure of size O(n (c2 log n) k k! and a preprocessing time of O(n (c2 log n) k), where c1, c2> k! 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.
Faster Algorithms for String Matching Problems: Matching the Convolution Bound
 In Proceedings of the 39th Symposium on Foundations of Computer Science
, 1998
"... In this paper we give a randomized O(n log n)time algorithm for the string matching with don't cares problem. This improves the FischerPaterson bound [10] from 1974 and answers the open problem posed (among others) by Weiner [30] and Galil [11]. Using the same technique, we give an O(n log n)t ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
In this paper we give a randomized O(n log n)time algorithm for the string matching with don't cares problem. This improves the FischerPaterson bound [10] from 1974 and answers the open problem posed (among others) by Weiner [30] and Galil [11]. Using the same technique, we give an O(n log n)time algorithm for other problems, including subset matching and tree pattern matching [15, 21, 9, 7, 17] and (general) approximate threshold matching [28, 17]. As this bound essentially matches the complexity of computing of the Fast Fourier Transform which is the only known technique for solving problems of this type, it is likely that the algorithms are in fact optimal. Additionally, the technique used for the threshold matching problem can be applied to the online version of this problem, in which we are allowed to preprocess the text and require to process the pattern in time sublinear in the text length. This result involves an interesting variant of the KarpRabin fingerprint m...
Pattern Matching with Swaps
, 1997
"... Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of finding all locations i for which there exists a swapped version T 0 of T where there is an exact matching of P in location i of T 0 . It has been an open problem whether swapped matching can be done in less than O(mn) time. In this paper we show the first algorithm that solves the pattern matching with swaps problem in time o(mn). We present an algorithm whose time complexity is O(nm 1=3 log m log 2 min(m; j\Sigmaj)) for a general alphabet \Sigma. Key Words: Design and analysis of algorithms, combinatorial algorithms on words, pattern matching, pattern matching with swaps, nonstandard pattern matching. Department of Mathematics...
Fast String Matching with Mismatches
 Information and Computation
, 1994
"... We describe and analyze three simple and fast algorithms on the average for solving the problem of string matching with a bounded number of mismatches. These are the naive algorithm, an algorithm based on the BoyerMoore approach, and adhoc deterministic finite automata searching. We include simula ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
We describe and analyze three simple and fast algorithms on the average for solving the problem of string matching with a bounded number of mismatches. These are the naive algorithm, an algorithm based on the BoyerMoore approach, and adhoc deterministic finite automata searching. We include simulation results that compare these algorithms to previous works. 1 Introduction The problem of string matching with k mismatches consists of finding all occurrences of a pattern of length m in a text of length n such that in at most k positions the text and the pattern have different symbols. In the following, we assume that 0 ! k ! m and m n. The case of k = 0 is the well known exact string matching problem, and if k = m the solution is trivial. Landau and Vishkin [LV86] gave the first efficient algorithm to solve this particular problem. Their algorithm uses O(kn + km log m)) time and O(k(n + m)) space. While it is fast, the space required is unacceptable for most practical purposes. Galil ...
Pattern matching with address errors: rearrangement distances
 In SODA
, 2006
"... Historically, approximate pattern matching has mainly focused at coping with errors in the data, while the order of the text/pattern was assumed to be more or less correct. In this paper we consider a class of pattern matching problems where the content is assumed to be correct, while the locations ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Historically, approximate pattern matching has mainly focused at coping with errors in the data, while the order of the text/pattern was assumed to be more or less correct. In this paper we consider a class of pattern matching problems where the content is assumed to be correct, while the locations may have shifted/changed. We formally define a broad class of problems of this type, capturing situations in which the pattern is obtained from the text by a sequence of rearrangements. We consider several natural rearrangement schemes, including the analogues of the ℓ1 and ℓ2 distances, as well as two distances based on interchanges. For these, we present efficient algorithms to solve the resulting string matching problems. 1
Approximate parameterized matching
 In Proc. 12th European Symposium on Algorithms (ESA
, 2004
"... Abstract Two equal length strings s and s0, over alphabets \Sigma s and \Sigma s0, parameterize match if thereexists a bijection ss: \Sigma s! \Sigma s0, such that ss(s) = s0, where ss(s) is the renaming of each characterof s via ss. Parameterized matching is the problem of finding all parameterize ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Abstract Two equal length strings s and s0, over alphabets \Sigma s and \Sigma s0, parameterize match if thereexists a bijection ss: \Sigma s! \Sigma s0, such that ss(s) = s0, where ss(s) is the renaming of each characterof s via ss. Parameterized matching is the problem of finding all parameterized matches of apattern string p in a text t and approximate parameterized matching is the problem of finding,at each location, a bijection ss that maximizes the number of characters that are mapped from p to the appropriate plength substring of t.Parameterized matching was introduced as a model for software duplication detection in software maintenance systems and also has applications in image processing and computationalbiology. For example, approximate parameterized matching models image searching with variable color maps in the presence of errors.We consider the problem for which an error threshold, k, is given and the goal is to find alllocations in t for which there exists a bijection ss which maps p into the appropriate plengthsubstring of t with at most k mismatched mappedelements.We show that (1) the approximate parameterized matching, when  p=t, is equivalent tothe maximum matching problem on graphs, implying that (2) maximum matching is reducible to the approximate parameterized matching with threshold k, up till an O(log t) factor (thiscan be achieved by reducing approximate parameterized matching to the problem by using a binary search on the k's). Given the best known maximum matching algorithms an O(m1.5),where m = p  = t, is implied for approximate parameterized matching. We show that (3) forthe k threshold problem we can do this in O(m + k1.5).Our main result (4) is an O(nk1.5 + mk log m) time algorithm where m = p  and n = t. 1 Introduction In the traditional pattern matching model [11, 19], one seeks exact occurrences of a given pattern pin a text t, i.e. text locations where every text symbol is equal to its corresponding pattern symbol.For two equal length strings
BoyerMoore strategy to efficient approximate string matching
, 2007
"... . We propose a simple but efficient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the ShiftAdd algorithm of BaezaYates and Gonnet [6], which involves representing by a bit number the ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
. We propose a simple but efficient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the ShiftAdd algorithm of BaezaYates and Gonnet [6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size !, that is, m(dlog 2 (k + 1)e + 1) !. This algorithm consists in a preprocessing step and a searching step. It is linear and performs 3n operations during the searching step. Notions of shift and character skip found in the BoyerMoore (BM) [9] approach, are introduced in this algorithm. Provided that the considered alphabet is large enough (compared to the Pattern length), the average number of operations performed by our algorithm during the searching step becomes n(2 + k+4 m\Gammak ). 1 Introduction Our purpose is approximate m...
Finding patterns with variable length gaps or don’t cares
 of Lecture Notes in Computer Science
, 2006
"... Abstract. In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi − ai))) time where n is the length of the ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Abstract. In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi − ai))) time where n is the length of the text, m is the summation of the lengths of the component subpatterns, α is the total number of occurrences of the component subpatterns in the text and ai and bi are, respectively, the minimum and maximum number of don’t cares allowed between the ith and (i+1)st component of the pattern. We also present another algorithm which, given a suffix array of the text, can report whether P occurs in T in O(m + α log log n) time. Both the algorithms record information to report all the occurrences of P in T. Furthermore, the techniques used in our algorithms are shown to be useful in many other contexts. 1