Results 1  10
of
145
A New Approach to Text Searching
"... We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the feature ..."
Abstract

Cited by 293 (15 self)
 Add to MetaCart
We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of these algorithms are that they don't need to buffer the input, they are real time algorithms (for constant size patterns), and they are suitable to be implemented in hardware. 1 Introduction String searching is a very important component of many problems, including text editing, bibliographic retrieval, and symbol manipulation. Recent surveys of string searching can be found in [17, 4]. The string matching problem consists of finding all occurrences of a pattern of length m in a text of length n. We generalize the problem allowing "don't care" symbols, the complement of a symbol, and any finite class of symbols. We solve this problem for one or more patterns, with or without mismatches. Fo...
Dictionary matching and indexing with errors and don’t cares,”
 in Proceedings of the thirtysixth annual ACM symposium on Theory of computing. ACM,
, 2004
"... ..."
(Show Context)
Fast and Flexible String Matching by Combining Bitparallelism and Suffix Automata
 ACM JOURNAL OF EXPERIMENTAL ALGORITHMICS (JEA
, 1998
"... ... In this paper we merge bitparallelism and suffix automata, so that a nondeterministic suffix automaton is simulated using bitparallelism. The resulting algorithm, called BNDM, obtains the best from both worlds. It is much simpler to implement than BDM and nearly as simple as ShiftOr. It inher ..."
Abstract

Cited by 74 (10 self)
 Add to MetaCart
... In this paper we merge bitparallelism and suffix automata, so that a nondeterministic suffix automaton is simulated using bitparallelism. The resulting algorithm, called BNDM, obtains the best from both worlds. It is much simpler to implement than BDM and nearly as simple as ShiftOr. It inherits from ShiftOr the ability to handle flexible patterns and from BDM the ability to skip characters. BNDM is 30%40% faster than BDM and up to 7 times faster than ShiftOr. When compared to the fastest existing algorithms on exact patterns (which belong to the BM family), BNDM is from 20% slower to 3 times faster, depending on the alphabet size. With respect to flexible pattern searching, BNDM is by far the fastest technique to deal with classes of characters and is competitive to search allowing errors. In particular, BNDM seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that
Fast and Practical Approximate String Matching
 In Combinatorial Pattern Matching, Third Annual Symposium
, 1992
"... We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searchin ..."
Abstract

Cited by 68 (0 self)
 Add to MetaCart
(Show Context)
We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searching. Second, we present an algorithm for string matching with errors based on partitioning the pattern that requires linear expected time for typical inputs. 1 Introduction Approximate string matching is one of the main problems in combinatorial pattern matching. Recently, several new approaches emphasizing the expected search time and practicality have appeared [3, 4, 27, 32, 31, 17], in contrast to older results, most of them are only of theoretical interest. Here, we continue this trend, by presenting two new simple and efficient algorithms for approximate string matching. First, we present an algorithm for string matching with k mismatches. This problem consists of finding all instances o...
Faster Algorithms for String Matching with k Mismatches
"... The string matching with mismatches problem is that of finding the number of mismatches between pattern P of length m and every length m substring of the text T. Currently, the best algorithms for this problem are the following. The LandauVishkin algorithm finds all locations where the pattern ha ..."
Abstract

Cited by 67 (14 self)
 Add to MetaCart
The string matching with mismatches problem is that of finding the number of mismatches between pattern P of length m and every length m substring of the text T. Currently, the best algorithms for this problem are the following. The LandauVishkin algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time O(npm log m). We present an algorithm that is faster than both. Our algorithm finds all locations where the pattern has at most k errors in time O(npk log k). We also show an algorithm that solves the above problem in time O((n + nk 3 m) log k).
Verifying candidate matches in sparse and wildcard matching
 In Proceedings on 34th Annual ACM Symposium on Theory of Computing (STOC 2002
, 2002
"... This paper obtains the following results on pattern mat
hing problems in whi
h the text has length n and the pattern has length m. An O(n logm) time deterministi
algorithm for the String Mat
hing with Wild
ards problems, even when the alphabet is large. AnO(k log 2 m) time Las Vegas algorithm for ..."
Abstract

Cited by 52 (3 self)
 Add to MetaCart
This paper obtains the following results on pattern mat
hing problems in whi
h the text has length n and the pattern has length m. An O(n logm) time deterministi
algorithm for the String Mat
hing with Wild
ards problems, even when the alphabet is large. AnO(k log 2 m) time Las Vegas algorithm for the Sparse String Mat
hing with Wild
ards problem, where k << n is the number of nonzeros in the text. We also give Las Vegas algorithms for the higher dimensional version of this problem. As an appli
ation of the above, an O(n log 2 m) time Las Vegas algorithm for the Subset Mat
hing and Tree Pattern Mat
hing problems, and a Las Vegas algorithm for the Geometri
Pattern Mat
hing problem. Finally, an O(n log 2 m) time deterministi
algorithm for Subset Mat
hing and Tree Pattern Mat
hing. The ru
ial new idea underlying the rst three results above is that of
onrming mat
hes by
onvolving ve
tors obtained by oding
hara
ters in the alphabet with nonboolean (i.e., rational or even
omplex) entries; in
ontrast, almost all previous pattern mat
hing algorithms
onsider only boolean odes for the alphabet. The
ru
ial new idea underlying the fourth result is a simpler method of shifting
hara
ters whi h ensures that ea
h
hara
ter o
urs as a singleton in some shift.
Optimal and nearly optimal algorithms for approximating polynomial zeros
 Comput. Math. Appl
, 1996
"... AbstractWe substantially improve the known algorithms for approximating all the complex zeros of an n th degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of SchSnhage [1], Pan [2], and Neff and Reif [3]. In parallel (N ..."
Abstract

Cited by 48 (15 self)
 Add to MetaCart
(Show Context)
AbstractWe substantially improve the known algorithms for approximating all the complex zeros of an n th degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of SchSnhage [1], Pan [2], and Neff and Reif [3]. In parallel (NC) implementation, we dramatically decrease the number of processors, versus the parallel algorithm of Neff [4], which was the only NC algorithm known for this problem so far. Specifically, under the simple normalization assumption that the variable x has been scaled so as to confine the zeros of p(x) to the unit disc {x: Ix [ < 1}, our algorithms (which promise to be practically effective) approximate all the zeros of p(x) within the absolute error bound 2b, by using order of n arithmetic operations and order of (b + n)n 2 Boolean (bitwise) operations (in both cases up to within polylogarithmic factors). The algorithms allow their optimal (work preserving) NC parallelization, so that they can be implemented by using polylogarithmic time and the orders of n arithmetic processors or (b + n)n 2 Boolean processors. All the cited bounds on the computational complexity are within polylogarithmic factors from the optimum (in terms of n and b) under both arithmetic and Boolean models of computation (in the Boolean case, under the additional (realistic) assumption that n = O(b)).
Computing similarity between rna strings
, 1996
"... Ribonucleic acid (RNA) strings are strings over the fourletter alphabet {A, C, G, U} with a secondary structure of basepairing between A U and C G pairs in the string 1. Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed t ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
Ribonucleic acid (RNA) strings are strings over the fourletter alphabet {A, C, G, U} with a secondary structure of basepairing between A U and C G pairs in the string 1. Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed to be noncrossing. The noncrossing basepairing naturally leads to a treelike representation of the secondary structure of RNA strings. In this paper, we address several notions of similarity between two RNA strings that take into account both the primary sequence and secondary basepalring structure of the strings. We present efficient algorithms for exact matching and approximate matching between two RNA strings. We define a notion of alignment between two RNA strings and devise algorithms based on dynamic programming. We then present a method for optimally aligning a given RNA string with unknown secondary structure to one with known sequence and structure, thus attacking the structure prediction problem in the case when the structure of a closely related sequence is known. The techniques employed to prove our results include reductions to wellknown string matching problems allowing wild cards and ranges, and speeding up dynamic programming by using the tree structures implicit in the secondary structure of RNA strings.
On the use of Regular Expressions for Searching Text
 ACM Transactions on Programming Languages and Systems
, 1995
"... The use of regular expressions to search text is well known and understood as a useful technique. It is then surprising that the standard techniques and tools prove to be of limited use for searching text formatted with SGML or other similar markup languages. Experience with structured text search h ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
(Show Context)
The use of regular expressions to search text is well known and understood as a useful technique. It is then surprising that the standard techniques and tools prove to be of limited use for searching text formatted with SGML or other similar markup languages. Experience with structured text search has caused us to carefully reexamine the current practice. The generally accepted rule of "leftmost longest match" is an unfortunate choice and is at the root of the difficulties. We instead propose a rule which is semantically cleaner and is incidentally more simple and efficient to implement. This rule is generally applicable to any text search application. 1 Introduction Regular expressions are widely regarded as a precise, succinct notation for specifying a text search, with a straightforward efficient implementation. Many people routinely use regular expressions to specify searches in text editors and with standalone search tools such as the Unix grep utility. A regular expression ...
Faster Algorithms for String Matching Problems: Matching the Convolution Bound
 In Proceedings of the 39th Symposium on Foundations of Computer Science
, 1998
"... In this paper we give a randomized O(n log n)time algorithm for the string matching with don't cares problem. This improves the FischerPaterson bound [10] from 1974 and answers the open problem posed (among others) by Weiner [30] and Galil [11]. Using the same technique, we give an O(n log ..."
Abstract

Cited by 37 (3 self)
 Add to MetaCart
(Show Context)
In this paper we give a randomized O(n log n)time algorithm for the string matching with don't cares problem. This improves the FischerPaterson bound [10] from 1974 and answers the open problem posed (among others) by Weiner [30] and Galil [11]. Using the same technique, we give an O(n log n)time algorithm for other problems, including subset matching and tree pattern matching [15, 21, 9, 7, 17] and (general) approximate threshold matching [28, 17]. As this bound essentially matches the complexity of computing of the Fast Fourier Transform which is the only known technique for solving problems of this type, it is likely that the algorithms are in fact optimal. Additionally, the technique used for the threshold matching problem can be applied to the online version of this problem, in which we are allowed to preprocess the text and require to process the pattern in time sublinear in the text length. This result involves an interesting variant of the KarpRabin fingerprint m...