Results 1 -
8 of
8
Fast Text Searching for Regular Expressions or Automaton Searching on Tries
"... We present algorithms for efficient searching of regular expressions on preprocessed text, using a Patricia tree as a logical model for the index. We obtain searching algorithms that run in logarithmic expected time in the size of the text for a wide subclass of regular expressions, and in subline ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
We present algorithms for efficient searching of regular expressions on preprocessed text, using a Patricia tree as a logical model for the index. We obtain searching algorithms that run in logarithmic expected time in the size of the text for a wide subclass of regular expressions, and in sublinear expected time for any regular expression. This is the first such algorithm to be found with this complexity.
An Alphabet Independent Approach to Two Dimensional Matching
, 1994
"... There are many solutions to the string matching problem which are strictly linear in the input size and independent of alphabet size. Furthermore, the model of computation for these algorithms is very weak: they allow only simple arithmetic and comparisons of equality between characters of the in ..."
Abstract
-
Cited by 25 (8 self)
- Add to MetaCart
There are many solutions to the string matching problem which are strictly linear in the input size and independent of alphabet size. Furthermore, the model of computation for these algorithms is very weak: they allow only simple arithmetic and comparisons of equality between characters of the input. In contrast, algorithm for two dimensional matching have needed stronger models of computation, most notably assuming a totally ordered alphabet. The fastest algorithms for two dimensional matching have therefore had a logarithmic dependence on the alphabet size. In the worst case, this gives an algorithm that runs in O(n log m) with O(m log m) preprocessing.
Pattern Matching with Swaps
, 1997
"... Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
Let a text string T of n symbols and a pattern string P of m symbols from alphabet \Sigma be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` / t `+1 and t 0 `+1 / t ` ) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of finding all locations i for which there exists a swapped version T 0 of T where there is an exact matching of P in location i of T 0 . It has been an open problem whether swapped matching can be done in less than O(mn) time. In this paper we show the first algorithm that solves the pattern matching with swaps problem in time o(mn). We present an algorithm whose time complexity is O(nm 1=3 log m log 2 min(m; j\Sigmaj)) for a general alphabet \Sigma. Key Words: Design and analysis of algorithms, combinatorial algorithms on words, pattern matching, pattern matching with swaps, non-standard pattern matching. Department of Mathematics...
On Boyer-Moore Automata
, 1994
"... The notion of Boyer-Moore automaton was introduced by Knuth, Morris and Pratt in their historical paper on fast pattern matching. It leads to an algorithm that requires more preprocessing but is more efficient than the original Boyer-Moore's algorithm. We formalize the notion of Boyer-Moore automato ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
The notion of Boyer-Moore automaton was introduced by Knuth, Morris and Pratt in their historical paper on fast pattern matching. It leads to an algorithm that requires more preprocessing but is more efficient than the original Boyer-Moore's algorithm. We formalize the notion of Boyer-Moore automaton, and we give an efficient building algorithm. Also, bounds on the number of states are presented, and the concept of potential of a transition is introduced to improve the worst and average case behavior of these machines. We show that looking at the rightmost unknown character, as suggested by Knuth et al., is not necessarily optimal. Keywords: string searching, pattern matching, finite automaton, average case analysis. 1 Introduction String searching is a very important component of many problems, including text editing, data retrieval and letter manipulation. Formally, the string searching or string matching problem consists in finding all occurrences (or the first occurrence) of a p...
Multidimensional Pattern Matching: A Survey
, 1992
"... We review some recent algorithms motivated by computer vision. The problem inspiring this research is that of searching an aerial photograph for all appearances of some object. The issues we discuss are local errors, scaling, compression and dictionary matching. We review deterministic serial te ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We review some recent algorithms motivated by computer vision. The problem inspiring this research is that of searching an aerial photograph for all appearances of some object. The issues we discuss are local errors, scaling, compression and dictionary matching. We review deterministic serial techniques that are used for multidimensional pattern matching and discuss their strengths and weaknesses. College of Computing Georgia Institute of Technology Atlanta, Georgia 30332--0280 Paritally supported by NSF grant IRI-9013055. 1 Motivation String Matching is one of the most widely studied problems in computer science [Gal85]. Part of its appeal is in its direct applicability to "real world" problems. The Knuth-Morris-Pratt [KMP77] algorithm is directly implemented in the emacs "s" and UNIX "grep" commands. The longest common subsequence dynamic programming algorithm [CKK72] is implemented in the UNIX "diff" command. The largest overlap heuristic for finding the shortest common s...
About the Size of Boyer-Moore Automata
"... . We study the size of Boyer-Moore automata introduced in Knuth, Morris & Pratt's famous paper on pattern matching. We experimentally exhibit a finite class of binary patterns, which produce large Boyer-Moore automata. The best approximation curve for their sizes is a polynomial O(m 7 ), or even a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. We study the size of Boyer-Moore automata introduced in Knuth, Morris & Pratt's famous paper on pattern matching. We experimentally exhibit a finite class of binary patterns, which produce large Boyer-Moore automata. The best approximation curve for their sizes is a polynomial O(m 7 ), or even an exponential O(2 0:4m ), in the length m of the patterns. All the previously known maximal sizes were at most cubic in m. Our results suggest to study two particular infinite classes of patterns, for which we conjecture that the generated automata have size\Omega\Gamma m 5 ). 1 Introduction The string matching problem appears in many applications in computer science, like word processing, text editing, data retrieval, symbol manipulation, etc. This problem has been extensively studied in theoretical computer science [2, 11], since the two historical papers by Knuth, Morris & Pratt [18], and by Boyer & Moore [6], both published in 1977. The string matching problem consists in finding ...
Efficient Parallel And Serial Approximate String Matching
, 1986
"... Consider the string matching problem, where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern or a superfluous character in the text or a superfluous charact ..."
Abstract
- Add to MetaCart
Consider the string matching problem, where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern or a superfluous character in the text or a superfluous character in the pattern. Given a text of length n, a pattern of length m and an integer k, we present parallel and serial algorithms for finding all occurrences of the pattern in the text with at most k differences. The first part of the parallel algorithm consists of analysis of the pattern and takes O(log m) time using m 2 processors. The rest of the algorithm consists of handling the text. The text handling part applies the following new approach. This part starts by obtaining a concise characterization of the text which is based solely on substrings of the pattern in O(log m) time using n / log m processors. Then the desired output is derived from this characterization together with the tabl...
Suffix Trees for Integer Alphabets Revisited
, 1999
"... Farach recently gave a linear-time algorithm for constructing suffix trees for integer alphabets, which solves a major open problem on index data structures. We present a new and somewhat cleaner algorithm for constructing suffix trees for integer alphabets in linear time. ..."
Abstract
- Add to MetaCart
Farach recently gave a linear-time algorithm for constructing suffix trees for integer alphabets, which solves a major open problem on index data structures. We present a new and somewhat cleaner algorithm for constructing suffix trees for integer alphabets in linear time.

