Results 1 
8 of
8
NRgrep: A Fast and Flexible Pattern Matching Tool
 Software Practice and Experience (SPE
, 2000
"... We present nrgrep ("nondeterministic reverse grep"), a new pattern matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bitparallel simulation of a nondeterminis ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
We present nrgrep ("nondeterministic reverse grep"), a new pattern matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bitparallel simulation of a nondeterministic suffix automaton. As a result, nrgrep can find from simple patterns to regular expressions, exactly or allowing errors in the matches, with an efficiency that degrades smoothly as the complexity of the searched pattern increases. Another concept fully integrated into nrgrep and that contributes to this smoothness is the selection of adequate subpatterns for fast scanning, which is also absent in many current tools. We show that the efficiency of nrgrep is similar to that of the fastest existing string matching tools for the simplest patterns, and by far unpaired for more complex patterns.
Fast and compact regular expression matching, 2005. Submitted to a journal. Preprint availiable at arxiv.org/cs/0509069
"... We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmicsized words to be manipulated in constant time. We show how to imp ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmicsized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way.
Regular Expression Searching on Compressed Text
 Journal of Discrete Algorithms
, 2003
"... We present a solution to the problem of regular expression searching on compressed text. ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We present a solution to the problem of regular expression searching on compressed text.
Flexible pattern matching
 Journal of Applied Statistics
, 2002
"... An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and we want to determine how often and where it occurs in a sequence. In this paper we review the most practical techniques to find patterns of different kinds. We show how regular ex ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and we want to determine how often and where it occurs in a sequence. In this paper we review the most practical techniques to find patterns of different kinds. We show how regular expressions can be searched for with general techniques, and how simpler patterns can be dealt with more simply and efficiently. We consider exact as well as approximate pattern matching. Also, we cover both sequential searching, where the sequence cannot be preprocessed, and indexed searching, where we have a data structure built over the sequence to speed up the search. 1
New techniques for regular expression searching
 Algorithmica
, 2005
"... We present two new techniques for regular expression searching and use them to derive faster practical algorithms. Based on the specific properties of Glushkov’s nondeterministic finite automaton construction algorithm, we show how to encode a deterministic finite automaton (DFA) using O(m2 m) bits, ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We present two new techniques for regular expression searching and use them to derive faster practical algorithms. Based on the specific properties of Glushkov’s nondeterministic finite automaton construction algorithm, we show how to encode a deterministic finite automaton (DFA) using O(m2 m) bits, where m is the number of characters, excluding operator symbols, in the regular expression. This compares favorably against the worst case of O(m2 m Σ) bits needed by a classical DFA representation (where Σ is the alphabet) and O(m2 2m) bits needed by the Wu and Manber approach implemented in Agrep. We also present a new way to search for regular expressions, which is able to skip text characters. The idea is to determine the minimum length ℓ of a string matching the regular expression, manipulate the original automaton so that it recognizes all the reverse prefixes of length up to ℓ of the strings originally accepted, and use it to skip text characters as done for exact string matching in previous work. We combine these techniques into two algorithms, one able and one unable to skip text characters. The algorithms are simple to implement, and our experiments show that they permit fast searching for regular expressions, normally faster than any existing algorithm. 1
On NFA reductions
 THEORY IS FOREVER
, 2004
"... We give faster algorithms for two methods of reducing the number of states in nondeterministic finite automata. The first uses equivalences and the second uses preorders. We develop restricted reduction algorithms that operate on position automata while preserving some of its properties. We show emp ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We give faster algorithms for two methods of reducing the number of states in nondeterministic finite automata. The first uses equivalences and the second uses preorders. We develop restricted reduction algorithms that operate on position automata while preserving some of its properties. We show empirically that these reductions are effective in largely reducing the memory requirements of regular expression search algorithms, and compare the effectiveness of different reductions.
Approximate Regular Expression Searching with Arbitrary Integer Weights
 NORDIC JOURNAL OF COMPUTING
, 2003
"... We present a bitparallel technique to search a text of length n for a regular expression of m symbols permitting k differences in worst case time O(mn / log k s), where s is the amount of main memory that can be allocated. The algorithm permits arbitrary integer weights and matches the complexity ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We present a bitparallel technique to search a text of length n for a regular expression of m symbols permitting k differences in worst case time O(mn / log k s), where s is the amount of main memory that can be allocated. The algorithm permits arbitrary integer weights and matches the complexity of the best previous techniques, but it is simpler and faster in practice. In our way, we define a new recurrence for approximate searching where the current values depend only on previous values. Interestingly, our algorithm turns out to be a relevant option also for simple approximate string matching with arbitrary integer weights.
Compact and Fast Algorithms for Regular Expression Search
 International Journal of Computer Mathematics (IJCM
, 2004
"... This paper describes an improvement of the brute force determinization algorithm in the case of homogeneous NFAs, as well as its application to pattern matching. Brute force determinization with limited memory may provide a partially determinized automaton, but its bounded complexity makes it be ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This paper describes an improvement of the brute force determinization algorithm in the case of homogeneous NFAs, as well as its application to pattern matching. Brute force determinization with limited memory may provide a partially determinized automaton, but its bounded complexity makes it be a failsafe procedure contrary to the classical subset construction.