MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Efficient Submatch Addressing for Regular Expressions (2001) [6 citations — 0 self]

by Ville Laurikari
Add To MetaCart

Abstract:

String pattern matching in its different forms is an important topic in theoretical computer science. This thesis concentrates on the problem of regular expression matching with submatch addressing, where the position and extent of the substrings matched by given subexpressions must be provided. The algorithms in widespread use at the time either take exponential worst-case time to find a match, can handle only a subset of all regular expressions, or use space proportional to the length of the input string where constant space would suffice. This thesis proposes a new method for solving the submatch addressing problem using nondeterministic finite automata with transitions augmented by copy-on-write update operations. The resulting algorithm makes a single pass over the input string, always using time linearly proportional to the input. Space consumption depends only on the used regular expression, and not on the input string. To the author's knowledge, this is a new result. A prototype of a POSIX.2 compatible regular expression matcher using the algorithm was done. Benchmarking results indicate that the prototype compares favorably against some popular implementations. Furthermore, absence of exponential or polynomial time worst cases makes it possible to use any regular expression without performance problems, which is not the case with previous implementations or algorithms.

Citations

2010 The Design and Analysis of Computer Algorithms – Aho, Hopcroft, et al. - 1974
1052 The C Programming Language – Kerighan, Ritchie - 1978
553 Binary codes capable of correcting deletions, insertions and reversals – Levenshtein - 1966
461 A Logical Calculus of the Ideas Immanent in Nervous Activity – McCulloch, Pitts - 1943
226 Elements of the theory of computation – Lewis, Papadimitriou - 1981
179 Storing a sparse table with O(1) worst case access time – Fredman, Komlós, et al. - 1984
156 Purely functional data structures – Okasaki - 1998
117 LEX – a lexical analyzer generator – Lesk, Schmidt - 1975
113 Dynamic Perfect Hashing: Upper and Lower Bounds – Dietzfelbinger, Karlin, et al. - 1994
112 Derivatives of Regular Expressions – Brzozowski - 1964
92 Regular expression pattern matching for XML – Hosoya, Pierce
71 Deterministic Part-of-Speech Tagging with Finite-State Transducers – Roche, Schabes - 1995
67 From Regular Expressions to Deterministic Automata – Berry, Sethi - 1986
66 Codes and Automata – Berstel, Perrin, et al. - 2009
56 Regular expressions and state graphs for automata – McNaughton, Yamada - 1960
51 Approximate matching of regular expressions – Myers, Miller - 1989
49 Storing a sparse table – Tarjan, Yao - 1979
42 Economy of description by automata, grammars, and formal systems – Meyer, Fischer - 1971
36 On the use of Regular Expressions for Searching Text – Clarke, Cormack - 1997
28 A four-russian algorithm for regular expression pattern matching – Myers - 1992
26 Functional Programming with Graphs – Erwig - 1997
17 Flex—Fast Lexical Analyzer Generator – Paxson - 1995
13 Representation of events in nerve nets and nite automata – Kleene - 1956
13 Nfas with tagged transitions, their conversion to deterministic automata and application to regular expressions – Laurikari - 2000
12 Programming techniques: Regular expression search algorithm – Thompson - 1968
10 Reporting exact and approximate regular expression matches – Guimaraes, Oliva, et al. - 1998
8 Storing a dynamic sparse table – Aho, Lee - 1986
8 Approximate regular expression pattern matching with concave gap penalties – Knight, Myers - 1995
8 Real-time Garbage Collection of a Functional Persistent Heap – Oksanen - 1999
7 A string manipulation language – SNOBOL - 1964
6 Efficiently building a parse tree from a regular expression – Dubé, Feeley - 2000
6 A procedure for checking equality of regular expressions – Ginzburg - 1967
5 Extending regular expressions with context operators and parse extraction – Kearns - 1991
5 Generation of pattern-matching algorithms by extended regular expressions – Nakata - 1993
5 Regular expressions with semantic rules and their application to data structure directed programs – Nakata, Sassa - 1991
4 Algorithms for nding patterns in strings – Aho - 1990
3 Finding patterns common to a set of strings (extended abstract – Angluin - 1979
3 TLex v.68 user's manual – Kearns - 1990
3 Regular expressions with nested levels of back referencing form a hierarchy – Larsen - 1998
3 Design of sequential machines from their regular expressions – Ott, Feinstein - 1961
3 Languages and Parsing, volume 1 of Parsing Theory – Sippu, Soisalon-Soininen - 1988
2 Generating nite-state transducers for semistructured data extraction from the web – Hsu, Dung - 1998
1 Partial derivatives of regular expressions and nite automaton constructions – Antimirov - 1996
1 XML: The Annotated Specication – DuCharme - 1999
1 On the succinctness of dierent representations of languages – Hartmanis - 1980
1 Haruo Hosoya, Jrme Vouillon. Regular expression types for XML – P - 2000
1 Approximate regular expression matching – Mutko - 1996
1 Parsing with nite state transducers – Roche
1 Index "-closure, 27 – O'Reilly, Associates - 2000