## Reporting exact and approximate regular expression matches (1998)

Venue: | Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching, number 1448 in LNCS series #1448 |

Citations: | 10 - 0 self |

### BibTeX

@INPROCEEDINGS{Myers98reportingexact,

author = {Eugene W. Myers and Paulo Oliva and Katia S. Guimaraes},

title = {Reporting exact and approximate regular expression matches},

booktitle = {Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching, number 1448 in LNCS series #1448},

year = {1998},

pages = {91--103},

publisher = {Springer-Verlag}

}

### OpenURL

### Abstract

While much work has been done on determining if a document or a line of a document contains an exact or approximate match to a regular expression, less e ort has been expended in formulating and determining what to report as \the match " once such a \hit " is detected. For exact regular expression pattern matching, we give algorithms for nding a longest match, all symbols involved in some match, and nding optimal submatches to tagged parts of a pattern. For approximate regular expression matching, we develop notions of what constitutes a signi-cant match, give algorithms for them, and also for nding a longest match and all symbols in a match. 1

### Citations

1209 | Tcl and the Tk Toolkit
- Ousterhout
- 1994
(Show Context)
Citation Context ... each line containing a match, working on delivering a meaningful match rather than one that is an artifact of the scanning/ ltration algorithm. The widely accepted standard, e.g. Perl [WS91], Tcl/Tk =-=[Ous94]-=-, and the IEEE Posix standard [IEE92], for exact regular expression pattern matching is to report the left-most longest match, i.e. the matching substring whose left end is leftmost, and if there are ... |

433 | Algorithms in c - Sedgewick - 1990 |

180 |
Programming perl. O'Reilly and Associates
- Wall, Schwartz
- 1991
(Show Context)
Citation Context ...nd more time on each line containing a match, working on delivering a meaningful match rather than one that is an artifact of the scanning/ ltration algorithm. The widely accepted standard, e.g. Perl =-=[WS91]-=-, Tcl/Tk [Ous94], and the IEEE Posix standard [IEE92], for exact regular expression pattern matching is to report the left-most longest match, i.e. the matching substring whose left end is leftmost, a... |

117 |
Regular expression search algorithm
- Thompson
- 1968
(Show Context)
Citation Context ...t matches above, it is easy to see that S f (i) =fs:C(i; s) 6= 1g. While this demonstrates that S f (i) can be computed from S f (i , 1) in O(P )worst-case time, the traditional reaching algorithm of =-=[Tho68]-=- doessoinO(jS f(i,1)j + jS f(i)j) time as one is free to discover the states in S f (i) inany order. This is superior in practice as it is frequently the case that the average size of the state sets i... |

59 |
Approximate matching of regular expressions
- Myers, Miller
- 1989
(Show Context)
Citation Context ...r within the target with equal probability. Interestingly, this problem can be solved in O(PN) time using a specialization of the approximate regular expression matching algorithm of Miller and Myers =-=[MM88]-=- that accommodates any additive alignment scoring scheme . Quickly we review this result and then proceed to the specialization. First, recall that any regular expression R can be converted to a state... |

53 |
Identification of Common Molecular Sequences
- Smith, Waterman
- 1981
(Show Context)
Citation Context ...Sel84] explored algorithms for finding such matches in the context of molecular biology. This work appears to have been forgotten in the wake of the current popularity of the Smith-Waterman algorithm =-=[SW81]-=-. Sellers' basic idea is as follows. Suppose scoring is with respect to a general additive scoring scheme ffi, and suppose one wants to detect only matches for which ffi (A; P )=jP jsr. Sellers observ... |

25 |
Pattern recognition genetic sequences by mismatch density
- Sellers
- 1984
(Show Context)
Citation Context ... between two sequences be one for which the di erence ratio of every pre x and su x of the match is less than r .Intuitively, every \extension" of the match is signi cant. In the early 1980's Sellers =-=[Sel84]-=- explored algorithms for nding such matches in the context of molecular biology. This work appears to have been forgotten in the wake of the current popularity of the Smith-Waterman algorithm [SW81]. ... |

2 |
Operating System Interface (POSIX
- Portable
- 1994
(Show Context)
Citation Context ...g on delivering a meaningful match rather than one that is an artifact of the scanning/ ltration algorithm. The widely accepted standard, e.g. Perl [WS91], Tcl/Tk [Ous94], and the IEEE Posix standard =-=[IEE92]-=-, for exact regular expression pattern matching is to report the left-most longest match, i.e. the matching substring whose left end is leftmost, and if there are several with such a left end, then De... |

1 |
On the use of regular expressons for searching text
- Clarke, Cormack
- 1997
(Show Context)
Citation Context ...est and shortest matches. In a recent paper, Clarke and Cormack, argue that shortest matches have superior search properties when looking at patterns that involve matching several regular expressions =-=[CC97]-=-. On the other hand, we know of no reported work on reporting approximate matches to regular expressions, save that there are connections to work on nding locally optimal alignments [SW81, Sel84]. Our... |

1 | Going against the grain
- Myers, Jain
- 1996
(Show Context)
Citation Context ...). Thus the \grain" of the computations for S f and S r oppose each other. If space is a problem in a particular context, then one can employ the \going-against the grain" algorithm of Myers and Jain =-=[MJ96]-=-, to compute S r(1);S r(2);S r(3);:::S r(N) in the given order using O(tP N) time and O(PN 1=t ) space for any choice of t 1. Choosing t = log N gives an O(PN log N) time and O(P log N) space worst-ca... |

1 |
Identi cation of common molecular sequence
- Smith, Waterman
- 1981
(Show Context)
Citation Context ... [Sel84] explored algorithms for nding such matches in the context of molecular biology. This work appears to have been forgotten in the wake of the current popularity of the Smith-Waterman algorithm =-=[SW81]-=-. Sellers' basic idea is as follows. Suppose scoring is with respect to a general additive scoring scheme , and suppose one wants to detect only matches for which (A; P )=jPj r. Sellers observed that ... |