#### DMCA

## Regular Expression Searching on Compressed Text (2003)

Venue: | Journal of Discrete Algorithms |

Citations: | 13 - 1 self |

### Citations

1522 | A universal algorithm for sequential data compression.
- Ziv, Lempel
- 1977
(Show Context)
Citation Context ...hen the text is compressed. Text compression [5] exploits the redundancies of the text to represent it using less space. There are many dierent compression schemes, among which the Ziv-Lempel family [=-=35, 36]-=- is one of the best in practice because of its good compression ratios combined with ecient compression and decompression times. The compressed matching problem consists of searching for a pattern on ... |

954 | Compression of individual sequences via variable rate coding.
- Ziv, Lempel
- 1978
(Show Context)
Citation Context ...hen the text is compressed. Text compression [5] exploits the redundancies of the text to represent it using less space. There are many dierent compression schemes, among which the Ziv-Lempel family [=-=35, 36]-=- is one of the best in practice because of its good compression ratios combined with ecient compression and decompression times. The compressed matching problem consists of searching for a pattern on ... |

722 |
Modeling for text compression.
- Bell, Witten, et al.
- 1989
(Show Context)
Citation Context ...gular expression searching is quite old and has received continuous attention since the sixties. A particularly interesting case of text searching arises when the text is compressed. Text compression =-=[5-=-] exploits the redundancies of the text to represent it using less space. There are many dierent compression schemes, among which the Ziv-Lempel family [35, 36] is one of the best in practice because ... |

598 | A guided tour to approximate string matching.
- Navarro
- 2001
(Show Context)
Citation Context ...ing to notice that any solution for compressed regular expression searching implies a solution for compressed approximate string matching, as the latter can be expressed as the output of an automaton =-=[-=-20]. Consider the NFA for k = 2 dierences shown in Figure 4. Every row denotes the number of dierences seen (thesrst row zero, the second row one, etc.). Every column represents matching a pattern pre... |

562 |
A technique for high performance data compression.
- Welch
- 1984
(Show Context)
Citation Context ...is is the problem we solve in this paper: we present thesrst solution for compressed regular expression searching. The format we choose is the Ziv-Lempel family, focusing in the LZ78 and LZW variants =-=[36, 33]-=-. Given a text of length u compressed into length n, we are able tosnd the R occurrences of a regular expression of length m in O(2 m +mn+Rm log m) worst case time, needing O(2 m + mn) space. We also ... |

368 |
Fast text searching allowing errors.
- WU, U
- 1992
(Show Context)
Citation Context ...t-parallel simulation of an NFA, or as an implementation of a DFA (where the identier of each deterministic state is the bit mask as a whole). This idea has been used several times, under Thompson's [=-=34-=-] and Glushkov's [27] constructions. By using dierent properties of the constructions, both manage to implement the transition function D using O(2 m ) space (actually, the Thompson-based version [34]... |

331 | An Introduction to the Analysis of Algorithms. - Sedgewick, Flajolet - 1996 |

152 |
Regular expression search algorithm.
- Thompson
- 1968
(Show Context)
Citation Context ...mbination of active/inactive NFA states becomes a single DFA state. Given a regular expression E, there are several techniques to produce an NFA that recognizes L(E). The most classical is Thompson's =-=[30]-=-. Given an expression of length m, this method produces an NFA of at most 2m states and 4m edges. A less popular one is Glushkov's [9], which produces an NFA of exactly m+1 states but O(m 2 ) edges. T... |

121 |
From regular expressions to deterministic automata,
- Berry, Sethi
- 1986
(Show Context)
Citation Context ...[9], which produces an NFA of exactly m+1 states but O(m 2 ) edges. Tosx ideas we will assume in this paper that we build NFAs using the version of Glushkov's algorithm popularized by Berry and Sethi =-=[6]-=-. The problem of searching for a regular expression E in a given text string T is that ofsnding all the text substrings that belong to L(E). These are called occurrences. For simplicity, we report the... |

121 |
A string matching algorithm fast on the average,
- Commentz-Walter
- 1979
(Show Context)
Citation Context ...gth of a string matching the regular expression and forms a trie with all the prexes 5 of that length of strings matching the regular expression. A multipattern search algorithm like Commentz-Walter [=-=-=-7] is run over those prexes as aslter to detect text areas where a complete occurrence may start. Those areas are then veried with a classical algorithm. Another technique of this kind is used in Gnu ... |

106 |
String matching in Lempel-Ziv compressed strings.
- Farach, Thorup
- 1998
(Show Context)
Citation Context ...orithm for exact searching is from 1994, by Amir, Benson and Farach [3], who search LZ78 compressed texts needing time and space O(m 2 + n). The only search technique for LZ77 is by Farach and Thorup =-=[8]-=-, a randomized algorithm to determine in time O(m + n log 2 (u=n)) whether a pattern is present or not in the text. An extension of thesrst work [3] to multipattern searching was presented by Kida et ... |

106 | Average-case analysis of algorithms and data structures. - Vitter, Flajolet - 1990 |

105 |
The abstract theory of automata,
- Glushkov
- 1961
(Show Context)
Citation Context ...ce an NFA that recognizes L(E). The most classical is Thompson's [30]. Given an expression of length m, this method produces an NFA of at most 2m states and 4m edges. A less popular one is Glushkov's =-=[9]-=-, which produces an NFA of exactly m+1 states but O(m 2 ) edges. Tosx ideas we will assume in this paper that we build NFAs using the version of Glushkov's algorithm popularized by Berry and Sethi [6]... |

85 | Efficient two-dimensional compressed matching. In:
- Amir, Benson
- 1992
(Show Context)
Citation Context ...thod is also dierent: instead of a Boyer-Moore like algorithm, it is based on BNDM [26]. 3.2 Compressed Pattern Matching The compressed matching problem wassrst dened in the work of Amir and Benson [2=-=]-=- as the task of performing string matching in a compressed text without decompressing it. Given a text T , a corresponding compressed string Z = z 1 : : : z n , and a pattern P , the compressed matchi... |

76 | A text compression scheme that allows fast searching directly in the compressed file
- MANBER
- 1997
(Show Context)
Citation Context ...], but they need the text to contain natural language and be large (say, 10 Mb or more). Moreover, they allow only searching for whole words and phrases. There are also other practical ad-hoc methods =-=[15-=-], but the compression they obtain is poor. Moreover, in these compression formats n = (u), so the speedups can only be measured in practical terms. The second line of research considers Ziv-Lempel co... |

71 |
A Method for the Construction of Minimum Redundancy Codes
- Human
- 1951
(Show Context)
Citation Context ...re R is the number of matches (note that it could be that R = u > n). Two dierent approaches exist to search compressed text. Thesrst one is rather practical. Ecient solutions based on Human coding [1=-=0]-=- on words have been presented by Moura et al. [18], but they need the text to contain natural language and be large (say, 10 Mb or more). Moreover, they allow only searching for whole words and phrase... |

65 | Fast text searching for regular expressions or automaton searching on tries. - Baeza-Yates, Gonnet |

60 | A general practical approach to pattern matching over Ziv-Lempel compressed text,
- Navarro, Rxdfiuot
- 1999
(Show Context)
Citation Context ...srst experimental results in this area. They achieve O(m 2 + n) time and space, although this time m is the total length of all the patterns. New practical results were presented by Navarro and Ranot =-=[25]-=-, who proposed a general scheme to search Ziv-Lempel compressed texts (simple and extended patterns) and specialized it for the particular cases of LZ77, LZ78 and a new variant proposed that was compe... |

55 |
A four russians algorithm for regular expression pattern matching.
- Myers
- 1992
(Show Context)
Citation Context ... 2 subtables of size 2 m=2 . We need to access two tables for a transition but need only the square root of the space. Some techniques have been proposed to obtain a tradeo between NFAs and DFAs. In [=-=19] a fo-=-ur-russians approach is presented that obtains O(mu= log u) worst-case time and extra space. The idea is to divide the syntax tree of the regular expression into \modules", which are subtrees of ... |

52 | Bayer-Moore string matching over Ziv-Lempel compressed text,
- Navarro, Raflinot
- 2000
(Show Context)
Citation Context ...or Human coding of words [18], but the solution is limited to search for a whole word and retrieve whole words that are similar. Thesrst true solutions appeared very recently, by Karkkainen et al. [11=-=]-=-, Matsumoto et al. [16] and Navarro et al. [23]. 4 A Search Algorithm We present now our approach for regular expression searching a text Z = b 1 : : : b n , which is expressed by the LZ78 algorithm a... |

28 | Faster Approximate String Matching Over Compressed Text,
- Navarro, Takeda, et al.
- 2001
(Show Context)
Citation Context ...n is limited to search for a whole word and retrieve whole words that are similar. Thesrst true solutions appeared very recently, by Karkkainen et al. [11], Matsumoto et al. [16] and Navarro et al. [2=-=3]-=-. 4 A Search Algorithm We present now our approach for regular expression searching a text Z = b 1 : : : b n , which is expressed by the LZ78 algorithm as a sequence of n blocks. Our goal is tosnd the... |

26 | Multiple pattern matching in lzw compressed text. In:
- Kida, Takeda, et al.
- 1998
(Show Context)
Citation Context ...randomized algorithm to determine in time O(m + n log 2 (u=n)) whether a pattern is present or not in the text. An extension of thesrst work [3] to multipattern searching was presented by Kida et al. =-=[13]-=-, together with thesrst experimental results in this area. They achieve O(m 2 + n) time and space, although this time m is the total length of all the patterns. New practical results were presented by... |

25 | A unifying framework for compressed pattern matching
- KIDA, SHIBATA, et al.
- 1999
(Show Context)
Citation Context ...lt, restricted to the LZW format, was independently 6 found and presented by Kida et al. [14]. The same group generalized the existing algorithms and nicely unied the concepts in a general framework [=-=12-=-]. Recently, Navarro and Tarhio [28] presented a new, faster, algorithm based on Boyer-Moore. Approximate string matching on compressed text aims atsnding the pattern where a limited number of dierenc... |

25 | Shift-And approach to pattern matching in LZW compressed text,
- Kida, Takeda, et al.
- 1999
(Show Context)
Citation Context ...f LZ77, LZ78 and a new variant proposed that was competitive and convenient for search purposes. A similar result, restricted to the LZW format, was independently 6 found and presented by Kida et al. =-=[14-=-]. The same group generalized the existing algorithms and nicely unied the concepts in a general framework [12]. Recently, Navarro and Tarhio [28] presented a new, faster, algorithm based on Boyer-Moo... |

19 |
Let sleeping lie: pattern matching in z-compressed
- Amir, Benson, et al.
- 1996
(Show Context)
Citation Context ...Lempel compressed texts is much more complex, since the pattern can appear in dierent forms across the compressed text. Thesrst algorithm for exact searching is from 1994, by Amir, Benson and Farach [=-=3]-=-, who search LZ78 compressed texts needing time and space O(m 2 + n). The only search technique for LZ77 is by Farach and Thorup [8], a randomized algorithm to determine in time O(m + n log 2 (u=n)) w... |

16 | Variations on a theme by Ziv and Lempel - Miller, Wegman - 1984 |

15 | Fast regular expression search
- Navarro, Raffinot
(Show Context)
Citation Context ...s are searched for and the areas where they appear are checked for complete occurrences using a lazy deterministic automaton (i.e., built on thesy). The most recent development, also in this line, is =-=[24-=-]. They invert the arrows of the DFA and make all states initial and the initial statesnal. The result is an automaton that recognizes all the reverse prexes of strings matching the regular expression... |

13 | Bit-parallel approach to approximate string matching in compressed texts
- Matsumoto, Kida, et al.
- 2000
(Show Context)
Citation Context ...ds [18], but the solution is limited to search for a whole word and retrieve whole words that are similar. Thesrst true solutions appeared very recently, by Karkkainen et al. [11], Matsumoto et al. [1=-=6]-=- and Navarro et al. [23]. 4 A Search Algorithm We present now our approach for regular expression searching a text Z = b 1 : : : b n , which is expressed by the LZ78 algorithm as a sequence of n block... |

13 | Compact DFA representation for fast regular expression search.
- Navarro, Ranot
- 2001
(Show Context)
Citation Context ...n of an NFA, or as an implementation of a DFA (where the identier of each deterministic state is the bit mask as a whole). This idea has been used several times, under Thompson's [34] and Glushkov's [=-=27-=-] constructions. By using dierent properties of the constructions, both manage to implement the transition function D using O(2 m ) space (actually, the Thompson-based version [34] may need O(2 2m ) s... |

11 |
Fast and string matching by combining bitparallelism and sux automata
- Navarro, Ranot
- 1998
(Show Context)
Citation Context ...ching the regular expression. The idea is in this sense similar to that of [32], but takes less space. The search method is also dierent: instead of a Boyer-Moore like algorithm, it is based on BNDM [=-=26-=-]. 3.2 Compressed Pattern Matching The compressed matching problem wassrst dened in the work of Amir and Benson [2] as the task of performing string matching in a compressed text without decompressing... |

9 |
Fast and word searching on compressed text
- Moura, Navarro, et al.
(Show Context)
Citation Context ...be that R = u > n). Two dierent approaches exist to search compressed text. Thesrst one is rather practical. Ecient solutions based on Human coding [10] on words have been presented by Moura et al. [1=-=8]-=-, but they need the text to contain natural language and be large (say, 10 Mb or more). Moreover, they allow only searching for whole words and phrases. There are also other practical ad-hoc methods [... |

9 |
A new regular grammar pattern matching algorithm.
- Watson
- 1996
(Show Context)
Citation Context ...at a good implementation of the automaton, but they must inspect all the text characters. Other proposals try to skip some text characters, as it is usual for simple pattern matching. For example, in =-=[32-=-] they present an algorithm that determines the minimum length of a string matching the regular expression and forms a trie with all the prexes 5 of that length of strings matching the regular express... |

5 |
Nr-grep: A fast and pattern matching tool
- Navarro
- 2000
(Show Context)
Citation Context ...ithms. Asrst one, DFA, uses a bit-parallel DFA to process the text [27]. This is interesting because it is the algorithm we are modifying to work on compressed text. A second one, the software nrgrep =-=[21]-=-, uses a character skipping technique for searching [24, 27], which is much faster. In any case, the time to decompress is an order of magnitude higher than that to search the uncompressed text, so th... |

3 | Regular expression searching over Ziv-Lempel compressed text
- Navarro
- 2001
(Show Context)
Citation Context ...gorithms on uncompressed text, showing that we can search the compressed text twice as fast as the nave approach of decompressing and then searching. A preliminary version of this paper appeared in [2=-=-=-2]. 2 Basic Concepts 2.1 Strings, Regular Expressions and Automata We give a very basic introduction to the subject. For more details see, for example, [1]. Given an alphabet (nite set of symbols) of... |