## Searching BWT compressed text with the Boyer-Moore algorithm and binary search (2002)

Venue: | Proceedings, IEEE Data Compression Conference, 2002 |

Citations: | 11 - 6 self |

### BibTeX

@INPROCEEDINGS{Bell02searchingbwt,

author = {Tim Bell and Matt Powell and Amar Mukherjee and Don Adjeroh},

title = {Searching BWT compressed text with the Boyer-Moore algorithm and binary search},

booktitle = {Proceedings, IEEE Data Compression Conference, 2002},

year = {2002},

pages = {112--121}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract: This paper explores two techniques for on-line exact pattern matching in files that have been compressed using the Burrows-Wheeler transform. We investigate two approaches. The first is an application of the Boyer-Moore algorithm (Boyer & Moore 1977) to a transformed string. The second approach is based on the observation that the transform effectively contains a sorted list of all substrings of the original text, which can be exploited for very rapid searching using a variant of binary search. Both methods are faster than a decompress-and-search approach for small numbers of queries, and binary search is much faster even for large numbers of queries. 1

### Citations

1141 | A universal algorithm for sequential data compression
- Ziv, Lempel
- 1977
(Show Context)
Citation Context ...jeroh & Mukherjee (2001) outlines several techniques for online compressed-domain pattern matching in both text and images. Many of these techniques are based on the LZ family of compression systems (=-=Ziv & Lempel 1977-=-, Ziv & Lempel 1978), but others include methods for Huffman-coded text and run-length encoding. Little work has been done with the Burrows-Wheeler transform, although some research has been undertake... |

628 | Fast pattern matching in strings - Knuth, Morris, et al. - 1977 |

572 |
A fast string searching algorithm
- Boyer, Moore
- 1977
(Show Context)
Citation Context ...ues for on-line exact pattern matching in files that have been compressed using the Burrows-Wheeler transform. We investigate two approaches. The first is an application of the Boyer-Moore algorithm (=-=Boyer & Moore 1977-=-) to a transformed string. The second approach is based on the observation that the transform effectively contains a sorted list of all substrings of the original text, which can be exploited for very... |

566 | A Block – Sorting Lossless Data compression Algorithm
- Burrows, Wheeler
- 1994
(Show Context)
Citation Context ...eries, and binary search is much faster even for large numbers of queries. 1 Introduction This paper investigates on-line exact pattern matching in files compressed with the BurrowsWheeler transform (=-=Burrows & Wheeler 1994-=-). By ‘on-line’ pattern matching, we refer to methods that do not require a pre-computed index—all the work of pattern matching is done at query time. They are particularly suitable for texts that are... |

180 | Opportunistic Data Structures with Application
- Ferrragina, Manzini
- 2000
(Show Context)
Citation Context ...thods for Huffman-coded text and run-length encoding. Little work has been done with the Burrows-Wheeler transform, although some research has been undertaken in the area of offline pattern matching (=-=Ferragina & Manzini 2000-=-, Ferragina & Manzini 2001, Sadakane & Imai 1999, Sadakane 2000). Throughout this paper we will refer to the pattern matching problem in terms of searching for a pattern P of length m in a text T of l... |

149 |
A locally adaptive data compression scheme
- Bentley, Sleator, et al.
- 1986
(Show Context)
Citation Context ... compression program, bsmp, was developed. bsmp uses a four-stage compression system: 1. a Burrows-Wheeler transform, with the block size set to the size of the entire file, 2. a move-to-front coder (=-=Bentley et al. 1986-=-), which takes advantage of the high level of local repetition in the BWT output, 3. a run-length coder, to remove the long sequences of zeroes in the MTF output, and 4. an order-0 arithmetic coder. N... |

9 | Processing truncated terms in document retrieval systems - Bratley, Choueka - 1982 |

8 | A Cooperative Distributed Text Database Management Method Unifying Search and Compression Based on the Burrows-Wheeler Transformation
- Sadakane, Imai
- 1999
(Show Context)
Citation Context .... Little work has been done with the Burrows-Wheeler transform, although some research has been undertaken in the area of offline pattern matching (Ferragina & Manzini 2000, Ferragina & Manzini 2001, =-=Sadakane & Imai 1999-=-, Sadakane 2000). Throughout this paper we will refer to the pattern matching problem in terms of searching for a pattern P of length m in a text T of length n. The input alphabet will be referred to ... |

6 | Pattern matching in compressed text and images - Bell, Adjeroh, et al. - 2001 |

5 | sorting text compression—final report - Block - 1996 |

2 |
An experimental study of a compressed index. Part of this work appeared
- Ferragina
- 2001
(Show Context)
Citation Context ...xt and run-length encoding. Little work has been done with the Burrows-Wheeler transform, although some research has been undertaken in the area of offline pattern matching (Ferragina & Manzini 2000, =-=Ferragina & Manzini 2001-=-, Sadakane & Imai 1999, Sadakane 2000). Throughout this paper we will refer to the pattern matching problem in terms of searching for a pattern P of length m in a text T of length n. The input alphabe... |

2 |
Unifying Text Search and Compression—Suffix Sorting, Block Sorting and Suffix Arrays
- Sadakane
- 2000
(Show Context)
Citation Context ... done with the Burrows-Wheeler transform, although some research has been undertaken in the area of offline pattern matching (Ferragina & Manzini 2000, Ferragina & Manzini 2001, Sadakane & Imai 1999, =-=Sadakane 2000-=-). Throughout this paper we will refer to the pattern matching problem in terms of searching for a pattern P of length m in a text T of length n. The input alphabet will be referred to as Σ; similarly... |

2 | Managing Gigabytes, second edn - Witten, Moffatt - 1999 |