Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract (2000)
| Venue: | in Proceedings of the 32nd Annual ACM Symposium on the Theory of Computing |
| Citations: | 172 - 15 self |
BibTeX
@INPROCEEDINGS{Grossi00compressedsuffix,
author = {Roberto Grossi and Jeffrey and Scott Vitter},
title = {Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract},
booktitle = {in Proceedings of the 32nd Annual ACM Symposium on the Theory of Computing},
year = {2000},
pages = {397--406}
}
Years of Citing Articles
OpenURL
Abstract
Abstract. The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. The text T can be represented in n lg |Σ | bits by encoding each symbol with lg |Σ | bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require Ω(n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Ω(n) memory words, each of Ω(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Ω(lg |Σ | n), which is significant when Σ is of constant size, such as in ascii or unicode. On the other hand, these indexes support fast searching, either in O(m lg |Σ|) timeorinO(m +lgn) time, plus an output-sensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m / lg |Σ | n +lgɛ |Σ | n) search time in the worst case, for any constant







