Improved compressed indexes for full-text document retrieval (2011)
| Venue: | In Proc. 18th SPIRE |
| Citations: | 3 - 2 self |
BibTeX
@INPROCEEDINGS{Belazzougui11improvedcompressed,
author = {Djamal Belazzougui and Gonzalo Navarro},
title = {Improved compressed indexes for full-text document retrieval},
booktitle = {In Proc. 18th SPIRE},
year = {2011},
pages = {286--297}
}
OpenURL
Abstract
Abstract. We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at lg D lg lg D least |CSA | + O(n) or 2|CSA | + o(n) bits of space, where CSA is a full-text index. Using monotone minimum perfect hash functions, we give new algorithms for document listing with frequencies and top-k document retrieval using just |CSA | + O(n lg lg lg D) bits. We also improve current solutions that use 2|CSA | + o(n) bits, and consider other problems such as colored range listing, top-k most important documents, and computing arbitrary frequencies. 1 Introduction and Related Work Full-text document retrieval is the problem of, given a collection of D documents (i.e., general sequences over alphabet [1, σ]), concatenated into a text T [1, n],







