## Top-k document retrieval in optimal time and linear space (2012)

Venue: | In Proc. 22nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2012 |

Citations: | 14 - 8 self |

@INPROCEEDINGS{Navarro12top-kdocument,

author = {Gonzalo Navarro and Yakov Nekrich},

title = {Top-k document retrieval in optimal time and linear space},

booktitle = {In Proc. 22nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2012},

year = {2012},

pages = {1066--1077}

}

### Abstract

We describe a data structure that uses O(n)-word space and reports k most relevant documents that contain a query pattern P in optimal O(|P | + k) time. Our construction supports an ample set of important relevance measures, such as the frequency of P in a document and the minimal distance between two occurrences of P in a document. We show how to reduce the space of the data structure from O(n log n) to O(n(log σ+log D+log log n)) bits, where σ is the alphabet size and D is the total number of documents. 1

