## Author manuscript, published in "N/P" SEARCHING IN ONE BILLION VECTORS: RE-RANK WITH SOURCE CODING (2011)

### BibTeX

@MISC{Jégou11authormanuscript,,

author = {Hervé Jégou and Inria Rennes and Romain Tavenard and Matthijs Douze and Laurent Amsaleg},

title = {Author manuscript, published in "N/P" SEARCHING IN ONE BILLION VECTORS: RE-RANK WITH SOURCE CODING},

year = {2011}

}

### OpenURL

### Abstract

Recent indexing techniques inspired by source coding have been shown successful to index billions of high-dimensional vectors in memory. In this paper, we propose an approach that re-ranks the neighbor hypotheses obtained by these compressed-domain indexing methods. In contrast to the usual post-verification scheme, which performs exact distance calculation on the short-list of hypotheses, the estimated distances are refined based on short quantization codes, to avoid reading the full vectors from disk. We have released a new public dataset of one billion 128dimensional vectors and proposed an experimental setup to evaluate high dimensional indexing algorithms on a realistic scale. Experiments show that our method accurately and efficiently re-ranks the neighbor hypotheses using little memory compared to the full vectors representation. Index Terms — nearest neighbor search, quantization, source coding, high dimensional indexing, large databases 1.

