Near Duplicate Image Detection: min-Hash and tf-idf Weighting
| Citations: | 32 - 1 self |
BibTeX
@MISC{Philbin_nearduplicate,
author = {James Philbin and Andrew Zisserman},
title = {Near Duplicate Image Detection: min-Hash and tf-idf Weighting},
year = {}
}
OpenURL
Abstract
This paper proposes two novel image similarity measures for fast indexing via locality sensitive hashing. The similarity measures are applied and evaluated in the context of near duplicate image detection. The proposed method uses a visual vocabulary of vector quantized local feature descriptors (SIFT) and for retrieval exploits enhanced min-Hash techniques. Standard min-Hash uses an approximate set intersection between document descriptors was used as a similarity measure. We propose an efficient way of exploiting more sophisticated similarity measures that have proven to be essential in image / particular object retrieval. The proposed similarity measures do not require extra computational effort compared to the original measure. We focus primarily on scalability to very large image and video databases, where fast query processing is necessary. The method requires only a small amount of data need be stored for each image. We demonstrate our method on the TrecVid 2006 data set which contains approximately 146K key frames, and also on challenging the University of Kentucky image retrieval database. 1







