Incremental Cluster-Based Retrieval using Compressed Cluster-Skipping Inverted Files
Cached
Download Links
- [www.users.muohio.edu]
- [www.users.muohio.edu]
- [www.cs.bilkent.edu.tr]
- DBLP
Other Repositories/Bibliography
| Citations: | 6 - 2 self |
BibTeX
@MISC{Altingovde_incrementalcluster-based,
author = {Ismail Sengor Altingovde and Engin Demir and Fazli Can},
title = {Incremental Cluster-Based Retrieval using Compressed Cluster-Skipping Inverted Files},
year = {}
}
OpenURL
Abstract
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation both best(-matching) clusters and best(-matching) documents of such clusters are computed together with a single posting list access per query term. As we switch from term to term, best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest is skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvements while yielding comparable or sometimes better effectiveness figures. Our experiments with various collections show that, the incremental-CBR strategy using compressed cluster-skipping inverted file significantly improves CPU time efficiency regardless of the query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.







