Performance, Experimentation
BibTeX
@MISC{Macdonald_performance,experimentation,
author = {Craig Macdonald and Iadh Ounis},
title = {Performance, Experimentation},
year = {}
}
OpenURL
Abstract
Information retrieval systems often use proximity or term dependence models to increase the effectiveness of document retrieval. Many of the existing proximity models examine document-level local statistics, such as the frequencies that pairs of query terms occur within fixed-size windows of each document, before applying standard or adapted weighting functions – for instance Markov Random Fields. Term weighting models use Inverse Document Frequency (IDF) to control the influence of occurrences of different query terms in documents. Similarly, some proximity models also take into account the frequency of pairs of query terms in the entire corpus of documents. However, pair frequency is an expensive statistic to pre-compute at indexing time, or to compute at retrieval time before scoring documents. In this work, we examine in a uniform setting, the importance of such global statistics for proximity weighting. We investigate two sources of global statistics, namely the target corpus, and the entire Web. Experiments are conducted using the TREC GOV2 and ClueWeb09 test collections. Our results show that local statistics alone are sufficient for effective retrieval, and global statistics usually do not bring any significant improvement in effectiveness, compared to the same proximity approaches that do not use these global statistics.







