Engineering a multi-purpose test collection for Web retrieval experiments (2001)
Cached
Download Links
- [pigfish.vic.cmis.csiro.au]
- [www.ted.cmis.csiro.au]
- [es.csiro.au]
- DBLP
Other Repositories/Bibliography
| Citations: | 73 - 3 self |
BibTeX
@MISC{Bailey01engineeringa,
author = {Peter Bailey and Nick Craswell and David Hawking},
title = {Engineering a multi-purpose test collection for Web retrieval experiments},
year = {2001}
}
Years of Citing Articles
OpenURL
Abstract
Past research into text retrieval methods for the Web has been restricted by the lack of a test collection capable of supporting experiments which are both realistic and reproducible. The 1.69 million document WT10g collection is proposed as a multi-purpose testbed for experiments with these attributes, in distributed IR, hyperlink algorithms and conventional ad hoc retrieval. WT10g was constructed by selecting from a superset of documents in such a way that desirable corpus properties were preserved or optimised. These properties include: a high degree of inter-server connectivity, integrity of server holdings, inclusion of documents related to a very wide spread of likely queries, and a realistic distribution of server holding sizes. We confirm that WT10g contains exploitable link information using a site (homepage) finding experiment. Our results show that, on this task, Okapi BM25 works better on propagated link anchor text than on full text. Keywords: Web retrieval, Link-based ranking, Distributed information retrieval, Test collections







