Experimentation, Measurement
by
Paul Thomas
BibTeX
@MISC{Thomas_experimentation,measurement,
author = {Paul Thomas},
title = {Experimentation, Measurement},
year = {}
}
OpenURL
Abstract
Algorithms in distributed information retrieval often rely on accurate knowledge of the size of a collection. The “multiple capture-recapture ” method of Shokouhi et al. is one of the more reliable algorithms for determining collection size, but it relies on samples with a uniform number of documents. Such uniform samples are often hard to obtain in a working system. A simple generalisation of multiple capture-recapture does not rely on uniform sample sizes. Simulations show it is as accurate as the original method even when sample sizes vary considerably, making it a useful technique in real tools.







