Results 1 -
3 of
3
Tracks, the Efficiency and Data Centric Tracks. In the Link-the-Wiki Track, we
"... Abstract. In this paper, we describe University of Otago’s participation ..."
and Retrieval – Search process. General Terms
"... Ranking function performance reached a plateau in 1994. The reason for this is investigated. First the performance of BM25 is measured as the proportion of queries satisfied on the first page of 10 results – it performs well. The performance is then compared to human performance. They perform compar ..."
Abstract
- Add to MetaCart
Ranking function performance reached a plateau in 1994. The reason for this is investigated. First the performance of BM25 is measured as the proportion of queries satisfied on the first page of 10 results – it performs well. The performance is then compared to human performance. They perform comparably. The conclusion is there isn’t much room for ranking function improvement.
Document Clustering Evaluation: Divergence from a Random Baseline
"... Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed usin ..."
Abstract
- Add to MetaCart
Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation. 1

