Results 1 -
2 of
2
An Investigation of the Preconditions for Effective Data Fusion in Information Retrieval: A Pilot Study
, 1998
"... Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance. Ther ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance. There are many successful data fusion experiments reported in IR literature, but there are also experiments in which data fusion did not work while using the same fusion rules. What is needed is a theory to tell a priori when one should use data fusion methods. We categorize different theoretical justifications of data fusion into two approaches, examine their implications, analyze some of the unsuccessful data fusion experiments, and propose two preconditions for effective data fusion: (1) The precondition of efficacy and (2) The precondition of dissimilarity. We have developed a mathematical measure (Pair-out-of-order) to measure inter-scheme dissimilarity, and have developed algorithms and co...
Predicting the Effectiveness of Nave Data Fusion on the Basis of System Characteristics
- Journal of American Society for Information Science
, 2000
"... Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance, in t ..."
Abstract
- Add to MetaCart
Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance, in the process called "data fusion". There are many successful data fusion experiments reported in IR literature, but there are also cases in which it did not work well. Thus if would be quite valuable to have a theory which can predict, in advance, whether fusion of two or more retrieval schemes will be worth doing. In previous study (Ng and Kantor, 1998), we identified two predictive variables for the effectiveness of fusion: (a) a list-based measure of output dissimilarity, and (b) a pair-wise measure of the similarity of performance of the two schemes. In this paper we investigate the predictive power of these two variables in simple symmetrical data fusion. We use the IR systems participating in the TREC 4 routing task to train a model that predicts the effectiveness of data fusion, and use the IR systems participating in the TREC 5 routing task to test that model. The model asks, "when will fusion perform better than an oracle who uses the best scheme from each pair?" We explore statistical techniques for fitting the model to the training data and use the receiver operating characteristic curve of signal detection theory to represent the power of the resulting models. The trained prediction methods predict whether fusion will beat an oracle, at levels much higher than could be achieved by chance. Ng, p. 2 1

