Results 1 - 10
of
87
Improving automatic query expansion
, 1998
"... Abstract Most casual users of IR systems type short queries. Recent research has shown that adding new words to these queries via odhoc feedback improves the re-trieval effectiveness of such queries. We investigate ways to improve this query expansion process by refining the set of documents used in ..."
Abstract
-
Cited by 195 (3 self)
- Add to MetaCart
Abstract Most casual users of IR systems type short queries. Recent research has shown that adding new words to these queries via odhoc feedback improves the re-trieval effectiveness of such queries. We investigate ways to improve this query expansion process by refining the set of documents used in feedback. We start by using manually formulated Boolean filters along with proxim-ity constraints. Our approach is similar to the one pro-posed by Hearst[l2]. Next, we investigate a completely automatic method that makes use of term cooccurrence information to estimate word correlation. Experimental results show that refining the set of documents used in query expansion often prevents the query drift caused by blind expansion and yields substantial improvements in retrieval effectiveness, both in terms of average preci-sion and precision in the top twenty documents. More importantly, the fully automatic approach developed in this study performs competitively with the best manual approach and requires little computational overhead. 1
Evaluating Evaluation Measure Stability
, 2000
"... This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thumb experimenters use, such as the number of queries needed for a good experiment is at least 25 and 50 is better, while cha ..."
Abstract
-
Cited by 131 (5 self)
- Add to MetaCart
This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thumb experimenters use, such as the number of queries needed for a good experiment is at least 25 and 50 is better, while challenging other beliefs, such as the common evaluation measures are equally reliable. As an example, we show that Precision at 30 documents has about twice the average error rate as Average Precision has. These results can help information retrieval researchers design experiments that provide a desired level of confidence in their results. In particular, we suggest researchers using Web measures such as Precision at 10 documents will need to use many more than 50 queries or will have to require two methods to have a very large difference in evaluation scores before concluding that the two methods are actually different.
How Reliable are the Results of Large-Scale Information Retrieval Experiments?
- Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1998
"... Two stages in measurement of techniques for information retrieval are gathering of documents for relevance assessment and use of the assessments to numerically evaluate e#ectiveness. We consider both of these stages in the context of the TREC experiments, to determine whether they lead to measuremen ..."
Abstract
-
Cited by 100 (3 self)
- Add to MetaCart
Two stages in measurement of techniques for information retrieval are gathering of documents for relevance assessment and use of the assessments to numerically evaluate e#ectiveness. We consider both of these stages in the context of the TREC experiments, to determine whether they lead to measurements that are trustworthy and fair. Our detailed empirical investigation of the TREC results shows that the measured relative performance of systems appears to be reliable, but that recall is overestimated: it is likely that many relevant documents have not been found. We propose a new pooling strategy that can significantly increase the number of relevant documents found for given e#ort, without compromising fairness.
Incremental Relevance Feedback for Information Filtering
, 1996
"... We use data from the TREC routing experiments to explore how relevance feedback can be applied incrementally --- using a few judged documents each time --- to achieve results that are as good as if the feedback occurred in one pass. We show that relatively few judgments are needed to get highquality ..."
Abstract
-
Cited by 90 (4 self)
- Add to MetaCart
We use data from the TREC routing experiments to explore how relevance feedback can be applied incrementally --- using a few judged documents each time --- to achieve results that are as good as if the feedback occurred in one pass. We show that relatively few judgments are needed to get highquality results. We also demonstrate methods that reduce the amount of information archived from past judged documents without adversely affecting effectiveness. A novel simulation shows that such techniques are useful for handling long-standing queries with drifting notions of relevance.
Overview of the Fifth Text REtrieval Conference (TREC-5)
- PROCEEDINGS OF THE FIFTH TEXT RETRIEVAL CONFERENCE (TREC-5
, 1997
"... ..."
Comparing the Performance of Database Selection Algorithms
, 1999
"... We compare the performance of two database selection algorithms reported in the literature. Their performance is compared using a common testbed designed specifically for database selection techniques. The testbed is a decomposition of the TREC/- TIPSTER data into 236 subcollections. We present resu ..."
Abstract
-
Cited by 89 (23 self)
- Add to MetaCart
We compare the performance of two database selection algorithms reported in the literature. Their performance is compared using a common testbed designed specifically for database selection techniques. The testbed is a decomposition of the TREC/- TIPSTER data into 236 subcollections. We present results of a recent investigation of the performance of the CORI algorithm and compare the performance with earlier work that examined the performance of gGlOSS. The databases from our testbed were ranked using both the gGlOSS and CORI techniques and compared to the RBR baseline, a baseline derived from TREC relevance judgements. We examined the degree to which CORI and gGlOSS approximate this baseline. Our results confirm our earlier observation that the gGlOSS Ideal(l) ranks do not estimate relevance- This work supported in part by DARPA contract N6600197 -C-8542 and NASA GSRP NGT5-50062. y This work supported in part by NSF, the Library of Congress, and the Department of Commerce under agre...
Projections for Efficient Document Clustering
, 1997
"... Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the distance calculations at the heart of clustering routines. We study two techniques for improving the ..."
Abstract
-
Cited by 86 (0 self)
- Add to MetaCart
Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the distance calculations at the heart of clustering routines. We study two techniques for improving the cost of distance calculations, LSI and truncation, and determine both how much these techniques speed up clustering and how much they affect the quality of the resulting clusters. We find that the speed increase is significant while --- surprisingly --- the quality of clustering is not adversely affected. We conclude that truncation yields clusters as good as those produced by full-profile clustering while offering a significant speed advantage.
Simple, Proven Approaches to Text Retrieval
, 1997
"... This technical note describes straightforward techniques for document indexing and retrieval that have been solidly established through extensive testing and are easy to apply. They are useful for many different types of text material, are viable for very large files, and have the advantage that the ..."
Abstract
-
Cited by 86 (3 self)
- Add to MetaCart
This technical note describes straightforward techniques for document indexing and retrieval that have been solidly established through extensive testing and are easy to apply. They are useful for many different types of text material, are viable for very large files, and have the advantage that they do not require special skills or training for searching, but are easy for end users. The document and text retrieval methods described here have a sound theoretical basis, are well established by extensive testing, and the ideas involved are now implemented in some commercial retrieval systems. Testing in the last few years has, in particular, shown that the methods presented here work very well with full texts, not only title and abstracts, and with large files of texts containing three quarters of a million documents. These tests, the TREC Tests (see Harman 1993 - 1997; IP&M 1995), have been rigorous comparative evaluations involving many different approaches to information retrieval. ...
Overview of the Sixth Text REtrieval Conference (TREC-6)
- The Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238, National Institute of Standards and Technology
, 1998
"... This paper serves as an introduction to the research described in detail in the remainder of the volume. The next section defines the common retrieval tasks performed in TREC-6. Sections 3 and 4 provide details regarding the test collections and the evaluation methodology used in TREC. Section 5 pro ..."
Abstract
-
Cited by 83 (2 self)
- Add to MetaCart
This paper serves as an introduction to the research described in detail in the remainder of the volume. The next section defines the common retrieval tasks performed in TREC-6. Sections 3 and 4 provide details regarding the test collections and the evaluation methodology used in TREC. Section 5 provides an overview of the retrieval results. The final section summarizes the main themes learned from the experiments.
Experimental components for the evaluation of interactive information retrieval systems
- Journal of Documentation
, 2000
"... 1988, no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying or otherwise without the prior written permission of the publisher. ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
1988, no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying or otherwise without the prior written permission of the publisher.

