@INPROCEEDINGS{Allan95relevancefeedback, author = {James Allan}, title = {Relevance Feedback With Too Much Data}, booktitle = {}, year = {1995}, pages = {337--343} }
Years of Citing Articles
Bookmark
OpenURL
Abstract
Modern text collections often contain large documents which span several subject areas. Such documents are problematic for relevance feedback since inappropriate terms can easily be chosen. This study explores the highly effective approach of feeding back passages of large documents. A less-expensive method which discards long documents is also reviewed and found to be effective if there are enough relevant documents. A hybrid approach which feeds back short documents and passages of long documents may be the best compromise. 1 1 Introduction As the amount of on-line text has increased, so has the size of individual documents in those collections. Information retrieval methods that could easily be applied to the full text of abstracts or short documents are sometimes less effective or prohibitively expensive for large documents. This problem has led to a resurgence of interest in techniques for handling large texts, including passage retrieval, theme identification, document su...