A Language Modeling Approach to Information Retrieval Abstract Models of document indexing and docu-
by Unknown Authors
@MISC{_alanguage,
author = {},
title = {A Language Modeling Approach to Information Retrieval Abstract Models of document indexing and docu-},
year = {}
}
ment retrieval have been extensively studied. The in-tegration of these two classes of models has been the goal of several researchers but it is a very difficult prob-lem. We argue that much of the reason for this is the lack of an adequate indexing model. This suggests that perhaps a better indexing model would help solve the problem. However, we feel that making unwarranted parametric assumptions will not lead to better retrieval performance. Furthermore, making prior assumptions about the similarity of documents is not warranted ei-ther. Instead, we propose an approach to retrieval based on probabilistic language modeling. We estimate models for each document individually. Our approach to model-ing is non-parametric and integrates document indexing and document retrieval into a single model. One advan-tage of our approach is that collection statistics which are used heuristically in many other retrieval models are an integral part of our model. We have implemented our model and tested it empirically. Our approach sig-nificantly outperforms standard tf.idf weighting on two different collections and query sets. 1
Developed at and hosted by The College of Information Sciences and Technology
© 2007-2010 The Pennsylvania State University
