Results 1 -
4 of
4
Two-stage language models for information retrieval
, 2003
"... The optimal settings of retrieval parameters often depend on both the document collection and the query, and are usually found through empirical tuning. In this paper, we propose a family of two-stage language models for information retrieval that explicitly captures the different influences of the ..."
Abstract
-
Cited by 173 (19 self)
- Add to MetaCart
The optimal settings of retrieval parameters often depend on both the document collection and the query, and are usually found through empirical tuning. In this paper, we propose a family of two-stage language models for information retrieval that explicitly captures the different influences of the query and document collection on the optimal settings of retrieval parameters. As a special case, we present a two-stage smoothing method that allows us to estimate the smoothing parameters completely automatically. In the first stage, the document language model is smoothed using a Dirichlet prior with the collection language model as the reference model. In the second stage, the smoothed document language model is further interpolated with a query background language model. We propose a leave-one-out method for estimating the Dirichlet parameter of the first stage, and the use of document mixture models for estimating the interpolation parameter of the second stage. Evaluation on five different databases and four types of queries indicates that the twostage smoothing method with the proposed parameter estimation methods consistently gives retrieval performance that is close to— or better than—the best results achieved using a single smoothing method and exhaustive parameter search on the test data.
A Brief Review of Information Retrieval Models
, 2007
"... Information retrieval models have been studied for decades, leading to a huge body of literature on the topic. In this paper, we briefly review this body of literature along with a discussion of some recent trends. 1 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Information retrieval models have been studied for decades, leading to a huge body of literature on the topic. In this paper, we briefly review this body of literature along with a discussion of some recent trends. 1
An Exploration of Formalized Retrieval Heuristics
"... Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. Any effective retrieval formula, no matter how it is originally motivated, also often boils down to an explicit or implicit ..."
Abstract
- Add to MetaCart
Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. Any effective retrieval formula, no matter how it is originally motivated, also often boils down to an explicit or implicit implementation of these heuristics. One basic research question is thus what are exactly these "necessary" heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of these retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is only satisfied for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well they satisfy these constraints. Thus the proposed constraints can provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.

