Results 1 - 10
of
112
Time-Based Language Models
, 2003
"... We explore the relationship between time and relevance using TREC ad-hoc queries. A type of query is identified that favors very recent documents. We propose a time-based language model approach to retrieval for these queries. We show how time can be incorporated into both query-likelihood models an ..."
Abstract
-
Cited by 244 (29 self)
- Add to MetaCart
We explore the relationship between time and relevance using TREC ad-hoc queries. A type of query is identified that favors very recent documents. We propose a time-based language model approach to retrieval for these queries. We show how time can be incorporated into both query-likelihood models and relevance models. We carried out experiments to compare time-based language models to heuristic techniques for incorporating document recency in the ranking. Our results show that time-based models perform as well as or better than the best of the heuristic techniques.
Parsimonious Language Models for Information Retrieval
- In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 2004
"... We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, ..."
Abstract
-
Cited by 216 (37 self)
- Add to MetaCart
We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, they need fewer (non-zero) parameters to describe the data. We apply parsimonious models at three stages of the retrieval process:1) at indexing time; 2) at search time; 3) at feedback time. Experimental results show that we are able to build models that are significantly smaller than standard models, but that still perform at least as well as the standard approaches.
Model-based Feedback in the Language Modeling Approach to Information Retrieval
- In Proceedings of Tenth International Conference on Information and Knowledge Management
, 2001
"... The language modeling approach to retrieval has been shown to perform well empirically. One advantage of this new approach is its statistical foundations. However, feedback, as one important component in a retrieval system, has only been dealt with heuristically in this new retrieval approach: the o ..."
Abstract
-
Cited by 118 (17 self)
- Add to MetaCart
The language modeling approach to retrieval has been shown to perform well empirically. One advantage of this new approach is its statistical foundations. However, feedback, as one important component in a retrieval system, has only been dealt with heuristically in this new retrieval approach: the original query is usually literally expanded by adding ditional terms to it. Such expansion-based feedback creates an inconsistent interpretation of the original and the expanded query. In this paper, we present a more principled approach to feedback in the language modeling approach. Specifically, we treat feedback as updating the query language model based on the extra evidence carried by the feedback documents. Such a model-based feedback strategy easily fits into an extension of the language modeling approach. We propose and evaluate two different approaches to updating a query language model based on feedback documents, one based on a generarive probabilistic model of feedback documents and one based on minimization of the KL-divergence over feedback documents. Experiment resuits show that both approaches are effective and outperform the Rocchio feedback approach.
COMBINING APPROACHES TO INFORMATION RETRIEVAL
"... The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. Combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the W ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. Combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the Web. This paper examines the development of this technique, including both experimental results and the retrieval models that have been proposed as formal frameworks for combination. We show that combining approaches for information retrieval can be modeled as combining the outputs of multiple classifiers based on one or more representations, and that this simple model can provide explanations for many of the experimental results. We also show that this view of combination is very similar to the inference net model, and that a new approach to retrieval based on language models supports combination and can be integrated with the inference net model.
A language modeling framework for resource selection and results merging
- IN CIKM 2002
, 2002
"... Statistical language models have been proposed recently for several information retrieval tasks, including the resource selection task in distributed information retrieval. This paper extends the language modeling approach to integrate resource selection, ad-hoc searching, and merging of results fro ..."
Abstract
-
Cited by 60 (5 self)
- Add to MetaCart
Statistical language models have been proposed recently for several information retrieval tasks, including the resource selection task in distributed information retrieval. This paper extends the language modeling approach to integrate resource selection, ad-hoc searching, and merging of results from different text databases into a single probabilistic retrieval model. This new approach is designed primarily for Intranet environments, where it is reasonable to assume that resource providers are relatively homogeneous and can adopt the same kind of search engine. Experiments demonstrate that this new, integrated approach is at least as effective as the prior state-of-the-art in distributed IR.
Inferring query performance using pre-retrieval predictors
- In Proc. Symposium on String Processing and Information Retrieval
, 2004
"... Abstract. The prediction of query performance is an interesting and important issue in Information Retrieval (IR). Current predictors involve the use of relevance scores, which are time-consuming to compute. Therefore, current predictors are not very suitable for practical applications. In this pape ..."
Abstract
-
Cited by 46 (4 self)
- Add to MetaCart
Abstract. The prediction of query performance is an interesting and important issue in Information Retrieval (IR). Current predictors involve the use of relevance scores, which are time-consuming to compute. Therefore, current predictors are not very suitable for practical applications. In this paper, we study a set of predictors of query performance, which can be generated prior to the retrieval process. The linear and non-parametric correlations of the predictors with query performance are thoroughly assessed on the TREC disk4 and disk5 (minus CR) collections. According to the results, some of the proposed predictors have significant correlation with query performance, showing that these predictors can be useful to infer query performance in practical applications. 1
Question answering passage retrieval using dependency relations
- In SIGIR 2005
, 2005
"... State-of-the-art question answering (QA) systems employ termdensity ranking to retrieve answer passages. Such methods often retrieve incorrect passages as relationships among question terms are not considered. Previous studies attempted to address this problem by matching dependency relations betwee ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
State-of-the-art question answering (QA) systems employ termdensity ranking to retrieve answer passages. Such methods often retrieve incorrect passages as relationships among question terms are not considered. Previous studies attempted to address this problem by matching dependency relations between questions and answers. They used strict matching, which fails when semantically equivalent relationships are phrased differently. We propose fuzzy relation matching based on statistical models. We present two methods for learning relation mapping scores from past QA pairs: one based on mutual information and the other on expectation maximization. Experimental results show that our method significantly outperforms state-of-the-art density-based passage retrieval methods by up to 78 % in mean reciprocal rank. Relation matching also brings about a 50 % improvement in a system enhanced by query expansion.
discriminant model for information retrieval
- In the Proceedings of SIGIR’2005
, 2005
"... This paper presents a new discriminative model for information retrieval (IR), referred to as linear discriminant model (LDM), which provides a flexible framework to incorporate arbitrary features. LDM is different from most existing models in that it takes into account a variety of linguistic featu ..."
Abstract
-
Cited by 38 (12 self)
- Add to MetaCart
This paper presents a new discriminative model for information retrieval (IR), referred to as linear discriminant model (LDM), which provides a flexible framework to incorporate arbitrary features. LDM is different from most existing models in that it takes into account a variety of linguistic features that are derived from the component models of HMM that is widely used in language modeling approaches to IR. Therefore, LDM is a means of melding discriminative and generative models for IR. We present two algorithms of parameter learning for LDM. One is to optimize the average precision (AP) directly using an iterative procedure. The other is a perceptron-based algorithm that minimizes the number of discordant document-pairs in a rank list. The effectiveness of our approach has been evaluated on the task of ad hoc retrieval using six English and Chinese TREC test sets. Results show that (1) in most test sets, LDM significantly outperforms the state-of-the-art language modeling approaches and the classical probabilistic retrieval model; (2) it is more appropriate to train LDM using a measure of AP rather than likelihood if the IR system is graded on AP; and (3) linguistic features (e.g. phrases and dependences) are effective for IR if they are incorporated properly.

