Results 1 - 10
of
11
Reducing Long Queries Using Query Quality Predictors
"... Long queries frequently contain many extraneous terms that hinder retrieval of relevant documents. We present techniques to reduce long queries to more effective shorter ones that lack those extraneous terms. Our work is motivated by the observation that perfectly reducing long TREC description quer ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Long queries frequently contain many extraneous terms that hinder retrieval of relevant documents. We present techniques to reduce long queries to more effective shorter ones that lack those extraneous terms. Our work is motivated by the observation that perfectly reducing long TREC description queries can lead to an average improvement of 30 % in mean average precision. Our approach involves transforming the reduction problem into a problem of learning to rank all sub-sets of the original query (sub-queries) based on their predicted quality, and select the top sub-query. We use various measures of query quality described in the literature as features to represent sub-queries, and train a classifier. Replacing the original long query with the top-ranked subquery chosen by the ranking classifier results in a statistically significant average improvement of 8 % on our test sets. Analysis of the results shows that query reduction is wellsuited for moderately-performing long queries, and a small set of query quality predictors are well-suited for the task of ranking sub-queries.
Predicting Query Performance by Query-Drift Estimation
, 2009
"... Predicting query performance, that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. Our novel approach to addressing this challenge is based on estimating the potential amount of query drift in the result list, i.e., the presence (an ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Predicting query performance, that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. Our novel approach to addressing this challenge is based on estimating the potential amount of query drift in the result list, i.e., the presence (and dominance) of aspects or topics not related to the query in top-retrieved documents. We argue that query-drift can potentially be estimated by measuring the diversity (e.g., standard deviation) of the retrieval scores of these documents. Empirical evaluation demonstrates the prediction effectiveness of our approach for several retrieval models. Specifically, the prediction success is better, over most tested TREC corpora, than that of state-of-the-art prediction methods.
S.: A case for improved evaluation of query difficulty prediction
- In: Proc. SIGIR’09. (2009) 640–641
"... Query difficulty prediction aims to identify, in advance, how well an information retrieval system will perform when faced with a particular search request. The current standard evaluation methodology involves calculating a correlation coefficient, to indicate how strongly the predicted query diffic ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Query difficulty prediction aims to identify, in advance, how well an information retrieval system will perform when faced with a particular search request. The current standard evaluation methodology involves calculating a correlation coefficient, to indicate how strongly the predicted query difficulty is related with an actual system performance measure, usually Average Precision. We run a series of experiments based on predictors that have been shown to perform well in the literature, comparing these across different TREC runs. Our results demonstrate that the current evaluation methodology is severely limited. Although it can be used to demonstrate the performance of a predictor for a single system, such performance is not consistent over a variety of retrieval systems. We conclude that published results in the query difficulty area are generally not comparable, and recommend that prediction be evaluated against a spectrum of underlying search systems.
Back to the roots: A probabilistic framework for query-performance prediction
- In Proceedings of CIKM
, 2012
"... The query-performance prediction task is estimating the effectiveness of a search performed in response to a query when no relevance judgments are available. Although there exist many effective prediction methods, these differ substantially in their basic principles, and rely on diverse hypotheses a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The query-performance prediction task is estimating the effectiveness of a search performed in response to a query when no relevance judgments are available. Although there exist many effective prediction methods, these differ substantially in their basic principles, and rely on diverse hypotheses about the characteristics of effective retrieval. We present a novel fundamental probabilistic prediction framework. Using the framework, we derive and explain various previously proposed prediction methods that might seem completely different, but turn out to share the same formal basis. The derivations provide new perspectives on several predictors (e.g., Clarity). The framework is also used to devise new prediction approaches that outperform the state-of-the-art.
Predicting Neighbor Goodness in Collaborative Filtering
"... Abstract. Performance prediction has gained increasing attention in the IR field since the half of the past decade and has become an established research topic in the field. The present work restates the problem in the subarea of Collaborative Filtering (CF), where it has barely been researched so f ..."
Abstract
- Add to MetaCart
Abstract. Performance prediction has gained increasing attention in the IR field since the half of the past decade and has become an established research topic in the field. The present work restates the problem in the subarea of Collaborative Filtering (CF), where it has barely been researched so far. We investigate the adaptation of clarity-based query performance predictors to define predictors of neighbor performance in CF. The proposed predictors are introduced in a memory-based CF algorithm to produce a dynamic variant where neighbor ratings are weighted based on their predicted performance. The approach is tested with encouraging empirical results, as the dynamic variants consistently outperform the baseline algorithms, with increasing difference on small neighborhoods.
Predicting Query Performance on the Web
"... Predicting performance of queries has many useful applications like automatic query reformulation and automatic spell correction. However, accurate and effective performance prediction on the Web is a challenge. In particular, measures such as Clarity, that work well on homogeneous TREC like collec ..."
Abstract
- Add to MetaCart
Predicting performance of queries has many useful applications like automatic query reformulation and automatic spell correction. However, accurate and effective performance prediction on the Web is a challenge. In particular, measures such as Clarity, that work well on homogeneous TREC like collections are not as effective on the Web. In this paper, we develop an effective and efficient approach for online performance prediction on the Web. We propose use of retrieval scores, and aggregates of the rank-time features used by the document-ranking algorithm to train regressors for query performance prediction. For a set of more than 12,000 queries sampled from the query logs of a major search engine, our approach achieves a linear correlation of 0.78 with DCG, and 0.52 with NDCG. Analysis of the prediction effectiveness shows that (i) hard queries are easier to identify while easy queries are harder to identify, (ii) NDCG, a non-linear effectiveness measure, is much harder to predict than DCG, and (iii) long queries’ performance prediction is easier than prediction for short queries.
Predicting the Performance of Recommender Systems: An Information Theoretic Approach
"... Abstract. Performance prediction is an appealing problem in Recommender Systems, as it enables an array of strategies for deciding when to deliver or hold back recommendations based on their foreseen accuracy. The problem, however, has been barely addressed explicitly in the area. In this paper, we ..."
Abstract
- Add to MetaCart
Abstract. Performance prediction is an appealing problem in Recommender Systems, as it enables an array of strategies for deciding when to deliver or hold back recommendations based on their foreseen accuracy. The problem, however, has been barely addressed explicitly in the area. In this paper, we propose adaptations of query clarity techniques from ad-hoc Information Retrieval to define performance predictors in the context of Recommender Systems, which we refer to as user clarity. Our experiments show positive results with different user clarity models in terms of the correlation with single recommender‟s performance. Empiric results show significant dependency between this correlation and the recommendation method at hand, as well as competitive results in terms of average correlation.
Learning from the Past: Answering New Questions with Past Answers ABSTRACT
, 2012
"... Community-based Question Answering sites, such as Yahoo! Answers or Baidu Zhidao, allow users to get answers to complex, detailed and personal questions from other users. However, since answering a question depends on the ability and willingness of users to address the asker’s needs, a significant f ..."
Abstract
- Add to MetaCart
Community-based Question Answering sites, such as Yahoo! Answers or Baidu Zhidao, allow users to get answers to complex, detailed and personal questions from other users. However, since answering a question depends on the ability and willingness of users to address the asker’s needs, a significant fraction of the questions remain unanswered. We measured that in Yahoo! Answers, this fraction represents 15 % of all incoming English questions. At the same time, we discovered that around 25 % of questions in certain categories are recurrent, at least at the question-title level, over a period of one year. We attempt to reduce the rate of unanswered questions in Yahoo! Answers by reusing the large repository of past resolved questions, openly available on the site. More specifically, we estimate the probability whether certain new questions can be satisfactorily answered by a best answer from the past, using a statistical model specifically trained for this task. We leverage concepts and methods from queryperformance prediction and natural language processing in order to extract a wide range of features for our model. The key challenge here is to achieve a level of quality similar to the one provided by the best human answerers. We evaluated our algorithm on offline data extracted from Yahoo! Answers, but more interestingly, also on online data by using three “live ” answering robots that automatically provide past answers to new questions when a certain degree of confidence is reached. We report the success rate of these robots in three active Yahoo! Answers categories in terms of both accuracy, coverage and askers ’ satisfaction. This work presents a first attempt, to the best of our knowledge, of automatic question answering to questions of social nature, by reusing past answers of high quality. Categories and Subject Descriptors H.3.4 [Systems and Software]: Question-answering systems
Predicting Query Performance for Fusion-Based Retrieval Gad Markovits
"... Estimating the effectiveness of a search performed in response to a query in the absence of relevance judgments is the goal of query-performance prediction methods. Postretrieval predictors analyze the result list of the most highly ranked documents. We address the prediction challenge for retrieval ..."
Abstract
- Add to MetaCart
Estimating the effectiveness of a search performed in response to a query in the absence of relevance judgments is the goal of query-performance prediction methods. Postretrieval predictors analyze the result list of the most highly ranked documents. We address the prediction challenge for retrieval approaches wherein the final result list is produced by fusing document lists that were retrieved in response to a query. To that end, we present a novel fundamental prediction framework that accounts for this special characteristics of the fusion setting; i.e., the use of intermediate retrieved lists. The framework is based on integrating prediction performed upon the final result list with that performed upon the lists that were fused to create it; prediction integration is controlled based on inter-list similarities. We empirically demonstrate the merits of various predictors instantiated from the framework. A case in point, their prediction quality substantially transcends that of applying state-of-the-art predictors upon the final result list.

