## Inferring query performance using pre-retrieval predictors (2004)

Venue: | In Proc. Symposium on String Processing and Information Retrieval |

Citations: | 66 - 5 self |

### BibTeX

@INPROCEEDINGS{He04inferringquery,

author = {Ben He and Iadh Ounis},

title = {Inferring query performance using pre-retrieval predictors},

booktitle = {In Proc. Symposium on String Processing and Information Retrieval},

year = {2004},

pages = {43--54},

publisher = {Springer Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. The prediction of query performance is an interesting and important issue in Information Retrieval (IR). Current predictors involve the use of relevance scores, which are time-consuming to compute. Therefore, current predictors are not very suitable for practical applications. In this paper, we study a set of predictors of query performance, which can be generated prior to the retrieval process. The linear and non-parametric correlations of the predictors with query performance are thoroughly assessed on the TREC disk4 and disk5 (minus CR) collections. According to the results, some of the proposed predictors have significant correlation with query performance, showing that these predictors can be useful to infer query performance in practical applications. 1

### Citations

882 | A language modeling approach to information retrieval
- Ponte, Croft
- 1998
(Show Context)
Citation Context ...features should be computed prior to the retrieval process. The proposed list of predictors is inspired by previous works related to probabilistic IR models, including the language modelling approach =-=[11]-=- and Amati & van Rijsbergen’s Divergence From Randomness (DFR) models [3]: – Query length. According to Zhai & Lafferty’s work [15], in the language modelling approach, the query length has a strong e... |

701 | A study of smoothing methods for language models applied to information retrieval
- Zhai, Lafferty
(Show Context)
Citation Context ...o probabilistic IR models, including the language modelling approach [11] and Amati & van Rijsbergen’s Divergence From Randomness (DFR) models [3]: – Query length. According to Zhai & Lafferty’s work =-=[15]-=-, in the language modelling approach, the query length has a strong effect on the smoothing methods. In our previous work, we also found that the query length heavily affects the length normalisation ... |

533 |
Probability and statistics
- DEGROOT, SCHERVISH
- 2004
(Show Context)
Citation Context ...in [10], the size of this document set is an important property of the query. Following [10], in this work, we define the query scope as follows: Definition 5 (ω): The query scope is: ω = − log(nQ/N) =-=(5)-=- where nQ is the number of documents containing at least one of the query terms, and N is the number of documents in the whole collection. In the following sections, we will study the correlations of ... |

264 | A probabilistic model of information retrieval: development and status: Part 1 and Part 2." Information Processing and Management
- Sparck-Jones, Walker, et al.
- 2000
(Show Context)
Citation Context ...iven by k1((1 − b) + b avg l ), where l and avg l are the document length and the average document length in the collection, respectively. For the parameters k1 and k3, we use the standard setting of =-=[14]-=-, i.e. k1 = 1.2 and k3 = 1000. qtf is the number of occurrences of a given term in the query and tf is the within document frequency of the given term. b is the free parameter of BM25’s term frequency... |

194 | general language model for information retrieval
- Song, Croft
- 1999
(Show Context)
Citation Context ...ion 2 (see Equation (7)) is also automatically set to 1.64 in our experiments for TREC4. Regarding the generation of AP, Cronen-Townsend et. al. apply Song & Croft’s multinomial language model for CS =-=[13]-=-, and we apply PL2 for SCS. Since rs(SCS, AP ) is stable for statistically diverse term-weighting models, i.e. PL2 and BM25 (see Table 4), we believe that the use of the two different termweighting mo... |

193 | Predicting query performance
- Cronen-Townsend, Zhou, et al.
- 2002
(Show Context)
Citation Context ...s is an important measure reflecting the retrieval performance of an IR system. It particularly refers to how an IR system deals with poorly-performing queries. As stressed by Cronen-Townsend et. al. =-=[4]-=-, poorly-performing queries considerably hurt the effectiveness of an IR system. Indeed, this issue has become important in IR research. For example, in 2003, TREC proposed a new track, namely the Rob... |

150 | Probabilistic models of information retrieval based on measuring the divergence from randomness
- Amati, Rijsbergen
(Show Context)
Citation Context ...ist of predictors is inspired by previous works related to probabilistic IR models, including the language modelling approach [11] and Amati & van Rijsbergen’s Divergence From Randomness (DFR) models =-=[3]-=-: – Query length. According to Zhai & Lafferty’s work [15], in the language modelling approach, the query length has a strong effect on the smoothing methods. In our previous work, we also found that ... |

137 | Okapi at TREC-4
- Robertson, Walker, et al.
- 1996
(Show Context)
Citation Context ...des the applied c value for the three types of queries. As one of the most well-established IR systems, Okapi uses BM25 to measure the term weight, where the idf factor w (1) is normalised as follows =-=[12]-=-: w(t, d) = w (1) (k1 + 1)tf K + tf (k3 + 1)qtf k3 + qtf l where w is the final weight. K is given by k1((1 − b) + b avg l ), where l and avg l are the document length and the average document length ... |

105 |
Nonparametric Statistical Inference
- Gibbons
- 1971
(Show Context)
Citation Context ...is section, instead of the linear correlation, we check the non-parametric correlations of the predictors with AP. An appropriate measure for the nonparametric test is the Spearman’s rank correlation =-=[6]-=-. In this paper, we denote the Spearman’s correlation between variables X and Y as rs(X, Y ). The test data and experimental setting for checking the Spearman’s correlation are the same as the previou... |

54 | Query difficulty, robustness, and selective application of query expansion
- Amati, Carpineto, et al.
- 2004
(Show Context)
Citation Context ...f poorly-performing queries. Moreover, the use of reliable query performance predictors is a step towards determining for each query the most optimal corresponding retrieval strategy. For example, in =-=[2]-=-, the use of query performance predictors allowed to devise a selective decision methodology avoiding the failure of query expansion. In order to predict the performance of a query, the first step is ... |

50 | Recent experiments with INQUERY
- Allan, Ballesteros, et al.
- 1996
(Show Context)
Citation Context ...tion of informative amount in its composing terms, called γ1, is represented as: γ1 = σidf where σidf is the standard deviation of the idf of the terms in Q. For idf, we use the INQUERY’s idf formula =-=[1]-=-: idf(t) = log 2(N + 0.5)/Nt log 2(N + 1) where Nt is the number of documents in which the query term t appears and N is the number of documents in the whole collection. Another possible definition re... |

17 |
A study of parameter tuning for term frequency normalization
- HE, Ounis
- 2003
(Show Context)
Citation Context ..., the query length has a strong effect on the smoothing methods. In our previous work, we also found that the query length heavily affects the length normalisation methods of the probabilistic models =-=[7]-=-. For example, the optimal setting for the so-called normalisation 2 in Amati & van Rijsbergen’s probabilistic framework is query-dependent [3]. The empirically obtained setting of its parameter c is ... |

14 | University of glasgow at the web track: Dynamic application of hyperlink analysis using the query scope
- Plachouras, Cacheda, et al.
- 2003
(Show Context)
Citation Context ...Query scope. Similar to the clarity score, an alternative indication of the generality/speciality of a query is the size of the document set containing at least one of the query terms. As stressed in =-=[10]-=-, the size of this document set is an important property of the query. Following [10], in this work, we define the query scope as follows: Definition 5 (ω): The query scope is: ω = − log(nQ/N) (5) whe... |

8 | Employing the resolution power of search keys
- Pirkola, Jarvelin
- 2001
(Show Context)
Citation Context .... As stressed bysPirkola and Jarvelin, the difference between the resolution power of the query terms, which is given as the idf(t) values, could affect the effectiveness of the retrieval performance =-=[9]-=-. Therefore, the distribution of the idf(t) factors in the composing query terms might be an intrinsic feature that affects the retrieval performance. In this paper, we investigate the following two p... |

4 | A query-based pre-retrieval model selection approach to information retrieval
- He, Ounis
- 2004
(Show Context)
Citation Context ...ry length. Measuring the correlation, we obtained r = 0.0585 and a p-value of 0.3124, which again indicates a very low correlation. Therefore, query length seems to be very weakly correlated with AP. =-=(8)-=-sTable 3. The correlations r of the predictors with AP, and the related p-values. The results are given separately with respect to the three types of queries. Significant correlations are shown in bol... |