#### DMCA

## oro.open.ac.uk Bias-Variance Analysis in Estimating True Query Model for Information Retrieval

### Citations

1152 | A language modeling approach to information retrieval
- Ponte, Croft
- 1998
(Show Context)
Citation Context .... Indeed, there are other kinds of probabilistic retrieval models (Fuhr, 2001; van Rijsbergen, 1997; Zhai, 2007). Our focus in this paper is on the language modeling (LM) approach. The LM approaches (=-=Ponte and Croft, 1998-=-; Zhai and Lafferty, 2001) are derived by estimating how probable it is for a document to generate a query (Sparck Jones et al., 2003). There is no explicit relevance in the formulation in early LM ap... |

960 | A study of smoothing methods for language models applied to ad hoc information retrieval
- Zhai, Lafferty
- 2001
(Show Context)
Citation Context ...er kinds of probabilistic retrieval models (Fuhr, 2001; van Rijsbergen, 1997; Zhai, 2007). Our focus in this paper is on the language modeling (LM) approach. The LM approaches (Ponte and Croft, 1998; =-=Zhai and Lafferty, 2001-=-) are derived by estimating how probable it is for a document to generate a query (Sparck Jones et al., 2003). There is no explicit relevance in the formulation in early LM approaches, where the query... |

727 |
R.: Neural networks and the bias/variance dilemma
- Geman, Bienenstock, et al.
- 1992
(Show Context)
Citation Context ...ve, i.e., bias-variance tradeoff. The bias-variance tradeoff is fundamental in the estimation theory and has been extensively studied in density estimation (Zucchini et al., 2005), linear regression (=-=Geman et al., 1992-=-), classification (Valentini et al., 2004), and other areas (Bishop, 2006). In general, the bias represents the gap between the expectation (i.e., mean) of estimated values and the true target value, ... |

440 | Relevance-based language models.
- Lavrenko, Croft
- 2001
(Show Context)
Citation Context ...elevance in the formulation in early LM approaches, where the query representation is the original query language model estimated by the maximum-likelihood method. Later on, the relevance model (RM) (=-=Lavrenko and Croft, 2001-=-) was developed by assuming that the query and its relevant documents are random samples from an underlying relevance model R. In practice, RM estimates an expanded query language model, which is gene... |

431 |
Pattern Recognition and Machine Learning (Information Science and Statistics
- Bishop
- 2006
(Show Context)
Citation Context ...the estimation theory and has been extensively studied in density estimation (Zucchini et al., 2005), linear regression (Geman et al., 1992), classification (Valentini et al., 2004), and other areas (=-=Bishop, 2006-=-). In general, the bias represents the gap between the expectation (i.e., mean) of estimated values and the true target value, while the variance represents the variability over all estimated values. ... |

381 | Document language models, query models, and risk minimization for information retrieval
- Lafferty, Zhai
- 2001
(Show Context)
Citation Context ...nsion, the document ranking is based on the second-round retrieval using the expanded query model. 15For any estimated query model, the document retrieval can be based on the negative KL-Divergence (=-=Lafferty and Zhai, 2001-=-) between the estimated query language model θqi and document language model θd: −D( θqi |θd) = −H( θqi , θd) + H( θqi ) (24) where H( θqi , θd) is the cross entropy between θqi and θd, an... |

237 |
On relevance, probabilistic indexing, and information retrieval
- Maron, Kuhns
- 1960
(Show Context)
Citation Context ...r by summarizing the main contributions and highlighting the potential impact and future research directions. 42. Literature Review Over decades, various probabilistic IR models have been developed (=-=Maron and Kuhns, 1960-=-; Lafferty and Zhai, 2003; Zhai, 2007; Robertson and Zaragoza, 2009) to estimate document relevance with respect to an information need (often represented as a query). One way is from the document-gen... |

146 | Experiments using the Lemur toolkit - Ogilvie, Callan - 2001 |

88 | Probabilistic Relevance Models Based on Document and Query Generation, chapter 1
- Lafferty, Zhai
- 2002
(Show Context)
Citation Context ...in contributions and highlighting the potential impact and future research directions. 42. Literature Review Over decades, various probabilistic IR models have been developed (Maron and Kuhns, 1960; =-=Lafferty and Zhai, 2003-=-; Zhai, 2007; Robertson and Zaragoza, 2009) to estimate document relevance with respect to an information need (often represented as a query). One way is from the document-generation point of view, le... |

85 | Tree induction vs. logistic regression: A learning-curve analysis
- Perlich, Provost, et al.
- 2003
(Show Context)
Citation Context ... stable than those of the complex one. To reduce the bias and variance simultaneously, one often needs more data (e.g., larger sample size or more training data) (Brain and Webb., 1999; Bishop, 2006; =-=Perlich et al., 2003-=-), or well designed methods (e.g., combination method, also called as ensemble method) (Valentini et al., 2004; Ghahramani et al., 2003). In the context of query language modeling, we will analyze the... |

81 | Query difficulty, robustness and selective application of query expansion. - Amati, Carpineto, et al. - 2004 |

80 | Portfolio theory of information retrieval
- Wang, Zhu
- 2009
(Show Context)
Citation Context ... investigate the retrieval effectiveness and stability across topics/queries. The proposed bias-variance analysis is different from the existing meanvariance analysis in document ranking (Wang, 2009; =-=Wang and Zhu, 2009-=-; Zhu et al., 2009). In mean-variance analysis, the variance is associated to the relevance score, while the bias and variance in our paper are associated to the retrieval performance and estimation q... |

56 |
Regularized estimation of mixture models for robust pseudorelevance feedback. In:
- Tao, Zhai
- 2006
(Show Context)
Citation Context ...S-divergence. 4.2.4. Combination between Original and Expanded Query Models The combination between original and expanded query models was widely studied in the literature (Abdul-Jaleel et al., 2004; =-=Tao and Zhai, 2006-=-; Li, 2008; Lv and Zhai, 2009). Basically, the combination can be formulated as θ (c) qi = λ θ (o) qi + (1 − λ) θ (f) qi (26) 18is the combined query model, λ is the combination coefficient of the... |

48 |
The probabilistic relevance framework
- Robertson, Zaragoza
- 2009
(Show Context)
Citation Context ... potential impact and future research directions. 42. Literature Review Over decades, various probabilistic IR models have been developed (Maron and Kuhns, 1960; Lafferty and Zhai, 2003; Zhai, 2007; =-=Robertson and Zaragoza, 2009-=-) to estimate document relevance with respect to an information need (often represented as a query). One way is from the document-generation point of view, leading to the classical probabilistic model... |

43 | UMASS at TREC 2004—novelty and hard.
- Abdul-Jaleel, Allan, et al.
- 2004
(Show Context)
Citation Context ...ation bias-variance using JS-divergence. 4.2.4. Combination between Original and Expanded Query Models The combination between original and expanded query models was widely studied in the literature (=-=Abdul-Jaleel et al., 2004-=-; Tao and Zhai, 2006; Li, 2008; Lv and Zhai, 2009). Basically, the combination can be formulated as θ (c) qi = λ θ (o) qi + (1 − λ) θ (f) qi (26) 18is the combined query model, λ is the combinatio... |

37 | Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods,”
- Valentini, Dietterich
- 2004
(Show Context)
Citation Context ... bias-variance tradeoff is fundamental in the estimation theory and has been extensively studied in density estimation (Zucchini et al., 2005), linear regression (Geman et al., 1992), classification (=-=Valentini et al., 2004-=-), and other areas (Bishop, 2006). In general, the bias represents the gap between the expectation (i.e., mean) of estimated values and the true target value, while the variance represents the variabi... |

32 | Reducing the risk of query expansion via robust constrained optimization.
- Collins-Thompson
- 2009
(Show Context)
Citation Context ...information. Collins-Thompson and Callan (2007) investigated the uncertainty of feedback-based query models and proposed to resample different feedback document models using Boot5strap sampling. In (=-=Collins-Thompson, 2009b-=-; Dillon and Collins-Thompson, 2010), the risk and reward tradeoff and optimization for query expansion were discussed. Lv et al. (2011) proposed a FeedbackBoost method to improve the robustness of th... |

32 | Estimation and use of uncertainty in pseudo-relevance feedback. - Collins-Thompson, Callan - 2007 |

31 |
Blind men and elephants: Six approaches to TREC data
- Banks, Over, et al.
- 1999
(Show Context)
Citation Context ... the retrieval performance than to the estimation quality with respect to the true query model. The variance of retrieval performance across different queries has been investigated in the literature (=-=Banks et al., 1999-=-). The variation of the query difficulty/hardness across different topics was studied in the query expansion task (Amati et al., 2004). More recently, Robertson and Kanoulas (2012) have investigated t... |

22 | Adaptive relevance feedback in information retrieval.
- Lv, Zhai
- 2009
(Show Context)
Citation Context ...on between Original and Expanded Query Models The combination between original and expanded query models was widely studied in the literature (Abdul-Jaleel et al., 2004; Tao and Zhai, 2006; Li, 2008; =-=Lv and Zhai, 2009-=-). Basically, the combination can be formulated as θ (c) qi = λ θ (o) qi + (1 − λ) θ (f) qi (26) 18is the combined query model, λ is the combination coefficient of the original query θ (o) qi , ... |

20 | Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval. - Hofmann, Whiteson, et al. - 2013 |

16 | Mean-variance analysis: A new document ranking theory in information retrieval
- Wang
- 2009
(Show Context)
Citation Context ... tradeoff to investigate the retrieval effectiveness and stability across topics/queries. The proposed bias-variance analysis is different from the existing meanvariance analysis in document ranking (=-=Wang, 2009-=-; Wang and Zhu, 2009; Zhu et al., 2009). In mean-variance analysis, the variance is associated to the relevance score, while the bias and variance in our paper are associated to the retrieval performa... |

15 |
Language modelling and relevance.
- Sparck-Jones, Robertson, et al.
- 2003
(Show Context)
Citation Context ...r is on the language modeling (LM) approach. The LM approaches (Ponte and Croft, 1998; Zhai and Lafferty, 2001) are derived by estimating how probable it is for a document to generate a query (Sparck =-=Jones et al., 2003-=-). There is no explicit relevance in the formulation in early LM approaches, where the query representation is the original query language model estimated by the maximum-likelihood method. Later on, t... |

13 | Risky Business: Modeling and Exploiting Uncertainty in Information Retrieval.
- Zhu, Wang, et al.
- 2009
(Show Context)
Citation Context ...rieval effectiveness and stability across topics/queries. The proposed bias-variance analysis is different from the existing meanvariance analysis in document ranking (Wang, 2009; Wang and Zhu, 2009; =-=Zhu et al., 2009-=-). In mean-variance analysis, the variance is associated to the relevance score, while the bias and variance in our paper are associated to the retrieval performance and estimation quality. Moreover, ... |

13 | Query-drift prevention for robust query expansion, SIGIR
- Zighelnic, Kurland
- 2008
(Show Context)
Citation Context ... the weights of original query terms while reducing the influence of non-relevant terms in the expanded query model. This can actually prevent the query drifting from the underlying information need (=-=Zighelnic and Kurland, 2008-=-). If the downside performance can be prevented, this could reduce the variance of the expanded query model. On the other hand, the bias can also be reduced if the retrieval performance on average can... |

10 |
Language models and uncertain inference in information retrieval
- Fuhr
- 2001
(Show Context)
Citation Context ...ach (Lafferty and Zhai, 2003). Lafferty and Zhai (2003) considered the above two directions into a unified generative relevance model. Indeed, there are other kinds of probabilistic retrieval models (=-=Fuhr, 2001-=-; van Rijsbergen, 1997; Zhai, 2007). Our focus in this paper is on the language modeling (LM) approach. The LM approaches (Ponte and Croft, 1998; Zhai and Lafferty, 2001) are derived by estimating how... |

9 | On The Effect of Data Set Size on Bias and Variance in Classification Learning. In - Brain, Webb - 1999 |

9 | Bayesian classifier combination.
- Ghahramani, Kim
- 2003
(Show Context)
Citation Context ...e size or more training data) (Brain and Webb., 1999; Bishop, 2006; Perlich et al., 2003), or well designed methods (e.g., combination method, also called as ensemble method) (Valentini et al., 2004; =-=Ghahramani et al., 2003-=-). In the context of query language modeling, we will analyze the above factors that can affect the bias and variance in Section 4.2. 3.2. Bias and Variance Regarding Retrieval Performance We now defi... |

8 | Accounting for stability of retrieval algorithms using risk-reward curves
- Collins-Thompson
- 2009
(Show Context)
Citation Context ...evant documents given a query. Despite its effectiveness in general, the expanded query model is often less stable in the sense that its performance is not stable across different individual queries (=-=Collins-Thompson, 2009a-=-). The expanded query model may perform less effectively than the original query model for some queries (Amati et al., 2004). Recently, many methods have been proposed to improve the robustness of que... |

7 | A unified optimization framework for robust pseudo-relevance feedback algorithms
- Dillon, Collins-Thompson
- 2010
(Show Context)
Citation Context ...pson and Callan (2007) investigated the uncertainty of feedback-based query models and proposed to resample different feedback document models using Boot5strap sampling. In (Collins-Thompson, 2009b; =-=Dillon and Collins-Thompson, 2010-=-), the risk and reward tradeoff and optimization for query expansion were discussed. Lv et al. (2011) proposed a FeedbackBoost method to improve the robustness of the expanded query model. In our opin... |

7 | On per-topic variance in IR evaluation. - Robertson, Kanoulas - 2012 |

5 |
A new robust relevance model in the language model framework
- Li
- 2008
(Show Context)
Citation Context ... Combination between Original and Expanded Query Models The combination between original and expanded query models was widely studied in the literature (Abdul-Jaleel et al., 2004; Tao and Zhai, 2006; =-=Li, 2008-=-; Lv and Zhai, 2009). Basically, the combination can be formulated as θ (c) qi = λ θ (o) qi + (1 − λ) θ (f) qi (26) 18is the combined query model, λ is the combination coefficient of the original ... |

5 | A brief review of information retrieval models
- Zhai
- 2007
(Show Context)
Citation Context ...lighting the potential impact and future research directions. 42. Literature Review Over decades, various probabilistic IR models have been developed (Maron and Kuhns, 1960; Lafferty and Zhai, 2003; =-=Zhai, 2007-=-; Robertson and Zaragoza, 2009) to estimate document relevance with respect to an information need (often represented as a query). One way is from the document-generation point of view, leading to the... |

5 | Approximating true relevance distribution from a mixture model based on irrelevance data - Zhang, Hou, et al. - 2009 |

4 | C (2010) Exploration-exploitation tradeoff in interactive relevance feed-back
- Karimzadehgan, Zhai
(Show Context)
Citation Context ...y looking at the tradeoff between the bias and variance. Our work is also related to but different from the recent research on the exploration-exploitation tradeoff in interactive relevance feedback (=-=Karimzadehgan and Zhai, 2010-=-, 2012) and in online learning to rank (Hofmann et al., 2012). We formulate the bias and variance (see the next section) and analyze the tradeoff between them in query language modeling (see Section 4... |

4 | A boosting approach to improving pseudo-relevance feedback.
- Lv, Zhai, et al.
- 2011
(Show Context)
Citation Context ...he proposed bias-variance analysis and evaluation methodology, we can study other query language model estimation methods (e.g., models in (Collins-Thompson, 2009b; Dillon and Collins-Thompson, 2010; =-=Lv et al., 2011-=-)). The proposed bias-variance analysis could also be applied to study the bias-variance of other IR models in terms of their retrieval effectiveness and stability. For instance, we may be able to stu... |

3 |
A learning approach to optimizing exploration-exploitation tradeoff in relevance feedback
- Karimzadehgan, Zhai
(Show Context)
Citation Context ...online learning to rank (Hofmann et al., 2012). We formulate the bias and variance (see the next section) and analyze the tradeoff between them in query language modeling (see Section 4.2), while in (=-=Karimzadehgan and Zhai, 2012-=-), the bias and variance are not defined or formulated. Nevertheless, the exploration-exploitation tradeoff occurs in our experiments (see Section 5.4.4), in the sense that the estimated model may ove... |

3 | Robust models in information retrieval
- Lipka, Stein
- 2011
(Show Context)
Citation Context ...nsively studied in parameter estimation (Lebanon, 2010; Duda et al., 2001), density estimation (Zucchini et al., 2005), linear regression (Geman et al., 1992), classification (Valentini et al., 2004; =-=Lipka and Stein, 2011-=-), and other areas (Bishop, 2006). We first briefly explain the classical bias-variance decomposition for the squared loss of the estimation. Let us consider an estimator y for the unknown true targe... |

3 | Bias-variance decomposition of ir evaluation - Zhang, Song, et al. - 2013 |

3 |
On modeling rank-independent risk in estimating probability of relevance, AIRS
- Zhang, Song, et al.
- 2011
(Show Context)
Citation Context ...t it is worthwhile to investigate the bias-variance of the expanded query model with smoothed document weights. To facilitate the investigation, we adopt a simple document weight s19moothing method (=-=Zhang et al., 2011-=-), which can be formulated as: ˜Sqi (d) = 1 [Sqi(d)] s ∑ d ′ ∈D [Sqi(d′)] 1 s (27) where ˜ Sqi (d) is the smoothed document weight, Sqi (d) is the original document weight, and s(s > 0) is a parameter... |

3 | Y (2010) A study of document weight smoothness in pseudo relevance feedback
- Zhang, Song, et al.
(Show Context)
Citation Context ...be non-relevant. It has been shown that properly smoothing the document weights (with moderate smoothing parameters) can improve the effectiveness (measured by MAP) of feedback-based query expansion (=-=Zhang et al., 2010-=-, 2011). On the other hand, for some individual queries, smoothing may affect the discriminativity between the relevant documents and non-relevant document in the PRF document set. For instance, if to... |

2 | variance, and mse of estimators - Lebanon |

2 | Readings in information retrieval. Ch. A nonclassical logic for information retrieval - Rijsbergen - 1997 |