## A Risk Minimization Framework for Information Retrieval (2003)

### Cached

### Download Links

- [www.dcs.vein.hu]
- [sifaka.cs.uiuc.edu]
- [www.cs.cmu.edu]
- [www-2.cs.cmu.edu]
- [cir.dcs.vein.hu]
- [sifaka.cs.uiuc.edu]
- [sifaka.cs.uiuc.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IN PROCEEDINGS OF THE ACM SIGIR 2003 WORKSHOP ON MATHEMATICAL/FORMAL METHODS IN IR. ACM |

Citations: | 54 - 1 self |

### BibTeX

@INPROCEEDINGS{Zhai03arisk,

author = {ChengXiang Zhai and John Lafferty},

title = {A Risk Minimization Framework for Information Retrieval},

booktitle = {IN PROCEEDINGS OF THE ACM SIGIR 2003 WORKSHOP ON MATHEMATICAL/FORMAL METHODS IN IR. ACM},

year = {2003},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper presents a novel probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models (i.e., probabilistic models of text), user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. We discuss how this framework can unify existing retrieval models and accommodate the systematic development of new retrieval models. As an example of using the framework to model non-traditional retrieval problems, we derive new retrieval models for subtopic retrieval, which is concerned with retrieving documents to cover many different subtopics of a general query topic. These new models differ from traditional retrieval models in that they go beyond independent topical relevance.

### Citations

3502 |
Introduction to Modern Information Retrieval
- Salton, McGill
- 1983
(Show Context)
Citation Context ...s such correlation to generate a relevance status value (RSV) for each document and rank documents accordingly. The vector space model is the most well known model of this type (Salton et al., 1975a; =-=Salton and McGill, 1983-=-; Salton, 1989), in which a document and a query are represented as two term vectors in a high-dimensional term space and each term is assigned a weight that reflects its “importance” to the document ... |

3076 | Indexing by latent semantic analysis
- Deerwester, Dumais, et al.
- 1990
(Show Context)
Citation Context ... semantic indexing can be applied to reduce the dimension of the term space and to capture the semantic “closeness” among terms, in an effort to improve the representation of the documents and query (=-=Deerwester et al., 1990-=-). A document can also be represented by a multinomial distribution over the terms, as in the distribution model of indexing proposed in (Wong and Yao, 1989). The main criticism of the vector space mo... |

2713 | Latent Dirichlet allocation - Blei, Ng, et al. - 2003 |

1755 | Term-weighting approaches in automatic text retrieval
- Salton, Buckley
- 1988
(Show Context)
Citation Context ...997; Zhai, 1997). Many heuristics have also been proposed to improve term weighting, but again, no weighting method has been found to be significantly better than the heuristic TF-IDF term weighting (=-=Salton and Buckley, 1988-=-). To address the variance in the length of documents, an effective weighting formula also needs to incorporate document length heuristically (Singhal et al., 1996). Salton et al. introduced the idea ... |

1352 |
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
- Salton
- 1989
(Show Context)
Citation Context ...erate a relevance status value (RSV) for each document and rank documents accordingly. The vector space model is the most well known model of this type (Salton et al., 1975a; Salton and McGill, 1983; =-=Salton, 1989-=-), in which a document and a query are represented as two term vectors in a high-dimensional term space and each term is assigned a weight that reflects its “importance” to the document or the query. ... |

995 |
A vector space model for automatic indexing
- Salton, Wong, et al.
- 1975
(Show Context)
Citation Context ... measure that preserves such correlation to generate a relevance status value (RSV) for each document and rank documents accordingly. The vector space model is the most well known model of this type (=-=Salton et al., 1975-=-a; Salton and McGill, 1983; Salton, 1989), in which a document and a query are represented as two term vectors in a high-dimensional term space and each term is assigned a weight that reflects its “im... |

962 | A Language Modeling Approach to Information Retrieval
- Ponte, Croft
- 1998
(Show Context)
Citation Context ...language model estimation. Smoothing of a document language model with some kind of collection language model has been very popular in the existing work. For example, geometric smoothing was used in (=-=Ponte and Croft, 1998-=-); linear interpolation smoothing was used in (Hiemstra and Kraaij, 1998; Berger and Lafferty, 1999), and was viewed as a 2-state hidden Markov model in (Miller et al., 1999). Berger and Lafferty expl... |

895 | Probabilistic latent semantic indexing - Hofmann - 1999 |

670 | Relevance Weighting of Search Terms - Robertson, Sparck-Jones, et al. - 1976 |

591 | The use of MMR, diversity-based reranking for reordering documents and producing summaries
- Carbonell, Goldstein
- 1998
(Show Context)
Citation Context ... this task. The first type of loss function is the Maximal Marginal Relevance (MMR) loss function, in which we encode a preference for retrieving documents that are both topically relevant and novel (=-=Carbonell and Goldstein, 1998-=-). In essence, the goal is to retrieve relevant documents and, at the same time, minimize the chance that the user will see redundant documents as he or she goes through the ranked list of documents. ... |

403 | Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval
- SE, Walker
- 1994
(Show Context)
Citation Context ...f documents (Robertson et al., 1981). While this model has not shown superior empirical performance itself, an approximation of the model based on a simple TF formula turns out to be quite effective (=-=Robertson and Walker, 1994-=-). A different way of introducing term frequency into the model is implicit in text categorization approaches which view a document as being generated from a unigram language model (Kalt, 1996; McCall... |

398 | Pivoted document length normalization
- Singhal, Buckley, et al.
- 1996
(Show Context)
Citation Context ...heuristic TF-IDF term weighting (Salton and Buckley, 1988). To address the variance in the length of documents, an effective weighting formula also needs to incorporate document length heuristically (=-=Singhal et al., 1996-=-). Salton et al. introduced the idea of the discrimination value of an indexing term (Salton et al., 1975b), which is the increase or decrease in the mean inter-document distance caused by adding the ... |

396 | Naive (bayes) at forty: The independence assumption in information retrieval
- Lewis
- 1998
(Show Context)
Citation Context ...st well known classical probabilistic model. It assumes that terms are independently distributed in each of the two relevance models, so is essentially a naı̈ve Bayes classifier for document ranking (=-=Lewis, 1998-=-). 1 There have been several efforts to improve the binary representation. Van Rijsbergen extended the binary independence model by capturing some term dependency as defined by a minimum-spanning tree... |

359 | The inquery retrieval system - Callan, Croft, et al. - 1992 |

326 | C (2001) Document language models, query models, and risk minimization for information retrieval - Lafferty, Zhai |

303 | A probabilistic model of information retrieval: development and comparative experiments, part 2
- Jones, Walker, et al.
- 2000
(Show Context)
Citation Context ...of parameters. 2.2 Probabilistic Relevance Models In a probabilistic relevance model, one is interested in the question “What is the probability that this document is relevant to this query?” (Sparck =-=Jones et al., 2000-=-). Given a query, a document is assumed to be either relevant or non-relevant, but the system relies on a probabilistic model to infer this value. Formally, let random variables D and Q denote a docum... |

287 | Information Retrieval as Statistical Translation
- Berger, Lafferty
- 1999
(Show Context)
Citation Context ...stic models based on statistical language modeling. The language modeling approach was first introduced by Ponte and Croft (1998) and also explored in (Hiemstra and Kraaij, 1998; Miller et al., 1999; =-=Berger and Lafferty, 1999-=-; Song and Croft, 1999). The estimation of a language model based on a document (i.e., the estimation of p(: jD; r)) is the key component in the language modeling approach. Indeed, most work in this d... |

257 | The Probability Ranking Principle in IR - Robertson - 1997 |

243 | WB: Evaluation of an inference networkbased retrieval model. ACM Trans Inf Syst 1991, 9:187. from 4th German Conference on Chemoinformatics
- RT, Croft
(Show Context)
Citation Context ...lar form of the language modeling approach can also be derived using this general probabilistic concept space model (Fuhr, 2001). The inference network model is also based on probabilistic inference (=-=Turtle and Croft, 1991-=-). It is essentially a Bayesian belief network that models the dependency between the satisfaction of a query and the observation of documents. The estimation of relevance is based on the computation ... |

224 | Two-Stage Language Models for Information Retrieval - Zhai, Lafferty - 2002 |

208 |
On relevance, probabilistic indexing, and information retrieval
- Maron, Kuhn
- 1960
(Show Context)
Citation Context ...which view a document as being generated from a unigram language model (Kalt, 1996; McCallum and Nigam, 1998). Models based on query generation (p(D; Q jR) = p(Q jD; R)p(D jR)) have been explored in (=-=Maron and Kuhns, 1960-=-), (Robertson et al., 1982), (Fuhr, 1992) and (Lafferty and Zhai, 2003). Indeed, the Probabilistic Indexing model proposed in (Maron and Kuhns, 1960) is the very first probabilistic retrieval model, i... |

208 | A general language model for information retrieval
- Song, Croft
- 1999
(Show Context)
Citation Context ...tical language modeling. The language modeling approach was first introduced by Ponte and Croft (1998) and also explored in (Hiemstra and Kraaij, 1998; Miller et al., 1999; Berger and Lafferty, 1999; =-=Song and Croft, 1999-=-). The estimation of a language model based on a document (i.e., the estimation of p(: jD; r)) is the key component in the language modeling approach. Indeed, most work in this direction differs mainl... |

205 |
A Hidden Markov Model Information Retrieval System
- Miller, Leek, et al.
- 1999
(Show Context)
Citation Context ...w family of probabilistic models based on statistical language modeling. The language modeling approach was first introduced by Ponte and Croft (1998) and also explored in (Hiemstra and Kraaij, 1998; =-=Miller et al., 1999-=-; Berger and Lafferty, 1999; Song and Croft, 1999). The estimation of a language model based on a document (i.e., the estimation of p(: jD; r)) is the key component in the language modeling approach. ... |

183 | A non-classical logic for Information Retrieval - Rijsbergen - 1986 |

177 |
Using probabilistic models of document retrieval without relevance information
- Croft, Harper
- 1979
(Show Context)
Citation Context ...n no explicit relevance information is available. Typically, p(t jQ; r) is set to a constant and p(t jQ; r) is estimated under the assumption that the each document in the collection is not relevant (=-=Croft and Harper, 1979-=-; Robertson and Walker, 1997). Recently, Lavrenko and Croft made progress in estimating the rel2 The use of a multinomial model for documents was actually first introduced in (Wong and Yao, 1989), but... |

171 |
Representation and Learning in Information Retrieval
- Lewis
- 1992
(Show Context)
Citation Context ..., and the weighting of the indexing terms. The choice of different indexing units has been extensively studied, but no significant improvement has been achieved over the simplest word-based indexing (=-=Lewis, 1992-=-), although recent evaluation has shown more promising improvement through the use of linguistic phrases (Evans and Zhai, 1996; Strzalkowski, 1997; Zhai, 1997). Many heuristics have also been proposed... |

165 | Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval
- Zhai, Cohen, et al.
- 2003
(Show Context)
Citation Context ...an et al., 2001), where p(Rel j d) is denoted p(Useful j d). In practice, there will be a compromise between retrieving documents with new content and avoiding non-relevant documents. In (Zhai, 2002; =-=Zhai et al., 2003-=-), this loss function is investigated with p(Rel j d) being assumed to be proportional to p(q j d) and p(New j d) being estimated with a mixture language model. A deficiency in way the MMR loss functi... |

114 | Probabilistic models in information retrieval
- FUHR
- 1992
(Show Context)
Citation Context ...ing to document generation and query generation (Lafferty and Zhai, 2003). Most classic probabilistic retrieval models (Robertson and Sparck Jones, 1976; van Rijsbergen, 1979; Robertson et al., 1981; =-=Fuhr, 1992-=-) are based on document generation (i.e., p(D; Q jR) = p(D jQ; R)p(Q jR)). The Binary Independence Retrieval (BIR) model (Robertson and Sparck Jones, 1976; Fuhr, 1992) is perhaps the most well known c... |

110 | Twenty-One at TREC-7: ad-hoc and crosslanguage track
- Hiemstra, Kraaij
- 1999
(Show Context)
Citation Context ...91). 6 fication for this new family of probabilistic models based on statistical language modeling. The language modeling approach was first introduced by Ponte and Croft (1998) and also explored in (=-=Hiemstra and Kraaij, 1998-=-; Miller et al., 1999; Berger and Lafferty, 1999; Song and Croft, 1999). The estimation of a language model based on a document (i.e., the estimation of p(: jD; r)) is the key component in the languag... |

101 | A probabilistic learning approach for document indexing - Fuhr, Buckley - 1991 |

101 | A theory of term importance in automatic text analysis - Salton, Yang, et al. - 1975 |

100 | A probabilistic approach to automatic keyword indexing. Parts 1 and 2 - Harter - 1975 |

100 |
On modeling information retrieval with probabilistic inference
- Wong, Yao
- 1995
(Show Context)
Citation Context ...evidence. The decision-theoretic view of retrieval allows the risk minimization framework to be more general than other retrieval frameworks such as the probabilistic inference framework proposed in (=-=Wong and Yao, 1995-=-) and the inference network framework (Turtle and Croft, 1991). 6.2 Risk Minimization and the Probability Ranking Principle The Probability Ranking Principle (PRP) has often been taken as the foundati... |

83 | Temporal summaries of news topics
- ALLAN, GUPTA, et al.
- 2001
(Show Context)
Citation Context ...not affected by the novelty of documents. When c3 = c2, we would score documents based on p(Rel j d)p(New j d), which is essentially the scoring formula for generating temporal summaries proposed in (=-=Allan et al., 2001-=-), where p(Rel j d) is denoted p(Useful j d). In practice, there will be a compromise between retrieving documents with new content and avoiding non-relevant documents. In (Zhai, 2002; Zhai et al., 20... |

81 | Noun-Phrase Analysis in Unrestricted Text for Information Retrieval
- Evans, Zhai
- 1996
(Show Context)
Citation Context ...o significant improvement has been achieved over the simplest word-based indexing (Lewis, 1992), although recent evaluation has shown more promising improvement through the use of linguistic phrases (=-=Evans and Zhai, 1996-=-; Strzalkowski, 1997; Zhai, 1997). Many heuristics have also been proposed to improve term weighting, but again, no weighting method has been found to be significantly better than the heuristic TF-IDF... |

80 | Probabilistic relevance models based on document and query generation
- Lafferty, Zhai
(Show Context)
Citation Context ...r jD; Q) p(r jD; Q) = log p(D; Q j r) p(r) p(D; Q j r) p(r) : There are two different ways to factor the conditional probability p(D; Q jR), corresponding to document generation and query generation (=-=Lafferty and Zhai, 2003-=-). Most classic probabilistic retrieval models (Robertson and Sparck Jones, 1976; van Rijsbergen, 1979; Robertson et al., 1981; Fuhr, 1992) are based on document generation (i.e., p(D; Q jR) = p(D jQ;... |

72 |
On relevance weights with little relevance information
- Robertson, Walker
- 1997
(Show Context)
Citation Context ...information is available. Typically, p(t jQ; r) is set to a constant and p(t jQ; r) is estimated under the assumption that the each document in the collection is not relevant (Croft and Harper, 1979; =-=Robertson and Walker, 1997-=-). Recently, Lavrenko and Croft made progress in estimating the rel2 The use of a multinomial model for documents was actually first introduced in (Wong and Yao, 1989), but was not exploited as a lang... |

63 | Trec-7 interactive track report. The
- Over
- 1999
(Show Context)
Citation Context ...rieval preference may prefer a ranking of documents where the top documents cover different subtopics. This problem, referred to as “aspect retrieval,” was investigated in the TREC interactive track (=-=Over, 1998-=-), where the purpose was to study how an interactive retrieval system can help a user to efficiently gather diverse information about a topic. How can we formally define a retrieval model for such a s... |

63 |
Model-based feedback in the KL-divergence retrieval model
- Zhai, Lafferty
- 2001
(Show Context)
Citation Context ... is precisely the log-likelihood criterion used by Ponte and Croft (1998) in introducing the language modeling approach, which has been used in all work on the language modeling approach to date. In (=-=Zhai and Lafferty, 2001-=-), new methods were developed to estimate a model bQ, leading to significantly improved performance over the use of the empirical distribution bQ. 18 4.1.3 “Binned” distance loss functions We now co... |

55 | Probabilistic models of indexing and searching - Robertson, Rijsbergen, et al. - 1981 |

54 | Document and passage retrieval based on hidden Markov models - Mittendorf, Schauble - 1994 |

50 |
Applying Bayesian Networks to Information Retrieval
- Fung, Favero
- 1995
(Show Context)
Citation Context ...d on the computation of the conditional probability that the query is satisfied given that the document is observed. Other similar uses of Bayesian belief network in retrieval have been presented in (=-=Fung and Favero, 1995-=-; Ribeiro and Muntz, 1996; Ribeiro-Neto et al., 2000). Kwok’s network model may also be considered as performing a probabilistic inference (Kwok, 1995), though it is based on spread activation. The in... |

50 |
Probability of relevance: a unification of two competing models for document retrieval
- Robertson, Maron, et al.
- 1982
(Show Context)
Citation Context ...being generated from a unigram language model (Kalt, 1996; McCallum and Nigam, 1998). Models based on query generation (p(D; Q jR) = p(Q jD; R)p(D jR)) have been explored in (Maron and Kuhns, 1960), (=-=Robertson et al., 1982-=-), (Fuhr, 1992) and (Lafferty and Zhai, 2003). Indeed, the Probabilistic Indexing model proposed in (Maron and Kuhns, 1960) is the very first probabilistic retrieval model, in which the indexing terms... |

47 | Extending the Boolean and Vector Space Model of Information Retrieval with P-norm Queries and Multiple Concept Types - Fox - 1983 |

41 |
Some inconsistencies and misnomers in probabilistic information retrieval
- Cooper
- 1991
(Show Context)
Citation Context ...y generation is used to decompose the generative model. This work provides a relevance-based justi1 The required underlying independence assumption for the final retrieval formula is actually weaker (=-=Cooper, 1991-=-). 6 fication for this new family of probabilistic models based on statistical language modeling. The language modeling approach was first introduced by Ponte and Croft (1998) and also explored in (Hi... |

41 |
A network approach to probabilistic information retrieval
- Kwok
- 1995
(Show Context)
Citation Context ...k in retrieval have been presented in (Fung and Favero, 1995; Ribeiro and Muntz, 1996; Ribeiro-Neto et al., 2000). Kwok’s network model may also be considered as performing a probabilistic inference (=-=Kwok, 1995-=-), though it is based on spread activation. The inference network model is a very general formalism; with different ways to realize the probabilistic relationship between the evidence of observing doc... |

36 | Risk minimization and language modeling in text retrieval
- Zhai
- 2002
(Show Context)
Citation Context ...re and a novelty measure. While there may be many different ways to specify such a loss function, the problem of deriving a well motivated loss of this type largely remains an open research question (=-=Zhai, 2002-=-). Suppose we make the simplifying assumption that a relevance score and a novelty score can be computed independently. In this case we can define our loss function as a direct combination of the two ... |

34 | A new probabilistic model of text classification and retrieval
- Kalt
- 1996
(Show Context)
Citation Context ...n and Walker, 1994). A different way of introducing term frequency into the model is implicit in text categorization approaches which view a document as being generated from a unigram language model (=-=Kalt, 1996-=-; McCallum and Nigam, 1998). Models based on query generation (p(D; Q jR) = p(Q jD; R)p(D jR)) have been explored in (Maron and Kuhns, 1960), (Robertson et al., 1982), (Fuhr, 1992) and (Lafferty and Z... |

34 | Fast statistical parsing of noun phrases for document indexing, Fifth Conference on Applied Natural Language Processing (1997) 312–319. A Correctness and Completeness In order to show that Algorithm 1 in Section 4.2 is sound and complete, we need to prove - Zhai |

32 |
Foundations of probabilistic and utility-theoretic indexing
- Cooper, Maron
- 1978
(Show Context)
Citation Context ...ion-theoretic view is not new; in the 1970s, researchers were already studying how to choose and weight indexing terms from a decisiontheoretic perspective (Bookstein and Swanson, 1975; Harter, 1975; =-=Cooper and Maron, 1978-=-). The probability ranking principle had also been justified based on optimizing the statistical decision about whether to retrieve a document (Robertson, 1977). However, the action/decision space con... |