## From Retrieval Status Values to Probabilities of Relevance for Advanced IR Applications (2003)

### Cached

### Download Links

- [www.is.informatik.uni-duisburg.de]
- [www.is.inf.uni-due.de]
- DBLP

### Other Repositories/Bibliography

Venue: | Information Retrieval |

Citations: | 9 - 3 self |

### BibTeX

@ARTICLE{Nottelmann03fromretrieval,

author = {Henrik Nottelmann and Norbert Fuhr},

title = {From Retrieval Status Values to Probabilities of Relevance for Advanced IR Applications},

journal = {Information Retrieval},

year = {2003},

volume = {6},

pages = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

this paper, we explore the use of linear and logistic mapping functions for different retrieval methods. In a series of upper-bound experiments, we compare the approximation quality of the different mapping functions. We also investigate the effect on the resulting retrieval quality in distributed retrieval (only merging, without resource selection) . These experiments show that good estimates of the actual probability of relevance can be achieved, and that the logistic model outperforms the linear one. Retrieval quality for distributed retrieval is only slightly improved by using the logistic function

### Citations

1802 |
An algorithm for suffix stripping
- Porter
- 1980
(Show Context)
Citation Context ...ries, where we used all fields. These queries contain 39–185 terms (average 87.5) and are common in TREC-based evaluations. For both documents and queries, terms are stemmed (using the Porter stemme=-=r [18]), a-=-nd stop words (the TREC “common words”) are removed. The relevance judgements are the standard TREC relevance judgements [12], documents with no judgement are treated as irrelevant. 6sIn our exper... |

1527 | Term-weighting approaches in automatic text retrieval
- Salton, Buckley
(Show Context)
Citation Context ...ine of the angle between the vectors. We use a simple version of TF·IDF which computes the scalar product of the query and the document vector RSV(d,q) := ∑ dtf (t,d) · qtf (t,q) (3) t∈q with tf=-=x.tfx [22] do-=-cument weights |DL| + 1 dtf T F·IDF(t,d) := tf (t,d) · log . (4) df (t) + 0.5 As this implementation of TF·IDF does not normalise the term frequency, it is biased towards longer documents. Raw tf v... |

882 | A language modeling approach to information retrieval
- Ponte, Croft
- 1998
(Show Context)
Citation Context ...etrieval function in formula 3. LM: Statistical language models have a long history in the fields of speech recognition and statistical language processing, and proved recently to be effective for IR =-=[17]. Th-=-e basic idea is that queries and documents are generated by statistical language models. E.g., the document language model is described by the “indexing weights” 2 http://www-2.cs.cmu.edu/~lemur d... |

330 | The inquery retrieval system
- Callan, Croft, et al.
(Show Context)
Citation Context ...+ 0.5 As this implementation of TF·IDF does not normalise the term frequency, it is biased towards longer documents. Raw tf values tf (t,q) are used as query term weights. INQUERY: The INQUERY system=-= [3, 1]-=- is based on inference networks with document nodes, indexing term nodes, query concept (i.e., query term) nodes and a single query node. The probability of the arcs between a document node and text n... |

234 |
The Probability Ranking Principle in IR
- Robertson
- 1977
(Show Context)
Citation Context ...thods for considering specific representations of documents and queries. The key advantage of probabilistic models is their underlying theoretic justification. The Probability Ranking Principle (PRP) =-=[20]-=- states that optimum retrieval (defined w. r. t. document representations) is given if the documents are ranked according to the probability Pr(rel|d,q) that document d is relevant to a user query q (... |

177 |
C.J.: A non-classical logic for information retrieval
- Rijsbergen
- 1986
(Show Context)
Citation Context ... status values and probabilities of relevance (specified by a “mapping function”). 1sFor Rijsbergen’s paradigm of uncertain inference (a probabilistic generalisation of the logical view on datab=-=ases) [23], a li-=-near relationship between the retrieval status value Pr(q ← d) (“probability of inference”) and the probability of relevance has been proposed [24]. Recently, [13] proposed a mixture model for a... |

173 | Query-Based Sampling of Text Databases
- Callan
(Show Context)
Citation Context .... 3.1 Experimental Setup The work described in this paper originates from work in the field of distributed information retrieval. Thus, we used the TREC-123 test bed with the CMU 100 collection split =-=[2]-=- which is heavily used for evaluating distributed IR methods. The collections are of roughly the same size (about 33 megabytes), but vary in the number of documents they contain. The documents inside ... |

141 | The effectiveness of gloss for the text database discovery problem
- Gravano, Garcia-Molina, et al.
- 1994
(Show Context)
Citation Context ...{0,1}, [0,1], IR). However, for advanced applications we need the probability Pr(rel|d,q) that d is relevant to q (“probability of relevance”). 1 In contrast to resource-ranking algorithms like Gl=-=OSS [11, 10]-=- or CORI [2] which only compute a matching score between collections and the given query. 2sThe actual relationship between the RSV of a document and its probability of relevance is approximated by a ... |

116 |
The Analysis of Cross-Classified Categorical Data (Second Edition
- Fienberg
- 1980
(Show Context)
Citation Context ...tinuous function f which approximates the discrete step function (15). Obviously, the pure and affine linear functions (10,9) are not appropriate. Instead, one good candidate is the logistic function =-=[5, 6] flo-=-g : IR → [0,1], flog(x) := exp(b0 + b1 · x) 1 + exp(b0 + b1 · x) with the two parameters b0 and b1. Figure 1 depicts some logistic functions with different parameters. One of the nice properties o... |

100 |
On modeling information retrieval with probabilistic inference
- Wong, Yao
- 1995
(Show Context)
Citation Context ...he scalar product (RSVs in IR). Probabilistic models are widely used in Information Retrieval. Besides the fact that even classical nonprobabilistic models can be given a probabilistic interpretation =-=[25]-=-, current language models extend classical probabilistic models by methods for considering specific representations of documents and queries. The key advantage of probabilistic models is their underly... |

93 | A Probabilistic Learning Approach for Document Indexing
- Fuhr, Buckley
- 1991
(Show Context)
Citation Context ...gistic functions have been used in different application areas within IR for quite some time, e.g. for text categorisation [8] or retrieval functions [4, 9] (logistic variant of the model proposed in =-=[7])-=-. In this paper, we extend this work by evaluating linear and logistic functions for three different retrieval methods: TF·IDF, INQUERY and language models. For these models, linear functions can be ... |

79 | Modeling score distributions for combining the outputs of search engines
- Manmatha, Feng
- 2001
(Show Context)
Citation Context ...the logical view on databases) [23], a linear relationship between the retrieval status value Pr(q ← d) (“probability of inference”) and the probability of relevance has been proposed [24]. Rece=-=ntly, [13]-=- proposed a mixture model for approximating the RSV distribution. The RSVs of the relevant documents are modelled by a normal (Gaussian) distribution, and the RSVs of the non-relevant documents are ap... |

56 | INQUERY does battle with TREC-6
- Allan, Callan, et al.
- 1997
(Show Context)
Citation Context ...k documents according to their “retrieval status values” with respect to (w. r. t.) the given query. In the Boolean model, RSVs are either zero or one. Fuzzy retrieval allows for RSVs in the inter=-=val [0,1]-=-. The well-known vector-space-model can be used with the cosine metric (RSVs in [0,1]) or the scalar product (RSVs in IR). Probabilistic models are widely used in Information Retrieval. Besides the fa... |

49 |
Evaluating different methods of estimating retrieval quality for resource selection
- Nottelmann, Fuhr
- 2003
(Show Context)
Citation Context ...non-relevant) document from the retrieved set. Resource selection: In distributed IR, resource selection is the task to determine the best collections to be searched. The decision-theoretic framework =-=[14]-=- for resource selection aims at estimating the number of relevant documents; 1 for this, the probabilities of relevance of the top-ranked documents have to be approximated. Data fusion: The probabilit... |

41 | Inferring probability of relevance using the method of logistic regression
- Gey
- 1994
(Show Context)
Citation Context ...ference) and Pr(rel|q,d) than linear functions. Logistic functions have been used in different application areas within IR for quite some time, e.g. for text categorisation [8] or retrieval functions =-=[4, 9] -=-(logistic variant of the model proposed in [7]). In this paper, we extend this work by evaluating linear and logistic functions for three different retrieval methods: TF·IDF, INQUERY and language mod... |

39 |
Probabilistic retrieval based on staged logistic regression
- Cooper, Dabney
(Show Context)
Citation Context ...ference) and Pr(rel|q,d) than linear functions. Logistic functions have been used in different application areas within IR for quite some time, e.g. for text categorisation [8] or retrieval functions =-=[4, 9] -=-(logistic variant of the model proposed in [7]). In this paper, we extend this work by evaluating linear and logistic functions for three different retrieval methods: TF·IDF, INQUERY and language mod... |

37 |
Applied categorical data analysis
- Freeman
- 1987
(Show Context)
Citation Context ...tinuous function f which approximates the discrete step function (15). Obviously, the pure and affine linear functions (10,9) are not appropriate. Instead, one good candidate is the logistic function =-=[5, 6] flo-=-g : IR → [0,1], flog(x) := exp(b0 + b1 · x) 1 + exp(b0 + b1 · x) with the two parameters b0 and b1. Figure 1 depicts some logistic functions with different parameters. One of the nice properties o... |

31 |
The Second Text Retrieval Conference
- Harman, editor
- 1994
(Show Context)
Citation Context ...t index only contains the <text> sections of the documents (with the different document indexing weights dtf (t,d) as described in subsection 2.1). Queries are based on TREC topics 51–100 and 101–=-=150 [12]-=-, respectively. We use three different sets of queries: 1. Short queries, where we only used the <title> field. Short queries contain between 1 and 7 terms (average 3.3), and are similar to those subm... |

15 | Generalizing gioss to vector-space databases and broker hierarchies
- Gravano, Garcia-Molina
- 1995
(Show Context)
Citation Context ...{0,1}, [0,1], IR). However, for advanced applications we need the probability Pr(rel|d,q) that d is relevant to q (“probability of relevance”). 1 In contrast to resource-ranking algorithms like Gl=-=OSS [11, 10]-=- or CORI [2] which only compute a matching score between collections and the given query. 2sThe actual relationship between the RSV of a document and its probability of relevance is approximated by a ... |

12 | Combining model-oriented and description-oriented approaches for probabilistic indexing
- Fuhr, Pfeifer
- 1991
(Show Context)
Citation Context ...d) (the RSV in uncertain inference) and Pr(rel|q,d) than linear functions. Logistic functions have been used in different application areas within IR for quite some time, e.g. for text categorisation =-=[8]-=- or retrieval functions [4, 9] (logistic variant of the model proposed in [7]). In this paper, we extend this work by evaluating linear and logistic functions for three different retrieval methods: TF... |

10 | From uncertain inference to probability of relevance for advanced IR applications
- Nottelmann, Fuhr
- 2003
(Show Context)
Citation Context ...ies of relevance as a normalisation of the RSVs is a natural and theoretically well-founded solution to this problem. We already investigated mapping functions for the paradigm of uncertain inference =-=[15]. -=-Here, logistic functions proved to be more effective for approximating the relationship between Pr(q ← d) (the RSV in uncertain inference) and Pr(rel|q,d) than linear functions. Logistic functions h... |

10 |
Rijsbergen. Probabilistic retrieval revisited
- van
- 1992
(Show Context)
Citation Context ...neralisation of the logical view on databases) [23], a linear relationship between the retrieval status value Pr(q ← d) (“probability of inference”) and the probability of relevance has been pro=-=posed [24]-=-. Recently, [13] proposed a mixture model for approximating the RSV distribution. The RSVs of the relevant documents are modelled by a normal (Gaussian) distribution, and the RSVs of the non-relevant ... |

2 |
Entwicklung und Untersuchung von verbesserten probabilistischen Indexierungsfunktionen für Freitext-Indexierung (in german
- Pollmann
- 1993
(Show Context)
Citation Context ...les (document, query, RSV, relevance judgement), parameters can be learned by means of regression methods. Possible optimisation criteria are maximum likelihood [8] or least-square 5 (16)spolynomials =-=[16]-=-. In both cases, extrema of a function (the likelihood function or the square error) have to be determined, i.e. the points where the first derivative equals zero. The resulting equation cannot be sol... |