## Models for retrieval with probabilistic indexing (1989)

Venue: | Information Processing and Management |

Citations: | 86 - 14 self |

### BibTeX

@INPROCEEDINGS{Fuhr89modelsfor,

author = {Norbert Fuhr},

title = {Models for retrieval with probabilistic indexing},

booktitle = {Information Processing and Management},

year = {1989},

pages = {55--72}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract- in this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing @II) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descriptor in a document is an estimate of the proba-bility of relevance of this document with respect to queries using this descriptor. Sec-ond is the retrieval-with-probabilistic-indexing (RPI) model, which is suited to different kinds of probabilistic indexing. For that we assume that each indexing scheme has its own concept of “correctness ” to which the probabilities relate. In addition to the prob-abilistic indexing weights, the RPI model provides the possibility of reIevance weight-ing of search terms. A third mode1 that is similar was proposed by Croft some years ago as an extension of the binary independence retrieval model but it can be shown that this model is not based on the probabilistic ranking principle. The probabilistic indexing weights required for any of these models can be provided by an application of the Darm-stadt indexing approach (DIA) for indexing with descriptors from a controlled vocabu-Iary. The experimental results show signi~cant improvements over retrieval with binary indexing. Finally, suggestions are made regarding how the DIA can be applied to prob-abilistic indexing with free text terms. 1.

### Citations

604 |
Relevance weighting of search terms
- Robertson, Jones
- 1976
(Show Context)
Citation Context ...x&M,d i=l All three assumptions relate to the distribution of descriptors in the queries. Formula (1) says that the distribution of the descriptors in all queries is independent, whereas formulas (2)/=-=(3)-=- say that the distribution of the descriptors is independent only in those queries where the document d, is relevant/nonrelevant to the corresponding request. Using assumptions (1) and (2), we get the... |

191 |
On relevance, probabilistic indexing and information retrieval
- Maron, Kuhns
- 1960
(Show Context)
Citation Context ...improvements in retrieval effectiveness can be achieved when binary indexing is replaced by weighted probabilistic indexing. The first paper on probabilistic indexing was published by Maron and Kuhns =-=[5]-=-. The central idea of their model is to estimate for each descriptor in a document a probability of relevance- the probability that the document is relevant to a request which is formulated with this ... |

165 |
Using Probabilistic Models of Document Retrieval Without Relevance Information
- Croft, Harper
- 1979
(Show Context)
Citation Context ... the ranking value. On the other hand, the EGX/IDF function does not perform better than the function without IDF weights in our experiments. This result is different from those described in [lS] and =-=[24]-=-, where significant improvements were gained with the usage of IDF weights. We assume that this is caused by the different kinds of query terms (controlled vocabulary vs. free text terms) and indexing... |

90 | A probabilistic approach to automatic keyword indexing: part 1 - Harter - 1975 |

89 | A theory of term importance in automatic text analysis - Salton, Yang, et al. - 1975 |

68 | Probabilistic models for automatic indexing - Bookstein, Swanson - 1974 |

52 | An evaluation of feedback in document retrieval using cooccurrence data - HARPER, RIJSBERGEN - 1978 |

50 |
Probabilistic models of indexing and searching
- Robertson, Rijsbergen, et al.
- 1980
(Show Context)
Citation Context ...AI. MODEL FOR RETRIEVAL. WITH PR~BABll.l~Tl~ INDEXING; The retrieval-with-probabilistic-indexing (RPI) model described here is similar to the so-called 2-Poisson-independence (TPI) model described in =-=[17]-=-. The main difference between the TPI model and the RPI model is that the RPI model is suited to different probabilistic indexing schemes, whereas the TPI model is an extension of the 2-Poisson model ... |

28 |
Experiments with representation in a document retrieval system
- Croft
- 1983
(Show Context)
Citation Context ...bilistic retrieval model suited to a binary indexing of documents. Croft developed an extension of this model for the combination with weighted probabilistic indexing in [l 11, and evaluated it later =-=[18]-=-. Here we will give a short description of these models and compare Croft’s model with the RPI model. In the BIR model, a document d, is represented by a binary vector x, = (&n,,..., x,,) where xm, = ... |

20 |
A decision theoretic foundation for indexing
- Bookstein, Swanson
- 1975
(Show Context)
Citation Context ...we regard the document-request relationship between d, and fk with respect to all indexings x E X. For the probability of this event we get Now we apply Bayes’ theorem: P(Rlfk,dm) = c ~(~lxJi)*ml&z). =-=(9)-=- XEX P(Rlf,,dm) = C P(RI fk) . p;$f;’ .P(xldm). XEX k Equation (10) is a general formula for retrieval with probabilistic indexing. Here all dependencies between descriptors can be considered. Before ... |

17 |
Retrieval test evaluation of a rule based automatic indexing (air/phys
- Fuhr, Knorz
- 1984
(Show Context)
Citation Context ...rther details, see [12141). The DIA is a dictionary-based indexing approach for automatic indexing from document titles and abstracts, with a prescribed indexing vocabulary. In the AIR retrieval test =-=[13]-=- it was demonstrated that the DIA is suited even to broad subject fields such as physics. The indexing task consists of two steps, a description step and a decision step. In the description step infor... |

13 |
The probability ranking principle in information retrieval
- Robertson
- 1977
(Show Context)
Citation Context ...on application of the following three independence assumptions: p(xk) = fip(xk,) I=1 P(x,lR,d,) = fiP(x&M,d i=l All three assumptions relate to the distribution of descriptors in the queries. Formula =-=(1)-=- says that the distribution of the descriptors in all queries is independent, whereas formulas (2)/(3) say that the distribution of the descriptors is independent only in those queries where the docum... |

10 |
Probabilistic automatic indexing by learning from human indexers
- Robertson, Harding
- 1984
(Show Context)
Citation Context ... respect to all indexings x E X. For the probability of this event we get Now we apply Bayes’ theorem: P(Rlfk,dm) = c ~(~lxJi)*ml&z). (9) XEX P(Rlf,,dm) = C P(RI fk) . p;$f;’ .P(xldm). XEX k Equation =-=(10)-=- is a general formula for retrieval with probabilistic indexing. Here all dependencies between descriptors can be considered. Before we apply some independence assumptions to simplify this formula, le... |

10 |
Automatisches Indexieren als Erkennen abstrakter Objekte
- Knorz
- 1983
(Show Context)
Citation Context ...lopment of indexing functions have been investigated for the DIA. In [15] a probabilistic formula for this purpose is described. Here we will concentrate on the polynomial approach developed by Knorz =-=[12,16]-=-, which uses polynomial classifiers. For this approach, the relevance description y is mapped to a description vector y. The definition of this mapping has to be done heuristically [12,16]. Then a coe... |

9 |
A Failure Analysis on the Limitations of Suffixing in an Online Environment
- Harman
- 1987
(Show Context)
Citation Context ...values. Another criterion for the definition of term classes might be the document frequency of the terms. Word stemming For the application of the DIA, two types of word stemming have been used. (In =-=[28]-=-, three word stemming algorithms are compared with respect to their influence on retrieval quality. In contrast to our approach, the different stemming algorithms are only used forsModels for retrieva... |

7 |
A probabilistic model of dictionary-based automatic indexing
- Fuhr
- 1985
(Show Context)
Citation Context ...nding descriptor assignment would be correct. This estimation is done by the indexing function a(y). Different methods for the development of indexing functions have been investigated for the DIA. In =-=[15]-=- a probabilistic formula for this purpose is described. Here we will concentrate on the polynomial approach developed by Knorz [12,16], which uses polynomial classifiers. For this approach, the releva... |

5 | Recent Trends in Automatic Information Retrieval - Salton - 1986 |

4 |
Probabilistic approaches to the document retrieval problem
- Maron
- 1983
(Show Context)
Citation Context ...s descriptor. But this model has never been investigated in experiments, because of the problem of estimating the required probabilistic parameters. All suggestions for solving this problem (see also =-=[6]-=-) require too much intellectual effort. In the meantime, other models of probabilistic, automatic indexing have been developed that are based on certain forms of document representation. The well-know... |

4 |
Automatische Indexierung zwischen Forschung und Anwendung. Olms
- Lustig
- 1986
(Show Context)
Citation Context ...assumptions of the 2-Poisson model are inappropriate. Here we propose a new approach for the estimation of index term weights that is based on the concept of the form of occurrence (FOC) from the DIA =-=[14]-=-. This concept is more powerful than the approaches mentioned before. The basic idea is that the task of identifying terms in a document cannot be done perfectly. Instead of having a single definition... |

3 |
A Decision Theory Approach to Optimal Automatic Indexing
- Knorz
- 1982
(Show Context)
Citation Context ...lopment of indexing functions have been investigated for the DIA. In [15] a probabilistic formula for this purpose is described. Here we will concentrate on the polynomial approach developed by Knorz =-=[12,16]-=-, which uses polynomial classifiers. For this approach, the relevance description y is mapped to a description vector y. The definition of this mapping has to be done heuristically [12,16]. Then a coe... |

3 |
Probabilistisches indexing und retrieval
- Fuhr
- 1988
(Show Context)
Citation Context ...ee of relevance. Some experiments not described here have shown that the difference between retrieval results remains the same, whether a binary or a multivalue relevance scale is used for evaluation =-=[21]-=-. i=lsModels for retrieval with probabilistic indexing 65 7. ESTIMATION OF PROBABILISTIC PARAMETERS To apply the ranking formulas described above three kinds of probabilistic parameters have to be est... |

1 |
Precision freighting-an effective automatic indexing method
- Salton, Yu
- 1976
(Show Context)
Citation Context ...fiP(x&M,d i=l All three assumptions relate to the distribution of descriptors in the queries. Formula (1) says that the distribution of the descriptors in all queries is independent, whereas formulas =-=(2)-=-/(3) say that the distribution of the descriptors is independent only in those queries where the document d, is relevant/nonrelevant to the corresponding request. Using assumptions (1) and (2), we get... |

1 |
Planung und Durchfiihrung der Retrievaltests
- Bollmann, Jochum, et al.
- 1986
(Show Context)
Citation Context ...tions that the descriptors are distributed independently in all relevant and all nonrelevant documents we get: IPM 25:1-E f+Gn 1 R,fk) n pkn, IR,fk) g(x) = log = log n p(xm 1 R,fk) i=l P(xrn;IRsfk) ’ =-=(19)-=-s62 NORBERT FUHR After doing some simplifications (see e.g. [ll]), we end up with where pik = P(x,,,, = 1 I Kh), qik = P(&,, = 11 R, fx), and p;k = q;k for all s, 4 fjJ, the set of query terms. The se... |

1 | Outline of a general probabilistic retrieval model - unknown authors |

1 |
Development of automatic indexing for the AIR retrieval test. Experiments by means of ALIBABA
- unknown authors
- 1983
(Show Context)
Citation Context ...ings differ in the definition of correctness ,to which the probabilistic parameters relate: � Indexing Al was taken from the AIR retrieval test. The indexing function was derived from manual indexing =-=[22]-=- using a learning sample of 1,000 documents with about 24,000 relevance descriptions. � Indexing 11 was adopted on the basis of the retrieval results of the query sample B. According to our applicatio... |

1 |
Probabilistic search term weighting-some negative results
- Fuhr, Miiller
- 1987
(Show Context)
Citation Context ... . . . 1 gave nearly equal results. No results of experiments with search term weights based on relevance feedback data are given here because there was no appropriate test sample available (see also =-=[23]-=-). 8. EXPERIMENTS With the different ranking formulas, experiments were made using sample A of the test collection. The results are given in Table 5. Experiments l-13 deal with the BII model. In exper... |

1 |
Two Poisson and binary independence assumptions for probabilistic document retrieval
- Losee, Bookstein, et al.
(Show Context)
Citation Context ...significant improvements of retrieval quality were gained. But for the only probabilistic approach to solve this problem, the 2-Poisson model [7-91, no improvement over binary indexing could be shown =-=[17,27]-=-. Obviously, the basic assumptions of the 2-Poisson model are inappropriate. Here we propose a new approach for the estimation of index term weights that is based on the concept of the form of occurre... |