## A critical investigation of recall and precision as measures of retrieval system performance (1989)

### Cached

### Download Links

- [taxondata.org]
- [lincweb.cacs.louisiana.edu]
- [nichol.as]
- [nichol.as]
- DBLP

### Other Repositories/Bibliography

Venue: | ACM Transactions on Information Systems |

Citations: | 90 - 0 self |

### BibTeX

@ARTICLE{Raghavan89acritical,

author = {Vijay V. Raghavan and Gwang S. Jung and Peter Bollmann},

title = {A critical investigation of recall and precision as measures of retrieval system performance},

journal = {ACM Transactions on Information Systems},

year = {1989},

volume = {7},

pages = {205--229}

}

### Years of Citing Articles

### OpenURL

### Abstract

Recall and precision are often used to evaluate the effectiveness of information retrieval systems. They are easy to define if there is a single query and if the retrieval result generated for the query is a linear ordering. However, when the retrieval results are weakly ordered, in the sense that several documents have an identical retrieval status value with respect to a query, some probabilistic notion of precision has to be introduced. Relevance probability, expected precision, and so forth, are some alternatives mentioned in the literature for this purpose. Furthermore, when many queries are to be evaluated and the retrieval results averaged over these queries, some method of interpolation of precision values at certain preselected recall levels is needed. The currently popular approaches for handling both a weak ordering and interpolation are found to be inconsistent, and the results obtained are not easy to interpret. Moreover, in cases where some alternatives are available, no comparative analysis that would facilitate the selection of a particular strategy has been provided. In this paper, we systematically investigate the various problems and issues associated with the use of recall and precision as measures of retrieval system performance. Our motivation is to provide a comparative analysis of methods available for defining precision in a probabilistic sense and to promote a better understanding of the various issues involved in retrieval performance evaluation.

### Citations

1693 | Term weighting approaches in automatic text retrieval
- Salton, Buckley
- 1988
(Show Context)
Citation Context ...hing function. The second one is the standard cosine similarity. The third one, called “the best probabilistic term weight,” and the fourth one, termed “the best (tf X idf)” were proposed in [22] and =-=[23]-=-. For each document collection, retrieval results based on simple matching, cosine similarity, best probabilistic term weight, and best (tf x idf) functions are obtained. Following that, we use the me... |

245 |
The SMART retrieval system: experiments in automatic document processing. Engle- wood Cliffs
- SALTON
- 1971
(Show Context)
Citation Context ...ded information, and the ability of the system to retrieve useful items. This approach is hard to realize, if not impossible, because it is difficult to obtain all the relevant measurement parameters =-=[21,25]-=-. Even if it were possible to have all the information available, how to combine them appropriately to obtain a single measure is another question. Consequently, it is common practice in research inve... |

171 |
Implementation of the SMART information retrieval system
- Buckley
- 1985
(Show Context)
Citation Context ...results based on this given set of queries, some technique of interpolation of precision values is needed. A method of interpolation based on the use of the ceiling operation was utilized in the past =-=[7,21,34]-=-. With this method, the interpretation of precision is difficult and not amenable to objective treatment, when all the documents in the final rank are not retrieved. We instead propose an interpolatio... |

156 | Computer evaluation of indexing and text processing - Salton, Lesk - 1968 |

104 | On the specification of term values in automatic indexing - Salton, Yang - 1973 |

86 |
Introduction to Logic
- Suppes
- 1957
(Show Context)
Citation Context ...weak ordering [6]. Formally, a linear ordering is reflexive, transitive, antisymmetric, and connected (every pair of elements is comparable). In contrast, a weak ordering may not satisfy antisymmetry =-=[29]-=-. In other words, a weak ordering reduces to ACM Transactions on Information Systems, Vol. 7, No. 3, July 1989.s208 - V. V. Raghavan, P. Bollmann, and G. S. Jung linear ordering as a special case. Lin... |

85 | Expected search length: A single measure of retrieval effectiveness based on weak ordering action of retrieval systems - Cooper - 1968 |

44 | On selecting a measure of retrieval effectiveness - Cooper - 1973 |

41 | Effectiveness of Information Retrieval Methods - Swets - 1969 |

39 | A theory of indexing - Salton - 1975 |

22 | On the inverse relationship of recall and precision - Cleverdon - 1972 |

19 | Foundation of evaluation - Rijsbergen - 1974 |

18 | The parametric description of retrieval tests: Part II: Overall measures - Robertson - 1969 |

16 |
Evaluation tests of information retrieval systems
- CLEVERDON
(Show Context)
Citation Context ...y adopted) affect the ultimate retrieval results and hence the outcome of the performance evaluation. Six different evaluation criteria, deemed most critical to a user population, were pointed out in =-=[9]-=- and [25], namely, recall, precision, effort, time, form of presentation, and coverage. Among them, recall and precision have received the most attention in the literature. Recall is defined as the ra... |

13 |
Measurement-theoretical investigation of the mzmetric
- Bollmann, Cherniavsky
- 1981
(Show Context)
Citation Context ...ver different queries at standardized recall values. Note, however, that although this method is used for making experimental comparisons, its meaning as a function of recall is yet to be determined. =-=(3)-=- The third experiment examined retrieval evaluation results based on the intuitive-PRECALL method where averaging over queries is done at selected ND values. In experiments 1, 2, and 3, four document ... |

10 |
On Selecting a measure of retrieval effectiveness: Part II. Implementation of the philosophy
- Cooper
- 1973
(Show Context)
Citation Context ...elps in the systematic selection of techniques to deal with problems of weak ordering and multiple queries. APPENDIX: PROOF FOR LEMMA 3.1 The proof of this lemma is a generalization of Cooper’s proof =-=[12]-=- in the sense that we use the F function here. I’(X) is related to the Beta function [16] by r(x)r(Y) = S l t’“-“(l - t)(r-l) dt = B(X, y) for x > 0, y > 0. (A.l) r(x+Y) o And we get for 0 < s I r r(s... |

10 |
Introduction to mathematical statistics (4th ed
- Hoel
- 1971
(Show Context)
Citation Context ...! (r - s)!(i - u)! / for integer s. If we now interpolate all factorials that contain an s with the I function we obtain the following lemma. Cr+i ’ LEMMA 3.1. Let es1 be calculated by the r function =-=[16]-=-. Then, es1 = j + s for 0 < s 5 r. PROOF. A proof of this lemma is given in the Appendix. The method of proof is similar to Cooper’s for integer s. 0 With this result we find a simple formula for es1 ... |

9 |
A general mathematical model for information retrieval systems
- Bookstein, Cooper
- 1976
(Show Context)
Citation Context ...ery item in the collection is assigned a distinct RSV by the similarity function used. However, if more than one item is present at the same level, with an identical RSV, it is termed a weak ordering =-=[6]-=-. Formally, a linear ordering is reflexive, transitive, antisymmetric, and connected (every pair of elements is comparable). In contrast, a weak ordering may not satisfy antisymmetry [29]. In other wo... |

9 | Distance between sets as an objective measure of retrieval effectiveness - HEINE - 1973 |

7 | Evaluation problems in interactive information retrieval - Salton - 1970 |

7 |
Recent trends in Automatic Information Retrieval
- Salton
- 1986
(Show Context)
Citation Context ...mple matching function. The second one is the standard cosine similarity. The third one, called “the best probabilistic term weight,” and the fourth one, termed “the best (tf X idf)” were proposed in =-=[22]-=- and [23]. For each document collection, retrieval results based on simple matching, cosine similarity, best probabilistic term weight, and best (tf x idf) functions are obtained. Following that, we u... |

6 |
Performance averaging for recall and precision
- JONES, K
- 1978
(Show Context)
Citation Context ... V. V. Raghavan, P. Bollmann, and G. S. Jung aging the precision values of all queries at that recall point. Although some other methods of interpolation have been considered in the literature (e.g., =-=[28]-=-), the ceiling method is quite typical of other such methods currently in use. We refer to PRECALL with this ceiling interpolation as the ceiling-PRECALL in the remainder of this paper. 2.3 Motivation... |

5 |
Two axioms for evaluation measures in information retrieval
- Bollmann
- 1984
(Show Context)
Citation Context ... ceiling or the intuitive interpolation). In other words, can one measure conclude that retrieval result A is better than retrieval result B, while the other measure leads to the opposite conclusion? =-=(2)-=- The second experiment investigated retrieval performance comparisons based on two measures: PRR under intuitive interpolation and PRECALL. To compare the two approaches fairly, we bring them to a com... |

5 | Evaluation of information retrieval systems: A decision theory approach - Kraft, Bookstein - 1978 |

5 | Stopping rules and their effect on expected search length - Kraft, Lee - 1979 |

3 |
A comparison of evaluation measures for document-retrieval systems
- Bollmann
- 1977
(Show Context)
Citation Context ...ePRECALL method proposed in Section 3.4 what conclusions can be drawn in comparing it to other measures ? Specifically, the following three experiments have been carried out to answer such questions. =-=(1)-=- The first experiment investigated whether by using the same measure, say PRR, claims about the relative performance of systems get reversed if we choose different methods of interpolation (i.e., eith... |

3 | Problem of Evaluating Retrieval Systems I - Cherniavsky, Lakhuty - 1970 |

2 | A utility-theoretic analysis of expected search length - Bollmann, Raghavan - 1988 |

1 |
Probabiity of relevance and expected precision in evaluating retrieval performance
- BOLLMANN, RAGHAVAN, et al.
(Show Context)
Citation Context ... the equations of how to obtain a closed-form formula for EP as well as what is a natural method of interpolation for EP are still being addressed. We will provide some answers in these directions in =-=[5]-=-. It is hoped that this investigation contributes to a better understanding of precision defined as a function of NR or ND as methods of evaluation and that it helps in the systematic selection of tec... |

1 | The inverse relationship of precision and recall - HEINE - 1973 |

1 |
Introduction to Modern Information Retrieual
- SALTON, MCGILL
- 1983
(Show Context)
Citation Context ...ded information, and the ability of the system to retrieve useful items. This approach is hard to realize, if not impossible, because it is difficult to obtain all the relevant measurement parameters =-=[21,25]-=-. Even if it were possible to have all the information available, how to combine them appropriately to obtain a single measure is another question. Consequently, it is common practice in research inve... |

1 |
Information Retrieual, 2nd Ed. Butterworth Scienti ic Ltd
- RIJSBERGEN
- 1979
(Show Context)
Citation Context ...l Investigation of Recall and Precision l 219 Fig. 2. Parametric interpretation of the graph obtained by the intuitiue-PRECALL method. an indirect approach similar to that mentioned in van Rijsbergen =-=[32]-=-, where he describes Recall and Precision as a function of a common parameter A. The above analysis provides an interpretation of points on the graph obtained by the intuitive-PRECALL method for one q... |

1 | Precision-weighting-An effective automatic il dexing method - Yu, SALTON - 1976 |

1 |
A single-pass method for determining the se nantic relationship between terms
- Yu, RAGHAVAN
- 1977
(Show Context)
Citation Context ...results based on this given set of queries, some technique of interpolation of precision values is needed. A method of interpolation based on the use of the ceiling operation was utilized in the past =-=[7,21,34]-=-. With this method, the interpretation of precision is difficult and not amenable to objective treatment, when all the documents in the final rank are not retrieved. We instead propose an interpolatio... |