Results 1 - 10
of
53
Information Retrieval
, 1979
"... Information retrieval is a wide, often loosely-defined term but in these pages I shall be concerned only with automatic information retrieval systems. Automatic as opposed to manual and information as opposed to data or fact. Unfortunately the word information can be very misleading. In the context ..."
Abstract
-
Cited by 288 (2 self)
- Add to MetaCart
Information retrieval is a wide, often loosely-defined term but in these pages I shall be concerned only with automatic information retrieval systems. Automatic as opposed to manual and information as opposed to data or fact. Unfortunately the word information can be very misleading. In the context of information retrieval (IR), information, in the technical meaning given in Shannon's theory of communication, is not readily measured (Shannon and Weaver1). In fact, in many cases one can adequately describe the kind of retrieval by simply substituting 'document' for 'information'. Nevertheless, 'information retrieval' has become accepted as a description of the kind of work published by Cleverdon, Salton, Sparck Jones, Lancaster and others. A perfectly straightforward definition along these lines is given by Lancaster2: 'Information retrieval is the term conventionally, though somewhat inaccurately, applied to the type of activity discussed in this volume. An information retrieval system does not inform (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and whereabouts of documents relating to his request.' This specifically excludes Question-Answering systems as typified by Winograd3 and those described by Minsky4. It also excludes data retrieval systems such as used by, say, the stock exchange for on-line quotations.
Learning to Order Things
- Journal of Artificial Intelligence Research
, 1998
"... There are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order, given feedback in the form of preference judgments, i.e., statements to the effect that one instance should be ranked ahead of another. We outline a ..."
Abstract
-
Cited by 265 (9 self)
- Add to MetaCart
There are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order, given feedback in the form of preference judgments, i.e., statements to the effect that one instance should be ranked ahead of another. We outline a two-stage approach in which one first learns by conventional means a preference function, of the form PREF(u; v), which indicates whether it is advisable to rank u before v. New instances are then ordered so as to maximize agreements with the learned preference function. We show that the problem of finding the ordering that agrees best with a preference function is NP-complete, even under very restrictive assumptions. Nevertheless, we describe a simple greedy algorithm that is guaranteed to find a good approximation. We then discuss an on-line learning algorithm, based on the "Hedge" algorithm, for finding a good linear combination of ranking "experts." We use the ordering algorith...
Cumulated Gain-based Evaluation of IR Techniques
- ACM Transactions on Information Systems
, 2002
"... Modem large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, i ..."
Abstract
-
Cited by 233 (3 self)
- Add to MetaCart
Modem large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents. This can be done by extending traditional evaluation methods, i.e., recall and precision based on binary relevance assessments, to graded relevance assessments. Alternatively, novel measures based on graded relevance assessments may be developed. This paper proposes three novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor on the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-tothe -ideal performance of IR techniques, based on the cumulative gain they are able to yield. The novel measures are defined and discussed and then their use is demonstrated in a case study using TREC data - sample system run results for 20 queries in TREC-7. As relevance base we used novel graded relevance assessments on a four-point scale. The test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences. The graphs based on the measures also provide insight into the performance IR techniques and allow interpretation, e.g., from the user point of ...
Using Statistical Testing in the Evaluation of Retrieval Experiments
, 1993
"... The standard strategies for evaluation based on precision and recall are examined and their relative advantages and disadvantages are discussed. In particular, it is suggested that relevance feedback be evaluated from the perspective of the user. A number of different statistical tests are described ..."
Abstract
-
Cited by 149 (0 self)
- Add to MetaCart
The standard strategies for evaluation based on precision and recall are examined and their relative advantages and disadvantages are discussed. In particular, it is suggested that relevance feedback be evaluated from the perspective of the user. A number of different statistical tests are described for determining if differences in performance between retrieval methods are significant. These tests have often been ignored in the past because most are based on an assumption of normality which is not strictly valid for the standard performance measures. However, one can test this assumption using simple diagnostic plots, and if it is a poor approximation, there are a number of non-parametric alternatives.
A critical investigation of recall and precision as measures of retrieval system performance
- ACM Transactions on Information Systems
, 1989
"... Recall and precision are often used to evaluate the effectiveness of information retrieval systems. They are easy to define if there is a single query and if the retrieval result generated for the query is a linear ordering. However, when the retrieval results are weakly ordered, in the sense that s ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
Recall and precision are often used to evaluate the effectiveness of information retrieval systems. They are easy to define if there is a single query and if the retrieval result generated for the query is a linear ordering. However, when the retrieval results are weakly ordered, in the sense that several documents have an identical retrieval status value with respect to a query, some probabilistic notion of precision has to be introduced. Relevance probability, expected precision, and so forth, are some alternatives mentioned in the literature for this purpose. Furthermore, when many queries are to be evaluated and the retrieval results averaged over these queries, some method of interpolation of precision values at certain preselected recall levels is needed. The currently popular approaches for handling both a weak ordering and interpolation are found to be inconsistent, and the results obtained are not easy to interpret. Moreover, in cases where some alternatives are available, no comparative analysis that would facilitate the selection of a particular strategy has been provided. In this paper, we systematically investigate the various problems and issues associated with the use of recall and precision as measures of retrieval system performance. Our motivation is to provide a comparative analysis of methods available for defining precision in a probabilistic sense and to promote a better understanding of the various issues involved in retrieval performance evaluation.
Evaluation by Highly Relevant Documents
- In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 2001
"... Given the size of the web, the search engine industry has argued that engines should be evaluated by their ability to retrieve highly relevant pages rather than all possible relevant pages. To explore the role highly relevant documents play in retrieval system evaluation, assessors for the TREC-9 we ..."
Abstract
-
Cited by 62 (3 self)
- Add to MetaCart
Given the size of the web, the search engine industry has argued that engines should be evaluated by their ability to retrieve highly relevant pages rather than all possible relevant pages. To explore the role highly relevant documents play in retrieval system evaluation, assessors for the TREC-9 web track used a three-point relevance scale and also selected best pages for each topic. The relative eectiveness of runs evaluated by dierent relevant document sets differed, con rming the hypothesis that dierent retrieval techniques work better for retrieving highly relevant documents. Yet evaluating by highly relevant documents can be unstable since there are relatively few highly relevant documents. TREC assessors frequently disagreed in their selection of the best page, and subsequent evaluation by best page across dierent assessors varied widely. The discounted cumulative gain measure introduced by Jarvelin and Kekalainen increases evaluation stability by incorporating all relevance judgments while still giving precedence to highly relevant documents.
Content-Based Image Retrieval with Self-Organizing Maps
- PATTERN RECOGNITION LETTERS
, 1999
"... The recent development of computing hardware has resulted in a rapid increase of visual information such as databases of images. To successfully utilize this increasing amount of data, we need eoeective ways to process it. Content-based image retrieval utilizes the visual content of images directly ..."
Abstract
-
Cited by 43 (9 self)
- Add to MetaCart
The recent development of computing hardware has resulted in a rapid increase of visual information such as databases of images. To successfully utilize this increasing amount of data, we need eoeective ways to process it. Content-based image retrieval utilizes the visual content of images directly in the process of retrieving relevant images from a database. The retrieval is based on visual features such as the colors, textures, shapes, and spatial relations the image contains rather than traditional textual keywords. These features are usually extracted automatically, without the need for a human operator. In the literature survey part o...
Expected Reciprocal Rank for Graded Relevance
- CIKM'09, NOVEMBER 2–6, 2009, HONG KONG, CHINA.
, 2009
"... While numerous metrics for information retrieval are available in the case of binary relevance, there is only one commonly used metric for graded relevance, namely the Discounted Cumulative Gain (DCG). A drawback of DCG is its additive nature and the underlying independence assumption: a document in ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
While numerous metrics for information retrieval are available in the case of binary relevance, there is only one commonly used metric for graded relevance, namely the Discounted Cumulative Gain (DCG). A drawback of DCG is its additive nature and the underlying independence assumption: a document in a given position has always the same gain and discount independently of the documents shown above it. Inspired by the “cascade ” user model, we present a new editorial metric for graded relevance which overcomes this difficulty and implicitly discounts documents which are shown below very relevant documents. More precisely, this new metric is defined as the expected reciprocal length of time that the user will take to find a relevant document. This can be seen as an extension of the classical reciprocal rank to the graded relevance case and we call this metric Expected Reciprocal Rank (ERR). We conduct an extensive evaluation on the query logs of a commercial search engine and show that ERR correlates better with clicks metrics than other editorial metrics.
Overview of the INitiative for the Evaluation of XML retrieval (INEX) 2002
- IN: PROC. OF THE FIRST WORKSHOP OF THE INITIATIVE FOR THE EVALUATION OF XML RETRIEVAL (INEX), DAGSTUHL, 2002
"... The INitiative for the Evaluation of XML retrieval (INEX) aims at providing an infrastructure for evaluating the effectiveness of content-oriented XML retrieval. In the first round of INEX, in 2002, a test collection of real world XML documents along with standard topics and respective relevance ass ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
The INitiative for the Evaluation of XML retrieval (INEX) aims at providing an infrastructure for evaluating the effectiveness of content-oriented XML retrieval. In the first round of INEX, in 2002, a test collection of real world XML documents along with standard topics and respective relevance assessments has been created. Research groups from 36 different organisations participated in this collaborative effort. In this article we describe the test collection and how it was constructed. An overview of the metrics used to evaluate the effectiveness of XML retrieval approaches and of the evaluation results of 51 submissions from the INEX 2002 participants is also provided.
Coverage, Relevance, and Ranking: The Impact of Query Operators on . . .
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2003
"... ..."

