Results 1 - 10
of
12
Cumulated Gain-based Evaluation of IR Techniques
- ACM Transactions on Information Systems
, 2002
"... Modem large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, i ..."
Abstract
-
Cited by 233 (3 self)
- Add to MetaCart
Modem large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents. This can be done by extending traditional evaluation methods, i.e., recall and precision based on binary relevance assessments, to graded relevance assessments. Alternatively, novel measures based on graded relevance assessments may be developed. This paper proposes three novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor on the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-tothe -ideal performance of IR techniques, based on the cumulative gain they are able to yield. The novel measures are defined and discussed and then their use is demonstrated in a case study using TREC data - sample system run results for 20 queries in TREC-7. As relevance base we used novel graded relevance assessments on a four-point scale. The test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences. The graphs based on the measures also provide insight into the performance IR techniques and allow interpretation, e.g., from the user point of ...
IR evaluation methods for retrieving highly relevant documents
, 2000
"... This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable from the user point of view in moderu large IR e ..."
Abstract
-
Cited by 218 (4 self)
- Add to MetaCart
This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable from the user point of view in moderu large IR environments. The proposed methods are (1) a novel application of P-R curves and average precision computations based on separate recall bases for documents of different degrees of relevance, and (2) two novel measures computing the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. We then demonstrate the use of these evaluation methods in a case study on the effectiveness of query types, based on combinations of query structures and expansion, in retrieving documents of various degrees of relevance. The test was run with a best match retrieval system (In- Query ) in a text database consisting of newspaper articles. The results indicate that the tested strong query structures are most effective in retrieving highly relevant documents. The differences between the query types are practically essential and statistically significant. More generally, the novel evaluation methods and the case demonstrate that non-dichotomous rele- vance assessments are applicable in IR experiments, may reveal interesting phenomena, and allow harder testing of IR methods. 1.
A Method for Measuring Wide Range Performance of Boolean Queries in Full-Text Databases
, 2000
"... A new laboratory-based method for the evaluation of Boolean queries in free-text searching of full-text databases is proposed. The method is based on a controlled formulation of inclusive query plans, on an automatic conversion of query plans into a set of elementary queries, and on composing optima ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
A new laboratory-based method for the evaluation of Boolean queries in free-text searching of full-text databases is proposed. The method is based on a controlled formulation of inclusive query plans, on an automatic conversion of query plans into a set of elementary queries, and on composing optimal queries at varying operational levels by combining appropriate sub-sets of elementary queries. The method is based on the idea of reverse engineering, and exploits full relevance data of documents to find the query performing optimally within given operational constraints. The proposed
How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review
- Journal of the American Medical Association
, 1998
"... Objective.—Despite the proliferation of electronic information retrieval (IR) systems for physicians, their effectiveness has not been well assessed. The purpose of this review is to provide a conceptual framework and to apply the results of previous studies to this framework. Data Sources.—All sour ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Objective.—Despite the proliferation of electronic information retrieval (IR) systems for physicians, their effectiveness has not been well assessed. The purpose of this review is to provide a conceptual framework and to apply the results of previous studies to this framework. Data Sources.—All sources of medical informatics and information science literature, including MEDLINE, along with bibliographies of textbooks in these areas, were searched from 1966 to January 1998. Study Selection.—All articles presenting either classifications of evaluation studies or their results, with an emphasis on those studying use by physicians. Data Extraction.—A framework for evaluation was developed, consisting of frequency of use, purpose of use, user satisfaction, searching utility, search failure, and outcomes. All studies were then assessed based on the framework. Data Synthesis.—Due to the heterogeneity and simplistic study designs, no meta-analysis of studies could be done. General conclusions were drawn from data where appropriate. A total of 47 articles were found to include an evaluation component and were used to develop the framework. Of these, 21 articles met the inclusion criteria for 1 or more of the categories in the framework. Most use of IR systems by physicians still occurs with bibliographic rather than full-text databases. Overall use of IR systems occurs just 0.3 to 9 times per physician per month, whereas physicians have 2 unanswered questions for every 3 patients. Conclusions.—Studies comparing IR systems with different searching features have not shown that advanced searching methods are significantly more effective than simple text word methods. Most searches retrieve only one fourth to one half of the relevant articles on a given topic and, once retrieved, little is known about how these articles are interpreted or applied. These studies imply that further research and development are needed to improve system utility and performance.
Automatic indexing of documents from journal descriptors: A preliminary investigation
- Journal of the American Society for Information Science
, 1999
"... A new, fully automated approach for indexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual j ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
A new, fully automated approach for indexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual journals themselves. This indexing is maintained in journal records in a serials authority database. The advantage of this novel approach is that the training set does not depend on previous manual indexing of hundreds of thousands of documents (i.e., any such indexing already in the training set is not used), but rather the relatively small intellectual effort of indexing at the journal level, usually a matter of a few thousand unique journals for which retrospective indexing to maintain consistency and currency may be feasible. If successful, JD indexing would provide topical categorization of documents outside the training set, i.e., journal articles, monographs, WEB documents, reports from the grey literature, etc., and therefore be applied in searching. Because JDs are quite general, corresponding to subject domains, their most probable use would be for improving or refining search results.
A Novel Method For The Evaluation Of Boolean Query Effectiveness Across A Wide Operational Range
, 2000
"... Traditional methods for the system-oriented evaluation of Boolean IR systems suffer from validity and reliability problems. Laboratory-based research neglects the searcher and studies suboptimal queries. Research on operational systems fails to make a distinction between searcher performance and sys ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Traditional methods for the system-oriented evaluation of Boolean IR systems suffer from validity and reliability problems. Laboratory-based research neglects the searcher and studies suboptimal queries. Research on operational systems fails to make a distinction between searcher performance and system performance. This approach is neither capable of measuring performance at standard points of operation (e.g. across R0.0-R1.0).
Adding Boolean-quality control to best-match searching via an improved user interface
, 2000
"... While end users these days seem happy with best-match text-retrieval systems, it appears that expert searchers still prefer exact-match (Boolean) text-retrieval systems by an overwhelming margin. This is somewhat surprising. Most expert searchers were probably trained with Boolean systems, and an ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
While end users these days seem happy with best-match text-retrieval systems, it appears that expert searchers still prefer exact-match (Boolean) text-retrieval systems by an overwhelming margin. This is somewhat surprising. Most expert searchers were probably trained with Boolean systems, and an obvious factor is simply preferring the familiar, but we argue that a second major factor is that these experts feel a much greater sense of control with Boolean than with best-match systems. We have designed a best-match system, MIRV, incorporating user-interface features that we believe will give experts a sense of control comparable to that of Boolean systems and that we believe end users will also be happy with. We implemented MIRV's document viewer and did a controlled user study, with encouraging results. This material is based on work supported in part by the National Science Foundation, Library of Congress and Department of Commerce under cooperative agreement number EEC-9209...
A Comparison of Boolean and Natural Language Searching for the TREC-6 Interactive Task
, 1997
"... The TREC-6 interactive task used a multi-site experimental protocol, where each participating site compared an "experimental" system with a common "control" system used at all sites. For the Oregon Health Sciences University site, the "experimental" system was a Boolean interface to the MG system, w ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The TREC-6 interactive task used a multi-site experimental protocol, where each participating site compared an "experimental" system with a common "control" system used at all sites. For the Oregon Health Sciences University site, the "experimental" system was a Boolean interface to the MG system, while the control system was, as for all sites, the natural language ZPRISE system. Performance was measured by aspectual recall and precision. OHSU searchers did well overall, achieving the highest overall aspectual precision. These searchers did obtain belowaverage aspectual recall overall, although they achieved above-average aspectual recall with the control system, indicating that for the TREC-6 interactive task, a natural language searching system was superior to a Boolean one. Background A long-standing research issue of interest to information retrieval (IR) researchers at Oregon Health Sciences University (OHSU) is whether end-user searchers achieve better results with Boolean or n...
A retrospective evaluation method for exact-match and best-match queries applying an interactive query performance analyser
- in Crestani, F. et al. (Eds), Advances in Information Retrieval: Proceedings of the 24th European Colloquium on IR Research
, 2002
"... Abstract. A retrospective method for the performance comparison of queries based on different IR models is introduced. The method is based on the interactive optimisation of queries by a group of test searchers using a query performance analyser. The case experiment focused on comparing the maximum ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. A retrospective method for the performance comparison of queries based on different IR models is introduced. The method is based on the interactive optimisation of queries by a group of test searchers using a query performance analyser. The case experiment focused on comparing the maximum effectiveness of Boolean exact-match queries, and structured and unstructured best-match queries. The experiment verified the problems in maintaining precision of Boolean queries at high recall levels. Interesting similarities were also observed between structured and unstructured best-match queries challenging the results of earlier studies.
User-Oriented Evaluation Methods for Information Retrieval: A Case Study Based on Conceptual Models for Query Expansion
"... This paper discusses evaluation methods based on the use of non-dichotomous relevance judgements in information retrieval (IR) experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is deskable from the user's point ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper discusses evaluation methods based on the use of non-dichotomous relevance judgements in information retrieval (IR) experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is deskable from the user's point of view in modem large IR environments. The proposed methods are (1) a novel application of P-R curves and average precision computations based on separate recall bases for documents of different degrees of relevance, and (2) two novel measures computing the cumulated gain the user obtains by examining the retrieval result up to a given ranked position. We then demonstrate the use of these evaluation methods in a case study on the effectiveness of query types, based on combinations of query structures and expansion, in retrieving documents of various degrees of relevance. Query expansion is based on concepts, which are selected from a conceptual model, and then expanded by semantic relationships given in the model. The test is run with a best match retrieval system (inQuery) in a text database consisting of newspaper articles.

