Results 1 - 10
of
124
Cumulated Gain-based Evaluation of IR Techniques
- ACM Transactions on Information Systems
, 2002
"... Modem large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, i ..."
Abstract
-
Cited by 233 (3 self)
- Add to MetaCart
Modem large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents. This can be done by extending traditional evaluation methods, i.e., recall and precision based on binary relevance assessments, to graded relevance assessments. Alternatively, novel measures based on graded relevance assessments may be developed. This paper proposes three novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor on the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-tothe -ideal performance of IR techniques, based on the cumulative gain they are able to yield. The novel measures are defined and discussed and then their use is demonstrated in a case study using TREC data - sample system run results for 20 queries in TREC-7. As relevance base we used novel graded relevance assessments on a four-point scale. The test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences. The graphs based on the measures also provide insight into the performance IR techniques and allow interpretation, e.g., from the user point of ...
Harvest: A Scalable, Customizable Discovery and Access System
, 1995
"... Rapid growth in data volume, user base, and data diversity render Internet-accessible information increasingly difficult to use effectively. In this paper we introduce Harvest, a system that provides an integrated set of customizable tools for gathering information from diverse repositories, buil ..."
Abstract
-
Cited by 159 (7 self)
- Add to MetaCart
Rapid growth in data volume, user base, and data diversity render Internet-accessible information increasingly difficult to use effectively. In this paper we introduce Harvest, a system that provides an integrated set of customizable tools for gathering information from diverse repositories, building topic-specific content indexes, flexibly searching the indexes, widely replicating them, and caching objects as they are retrieved across the Internet. The system interoperates with WWW clients and with HTTP,FTP, Gopher, and NetNews information resources. We discuss the design and implementation of Harvest and its subsystems, give examples of its uses, and provide measurements indicating that Harvest can significantly reduce server load, network traffic, and space requirements when building indexes, compared with previous systems. We also discuss several popular indexes wehave built using Harvest, underscoring the customizability and scalability of the system.
Reusing Software: Issues And Research Directions
, 1995
"... Software productivity has been steadily increasing over the last 30 years, but not enough to close the gap between the demands placed on the software industry and what the state of the practice can deliver [22,39]; nothing short of an order of magnitude increase in productivity will extricate the so ..."
Abstract
-
Cited by 143 (7 self)
- Add to MetaCart
Software productivity has been steadily increasing over the last 30 years, but not enough to close the gap between the demands placed on the software industry and what the state of the practice can deliver [22,39]; nothing short of an order of magnitude increase in productivity will extricate the software industry from its perennial crisis [39,67]. Several decades of intensive research in software engineering and artificial intelligence left few alternatives but sofware reuse as the (only) realistic approach to bring about the gains of productivity and quality that the software industry needs. In this paper, we discuss the implications of reuse on the production, with an emphasis on the technical challenges. Software reuse involves building software that is reusable by design, and building with reusable software. Software reuse includes reusing both the products of previous software projects, and the processes deployed to produce them, leading to a wide spectrum of reuse approaches, from the building blocks (reusing products) approach on one hand, to the generative or reusable processor (reusing processes) on the other [68]. We discuss the implications of such appproaches on the organization, control, and method of software development and discuss proposed models for their economic analysis. Software reuse benefits from methodologies and tools to: 1) build more readily reusable software, and 2) locate, evaluate, and tailor reusable software, the latter being critical for the building blocks approach. Both sets of issues are discussed in this paper, with a focus on application generators and object-oriented development for the first, and a thorough discussion of retrieval techniques for software components, component composition (or bottom-up design) and transformational systems for the second. We conclude by highlighting areas that, in our opinion, are worthy of further investigation.
Cognitive perspectives of information retrieval interaction: elements of a cognitive IR theory
- Journal of Documentation
, 1996
"... The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a cognitive framework, the paper outlines the concept of polyrepresentation applied to both the user's cognitive space and the ..."
Abstract
-
Cited by 96 (7 self)
- Add to MetaCart
The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a cognitive framework, the paper outlines the concept of polyrepresentation applied to both the user's cognitive space and the information space of IR systems. The concept seeks to represent the current user's information need, problem state, and domain work task or interest in a structure of causality. Further, it implies that we should apply different methods of representation and a variety of IR techniques of different cognitive and functional origin simultaneously to each semantic full-text entity in the information space. The cognitive differences imply that by applying cognitive overlaps of information objects, originating from different interpretations of such objects through time and by type, the degree of uncertainty inherent in IR is decreased. Polyrepresentation and the use of cognitive overlaps are associated with, but not identical to, data
Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques
- JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
, 1998
"... ..."
Using Latent Semantic Analysis To Improve Access To Textual Information
- SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS
, 1988
"... This paper describes a new approach for dealing with the vocabulary problem in human-computer interaction. Most approaches to retrieving textual materials depend on a lexical match between words in users' requests and those in or assigned to database objects. Because of the tremendous diversity in t ..."
Abstract
-
Cited by 84 (1 self)
- Add to MetaCart
This paper describes a new approach for dealing with the vocabulary problem in human-computer interaction. Most approaches to retrieving textual materials depend on a lexical match between words in users' requests and those in or assigned to database objects. Because of the tremendous diversity in the words people use to describe the same object, lexical matching methods are necessarily incomplete and imprecise [5]. The latent semantic indexing approach tries to overcome these problems by automatically organizing text objects into a semantic structure more appropriate for matching user requests. This is done by taking advantage of implicit higher-order structure in the association of terms with text objects. The particular technique used is singular-value decomposition, in which a large term by text-object matrix is decomposed into a set of about 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination. Terms and objects are represented by 50 to 150 dimensional vectors and matched against user queries in this “semantic” space. Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available.
Information-seeking strategies of novices using a full-text electronic encyclopedia
- Journal of the American Society for Information Science
, 1989
"... An exploratory study was conducted of elementary school children searching a full-text electronic encyclo-pedia on CD-ROM. Twenty-eight third and fourth graders and 24 sixth graders conducted two assigned searches, one open-ended, the other one closed, after two demon-stration sessions. Keystrokes c ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
An exploratory study was conducted of elementary school children searching a full-text electronic encyclo-pedia on CD-ROM. Twenty-eight third and fourth graders and 24 sixth graders conducted two assigned searches, one open-ended, the other one closed, after two demon-stration sessions. Keystrokes captured by the com-puter and observer notes were used to examine user information-seeking strategies from a mental model perspective. Older searchers were more successful in finding required information, and took less time than younger searchers. No differences in total number of moves were found. Analysis of search patterns showed that novices used a heuristic, highly interactive search strategy. Searchers used sentence and phrase queries, indicating unique mental models for this search sys-tem. Most searchers accepted system defaults and used the AND connective in formulating queries. Transi-tion matrix analyses showed that younger searchers generally favored query refining moves and older searchers favored examining title and text moves. Sug-gestions for system designers were made and future re-search questions were identified.
Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms
- Journal of the American Society for Information Science
, 1995
"... Information retrieval using probabilistic techniques has at-tracted significant attention on the part of researchers in information and computer science over the past few de-cades. In the 198Os, knowledge-based techniques also made an impressive contribution to “intelligent ” informa-tion retrieval ..."
Abstract
-
Cited by 56 (9 self)
- Add to MetaCart
Information retrieval using probabilistic techniques has at-tracted significant attention on the part of researchers in information and computer science over the past few de-cades. In the 198Os, knowledge-based techniques also made an impressive contribution to “intelligent ” informa-tion retrieval and indexing. More recently, information sci-ence researchers have turned to other newer artificial-in-telligence-based inductive learning techniques including neural networks, symbolic learning, and genetic algo-rithms. These newer techniques, which are grounded on diverse paradigms, have provided great opportunities for researchers to enhance the information processing and re-trieval capabilities of current information storage and re-trieval systems. In this article, we first provide an overview of these newer techniques and their use in information science research. To familiarize readers with these tech-niques, we present three popular methods: the connec-tionist Hopfield network; the symbolic ID3/ID5R; and evolu-tion-based genetic algorithms. We discuss their knowl-edge representations and algorithms in the context of information retrieval. Sample implementation and testing results from our own research are also provided for each technique. We believe these techniques are promising in their ability to analyze user queries, identify users ’ infor-mation needs, and suggest alternatives for search. With proper user-system interactions, these methods can greatly complement the prevailing full-text, keyword-based, probabilistic, and knowledge-based techniques.
A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System
- Journal of the American Society for Information Science
, 1997
"... This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive stud!es related to the vcrcabulaw problem and vocabular ..."
Abstract
-
Cited by 56 (14 self)
- Add to MetaCart
This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive stud!es related to the vcrcabulaw problem and vocabulary-based search aids (thesauri) and then discuss technques for building robust and domain-specific thesauri to assist in cross-domain scientific information retrieval. Using a variation of the automatic thesaurus generation techniques, which we refer to as the concept space approach, we racentiy conducted an experiment in the molecular biology domain in whch we created a C. eksgans worm thesaurus of 7,657 worm-specific terms and a Drosophila fty thesaurus of 15,626 terms. About 30 % of these terms overtappad, which created vocabulary paths
Measuring Search Engine Quality
, 2001
"... The effectiveness of twenty public search engines is evaluated using TREC-inspired methods and a set of 54 queries taken from real Web search logs. The World Wide Web is taken as the test collection and a combination of crawler and text retrieval system is evaluated. The engines are compared on a ..."
Abstract
-
Cited by 47 (8 self)
- Add to MetaCart
The effectiveness of twenty public search engines is evaluated using TREC-inspired methods and a set of 54 queries taken from real Web search logs. The World Wide Web is taken as the test collection and a combination of crawler and text retrieval system is evaluated. The engines are compared on a range of measures derivable from binary relevance judgments of the first seven live results returned. Statistical testing reveals a significant difference between engines and high inter-correlations between measures. Surprisingly, given the dynamic nature of the Web and the time elapsed, there is also a high correlation between results of this study and a previous study by Gordon and Pathak. For nearly all engines, there is a gradual decline in precision at increasing cutoff after some initial fluctuation. Performance of the engines as a group is found to be inferior to the group of participants in the TREC-8 Large Web task, although the best engines approach the median of those systems. Shortcomings of current Web search evaluation methodology are identified and recommendations are made for future improvements. In particular, the present study and its predecessors deal with queries which are assumed to derive from a need to find a selection of documents relevant to a topic. By contrast, real Web search reflects a range of other information need types which require different judging and different measures. The authors wish to acknowledge that this work was carried out partly within the Cooperative Research Centre for Advanced Computational Systems established under the Australian Government's Cooperative Research Centres Program. 1 1

