Results 1 - 10
of
10
A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System
- Journal of the American Society for Information Science
, 1997
"... This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive stud!es related to the vcrcabulaw problem and vocabular ..."
Abstract
-
Cited by 56 (14 self)
- Add to MetaCart
This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive stud!es related to the vcrcabulaw problem and vocabulary-based search aids (thesauri) and then discuss technques for building robust and domain-specific thesauri to assist in cross-domain scientific information retrieval. Using a variation of the automatic thesaurus generation techniques, which we refer to as the concept space approach, we racentiy conducted an experiment in the molecular biology domain in whch we created a C. eksgans worm thesaurus of 7,657 worm-specific terms and a Drosophila fty thesaurus of 15,626 terms. About 30 % of these terms overtappad, which created vocabulary paths
An Evolutionary Approach to Constructing Effective Software Reuse Repositories
- ACM Transactions on Software Engineering and Methodology
, 1997
"... This article outlines an approach that avoids these problems by choosing a retrieval method that utilizes minimal repository structure to effectively support the process of finding software components. The approach is demonstrated through a pair of proof-ofconcept prototypes: PEEL, a tool to semiaut ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
This article outlines an approach that avoids these problems by choosing a retrieval method that utilizes minimal repository structure to effectively support the process of finding software components. The approach is demonstrated through a pair of proof-ofconcept prototypes: PEEL, a tool to semiautomatically identify reusable components, and CodeFinder, a retrieval system that compensates for the lack of explicit knowledge structures through a spreading activation retrieval process. CodeFinder also allows component representations to be modified while users are searching for information. This mechanism adapts to the changing nature of the information in the repository and incrementally improves the repository while people use it. The combination of these techniques holds potential for designing software repositories that minimize up-front costs, effectively support the search process, and evolve with an organization's changing needs.
Optimizing Ranking Functions: A Connectionist Approach to Adaptive Information Retrieval
- DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, THE UNIVERSITY OF CALIFORNIA, SAN DIEGO
, 1994
"... This dissertation examines the use of adaptive methods to automatically improve the performance of ranked text retrieval systems. The goal of a ranked retrieval system is to manage a large collection of text documents and to order documents for a user based on the estimated relevance of the document ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This dissertation examines the use of adaptive methods to automatically improve the performance of ranked text retrieval systems. The goal of a ranked retrieval system is to manage a large collection of text documents and to order documents for a user based on the estimated relevance of the documents to the user's information need (or query). The ordering enables the user to quickly find documents of interest. Ranked retrieval is a difficult problem because of the ambiguity of natural language, the large size of the collections, and because of the varying needs of users and varying collection characteristics. We propose and empirically validate general adaptive methods which improve the ability of a large class of retrieval systems to rank documents effectively. Our main adaptive method is to numerically optimize free parameters in a retrieval system by minimizing a non-metric criterion function. The criterion measures how well the system is ranking documents relative to a target ordering, defined by a set of training queries which include the users' desired document orderings. Thus, the system learns parameter settings which better enable it to rank relevant documents before irrelevant. The non-metric approach is interesting because it is a general adaptive method, an alternative to supervised methods for training neural networks in domains in which rank order or prioritization is important. A second adaptive method is also examined, which is applicable to a restricted class of retrieval systems but which permits an analytic solution. The adaptive methods are applied to a number of problems in text retrieval to validate their utility and practical efficiency. The applications include: A dimensionality reduction of vector-based document representations to a vector spa...
Bibliometric Information Retrieval System (BIRS): A Web Search Interface Utilizing Bibliometric Research Results
- Journal of the American Society for Information Science
, 2000
"... Introduction TheInternetandWWWhavealreadyestablishedthemselvesasmajorfactorsintheoperationofscholarlycom - munitiesworldwide.Today,theInternetisusedinall spheresoflifeforexchangeofinformation.Information resourcesontheInternetareincreasingtremendously.GordonandPathak (1999)suggestedthattheprimaryus ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Introduction TheInternetandWWWhavealreadyestablishedthemselvesasmajorfactorsintheoperationofscholarlycom - munitiesworldwide.Today,theInternetisusedinall spheresoflifeforexchangeofinformation.Information resourcesontheInternetareincreasingtremendously.GordonandPathak (1999)suggestedthattheprimaryuseofthe Internetisforinformationretrieval.Searchenginesare consideredasthemostimportanttoolforretrievinginformationontheWeb, andconsequentlyformacriticalareaof research(Gaines,Chen,&Shaw,1997;Lawrence&Giles, 1998). DespitetheeffectivenessofInternet-basedoronlineinformationretrieval, problemsstillexist.Woodward(1996) arguedthattheInternetiscurrentlyinastateofnearchaos intermsofaccessandorganizationofinformation.Voorbij (1999)foundthat67%oftheInternetusersagreeor stronglyagreewiththedifficultytoperformsubject searchesontheInternet.Users,especiallythenoviceand irregularusers,finditdifficulttophrasetheirinformation needsduetothelackofknowledgeliteracyinsearch domain(Bates,1986,1998).Alth
An Investigation of the Preconditions for Effective Data Fusion in Information Retrieval: A Pilot Study
, 1998
"... Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance. Ther ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance. There are many successful data fusion experiments reported in IR literature, but there are also experiments in which data fusion did not work while using the same fusion rules. What is needed is a theory to tell a priori when one should use data fusion methods. We categorize different theoretical justifications of data fusion into two approaches, examine their implications, analyze some of the unsuccessful data fusion experiments, and propose two preconditions for effective data fusion: (1) The precondition of efficacy and (2) The precondition of dissimilarity. We have developed a mathematical measure (Pair-out-of-order) to measure inter-scheme dissimilarity, and have developed algorithms and co...
The design of knowledge-rich browsing interfaces for retrieval in digital libraries. Doctoral dissertation
, 1999
"... ii ..."
Incorporating the results of co-word analyses to increase search variety for information retrieval
- Journal of Information Science
"... This research aims to incorporate the results of co-word analysis into information retrieval as a means to increase search variety for end users in the domain of information retrieval. Relevant data were first collected from the Science Citation Index and Social Science Citation Index for the period ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This research aims to incorporate the results of co-word analysis into information retrieval as a means to increase search variety for end users in the domain of information retrieval. Relevant data were first collected from the Science Citation Index and Social Science Citation Index for the period of 1987-1997. The results of co-word analysis on the data were compared with similar data obtained from three thesauri, namely, the LISA thesaurus, LCSH (Library Congress Subject Heading) and the Thesaurus of Information Technology Terms. The differences detected between them indicate that the search variety may be increased by combining co-word analysis with the use of traditional thesauri. Subsequently, the results of co-word analysis were compared with each other for two different periods (1987-1991 and 1992-1997). The changes among them were identified implying co-word analysis may be used to directly identify dynamic changes in its chosen domain area, thereby providing better up-to-date information to aid the information search process.
Predicting the Effectiveness of Nave Data Fusion on the Basis of System Characteristics
- Journal of American Society for Information Science
, 2000
"... Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance, in t ..."
Abstract
- Add to MetaCart
Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance, in the process called "data fusion". There are many successful data fusion experiments reported in IR literature, but there are also cases in which it did not work well. Thus if would be quite valuable to have a theory which can predict, in advance, whether fusion of two or more retrieval schemes will be worth doing. In previous study (Ng and Kantor, 1998), we identified two predictive variables for the effectiveness of fusion: (a) a list-based measure of output dissimilarity, and (b) a pair-wise measure of the similarity of performance of the two schemes. In this paper we investigate the predictive power of these two variables in simple symmetrical data fusion. We use the IR systems participating in the TREC 4 routing task to train a model that predicts the effectiveness of data fusion, and use the IR systems participating in the TREC 5 routing task to test that model. The model asks, "when will fusion perform better than an oracle who uses the best scheme from each pair?" We explore statistical techniques for fitting the model to the training data and use the receiver operating characteristic curve of signal detection theory to represent the power of the resulting models. The trained prediction methods predict whether fusion will beat an oracle, at levels much higher than could be achieved by chance. Ng, p. 2 1
Incorporating syntactic dependency information towards improved coding of
"... lengthy medical concepts in clinical reports ..."

