Results 1 - 10
of
66
Query expansion using lexical-semantic relations
- In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1994
"... Applications such as office automation, news filtering, help facilities in complex systems, and the like require the ability to retrieve documents from full-text databases where vocabulary problems can be particularly severe. Experiments performed on small collections with single-domain thesauri sug ..."
Abstract
-
Cited by 395 (1 self)
- Add to MetaCart
Applications such as office automation, news filtering, help facilities in complex systems, and the like require the ability to retrieve documents from full-text databases where vocabulary problems can be particularly severe. Experiments performed on small collections with single-domain thesauri suggest that expanding query vectors with words that are lexically related to the original query words can ameliorate some of the problems of mismatched vocabularies. This paper examines the utility of lexical query expansion in the large, diverse TREC collection. Concepts are represented by WordNet synonym sets and are expanded by following the typed links included in Word Net. Experimental results show this query expansion technique makes little difference in retrieval effectiveness if the original queries are relatively complete descriptions of the information being sought even when the concepts to be expanded are selected by hand. Less well developed queries can be significantly improved by expansion of hand-chosen concepts. However, an automatic procedure that can approximate the set of hand picked synonym sets has yet to be devised, and expanding by the synonym sets that are automatically generated can degrade retrieval performance. 1
Concept Based Query Expansion
, 1993
"... Query expansion methods have been studied for a long time - with debatable success in many instances. In this paper we present a probabilistic query expansion model based on a similarity thesaurus which was constructed automatically. A similarity thesaurus reflects domain knowledge about the particu ..."
Abstract
-
Cited by 233 (2 self)
- Add to MetaCart
Query expansion methods have been studied for a long time - with debatable success in many instances. In this paper we present a probabilistic query expansion model based on a similarity thesaurus which was constructed automatically. A similarity thesaurus reflects domain knowledge about the particular collection from which it is constructed. We address the two important issues with query expansion: the selection and the weighting of additional search terms. In contrast to earlier methods, our queries are expanded by adding those terms that are most similar to the concept of the query, rather than selecting terms that are similar to the query terms. Our experiments show that this kind of query expansion results in a notable improvement in the retrieval effectiveness when measured using both recall-precision and usefulness.
An Association Thesaurus for Information Retrieval
- In RIAO 94 Conference Proceedings
, 1994
"... Although commonly used in both commercial and experimental information retrieval systems, thesauri have not demonstrated consistent benefits for retrieval performance, and it is difficult to construct a thesaurus automatically for large text databases. In this paper, an approach, called PhraseFinder ..."
Abstract
-
Cited by 182 (11 self)
- Add to MetaCart
(Show Context)
Although commonly used in both commercial and experimental information retrieval systems, thesauri have not demonstrated consistent benefits for retrieval performance, and it is difficult to construct a thesaurus automatically for large text databases. In this paper, an approach, called PhraseFinder, is proposed to construct collection-dependent association thesauri automatically using large full-text document collections. The association thesaurus can be accessed through natural language queries in INQUERY, an information retrieval system based on the probabilistic inference network. Experiments are conducted in INQUERY to evaluate different types of association thesauri, and thesauri constructed for a variety of collections. 1 Introduction A thesaurus is a set of items ( phrases or words ) plus a set of relations between these items. Although thesauri are commonly used in both commercial and experimental IR systems, experiments have shown inconsistent effects on retrieval effectiven...
The limitations of term co-occurrence data for query expansion in document retrieval systems
- Journal of the American Society for Information Science
, 1991
"... Term cooccurrence data has been extensively used in document retrieval systems for the identification of indexing terms that are similar to those that have been specified in a user query: these similar terms can then be used to augment the original query statement. Despite the plausibility of this a ..."
Abstract
-
Cited by 116 (0 self)
- Add to MetaCart
(Show Context)
Term cooccurrence data has been extensively used in document retrieval systems for the identification of indexing terms that are similar to those that have been specified in a user query: these similar terms can then be used to augment the original query statement. Despite the plausibility of this approach to query expan-sion, the retrieval effectiveness of the expanded que-ries is often no greater than, or even less than, the effectiveness of the unexpanded queries. This article demonstrates that the similar terms identified by cooc-currence data in a query expansion system tend to occur very frequently in the database that is being searched. Unfortunately, frequent terms tend to discrimi-nate poorly between relevant and nonrelevant docu-ments, and the general effect of query expansion is thus to add terms that do little or nothing to improve the dis-criminatory power of the original query.
Experiments on Using Semantic Distances Between Words in Image Caption Retrieval
, 1996
"... Traditional approaches to information retrieval are based upon representing a user's query as a bag of query terms and a document as a bag of index terms and computing a degree of similarity between the two based on the overlap or number of query terms in common between them. Our long-term app ..."
Abstract
-
Cited by 109 (3 self)
- Add to MetaCart
Traditional approaches to information retrieval are based upon representing a user's query as a bag of query terms and a document as a bag of index terms and computing a degree of similarity between the two based on the overlap or number of query terms in common between them. Our long-term approach to IR applications is based upon precomputing semantically-based word-word similarities, work which is described elsewhere, and using these as part of the document-query similarity measure. A basic premise of our word-to-word similarity measure is that the input to this computation is the correct or intended word sense but in information retrieval applications, automatic and accurate word sense disambiguation remains an unsolved problem. In this paper we describe our first successful application of these ideas to an information retrieval application, specifically the indexing and retrieval of captions describing the content of images. We have hand-captioned 2714 images and to circumvent, fo...
Selecting Good Expansion Terms for Pseudo-Relevance Feedback
"... Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold in reality – many expansion terms identified in traditional approaches are indeed unrelated to the que ..."
Abstract
-
Cited by 92 (7 self)
- Add to MetaCart
(Show Context)
Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold in reality – many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. We also show that good expansion terms cannot be distinguished from bad ones merely on their distributions in the feedback documents and in the whole collection. We then propose to integrate a term classification process to predict the usefulness of expansion terms. Multiple additional features can be integrated in this process. Our experiments on three TREC collections show that retrieval effectiveness can be much improved when term classification is used. In addition, we also demonstrate that good terms should be identified directly according to their possible impact on the retrieval effectiveness, i.e. using supervised learning, instead of unsupervised learning.
Retrieval Effectiveness of an Ontology-Based Model for Information Selection
- The VLDB Journal
, 2004
"... Technology in the field of digital media generates huge amounts of non-textual information, audio, video, and images, along with more familiar textual information. The potential for exchange and retrieval of information is vast and daunting. The key problem in achieving efficient and userfriendly re ..."
Abstract
-
Cited by 54 (16 self)
- Add to MetaCart
(Show Context)
Technology in the field of digital media generates huge amounts of non-textual information, audio, video, and images, along with more familiar textual information. The potential for exchange and retrieval of information is vast and daunting. The key problem in achieving efficient and userfriendly retrieval is the development of a search mechanism to guarantee delivery of minimal irrelevant information (high precision) while insuring relevant information is not overlooked (high recall). The traditional solution employs keyword-based search. The only documents retrieved are