Results 1 - 10
of
34
Query Expansion Using Local and Global Document Analysis
- In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1996
"... Automatic query expansion has long been suggested as a technique for dealing with the fundamental issue of word mismatch in information retrieval. A number of approaches to expansion have been studied and, more recently, attention has focused on techniques that analyze the corpus to discover word re ..."
Abstract
-
Cited by 384 (14 self)
- Add to MetaCart
Automatic query expansion has long been suggested as a technique for dealing with the fundamental issue of word mismatch in information retrieval. A number of approaches to expansion have been studied and, more recently, attention has focused on techniques that analyze the corpus to discover word relationships (global techniques) and those that analyze documents retrieved by the initial query ( local feedback). In this paper, we compare the effectiveness of these approaches and show that, although global analysis has some advantages, local analysis is generally more effective. We also show that using global analysis techniques, such as word context and phrase structure, on the local set of documents produces results that are both more effective and more predictable than simple local feedback. 1 Introduction The problem of word mismatch is fundamental to information retrieval. Simply stated, it means that people often use different words to describe concepts in their queries than auth...
Concept Based Query Expansion
, 1993
"... Query expansion methods have been studied for a long time - with debatable success in many instances. In this paper we present a probabilistic query expansion model based on a similarity thesaurus which was constructed automatically. A similarity thesaurus reflects domain knowledge about the particu ..."
Abstract
-
Cited by 147 (2 self)
- Add to MetaCart
Query expansion methods have been studied for a long time - with debatable success in many instances. In this paper we present a probabilistic query expansion model based on a similarity thesaurus which was constructed automatically. A similarity thesaurus reflects domain knowledge about the particular collection from which it is constructed. We address the two important issues with query expansion: the selection and the weighting of additional search terms. In contrast to earlier methods, our queries are expanded by adding those terms that are most similar to the concept of the query, rather than selecting terms that are similar to the query terms. Our experiments show that this kind of query expansion results in a notable improvement in the retrieval effectiveness when measured using both recall-precision and usefulness.
Applying Associative Retrieval Techniques to Alleviate the Sparsity Problem in Collaborative Filtering
- ACM Transactions on Information Systems
, 2004
"... this article, we propose to deal with this sparsity problem by applying an associative retrieval framework and related spreading activation algorithms to explore transitive associations among consumers through their past transactions and feedback. Such transitive associations are a valuable source o ..."
Abstract
-
Cited by 66 (10 self)
- Add to MetaCart
this article, we propose to deal with this sparsity problem by applying an associative retrieval framework and related spreading activation algorithms to explore transitive associations among consumers through their past transactions and feedback. Such transitive associations are a valuable source of information to help infer consumer interests and can be explored to deal with the sparsity problem. To evaluate the effectiveness of our approach, we have conducted an experimental study using a data set from an online bookstore. We experimented with three spreading activation algorithms including a constrained Leaky Capacitor algorithm, a branch-and-bound serial symbolic search algorithm, and a Hopfield net parallel relaxation search algorithm. These algorithms were compared with several collaborative filtering approaches that do not consider the transitive associations: a simple graph search approach, two variations of the user-based approach, and an item-based approach. Our experimental results indicate that spreading activation-based approaches significantly outperformed the other collaborative filtering methods as measured by recommendation precision, recall, the F-measure, and the rank score. We also observed the over-activation effect of the spreading activation approach, that is, incorporating transitive associations with past transactional data that is not sparse may "dilute" the data used to infer user preferences and lead to degradation in recommendation performance
A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System
- Journal of the American Society for Information Science
, 1997
"... This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive stud!es related to the vcrcabulaw problem and vocabular ..."
Abstract
-
Cited by 56 (14 self)
- Add to MetaCart
This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive stud!es related to the vcrcabulaw problem and vocabulary-based search aids (thesauri) and then discuss technques for building robust and domain-specific thesauri to assist in cross-domain scientific information retrieval. Using a variation of the automatic thesaurus generation techniques, which we refer to as the concept space approach, we racentiy conducted an experiment in the molecular biology domain in whch we created a C. eksgans worm thesaurus of 7,657 worm-specific terms and a Drosophila fty thesaurus of 15,626 terms. About 30 % of these terms overtappad, which created vocabulary paths
Tuning a Corpus Analysis Approach for Automatic Query Expansion
, 1997
"... Searching online text collections can be both rewarding and frustrating. While valuable information can be found, typically many irrelevant documents are also retrieved and many relevant ones are missed. Terminology mismatches between the user's query and document contents are a main cause of retrie ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
Searching online text collections can be both rewarding and frustrating. While valuable information can be found, typically many irrelevant documents are also retrieved and many relevant ones are missed. Terminology mismatches between the user's query and document contents are a main cause of retrieval failures. Expanding a user's query with related words can improve search performance, but finding and using related words is an open problem. This research uses corpus analysis techniques to automatically discover similar words directly from the contents of the untagged databases. Using these similarities, user queries are automatically expanded, resulting in conceptual retrieval rather than requiring exact word matches between queries and documents. This work has been extended to multi-database collections where each sub-database has a collection-specific similarity matrix associated with it. If the best matrix is selected, substantial search improvements are possible. However, automati...
A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1996
"... : This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer as the concept space approach, we aimed to create gra ..."
Abstract
-
Cited by 37 (12 self)
- Add to MetaCart
: This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. We have experimented previously with such a technique for a smaller molecular biology domain (Worm Community System, with 10+ MBs of document collection) with encouraging results. In order to address the scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we recently conducted experiments using the concept sp...
Query Expansion by Mining User Logs
- IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING
, 2003
"... Queries to search engines on the Web are usually short. They do not provide sufficient evidence for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, expansion terms are usually determined on term co- ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
Queries to search engines on the Web are usually short. They do not provide sufficient evidence for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, expansion terms are usually determined on term co-occurrences within documents. In this study, we propose a new method for query expansion based on user interactions recorded in user logs. The central idea is to extract correlations between query terms and document terms by analyzing user logs. These correlations are then used to select high-quality expansion terms for new queries. Compared to
Searching the Web by Constrained Spreading Activation.
, 2000
"... Intelligent Information Retrieval is concerned with the application of intelligent techniques, like for example semantic networks, neural networks and inference nets to Information Retrieval. The eld of research has seen a number of applications of Constrained Spreading Activation (CSA) techniques ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
Intelligent Information Retrieval is concerned with the application of intelligent techniques, like for example semantic networks, neural networks and inference nets to Information Retrieval. The eld of research has seen a number of applications of Constrained Spreading Activation (CSA) techniques on domain knowledge networks. However, there has never been any application of these techniques to the World Wide Web. The Web is a very important information resource, but users nd that looking for a relevant piece of information in the Web can be like "looking for a needle in a haystack". We were therefore motivated to design and develop a prototype system, WebSCSA (Web Search by CSA), that applies a CSA technique to retrieve information from the Web using an ostensive approach to querying similar to query-by-example. In this paper we describe the system and its underlying model. Furthermore, we report on an experiment carried out with human subjects to evaluate the e ectiveness of WebSCSA. We tested whether WebSCSA improves retrieval of relevant information on top of Web search engines results and how well WebSCSA serves as an agent browser for the user. The results of the experiments are promising, and show that there is much potential for further research on the use of CSA techniques to search the Web.
Solving The Word Mismatch Problem Through Automatic Text Analysis
, 1997
"... Information Retrieval (IR) is concerned with locating documents that are relevant for a user's information need or query from a large collection of documents. A fundamental problem for information retrieval is word mismatch. A query is usually a short and incomplete description of the underlying inf ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Information Retrieval (IR) is concerned with locating documents that are relevant for a user's information need or query from a large collection of documents. A fundamental problem for information retrieval is word mismatch. A query is usually a short and incomplete description of the underlying information need. The users of IR systems and the authors of the documents often use different words to refer to the same concepts. This thesis addresses the word mismatch problem through automatic text analysis. We investigate two text analysis techniques, corpus analysis and local context analysis, and apply them in two domains of word mismatch, stemming and general query expansion. Experimental results show that these techniques ca...

