Results 1 - 10
of
31
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
, 1998
"... The naive Bayes classifier, currently experiencing a renaissance in machine learning, has long been a core technique in information retrieval. We review some of the variations of naive Bayes models used for text retrieval and classification, focusing on the distributional assump- tions made abou ..."
Abstract
-
Cited by 268 (1 self)
- Add to MetaCart
The naive Bayes classifier, currently experiencing a renaissance in machine learning, has long been a core technique in information retrieval. We review some of the variations of naive Bayes models used for text retrieval and classification, focusing on the distributional assump- tions made about word occurrences in documents.
A Probabilistic Model of Information Retrieval: Development and Status
, 1998
"... The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Eac ..."
Abstract
-
Cited by 206 (16 self)
- Add to MetaCart
The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Each step in the argument is matched by comparative retrieval tests, to provide a single coherent account of a major line of research. The experiments demonstrate, for a large test collection, that the probabilistic model is effective and robust, and that it responds appropriately, with major improvements in performance, to key features of retrieval situations.
The limitations of term co-occurrence data for query expansion in document retrieval systems
- Journal of the American Society for Information Science
, 1991
"... Term cooccurrence data has been extensively used in document retrieval systems for the identification of indexing terms that are similar to those that have been specified in a user query: these similar terms can then be used to augment the original query statement. Despite the plausibility of this a ..."
Abstract
-
Cited by 82 (0 self)
- Add to MetaCart
Term cooccurrence data has been extensively used in document retrieval systems for the identification of indexing terms that are similar to those that have been specified in a user query: these similar terms can then be used to augment the original query statement. Despite the plausibility of this approach to query expan-sion, the retrieval effectiveness of the expanded que-ries is often no greater than, or even less than, the effectiveness of the unexpanded queries. This article demonstrates that the similar terms identified by cooc-currence data in a query expansion system tend to occur very frequently in the database that is being searched. Unfortunately, frequent terms tend to discrimi-nate poorly between relevant and nonrelevant docu-ments, and the general effect of query expansion is thus to add terms that do little or nothing to improve the dis-criminatory power of the original query.
Models for retrieval with probabilistic indexing
- Information Processing and Management
, 1989
"... Abstract- in this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing @II) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descri ..."
Abstract
-
Cited by 78 (14 self)
- Add to MetaCart
Abstract- in this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing @II) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descriptor in a document is an estimate of the proba-bility of relevance of this document with respect to queries using this descriptor. Sec-ond is the retrieval-with-probabilistic-indexing (RPI) model, which is suited to different kinds of probabilistic indexing. For that we assume that each indexing scheme has its own concept of “correctness ” to which the probabilities relate. In addition to the prob-abilistic indexing weights, the RPI model provides the possibility of reIevance weight-ing of search terms. A third mode1 that is similar was proposed by Croft some years ago as an extension of the binary independence retrieval model but it can be shown that this model is not based on the probabilistic ranking principle. The probabilistic indexing weights required for any of these models can be provided by an application of the Darm-stadt indexing approach (DIA) for indexing with descriptors from a controlled vocabu-Iary. The experimental results show signi~cant improvements over retrieval with binary indexing. Finally, suggestions are made regarding how the DIA can be applied to prob-abilistic indexing with free text terms. 1.
A Theory of Term Weighting Based on Exploratory Data Analysis
- Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1998
"... Techniques of exploratory data analysis are used to study the weight of evidence that the occurrence of a query term provides in support of the hypothesis that a document is relevant to an information need. In particular, the relationship between the document frequency and the weight of evidence is ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
Techniques of exploratory data analysis are used to study the weight of evidence that the occurrence of a query term provides in support of the hypothesis that a document is relevant to an information need. In particular, the relationship between the document frequency and the weight of evidence is investigated. A correlation between document frequency normalized by collection size and the mutual information between relevance and term occurrence is uncovered. This correlation is found to be robust across a variety of query sets and document collections. Based on this relationship, a theoretical explanation of the efficacy of inverse document frequency for term weighting is developed which differs in both style and content from theories previously put forth. The theory predicts that a "flattening" of idf at both low and high frequency should result in improved retrieval performance. This altered idf formulation is tested on all TREC query sets. Retrieval results corroborate the predicti...
Learning in Intelligent Information Retrieval
- In Proceedings of the Eighth International Workshop on Machine Learning
, 1991
"... Information retrieval (IR) systems are used for finding, within a large text database, those documents containing information needed by a user. The complex and poorly understood semantics of documents and user queries has made feedback and adaptation important characteristics of IR systems. In this ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Information retrieval (IR) systems are used for finding, within a large text database, those documents containing information needed by a user. The complex and poorly understood semantics of documents and user queries has made feedback and adaptation important characteristics of IR systems. In this paper we briefly survey previous research on machine learning in IR systems and discuss promising areas for future research at the intersection of these two fields. 1 Introduction The goal of information retrieval (IR) techniques is to find, within a large database of documents, those documents which satisfy a user information need. Typically the stored documents are composed of natural language text, though IR techniques have also been applied to databases of stored speech, images, computer source code, and other forms of information. In contrast to conventional database techniques, IR techniques are most useful when the semantics of the objects to be retrieved is unclear, and the relation...
User choices: A new yardstick for the evaluation of ranking algorithms for interactive query expansion
- Information Processing and Management
, 1995
"... Abstract--The performance of eight ranking algorithms was evaluated with respect to their effectiveness in ranking terms for query expansion. The evaluation was conducted within an investigation of interactive query expansion and relevance feedback in a real operational environment. This study focus ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Abstract--The performance of eight ranking algorithms was evaluated with respect to their effectiveness in ranking terms for query expansion. The evaluation was conducted within an investigation of interactive query expansion and relevance feedback in a real operational environment. This study focuses on the identification of algorithms that most effectively take cognizance of user preferences. User choices (i.e. the terms selected by the searchers for the query expansion search) provided the yardstick for the evaluation of the eight ranking algorithms. This methodology introduces a user-oriented approach in evaluating ranking algorithms for query expansion in contrast to the standard, system-oriented approaches. Similarities in the performance of the eight algorithms and the ways that these algorithms rank terms were the main focus of this evaluation. The findings demonstrate that the r-lohi, wpq, emim, and porter algorithms have similar performance in bringing good terms to the top of a ranked list of terms for query expansion. However, further evaluation of the algorithms in different (e.g. full-text) environments is needed before these results can be generalized beyond the context of the present study. 1.
The Effect of Accessing Non-Matching Documents on Relevance Feedback
- ACM Transactions on Information Systems
, 1997
"... Traditional information retrieval (IR)... This paper shows that, in systems that allow access to non-matching documents (e.g. hybrid hypertext and information retrieval systems), the strength of the effect of giving relevance feedback varies between matching and non-matching documents. For positive ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Traditional information retrieval (IR)... This paper shows that, in systems that allow access to non-matching documents (e.g. hybrid hypertext and information retrieval systems), the strength of the effect of giving relevance feedback varies between matching and non-matching documents. For positive feedback the results shown here are encouraging as they can be justified by an intuitive view of the process. However, for negative feedback the results show behaviour that cannot easily be justified and that varies greatly depending on the model of feedback used.

