Results 1 - 10
of
16
Improving retrieval performance by relevance feedback
- Journal of the American Society for Information Science
, 1990
"... Relevance feedback is an automatic process, introduced over 20 years ago, designed to produce improved query formulations following an initial retrieval operation. The principal relevance feedback methods described over the years are examined briefly, and evaluation data are included to demonstrate ..."
Abstract
-
Cited by 538 (6 self)
- Add to MetaCart
Relevance feedback is an automatic process, introduced over 20 years ago, designed to produce improved query formulations following an initial retrieval operation. The principal relevance feedback methods described over the years are examined briefly, and evaluation data are included to demonstrate the effectiveness of the various methods. Prescriptions are given for conducting text re-trieval operations iteratively using relevance feedback. Introduction to Relevance Feedback It is well known that the original query formulation process is not transparent to most information system users. In particular, without detailed knowledge of the collection make-up, and of the retrieval environment, most users find
Probabilistic Models for Information Retrieval based on Divergence from Randomness
- ACM Transactions on Information Systems
, 2002
"... We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a ra ..."
Abstract
-
Cited by 111 (5 self)
- Add to MetaCart
We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose–Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document–query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model.
Probabilistic Models in Information Retrieval
- The Computer Journal
, 1992
"... In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR alon ..."
Abstract
-
Cited by 87 (4 self)
- Add to MetaCart
In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR along with the corresponding event space clarify the interpretation of the probabilistic parameters involved. For the estimation of these parameters, three different learning strategies are distinguished, namely query-related, document-related and description-related learning. As a representative for each of these strategies, a specific model is described. A new approach regards IR as uncertain inference; here, imaging is used as a new technique for estimating the probabilistic parameters, and probabilistic inference networks support more complex forms of inference. Finally, the more general problems of parameter estimation, query expansion and the development of models for advanced document representations are discussed.
The limitations of term co-occurrence data for query expansion in document retrieval systems
- Journal of the American Society for Information Science
, 1991
"... Term cooccurrence data has been extensively used in document retrieval systems for the identification of indexing terms that are similar to those that have been specified in a user query: these similar terms can then be used to augment the original query statement. Despite the plausibility of this a ..."
Abstract
-
Cited by 82 (0 self)
- Add to MetaCart
Term cooccurrence data has been extensively used in document retrieval systems for the identification of indexing terms that are similar to those that have been specified in a user query: these similar terms can then be used to augment the original query statement. Despite the plausibility of this approach to query expan-sion, the retrieval effectiveness of the expanded que-ries is often no greater than, or even less than, the effectiveness of the unexpanded queries. This article demonstrates that the similar terms identified by cooc-currence data in a query expansion system tend to occur very frequently in the database that is being searched. Unfortunately, frequent terms tend to discrimi-nate poorly between relevant and nonrelevant docu-ments, and the general effect of query expansion is thus to add terms that do little or nothing to improve the dis-criminatory power of the original query.
User choices: A new yardstick for the evaluation of ranking algorithms for interactive query expansion
- Information Processing and Management
, 1995
"... Abstract--The performance of eight ranking algorithms was evaluated with respect to their effectiveness in ranking terms for query expansion. The evaluation was conducted within an investigation of interactive query expansion and relevance feedback in a real operational environment. This study focus ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Abstract--The performance of eight ranking algorithms was evaluated with respect to their effectiveness in ranking terms for query expansion. The evaluation was conducted within an investigation of interactive query expansion and relevance feedback in a real operational environment. This study focuses on the identification of algorithms that most effectively take cognizance of user preferences. User choices (i.e. the terms selected by the searchers for the query expansion search) provided the yardstick for the evaluation of the eight ranking algorithms. This methodology introduces a user-oriented approach in evaluating ranking algorithms for query expansion in contrast to the standard, system-oriented approaches. Similarities in the performance of the eight algorithms and the ways that these algorithms rank terms were the main focus of this evaluation. The findings demonstrate that the r-lohi, wpq, emim, and porter algorithms have similar performance in bringing good terms to the top of a ranked list of terms for query expansion. However, further evaluation of the algorithms in different (e.g. full-text) environments is needed before these results can be generalized beyond the context of the present study. 1.
Evaluating implicit feedback models using searcher simulations
- ACM Transactions on Information Systems
, 2005
"... In this article we describe an evaluation of relevance feedback (RF) algorithms using searcher simulations. Since these algorithms select additional terms for query modification based on inferences made from searcher interaction, not on relevance information searchers explicitly provide (as in tradi ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
In this article we describe an evaluation of relevance feedback (RF) algorithms using searcher simulations. Since these algorithms select additional terms for query modification based on inferences made from searcher interaction, not on relevance information searchers explicitly provide (as in traditional RF), we refer to them as implicit feedback models. Weintroduce six different models that base their decisions on the interactions of searchers and use different approaches to rank query modification terms. The aim of this article is to determine which of these models should be used to assist searchers in the systems we develop. To evaluate these models we used searcher simulations that afforded us more control over the experimental conditions than experiments with human subjects and allowed complex interaction to be modeled without the need for costly human experimentation. The simulation-based evaluation methodology measures how well the models learn the distribution of terms across relevant documents (i.e., learn what information is relevant) and how well they improve search effectiveness (i.e., create effective search queries). Our findings show that an implicit feedback model based on Jeffrey’s rule of conditioning outperformed other
The Effects Of Query Complexity, Expansion And Structure On Retrieval Performance In Probabilistic Text Retrieval
- University of Tampere
, 1999
"... ueries using all search facets identified from requests, low complexity was achieved by formulating queries with major facets only. Query expansion was based on a thesaurus, from which the expansion keys were elicited for queries. There were five expansion types: (1) the first query version was an u ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
ueries using all search facets identified from requests, low complexity was achieved by formulating queries with major facets only. Query expansion was based on a thesaurus, from which the expansion keys were elicited for queries. There were five expansion types: (1) the first query version was an unexpanded, original query with one search key for each search concept (original search concepts) elicited from the test thesaurus; (2) the synonyms of the original search keys were added to the original query; (3) search keys representing the narrower concepts of the original search concepts were added to the original query; (4) search keys representing the associative concepts of the original search concepts were added to the original query; (5) all previous expansion keys were cumulatively added to the original query. Query structure refers to the syntactic structure of a query expression, marked with query operators and parentheses. The structure of queries was either weak (queries with n
Combining Multiple Evidence from Different Relevance Feedback Methods
"... It has been known that using different representations of a query retrieves different sets of documents. ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
It has been known that using different representations of a query retrieves different sets of documents.
Effective Profiling of Consumer Information Retrieval Needs: A Unified Framework and Empirical Comparison
"... Due to the overwhelming volume of information that is increasingly available, many people rely on current awareness systems to keep abreast of the latest developments in the fields that they are interested in, as evidenced in the popularity of subscriptions to news-monitoring and digital library ser ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Due to the overwhelming volume of information that is increasingly available, many people rely on current awareness systems to keep abreast of the latest developments in the fields that they are interested in, as evidenced in the popularity of subscriptions to news-monitoring and digital library services. The success of these services, however, often requires effective acquisition of users' personal standing interests as represented in personal profiles. Our objective in this paper is twofold. First, we have introduced a new method for profile generation and compared it against other well-known methods. We have found promising results. Second, although there are various methods proposed in information retrieval and machine learning literature to address the issue of profiling, a unified framework and systematic cross-system comparison to help users, especially service providers, to determine the most effective way of profiling consumers is still lacking in the literature. In this paper, we try to fill the gap by looking at these methods from a more integrated point of view based on statistical contingency theory. Variations of these methods are then systematically tested on three well-known routing systems and results are analyzed and reported.
Optimum Probability Estimation from Empirical Distributions
- Information Processing and Management
, 1989
"... Probability estimation is important for the application of probabilistic models as well as for any evaluation in IR. We discuss the interdependencies between parameter estimation and certain properties of probabilistic models: dependence assumptions, binary vs. nonbinary features, estimation sample ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Probability estimation is important for the application of probabilistic models as well as for any evaluation in IR. We discuss the interdependencies between parameter estimation and certain properties of probabilistic models: dependence assumptions, binary vs. nonbinary features, estimation sample selection. Then we define an optimum estimate for binary features which can be applied to various typical estimation problems in IR. A method for computing this estimate using empirical data is described. Some experiments show the applicability of our method, whereas comparable approaches are partially based on false assumptions or yield biased estimates. 1 Parameter estimation in IR In IR the development of theoretical models and their evaluation in experiments is of equal importance: A model which cannot be evaluated (applied) is of very little use, while an evaluation can show its weaknesses and strengths and give evidence for further developments. As will be discussed below, any evaluation in IR involves some kind of parameter estimation, even for non-probabilistic models. So it is interesting to note that the problem of parameter estimation has been discussed only by a few authors ( [Rijsbergen 77], [Robertson & Bovey 82], [Bookstein 83], [?]). In this paper, an attempt is

