Results 1 -
3 of
3
A Probabilistic Learning Approach for Document Indexing
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 1991
"... We describe a method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries. Our approach is based on three new concepts: (1) Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information fo ..."
Abstract
-
Cited by 84 (12 self)
- Add to MetaCart
We describe a method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries. Our approach is based on three new concepts: (1) Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information for parameter estimation. (2) Flexibility of the representation, which allows the integration of new text analysis and knowledge-based methods in our approach as well as the consideration of document structures or different types of terms. (3) Probabilistic learning or classification methods for the estimation of the indexing weights making better use of the available relevance information. Our approach can be applied under restrictions that hold for real applications. We give experimental results for five test collections which show improvements over other indexing methods.
Models for retrieval with probabilistic indexing
- Information Processing and Management
, 1989
"... Abstract- in this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing @II) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descri ..."
Abstract
-
Cited by 78 (14 self)
- Add to MetaCart
Abstract- in this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing @II) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descriptor in a document is an estimate of the proba-bility of relevance of this document with respect to queries using this descriptor. Sec-ond is the retrieval-with-probabilistic-indexing (RPI) model, which is suited to different kinds of probabilistic indexing. For that we assume that each indexing scheme has its own concept of “correctness ” to which the probabilities relate. In addition to the prob-abilistic indexing weights, the RPI model provides the possibility of reIevance weight-ing of search terms. A third mode1 that is similar was proposed by Croft some years ago as an extension of the binary independence retrieval model but it can be shown that this model is not based on the probabilistic ranking principle. The probabilistic indexing weights required for any of these models can be provided by an application of the Darm-stadt indexing approach (DIA) for indexing with descriptors from a controlled vocabu-Iary. The experimental results show signi~cant improvements over retrieval with binary indexing. Finally, suggestions are made regarding how the DIA can be applied to prob-abilistic indexing with free text terms. 1.
Optimum Probability Estimation from Empirical Distributions
- Information Processing and Management
, 1989
"... Probability estimation is important for the application of probabilistic models as well as for any evaluation in IR. We discuss the interdependencies between parameter estimation and certain properties of probabilistic models: dependence assumptions, binary vs. nonbinary features, estimation sample ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Probability estimation is important for the application of probabilistic models as well as for any evaluation in IR. We discuss the interdependencies between parameter estimation and certain properties of probabilistic models: dependence assumptions, binary vs. nonbinary features, estimation sample selection. Then we define an optimum estimate for binary features which can be applied to various typical estimation problems in IR. A method for computing this estimate using empirical data is described. Some experiments show the applicability of our method, whereas comparable approaches are partially based on false assumptions or yield biased estimates. 1 Parameter estimation in IR In IR the development of theoretical models and their evaluation in experiments is of equal importance: A model which cannot be evaluated (applied) is of very little use, while an evaluation can show its weaknesses and strengths and give evidence for further developments. As will be discussed below, any evaluation in IR involves some kind of parameter estimation, even for non-probabilistic models. So it is interesting to note that the problem of parameter estimation has been discussed only by a few authors ( [Rijsbergen 77], [Robertson & Bovey 82], [Bookstein 83], [?]). In this paper, an attempt is

