Results 1 -
8 of
8
Probabilistic Models in Information Retrieval
- The Computer Journal
, 1992
"... In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR alon ..."
Abstract
-
Cited by 87 (4 self)
- Add to MetaCart
In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR along with the corresponding event space clarify the interpretation of the probabilistic parameters involved. For the estimation of these parameters, three different learning strategies are distinguished, namely query-related, document-related and description-related learning. As a representative for each of these strategies, a specific model is described. A new approach regards IR as uncertain inference; here, imaging is used as a new technique for estimating the probabilistic parameters, and probabilistic inference networks support more complex forms of inference. Finally, the more general problems of parameter estimation, query expansion and the development of models for advanced document representations are discussed.
A Probabilistic Learning Approach for Document Indexing
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 1991
"... We describe a method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries. Our approach is based on three new concepts: (1) Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information fo ..."
Abstract
-
Cited by 84 (12 self)
- Add to MetaCart
We describe a method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries. Our approach is based on three new concepts: (1) Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information for parameter estimation. (2) Flexibility of the representation, which allows the integration of new text analysis and knowledge-based methods in our approach as well as the consideration of document structures or different types of terms. (3) Probabilistic learning or classification methods for the estimation of the indexing weights making better use of the available relevance information. Our approach can be applied under restrictions that hold for real applications. We give experimental results for five test collections which show improvements over other indexing methods.
Models for retrieval with probabilistic indexing
- Information Processing and Management
, 1989
"... Abstract- in this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing @II) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descri ..."
Abstract
-
Cited by 78 (14 self)
- Add to MetaCart
Abstract- in this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing @II) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descriptor in a document is an estimate of the proba-bility of relevance of this document with respect to queries using this descriptor. Sec-ond is the retrieval-with-probabilistic-indexing (RPI) model, which is suited to different kinds of probabilistic indexing. For that we assume that each indexing scheme has its own concept of “correctness ” to which the probabilities relate. In addition to the prob-abilistic indexing weights, the RPI model provides the possibility of reIevance weight-ing of search terms. A third mode1 that is similar was proposed by Croft some years ago as an extension of the binary independence retrieval model but it can be shown that this model is not based on the probabilistic ranking principle. The probabilistic indexing weights required for any of these models can be provided by an application of the Darm-stadt indexing approach (DIA) for indexing with descriptors from a controlled vocabu-Iary. The experimental results show signi~cant improvements over retrieval with binary indexing. Finally, suggestions are made regarding how the DIA can be applied to prob-abilistic indexing with free text terms. 1.
A probabilistic framework for vague queries and imprecise information in databases
- PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES
, 1990
"... A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objec ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objects retrieved, the ranking for the actual query as well as the overall retrieval quality of the system can be further improved. For specifying different kinds of conditions in vague queries, the notion of vague pred-icates is introduced. Based on the underlying probabilistic model, also imprecise or missing attribute values can be treated easily. In addition, the corresponding formulas can be applied in combination with standard predicates (from two-valued logic), thus extending standard database systems for coping with missing or imprecise data.
Automatic Indexing Based on Bayesian Inference Networks
- IN PROC 16TH ANN INT ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'93
, 1993
"... In this paper, a Bayesian inference network model for automatic indexing with index terms (descriptors) from a prescribed vocabulary is presented. It requires an indexing dictionary with rules mapping terms of the respective subject field onto descriptors and inverted lists for terms occuring in a s ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
In this paper, a Bayesian inference network model for automatic indexing with index terms (descriptors) from a prescribed vocabulary is presented. It requires an indexing dictionary with rules mapping terms of the respective subject field onto descriptors and inverted lists for terms occuring in a set of documents of the subject field and descriptors manually assigned to these documents. The indexing dictionary can be derived automatically from a set of manually indexed documents. An application of the network model is described, followed by an indexing example and some experimental results about the indexing performance of the network model.
AIR/X - a Rule-Based Multistage Indexing System for Large Subject Fields
- Proceedings of RIAO'91
, 1991
"... AIR/X is a rule-based system for indexing with terms (descriptors) from a prescribed vocabulary. For this task, an indexing dictionary with rules for mapping terms from the text onto descriptors is required, which can be derived automatically from a set of manually indexed documents. Based on the ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
AIR/X is a rule-based system for indexing with terms (descriptors) from a prescribed vocabulary. For this task, an indexing dictionary with rules for mapping terms from the text onto descriptors is required, which can be derived automatically from a set of manually indexed documents. Based on the Darmstadt Indexing Approach, the indexing task is divided into a description step and a decision step. First, terms (single words or phrases) are identified in the document text. With term-descriptor rules from the dictionary, descriptor indications are formed. The set of all indications from a document leading to the same descriptor is called a relevance description. A probabilistic classification procedure computes indexing weights for each relevance description. Since the whole system is rule-based, it can be adapted to different subject fields by appropriate modifications of the rule bases. A major application of AIR/X is the AIR/PHYS system developed for a large physics database. This application is described in more detail along with experimental results.
Combining Model-Oriented and Description-Oriented Approaches for Probabilistic Indexing
"... We distinguish model-oriented and description-oriented approaches in probabilistic information retrieval. The former refer to certain representations of documents and queries and use additional independence assumptions, whereas the latter map documents and queries onto feature vectors which form the ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
We distinguish model-oriented and description-oriented approaches in probabilistic information retrieval. The former refer to certain representations of documents and queries and use additional independence assumptions, whereas the latter map documents and queries onto feature vectors which form the input to certain classification procedures or regression methods. Descriptionoriented approaches are more flexible with respect to the underlying representations, but the definition of the feature vector is a heuristic step. In this paper, we combine a probabilistic model for the Darmstadt Indexing Approach with logistic regression. Here the probabilistic model forms a guideline for the definition of the feature vector. Experiments with the purely theoretical approach and with several heuristic variations show that heuristic assumptions may yield significant improvements.
Automatic Indexing in Operation: The Rule-Based System AIR/X for Large Subject Fields
, 1993
"... AIR/X is a rule-based system for automatic indexing with a controlled vocabulary. The indexing process consists of several stages, with specific rule bases involved in each stage. Most of these rule bases are constructed automatically, especially the large number of term-descriptor rules. We describ ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
AIR/X is a rule-based system for automatic indexing with a controlled vocabulary. The indexing process consists of several stages, with specific rule bases involved in each stage. Most of these rule bases are constructed automatically, especially the large number of term-descriptor rules. We describe the different stages and the overall architecture of the system. Then we present a specific application, the AIR/PHYS system developed for a large physics database. We illustrate the system by giving a detailed example and present experimental results for different system parameter settings. 1 Introduction The AIR/X system described in this paper performs an automatic indexing with index terms (called descriptors here) from a controlled vocabulary. The texts to be indexed are abstracts written in English. The indexing process consists of several stages, with specific rule bases involved in each stage. In order to cope with large subject fields, appropriate rule bases have to be developed....

