Results 1 -
4 of
4
AIR/X - a Rule-Based Multistage Indexing System for Large Subject Fields
- Proceedings of RIAO'91
, 1991
"... AIR/X is a rule-based system for indexing with terms (descriptors) from a prescribed vocabulary. For this task, an indexing dictionary with rules for mapping terms from the text onto descriptors is required, which can be derived automatically from a set of manually indexed documents. Based on the ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
AIR/X is a rule-based system for indexing with terms (descriptors) from a prescribed vocabulary. For this task, an indexing dictionary with rules for mapping terms from the text onto descriptors is required, which can be derived automatically from a set of manually indexed documents. Based on the Darmstadt Indexing Approach, the indexing task is divided into a description step and a decision step. First, terms (single words or phrases) are identified in the document text. With term-descriptor rules from the dictionary, descriptor indications are formed. The set of all indications from a document leading to the same descriptor is called a relevance description. A probabilistic classification procedure computes indexing weights for each relevance description. Since the whole system is rule-based, it can be adapted to different subject fields by appropriate modifications of the rule bases. A major application of AIR/X is the AIR/PHYS system developed for a large physics database. This application is described in more detail along with experimental results.
Probabilistic Information Retrieval as Combination of Abstraction, Inductive Learning and Probabilistic Assumptions
, 1994
"... We show that former approaches in probabilistic information retrieval are based on one or two of the three concepts abstraction, inductive learning and probabilistic assumptions, and we propose a new approach which combines all three concepts. This approach is illustrated for the case of indexing ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
We show that former approaches in probabilistic information retrieval are based on one or two of the three concepts abstraction, inductive learning and probabilistic assumptions, and we propose a new approach which combines all three concepts. This approach is illustrated for the case of indexing with a controlled ...
Combining Model-Oriented and Description-Oriented Approaches for Probabilistic Indexing
"... We distinguish model-oriented and description-oriented approaches in probabilistic information retrieval. The former refer to certain representations of documents and queries and use additional independence assumptions, whereas the latter map documents and queries onto feature vectors which form the ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
We distinguish model-oriented and description-oriented approaches in probabilistic information retrieval. The former refer to certain representations of documents and queries and use additional independence assumptions, whereas the latter map documents and queries onto feature vectors which form the input to certain classification procedures or regression methods. Descriptionoriented approaches are more flexible with respect to the underlying representations, but the definition of the feature vector is a heuristic step. In this paper, we combine a probabilistic model for the Darmstadt Indexing Approach with logistic regression. Here the probabilistic model forms a guideline for the definition of the feature vector. Experiments with the purely theoretical approach and with several heuristic variations show that heuristic assumptions may yield significant improvements.
Automatic Indexing in Operation: The Rule-Based System AIR/X for Large Subject Fields
, 1993
"... AIR/X is a rule-based system for automatic indexing with a controlled vocabulary. The indexing process consists of several stages, with specific rule bases involved in each stage. Most of these rule bases are constructed automatically, especially the large number of term-descriptor rules. We describ ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
AIR/X is a rule-based system for automatic indexing with a controlled vocabulary. The indexing process consists of several stages, with specific rule bases involved in each stage. Most of these rule bases are constructed automatically, especially the large number of term-descriptor rules. We describe the different stages and the overall architecture of the system. Then we present a specific application, the AIR/PHYS system developed for a large physics database. We illustrate the system by giving a detailed example and present experimental results for different system parameter settings. 1 Introduction The AIR/X system described in this paper performs an automatic indexing with index terms (called descriptors here) from a controlled vocabulary. The texts to be indexed are abstracts written in English. The indexing process consists of several stages, with specific rule bases involved in each stage. In order to cope with large subject fields, appropriate rule bases have to be developed....

