Results 1 -
4 of
4
Models for retrieval with probabilistic indexing
- Information Processing and Management
, 1989
"... Abstract- in this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing @II) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descri ..."
Abstract
-
Cited by 78 (14 self)
- Add to MetaCart
Abstract- in this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing @II) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descriptor in a document is an estimate of the proba-bility of relevance of this document with respect to queries using this descriptor. Sec-ond is the retrieval-with-probabilistic-indexing (RPI) model, which is suited to different kinds of probabilistic indexing. For that we assume that each indexing scheme has its own concept of “correctness ” to which the probabilities relate. In addition to the prob-abilistic indexing weights, the RPI model provides the possibility of reIevance weight-ing of search terms. A third mode1 that is similar was proposed by Croft some years ago as an extension of the binary independence retrieval model but it can be shown that this model is not based on the probabilistic ranking principle. The probabilistic indexing weights required for any of these models can be provided by an application of the Darm-stadt indexing approach (DIA) for indexing with descriptors from a controlled vocabu-Iary. The experimental results show signi~cant improvements over retrieval with binary indexing. Finally, suggestions are made regarding how the DIA can be applied to prob-abilistic indexing with free text terms. 1.
AIR/X - a Rule-Based Multistage Indexing System for Large Subject Fields
- Proceedings of RIAO'91
, 1991
"... AIR/X is a rule-based system for indexing with terms (descriptors) from a prescribed vocabulary. For this task, an indexing dictionary with rules for mapping terms from the text onto descriptors is required, which can be derived automatically from a set of manually indexed documents. Based on the ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
AIR/X is a rule-based system for indexing with terms (descriptors) from a prescribed vocabulary. For this task, an indexing dictionary with rules for mapping terms from the text onto descriptors is required, which can be derived automatically from a set of manually indexed documents. Based on the Darmstadt Indexing Approach, the indexing task is divided into a description step and a decision step. First, terms (single words or phrases) are identified in the document text. With term-descriptor rules from the dictionary, descriptor indications are formed. The set of all indications from a document leading to the same descriptor is called a relevance description. A probabilistic classification procedure computes indexing weights for each relevance description. Since the whole system is rule-based, it can be adapted to different subject fields by appropriate modifications of the rule bases. A major application of AIR/X is the AIR/PHYS system developed for a large physics database. This application is described in more detail along with experimental results.
Automatic Indexing in Operation: The Rule-Based System AIR/X for Large Subject Fields
, 1993
"... AIR/X is a rule-based system for automatic indexing with a controlled vocabulary. The indexing process consists of several stages, with specific rule bases involved in each stage. Most of these rule bases are constructed automatically, especially the large number of term-descriptor rules. We describ ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
AIR/X is a rule-based system for automatic indexing with a controlled vocabulary. The indexing process consists of several stages, with specific rule bases involved in each stage. Most of these rule bases are constructed automatically, especially the large number of term-descriptor rules. We describe the different stages and the overall architecture of the system. Then we present a specific application, the AIR/PHYS system developed for a large physics database. We illustrate the system by giving a detailed example and present experimental results for different system parameter settings. 1 Introduction The AIR/X system described in this paper performs an automatic indexing with index terms (called descriptors here) from a controlled vocabulary. The texts to be indexed are abstracts written in English. The indexing process consists of several stages, with specific rule bases involved in each stage. In order to cope with large subject fields, appropriate rule bases have to be developed....
An Information Retrieval View of Environmental Information Systems
- RBFT(G;H; u; v) 1 Let g 0 = fug and h 0 = fvg. 2 Let i = 1. 3 Repeat
, 1997
"... In the design of future enviromental systems, the semantics of the data as well as the kind of queries to those systems have to be considered. Enviromental data is frequently uncertain and incomplete. Heterogeneous data structures as well as multimedia data have to be managed by the system. For inte ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In the design of future enviromental systems, the semantics of the data as well as the kind of queries to those systems have to be considered. Enviromental data is frequently uncertain and incomplete. Heterogeneous data structures as well as multimedia data have to be managed by the system. For interactive queries, the system should allow vague queries and query formulations that are independent of the specific structure of the data and its representation. For vague queries and imprecise data, methods developed in information retrieval can be applied. Heterogeneous data structures can be handled with concepts from object-oriented database management systems. In multimedia information systems, the problem of full integration of the different media is yet unsolved, especially in case the information a user searches for is stored in different media. We claim that the retrieval interface offerd by current database management systems is not sufficient for interactive use. In addition, funct...

