Results 1 -
9 of
9
A Probabilistic Learning Approach for Document Indexing
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 1991
"... We describe a method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries. Our approach is based on three new concepts: (1) Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information fo ..."
Abstract
-
Cited by 84 (12 self)
- Add to MetaCart
We describe a method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries. Our approach is based on three new concepts: (1) Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information for parameter estimation. (2) Flexibility of the representation, which allows the integration of new text analysis and knowledge-based methods in our approach as well as the consideration of document structures or different types of terms. (3) Probabilistic learning or classification methods for the estimation of the indexing weights making better use of the available relevance information. Our approach can be applied under restrictions that hold for real applications. We give experimental results for five test collections which show improvements over other indexing methods.
A probabilistic framework for vague queries and imprecise information in databases
- PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES
, 1990
"... A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objec ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objects retrieved, the ranking for the actual query as well as the overall retrieval quality of the system can be further improved. For specifying different kinds of conditions in vague queries, the notion of vague pred-icates is introduced. Based on the underlying probabilistic model, also imprecise or missing attribute values can be treated easily. In addition, the corresponding formulas can be applied in combination with standard predicates (from two-valued logic), thus extending standard database systems for coping with missing or imprecise data.
AIR/X - a Rule-Based Multistage Indexing System for Large Subject Fields
- Proceedings of RIAO'91
, 1991
"... AIR/X is a rule-based system for indexing with terms (descriptors) from a prescribed vocabulary. For this task, an indexing dictionary with rules for mapping terms from the text onto descriptors is required, which can be derived automatically from a set of manually indexed documents. Based on the ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
AIR/X is a rule-based system for indexing with terms (descriptors) from a prescribed vocabulary. For this task, an indexing dictionary with rules for mapping terms from the text onto descriptors is required, which can be derived automatically from a set of manually indexed documents. Based on the Darmstadt Indexing Approach, the indexing task is divided into a description step and a decision step. First, terms (single words or phrases) are identified in the document text. With term-descriptor rules from the dictionary, descriptor indications are formed. The set of all indications from a document leading to the same descriptor is called a relevance description. A probabilistic classification procedure computes indexing weights for each relevance description. Since the whole system is rule-based, it can be adapted to different subject fields by appropriate modifications of the rule bases. A major application of AIR/X is the AIR/PHYS system developed for a large physics database. This application is described in more detail along with experimental results.
Probabilistic Information Retrieval as Combination of Abstraction, Inductive Learning and Probabilistic Assumptions
, 1994
"... We show that former approaches in probabilistic information retrieval are based on one or two of the three concepts abstraction, inductive learning and probabilistic assumptions, and we propose a new approach which combines all three concepts. This approach is illustrated for the case of indexing ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
We show that former approaches in probabilistic information retrieval are based on one or two of the three concepts abstraction, inductive learning and probabilistic assumptions, and we propose a new approach which combines all three concepts. This approach is illustrated for the case of indexing with a controlled ...
Combining Model-Oriented and Description-Oriented Approaches for Probabilistic Indexing
"... We distinguish model-oriented and description-oriented approaches in probabilistic information retrieval. The former refer to certain representations of documents and queries and use additional independence assumptions, whereas the latter map documents and queries onto feature vectors which form the ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
We distinguish model-oriented and description-oriented approaches in probabilistic information retrieval. The former refer to certain representations of documents and queries and use additional independence assumptions, whereas the latter map documents and queries onto feature vectors which form the input to certain classification procedures or regression methods. Descriptionoriented approaches are more flexible with respect to the underlying representations, but the definition of the feature vector is a heuristic step. In this paper, we combine a probabilistic model for the Darmstadt Indexing Approach with logistic regression. Here the probabilistic model forms a guideline for the definition of the feature vector. Experiments with the purely theoretical approach and with several heuristic variations show that heuristic assumptions may yield significant improvements.
From uncertain inference to probability of relevance for advanced IR applications
- 25th European Conference on Information Retrieval Research (ECIR 2003
, 2003
"... Abstract. Uncertain inference is a probabilistic generalisation of the logical view on databases, ranking documents according to their probabilities that they logically imply the query. For tasks other than ad-hoc retrieval, estimates of the actual probability of relevance are required. In this pape ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Abstract. Uncertain inference is a probabilistic generalisation of the logical view on databases, ranking documents according to their probabilities that they logically imply the query. For tasks other than ad-hoc retrieval, estimates of the actual probability of relevance are required. In this paper, we investigate mapping functions between these two types of probability. For this purpose, we consider linear and logistic functions. The former have been proposed before, whereas we give a new theoretic justification for the latter. In a series of upper-bound experiments, we compare the goodness of fit of the two models. A second series of experiments investigates the effect on the resulting retrieval quality in the fusion step of distributed retrieval. These experiments show that good estimates of the actual probability of relevance can be achieved, and the logistic model outperforms the linear one. However, retrieval quality for distributed retrieval (only merging, without resource selection) is only slightly improved by using the logistic function. 1
Automatic Indexing in Operation: The Rule-Based System AIR/X for Large Subject Fields
, 1993
"... AIR/X is a rule-based system for automatic indexing with a controlled vocabulary. The indexing process consists of several stages, with specific rule bases involved in each stage. Most of these rule bases are constructed automatically, especially the large number of term-descriptor rules. We describ ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
AIR/X is a rule-based system for automatic indexing with a controlled vocabulary. The indexing process consists of several stages, with specific rule bases involved in each stage. Most of these rule bases are constructed automatically, especially the large number of term-descriptor rules. We describe the different stages and the overall architecture of the system. Then we present a specific application, the AIR/PHYS system developed for a large physics database. We illustrate the system by giving a detailed example and present experimental results for different system parameter settings. 1 Introduction The AIR/X system described in this paper performs an automatic indexing with index terms (called descriptors here) from a controlled vocabulary. The texts to be indexed are abstracts written in English. The indexing process consists of several stages, with specific rule bases involved in each stage. In order to cope with large subject fields, appropriate rule bases have to be developed....
Predicting Retrieval Quality for Resource Selection in Distributed Information Retrieval
, 2002
"... In a federated digital library system, it is too expensive to query every accessible library. Resource selection is the task to decide to which libraries a query should be passed. Most existing resource selection algorithms compute a library ranking in a heuristic way. In this paper, we follow a dif ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In a federated digital library system, it is too expensive to query every accessible library. Resource selection is the task to decide to which libraries a query should be passed. Most existing resource selection algorithms compute a library ranking in a heuristic way. In this paper, we follow a different approach on a better theoretic foundation. The decision-theoretic approach tries to minimise the overall costs of the distributed retrieval. Costs here include---beside retrieval quality---time and money. We present different methods for estimating the retrieval quality of a library. We explore the relationship between the probability of inference and the probability of relevance, and approximate indexing weights with a normal distribution. We also evaluate the different methods on a large test-bed.
Ist Rd Project
"... This is the description of the MIND resource selection framework. It extends the decision- (for dissemination) theoretic framework formerly developed by UNIDO by using a logistic function for modelling the relationship between score and probability of relevance, and by approximating the indexing w ..."
Abstract
- Add to MetaCart
This is the description of the MIND resource selection framework. It extends the decision- (for dissemination) theoretic framework formerly developed by UNIDO by using a logistic function for modelling the relationship between score and probability of relevance, and by approximating the indexing weights (and, as a consequence, the document scores) by a normal distribution.

