We describe a method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries. Our approach is based on three new concepts: (1) Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information for parameter estimation. (2) Flexibility of the representation, which allows the integration of new text analysis and knowledge-based methods in our approach as well as the consideration of document structures or different types of terms. (3) Probabilistic learning or classification methods for the estimation of the indexing weights making better use of the available relevance information. Our approach can be applied under restrictions that hold for real applications. We give experimental results for five test collections which show improvements over other indexing methods.
|
915
|
Term-weighting approaches in automatic text retrieval
– Salton, Buckley
- 1988
|
|
411
|
Relevance Weighting of Search Terms
– Robertson, Sparck-Jones
- 1976
|
|
349
|
Approximating discrete probability distributions with dependence trees
– Chow, Liu
- 1968
|
|
189
|
Inference networks for document retrieval
– Turtle, Croft
- 1990
|
|
128
|
Experiments in automatic phrase indexing for document retrieval: a comparison of syntactic and non-syntactic methods
– Fagan
- 1987
|
|
127
|
On relevance, probabilistic indexing, and information retrieval
– Maron, Kuhns
- 1960
|
|
93
|
The e ect of noise on concept learning
– Quinlan
- 1986
|
|
90
|
A theoretical basis for the use of co-occurrence data in information retrieval
– Rijsbergen
- 1977
|
|
78
|
Models for retrieval with probabilistic indexing
– Fuhr
- 1989
|
|
66
|
A theory of term importance in automatic text analysis
– Salton, Yang, et al.
- 1975
|
|
59
|
The Eectiveness of a Nonsyntactic Approach to Automatic Phrase Indexing for Document Retrieval
– Fagan
- 1989
|
|
58
|
Probabilistic and genetic algorithms for document retrieval
– Gordon
- 1988
|
|
40
|
Probabilistic models of indexing and searching
– Robertson, van-Rijsbergen, et al.
- 1981
|
|
32
|
A Neural Network for the Probabilistic Information Retrieval
– Kwok
- 1989
|
|
32
|
Probability of relevance: a unification of two competing models for information retrieval
– Robertson, Maron, et al.
- 1982
|
|
28
|
Optimum polynomial retrieval functions based on the probability ranking principle
– Fuhr
- 1989
|
|
27
|
Experiments with Representation in a Document Retrieval System
– Croft
- 1983
|
|
24
|
Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data
– Wong, Chiu
- 1987
|
|
23
|
Boolean Queries and Term Dependencies in Probabilistic Retrieval Models
– Croft
- 1986
|
|
23
|
Precision weighting - an effective automatic indexing method
– Yu, Salton
- 1976
|
|
21
|
The automatic indexing system AIR/PHYS — from research to application
– Biebricher, Fuhr, et al.
- 1988
|
|
20
|
A probability distribution model for information retrieval
– Wong, Yao
- 1989
|
|
18
|
Document representation in probabilistic models of information retrieval
– Croft
- 1981
|
|
16
|
Applied Categorical Data Analysis
– Freeman
- 1987
|
|
11
|
Applied Categorial Data Analysis
– Freeman
- 1987
|
|
10
|
Automatisches Indexieren als Erkennen abstrakter Objekte
– Knorz
- 1983
|
|
8
|
Two learning schemes in information retrieval
– Yu, Mizuno
- 1988
|
|
7
|
SILOL: A simple logical-linguistic document retrieval system
– Sembok, Rijsbergen
- 1990
|
|
6
|
Development of log-linear and linear-iterative indexing functions (in german
– Pfeifer
- 1990
|
|
6
|
Incorporating Syntactic Information into a Document Retrieval Strategy: An Investigation
– Smeaton
- 1986
|
|
6
|
The automatic indexing system AIR/PHYS---from research to application
– Biebricher, Fuhr, et al.
- 1988
|
|
4
|
Probabilistic approaches to the document retrieval problem
– Maron
- 1983
|
|
4
|
Approximation of Discrete Probability Distributions by Dependence Trees and their Application as Indexing Functions
– Tietze
- 1989
|
|
3
|
Development of Indexing Functions Based on Probabilistic Decision Trees (in German
– Fait
- 1990
|
|
3
|
Probabilistisches indexing und retrieval
– Fuhr
- 1988
|
|
2
|
Experiments with document components for indexing and retrieval
– Kwok, Kuan
- 1988
|
|
1
|
Indexieren mit dem system daisy
– Beinke-Geiser, Lustig, et al.
- 1986
|
|
1
|
Entwicklung und anwendung des automatischen indexierungssystems air/phys. Nachrichten fuer Dokumentation
– Biebricher, Fuhr, et al.
- 1988
|
|
1
|
An interpretation of index term weighting schemes based on document components
– Kwok
- 1986
|
|
1
|
Development of indexing functions based on probabilistic decision trees (in german
– FAISST
- 1990
|
|
1
|
Automatisches Zndexieren als Erkennen Abstracter Objekte
– KNORZ
- 1983
|
|
1
|
Development of log-linear and linear-iterative mdexmg functions (in german). Diploma thesis, TH Darmstadt, FB Informatik, Datenverwaltungssy steme
– PFEIFER
- 1990
|
|
1
|
Approximation of discrete probabihty distributions by dependence trees and their application as indexing functions (m german
– TIETZE
- 1989
|