## A Theory of Term Weighting Based on Exploratory Data Analysis (1998)

Venue: | Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval |

Citations: | 41 - 1 self |

### BibTeX

@INPROCEEDINGS{Greiff98atheory,

author = {Warren R. Greiff},

title = {A Theory of Term Weighting Based on Exploratory Data Analysis},

booktitle = {Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval},

year = {1998},

pages = {11--19},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Techniques of exploratory data analysis are used to study the weight of evidence that the occurrence of a query term provides in support of the hypothesis that a document is relevant to an information need. In particular, the relationship between the document frequency and the weight of evidence is investigated. A correlation between document frequency normalized by collection size and the mutual information between relevance and term occurrence is uncovered. This correlation is found to be robust across a variety of query sets and document collections. Based on this relationship, a theoretical explanation of the efficacy of inverse document frequency for term weighting is developed which differs in both style and content from theories previously put forth. The theory predicts that a "flattening" of idf at both low and high frequency should result in improved retrieval performance. This altered idf formulation is tested on all TREC query sets. Retrieval results corroborate the predicti...

### Citations

3389 |
Introduction to Modern Information Retrieval
- Salton, McGill
- 1983
(Show Context)
Citation Context ...ishing documents of the collection from each other. Information theoretic considerations have also been used. In early work, information theory was used to derive a weight based on signal-noise ratio =-=[22]-=-. In [26], Wong and Yao develop a term weighting theory based on the entropy of a term's distribution in the collection. They show that idf weighting is easily derived as a special case of their more ... |

1074 |
Exploratory Data Analysis
- Tukey
- 1977
(Show Context)
Citation Context ...ction of large quantities of data to a few summary statistics erases most of the message the data have for us. EDA embodies a set of useful methods and strategies, fomented primarily by John W. Tukey =-=[24]-=-. For example, techniques for data smoothing and re-expression of variables have been used in the study presented in this article. 3 Related Work In 1972, Sparck Jones, convincingly demonstrated that ... |

648 |
Relevance weighting of search terms
- Robertson, Jones, et al.
- 1976
(Show Context)
Citation Context ...lity of mutual occurrence of multiple query terms [17]; thus providing theoretical arguments for the use of wsj . Together, in 1976, Robertson and Sparck Jones presented the Binary Independence Model =-=[18]-=-, in which terms are weighted by: wrsj = log p(occjrel) \Delta (1 \Gamma p(occjrel)) (1 \Gamma p(occjrel)) \Delta p(occjrel) (4) where p(occjrel) is the probability of the term occurring in relevant d... |

354 | The INQUERY retrieval system
- Callan, Croft, et al.
- 1992
(Show Context)
Citation Context ...s. The group at Berkeley has conducted extensive research into the use of logistic regression [10, 4]. Logistic regression is generally considered a natural approach for estimating a probability. The =-=[0; 1]-=- range that can be assumed by a probability does not correspond to other regression models, but is accounted for in logistic regression. Also, normality assumptions which are often behind the statisti... |

195 |
Overview of the first text retrieval conference
- Harman
- 1993
(Show Context)
Citation Context ... of the term as evidence in favor of relevance. The study involved data from queries 051-100 from the first Text REtrieval Conference (TREC) and the Associated Press (AP) documents from TREC volume 1 =-=[13]-=-. Each data point corresponds to one query term. The query terms were taken from the concepts field of the TREC 1 topics. For the purposes of uncovering underlying statistical regularities, we wanted ... |

174 |
Using probabilistic models of document retrieval without relevance information
- Croft, Harper
- 1979
(Show Context)
Citation Context ...ailability of relevance feedback information, on which estimates of the two conditional probabilities can be based. Applying the probabilistic approach of Robertson and Sparck Jones, Croft and Harper =-=[5]-=- work with an equivalent formulation of wrsj : wrsj = log p(occjrel) 1 \Gamma p(occjrel) \Gamma log p(occjrel) 1 \Gamma p(occjrel) (5) Their goal is the development of a probabilistically justified we... |

151 | Using statistics in lexical analysis
- Church, Gale, et al.
- 1991
(Show Context)
Citation Context ...as connections to information theory. Often referred to as mutual information, it has been used as a measure of variable dependence in both information retrieval [25, 6] and computational linguistics =-=[2]-=-. In a very important sense, it can be taken as a measure of the information about one event provided by the occurrence of another [7]. In our context, it can be taken as a measure the information abo... |

140 |
Transmission of Information: A Statistical Theory of Communication
- Fano
- 1961
(Show Context)
Citation Context ... both information retrieval [25, 6] and computational linguistics [2]. In a very important sense, it can be taken as a measure of the information about one event provided by the occurrence of another =-=[7]-=-. In our context, it can be taken as a measure the information about relevance provided by the occurrence of a query term. In what follows, we shall adopt the notation, MI(occ; rel) for this quantity,... |

116 |
Probability and the Weighing of Evidence
- Good
- 1950
(Show Context)
Citation Context ...and compare and contrast it with the work presented here. 2 Weight of Evidence & EDA Weight of Evidence I. J. Good formally defines the weight in favor of a hypothesis, h, provided by evidence, e, as =-=[12, 11]: woe(h : -=-e) = log O(hje) O(h) (1) which he thinks is a concept "almost as important as that of probability itself " [11, p. 249]. Good elucidates simple, natural desiderata for the formalization of t... |

105 | Overview of the fifth text retrieval conference
- Voorhees, Harman
- 1996
(Show Context)
Citation Context ...hat accounts for the observed flattening. To test this prediction, we compared retrieval performance of two versions of the INQUERY IR system [1] on each of the ad-hoc tasks for TREC 1 through TREC 6 =-=[14]-=-. Queries were formed by taking all words from both the title and description. All stopwords were removed, as were all duplicates. The baseline system used pure idf term weighting with idf = \Gamma lo... |

67 |
On relevance weights with little relevance information
- Robertson, Walker
- 1997
(Show Context)
Citation Context ... constant, corresponding to the log-odds of a term occurring in a relevant document. The second component is essentially equivalent to ( 3) for all but very high frequency terms. Robertson and Walker =-=[19] have rece-=-ntly looked anew at the combination match weight, wch . They point out two "anomalies" of the Croft/Harper weights. One is that the probability of a term occurring in a relevant document mus... |

63 |
A statistical interpretation of term specificity and its application in retrieval
- Sparck-Jones
- 1972
(Show Context)
Citation Context ...is the development of an explanatory theory of information retrieval. 1 Introduction In 1972, Spark Jones demonstrated that document frequency can be used effectively for the weighting of query terms =-=[23]-=-. Ever since, formulations of inverse document frequency have played a key role in information retrieval research. In this paper a theory of why inverse document frequency has been so effective is dev... |

61 |
Full Text Retrieval based on Probabilistic Equations with Coefficients fitted by Logistic Regression
- Cooper, Chen, et al.
- 1994
(Show Context)
Citation Context ...ion to determine coefficients for a polynomial weighting function of termdocument pair descriptor variables. The group at Berkeley has conducted extensive research into the use of logistic regression =-=[10, 4]-=-. Logistic regression is generally considered a natural approach for estimating a probability. The [0; 1] range that can be assumed by a probability does not correspond to other regression models, but... |

57 |
Evaluation of feedback in document retrieval using co-occurrence data
- Harper, Rijsbergen
- 1978
(Show Context)
Citation Context ... correlated, woe(rel : occ2 ) will be greater than woe(rel : occ2 j occ1 ). It is generally accepted that interdependence of query terms has a noticeable impact on the effectiveness of term weighting =-=[15, 25, 10]-=-. Since, to date, we have made no attempt to model the influence of term dependence, determination of a precise function for estimation of woe(rel : occ) is not indicated. What we look for, instead, i... |

46 |
Probabilistic retrieval based on staged logistic regression
- Cooper, Gey, et al.
(Show Context)
Citation Context ...l techniques to fit the model to available data. In 1983, Fox used multiple regression analysis to derive an equation for predicting the probability that a document will be judged relevant to a query =-=[3]-=-. In [28], Yu and Mizuno use linear regression to determine parameter settings for both a binary and non-binary model. Fuhr and Buckley [9, 8] have used a least-square error criterion to determine coe... |

44 | Inferring probability of relevance using the method of logistic regression
- Gey
- 1994
(Show Context)
Citation Context ...ion to determine coefficients for a polynomial weighting function of termdocument pair descriptor variables. The group at Berkeley has conducted extensive research into the use of logistic regression =-=[10, 4]-=-. Logistic regression is generally considered a natural approach for estimating a probability. The [0; 1] range that can be assumed by a probability does not correspond to other regression models, but... |

35 |
Optimum polynomial retrieval functions based on the probability ranking principle
- Fuhr
- 1989
(Show Context)
Citation Context ...obability that a document will be judged relevant to a query [3]. In [28], Yu and Mizuno use linear regression to determine parameter settings for both a binary and non-binary model. Fuhr and Buckley =-=[9, 8]-=- have used a least-square error criterion to determine coefficients for a polynomial weighting function of termdocument pair descriptor variables. The group at Berkeley has conducted extensive researc... |

28 |
Weight of evidence: a brief survey
- Good
- 1985
(Show Context)
Citation Context ...and compare and contrast it with the work presented here. 2 Weight of Evidence & EDA Weight of Evidence I. J. Good formally defines the weight in favor of a hypothesis, h, provided by evidence, e, as =-=[12, 11]: woe(h : -=-e) = log O(hje) O(h) (1) which he thinks is a concept "almost as important as that of probability itself " [11, p. 249]. Good elucidates simple, natural desiderata for the formalization of t... |

24 |
Probabilistic Document Indexing from Relevance Feedback Data
- Fuhr, Buckley
- 1990
(Show Context)
Citation Context ...obability that a document will be judged relevant to a query [3]. In [28], Yu and Mizuno use linear regression to determine parameter settings for both a binary and non-binary model. Fuhr and Buckley =-=[9, 8]-=- have used a least-square error criterion to determine coefficients for a polynomial weighting function of termdocument pair descriptor variables. The group at Berkeley has conducted extensive researc... |

21 | Exploratory Data Analysis - Hartwig, Dearing - 1979 |

20 | Corpus-Specific Stemming using Word Form Co-occurrences
- Croft, Xu
- 1995
(Show Context)
Citation Context ...Delta p(occ) = log p(reljocc) p(rel) has connections to information theory. Often referred to as mutual information, it has been used as a measure of variable dependence in both information retrieval =-=[25, 6]-=- and computational linguistics [2]. In a very important sense, it can be taken as a measure of the information about one event provided by the occurrence of another [7]. In our context, it can be take... |

14 |
Automatic indexing using term discrimination and term precision measurements
- SALTON, WONG, et al.
- 1976
(Show Context)
Citation Context ... in this same period, Salton and coworkers reported both theoretical and empirical work on a ranking formula based on what they called term precision. In earlier papers, term precision was defined as =-=[20]-=-: wtp = p(occjrel) 1 \Gamma p(occjrel) = p(occjrel) 1 \Gamma p(occjrel) Later, term precision was defined as the log of this quantity [21, 27], yielding the same weight as given by Robertson and Sparc... |

12 |
Term weighting in information retrieval using the term precision model
- Yu, Lam, et al.
- 1982
(Show Context)
Citation Context ...m precision. In earlier papers, term precision was defined as [20]: wtp = p(occjrel) 1 \Gamma p(occjrel) = p(occjrel) 1 \Gamma p(occjrel) Later, term precision was defined as the log of this quantity =-=[21, 27]-=-, yielding the same weight as given by Robertson and Sparck Jones (eq. 4). The form they adopt for what amounts to p(occjrel) differs from that of both Croft/Harper and Robertson/Walker. The term prec... |

11 |
Declarative specifications
- Fuchs, Robertson
- 1996
(Show Context)
Citation Context ...tation, Robertson pointed out that, viewed as a function of the probability of term occurrence, the sum of weights could be interpreted as the probability of mutual occurrence of multiple query terms =-=[17]-=-; thus providing theoretical arguments for the use of wsj . Together, in 1976, Robertson and Sparck Jones presented the Binary Independence Model [18], in which terms are weighted by: wrsj = log p(occ... |

11 |
The measurement of term importance in automatic indexing
- Salton, Wu, et al.
(Show Context)
Citation Context ...m precision. In earlier papers, term precision was defined as [20]: wtp = p(occjrel) 1 \Gamma p(occjrel) = p(occjrel) 1 \Gamma p(occjrel) Later, term precision was defined as the log of this quantity =-=[21, 27]-=-, yielding the same weight as given by Robertson and Sparck Jones (eq. 4). The form they adopt for what amounts to p(occjrel) differs from that of both Croft/Harper and Robertson/Walker. The term prec... |

11 |
An information-theoretic measure of term specicity
- Wong, Yao
- 1992
(Show Context)
Citation Context ...cuments of the collection from each other. Information theoretic considerations have also been used. In early work, information theory was used to derive a weight based on signal-noise ratio [22]. In =-=[26]-=-, Wong and Yao develop a term weighting theory based on the entropy of a term's distribution in the collection. They show that idf weighting is easily derived as a special case of their more general w... |

8 |
Two learning schemes in information retrieval
- Yu, Mizuno
- 1998
(Show Context)
Citation Context ...ques to fit the model to available data. In 1983, Fox used multiple regression analysis to derive an equation for predicting the probability that a document will be judged relevant to a query [3]. In =-=[28]-=-, Yu and Mizuno use linear regression to determine parameter settings for both a binary and non-binary model. Fuhr and Buckley [9, 8] have used a least-square error criterion to determine coefficients... |