Results 1 
2 of
2
Inferring probability of relevance using the method of logistic regression
 In Proceedings of ACM SIGIRâ€™94
, 1994
"... This research evaluates a model for probabilistic text and document retrieval; the model utilizes the technique of logistic regression to obtain equations which rank documents by probability of relevance as a function of document and query properties. Since the model infers probability of relevance ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
This research evaluates a model for probabilistic text and document retrieval; the model utilizes the technique of logistic regression to obtain equations which rank documents by probability of relevance as a function of document and query properties. Since the model infers probability of relevance from statistical clues present in the texts of documents and queries, we call it logistic inference. By transforming the distribution of each statistical clue into its standardized distribution (one with mean v = O and standard deviation a = 1), the method allows one to apply logistic coefficients derived from a training collection to other document collections, with little loss of predictive power. The model is applied to three wellknown information retrieval test collections, and the results are compared directly to the particular vector space model of retrieval which uses termfrequency/inversedocumentfrequency (tfidf) weighting and the cosine similarity measure. In the comparison, the logistic inference method performs significantly better than (in two collections) or equally well as (in the third collection) the tfidf/cosine vector space model. The differences in performances of the two models were subjected to statistical tests to see if the differences are statistically significant or could have occurred by chance. 1.
Term Dependence: A Basis for Luhn and Zipf Models
, 2001
"... There are regularities in the statistical information provided by natural language terms about neighboring terms. We find that when phrase rank increases, moving from common to less common phrases, the value of the expected mutual information measure (EMIM) between the terms regularly decreases. ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
There are regularities in the statistical information provided by natural language terms about neighboring terms. We find that when phrase rank increases, moving from common to less common phrases, the value of the expected mutual information measure (EMIM) between the terms regularly decreases. Luhn's model suggests that midrange terms are the best index terms and relevance discriminators. We suggest reasons for this principle based on the empirical relationships shown here between the rank of terms within phrases and the average mutual information between terms, which we refer to as the Inverse RepresentationEMIM principle. We also suggest an