Results 1  10
of
14
A Probabilistic Model of Information Retrieval: Development and Status
, 1998
"... The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Eac ..."
Abstract

Cited by 337 (23 self)
 Add to MetaCart
The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Each step in the argument is matched by comparative retrieval tests, to provide a single coherent account of a major line of research. The experiments demonstrate, for a large test collection, that the probabilistic model is effective and robust, and that it responds appropriately, with major improvements in performance, to key features of retrieval situations.
Some inconsistencies and misidentified modelling assumptions in probabilistic information retrieval
 A CM Transactions on Information Systems
, 1995
"... Research in the probabilistic theory of information retrieval involves the construction of mathematical models based on statistical assumptions. One of the hazards inherent in this kind of theory construction is that the assumptions laid down maybe inconsmtent in unanticipated ways with the data to ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
Research in the probabilistic theory of information retrieval involves the construction of mathematical models based on statistical assumptions. One of the hazards inherent in this kind of theory construction is that the assumptions laid down maybe inconsmtent in unanticipated ways with the data to which they are applied. Another hazard is that the stated assumptions may not be those on which the derived modeling equations or resulting experiments are actually based. Both kinds of mistakes have been made m past research on probabihstic reformation retrieval. One consequence of these errors is that the statistical character of certain probabilistic IR models, including the socalled Binary Independence model, has been seriously misapprehended Categories and Subject Descriptors: H. 1.2 [Models and Principles]: User/Machine Systems;
Probabilistic Information Retrieval as Combination of Abstraction, Inductive Learning and Probabilistic Assumptions
, 1994
"... We show that former approaches in probabilistic information retrieval are based on one or two of the three concepts abstraction, inductive learning and probabilistic assumptions, and we propose a new approach which combines all three concepts. This approach is illustrated for the case of indexing ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
We show that former approaches in probabilistic information retrieval are based on one or two of the three concepts abstraction, inductive learning and probabilistic assumptions, and we propose a new approach which combines all three concepts. This approach is illustrated for the case of indexing with a controlled ...
Combining ModelOriented and DescriptionOriented Approaches for Probabilistic Indexing
"... We distinguish modeloriented and descriptionoriented approaches in probabilistic information retrieval. The former refer to certain representations of documents and queries and use additional independence assumptions, whereas the latter map documents and queries onto feature vectors which form the ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
We distinguish modeloriented and descriptionoriented approaches in probabilistic information retrieval. The former refer to certain representations of documents and queries and use additional independence assumptions, whereas the latter map documents and queries onto feature vectors which form the input to certain classification procedures or regression methods. Descriptionoriented approaches are more flexible with respect to the underlying representations, but the definition of the feature vector is a heuristic step. In this paper, we combine a probabilistic model for the Darmstadt Indexing Approach with logistic regression. Here the probabilistic model forms a guideline for the definition of the feature vector. Experiments with the purely theoretical approach and with several heuristic variations show that heuristic assumptions may yield significant improvements.
Capturing Term Dependencies using a Sentence Tree based Language Model
, 2002
"... We describe a new probabilistic Sentence Tree Language Modeling approach that captures term dependency patterns in Topic Detection and Tracking's (TDT) Story Link Detection task. New features of the approach include modeling the syntactic structure of sentences in documents by a sentencebin ap ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
We describe a new probabilistic Sentence Tree Language Modeling approach that captures term dependency patterns in Topic Detection and Tracking's (TDT) Story Link Detection task. New features of the approach include modeling the syntactic structure of sentences in documents by a sentencebin approach and a computationally efficient algorithm for capturing the most significant sentence level term dependencies using a Maximum Spanning Tree approach, similar to Van Rijsbergen's modeling of documentlevel term dependencies.
Score Distributions in Information Retrieval
"... Abstract. We review the history of modeling score distributions, focusing on the mixture of normalexponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as t ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
(Show Context)
Abstract. We review the history of modeling score distributions, focusing on the mixture of normalexponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the RecallFallout Convexity Hypothesis, and formulate two new hypotheses considering the component distributions under some limiting conditions of parameter values. From all the mixtures suggested in the past, the current theoretical argument points to the two gamma as the mostlikely universal model, with the normalexponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental evidence showing vector space or geometric models, and BM25, as being “friendly ” to the normalexponential, and that the nonconvexity problem that the mixture possesses is practically not severe. 1
Optimum Probability Estimation from Empirical Distributions
 INFORMATION PROCESSING AND MANAGEMENT
, 1989
"... Probability estimation is important for the application of probabilistic models as well as for any evaluation in IR. We discuss the interdependencies between parameter estimation and certain properties of probabilistic models: dependence assumptions, binary vs. nonbinary features, estimation sample ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Probability estimation is important for the application of probabilistic models as well as for any evaluation in IR. We discuss the interdependencies between parameter estimation and certain properties of probabilistic models: dependence assumptions, binary vs. nonbinary features, estimation sample selection. Then we define an optimum estimate for binary features which can be applied to various typical estimation problems in IR. A method for computing this estimate using empirical data is described. Some experiments show the applicability of our method, whereas comparable approaches are partially based on false assumptions or yield biased estimates.
Modeling score distributions in information retrieval
"... Abstract We review the history of modeling score distributions, focusing on the mixture of normalexponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as th ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract We review the history of modeling score distributions, focusing on the mixture of normalexponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the RecallFallout Convexity Hypothesis, and formulate two new hypotheses considering the component distributions, individually as well as in pairs, under some limiting conditions of parameter values. From all the mixtures suggested in the past, the current theoretical argument points to the two gamma as the mostlikely universal model, with the normalexponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental evidence showing vector space or geometric models, and BM25, as being ‘friendly ’ to the normalexponential, and that the nonconvexity problem that the mixture possesses is practically not severe. Furthermore, we review recent nonbinary mixture models, speculate on graded relevance, and consider methods such as logistic regression for score calibration.
An Adaptive Local Dependency Language Model: Relaxing the Na ve Bayes' Assumption
"... We describe a new probabilistic approach in the language modeling framework that captures adaptively the local term dependencies in documents. The new model works by boosting scores of documents that contain topicspecific local dependencies and exhibits the behavior of the unigram model in the abse ..."
Abstract
 Add to MetaCart
We describe a new probabilistic approach in the language modeling framework that captures adaptively the local term dependencies in documents. The new model works by boosting scores of documents that contain topicspecific local dependencies and exhibits the behavior of the unigram model in the absence of such dependencies. Contributions of the current work include adapting van Rijsbergen 's [14] work in the classical probabilistic framework to the language modeling framework and adaptive modeling of withinsentence dependencies.