Results 1 -
7 of
7
Discriminative Models for Information Retrieval
- SIGIR '04
, 2004
"... Discriminative models have been preferred over generative models in many machine learning problems in the recent past owing to some of their attractive theoretical properties. In this paper, we explore the applicability of discriminative classifiers for IR. We have compared the performance of two po ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
Discriminative models have been preferred over generative models in many machine learning problems in the recent past owing to some of their attractive theoretical properties. In this paper, we explore the applicability of discriminative classifiers for IR. We have compared the performance of two popular discriminative models, namely the maximum entropy model and support vector machines with that of language modeling, the state-of-the-art generative model for IR. Our experiments on ad-hoc retrieval indicate that although maximum entropy is significantly worse than language models, support vector machines are on par with language models. We argue that the main reason to prefer SVMs over language models is their ability to learn arbitrary features automatically as demonstrated by our experiments on the home-page finding task of TREC-10.
The Maximum Entropy Approach and Probabilistic IR Models
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 1998
"... The Principle of Maximum Entropy is discussed and two classic probabilistic models of information retrieval, the Binary Independence Model of Robertson and Sparck Jones and the Combination Match Model of Croft and Harper are derived using the maximum entropy approach. The assumptions on which the cl ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
The Principle of Maximum Entropy is discussed and two classic probabilistic models of information retrieval, the Binary Independence Model of Robertson and Sparck Jones and the Combination Match Model of Croft and Harper are derived using the maximum entropy approach. The assumptions on which the classical models are based are not made. In their place, the probability distribution of maximum entropy consistent with a set of constraints is determined. It is argued that this subjectivist approach is more philosophically coherent than the frequentist conceptualization of probability that is often assumed as the basis of probabilistic modeling and that this philosophical stance has important practical consequences with respect to the realization of information retrieval research.
Monitoring User-System Performance in Interactive Retrieval Tasks
- PROC. RIAO 2004
, 2004
"... Monitoring user-system performance in interactive search is a challenging task. Traditional measures of retrieval evaluation, based on recall and precision, are not of any use in real time, for they require a priori knowledge of relevant documents. This paper shows how a Shannon entropy-based measur ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Monitoring user-system performance in interactive search is a challenging task. Traditional measures of retrieval evaluation, based on recall and precision, are not of any use in real time, for they require a priori knowledge of relevant documents. This paper shows how a Shannon entropy-based measure of user-system performance naturally falls in the framework of (interactive) probabilistic information retrieval. The value of entropy of the distribution of probability of relevance associated with the documents in the collection can be used to monitor search progress in live testing, to allow for example the system to select an optimal combination of search strategies. User profiling and tuning parameters of retrieval systems are other important applications.
Applying Maximum Entropy to Known-Item Email Retrieval
"... Abstract. It is becoming increasingly common in information retrieval to combine evidence from multiple resources to compute the retrieval status value of documents. Although this has led to considerable improvements in several retrieval tasks, one of the outstanding issues is estimation of the resp ..."
Abstract
- Add to MetaCart
Abstract. It is becoming increasingly common in information retrieval to combine evidence from multiple resources to compute the retrieval status value of documents. Although this has led to considerable improvements in several retrieval tasks, one of the outstanding issues is estimation of the respective weights that should be associated with the different sources of evidence. In this paper we propose to use maximum entropy in combination with the limited memory LBFG algorithm to estimate feature weights. Examining the effectiveness of our approach with respect to the known-item finding task of enterprise track of TREC shows that it significantly outperforms a standard retrieval baseline and leads to competitive performance. 1
Distance, Minimum Cross-Entropy, and Path methods. Background and Purpose of the Study
, 1988
"... The maximum entropy principle may be applied to the design of probabilistic retrieval systems. When there are inconsistent expert judgments, the resulting optimization problem cannot be solved. The inconsistency of the expert judgments can be revealed by solving a linear programming formulation. In ..."
Abstract
- Add to MetaCart
The maximum entropy principle may be applied to the design of probabilistic retrieval systems. When there are inconsistent expert judgments, the resulting optimization problem cannot be solved. The inconsistency of the expert judgments can be revealed by solving a linear programming formulation. In the case of inconsistent judgment, four plausible schemes are proposed in order to find revised judgments which are consistent with the true data structure but still reflect the original expert judgment. These schemes are the Interactive, Minimum
Searching Cultural Heritage Data: Does Structure Help Expert Searchers?
"... On-line search requests of cultural heritage (CH) material are often very short and mainly focused on names and dates,while the data provides much more detail and is highly structured, based on classification systems and ontologies. Apparently, typical users make no use of the available information ..."
Abstract
- Add to MetaCart
On-line search requests of cultural heritage (CH) material are often very short and mainly focused on names and dates,while the data provides much more detail and is highly structured, based on classification systems and ontologies. Apparently, typical users make no use of the available information and structure. Expert users such as museum curators have extensive knowledge of the objects in the collection and the classification systems used to describe them, and have complex information needs. In this paper we investigate the impact of exploiting the metadata structure on retrieval effectiveness of complex queries. Our findings are that 1) expert queries require little smoothing as all terms are important for identifying the right objects, 2) the field structure of CH descriptions can help improve early precision, 3) combining free-text retrieval and structured Boolean retrieval leads to significant improvements on both approaches alone. Finally, from analysing the questions send to a museum, we find that non-experts have more complex information needs than what search logs show us, suggesting they can benefit from systems that exploit structure as well.

