• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Risk Minimization and Language Modeling in Text Retrieval (2002)

by C Zhai
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 20
Next 10 →

A GENERATIVE THEORY OF RELEVANCE

by Victor Lavrenko , 2004
"... ..."
Abstract - Cited by 38 (1 self) - Add to MetaCart
Abstract not found

A Risk Minimization Framework for Information Retrieval

by ChengXiang Zhai , John Lafferty - IN PROCEEDINGS OF THE ACM SIGIR 2003 WORKSHOP ON MATHEMATICAL/FORMAL METHODS IN IR. ACM , 2003
"... This paper presents a novel probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models (i.e., probabilistic models of text), user preference ..."
Abstract - Cited by 36 (1 self) - Add to MetaCart
This paper presents a novel probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models (i.e., probabilistic models of text), user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. We discuss how this framework can unify existing retrieval models and accommodate the systematic development of new retrieval models. As an example of using the framework to model non-traditional retrieval problems, we derive new retrieval models for subtopic retrieval, which is concerned with retrieving documents to cover many different subtopics of a general query topic. These new models differ from traditional retrieval models in that they go beyond independent topical relevance.

Statistical Language Models for Information Retrieval. Tutorial Presentation at the

by Chengxiang Zhai - 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR , 2006
"... Statistical language models have recently been successfully applied to many information retrieval problems. A great deal of recent work has shown that statistical language models not only lead to superior empirical performance, but also facilitate parameter tuning and open up possibilities for model ..."
Abstract - Cited by 22 (3 self) - Add to MetaCart
Statistical language models have recently been successfully applied to many information retrieval problems. A great deal of recent work has shown that statistical language models not only lead to superior empirical performance, but also facilitate parameter tuning and open up possibilities for modeling nontraditional retrieval problems. In general, statistical language models provide a principled way of modeling various kinds of retrieval problems. The purpose of this survey is to systematically and critically review the existing work in applying statistical language models to information retrieval, summarize their contributions, and point out outstanding challenges. 1

Active feedback in ad hoc information retrieval

by Xuehua Shen, Chengxiang Zhai - In Proceedings of SIGIR , 2005
"... Information retrieval is, in general, an iterative search process, in which the user often has several interactions with a retrieval system for an information need. The retrieval system can actively probe a user with questions to clarify the information need instead of just passively responding to u ..."
Abstract - Cited by 20 (2 self) - Add to MetaCart
Information retrieval is, in general, an iterative search process, in which the user often has several interactions with a retrieval system for an information need. The retrieval system can actively probe a user with questions to clarify the information need instead of just passively responding to user queries. A basic question is thus how a retrieval system should propose questions to the user so that it can obtain maximum benefits from the feedback on these questions. In this paper, we study how a retrieval system can perform active feedback, i.e., how to choose documents for relevance feedback so that the system can learn most from the feedback information. We present a general framework for such an active feedback problem, and derive several practical algorithms as special cases. Empirical evaluation of these algorithms shows that the performance of traditional relevance feedback (presenting the top K documents) is consistently worse than that of presenting documents with more diversity. With a diversity-based selection algorithm, we obtain fewer relevant documents, however, these fewer documents have more learning benefits.

Users and assessors in the context of INEX: Are relevance dimensions relevant

by Jovan Pehcevski - In Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology, Second Edition , 2005
"... The main aspects of XML retrieval are identified by analysing and comparing the following two behaviours: the behaviour of the assessor when judging the relevance of returned document components; and the behaviour of users when interacting with components of XML documents. We argue that the two INEX ..."
Abstract - Cited by 13 (3 self) - Add to MetaCart
The main aspects of XML retrieval are identified by analysing and comparing the following two behaviours: the behaviour of the assessor when judging the relevance of returned document components; and the behaviour of users when interacting with components of XML documents. We argue that the two INEX relevance dimensions, Exhaustivity and Specificity, are not orthogonal dimensions; indeed, an empirical analysis of each dimension reveals that the grades of the two dimensions are correlated to each other. By analysing the level of agreement between the assessor and the users, we aim at identifying the best units of retrieval. The results of our analysis show that the highest level of agreement is on highly relevant and on non-relevant document components, suggesting that only the end points of the INEX 10-point relevance scale are perceived in the same way by both the assessor and the users. We propose a new definition of relevance for XML retrieval and argue that its corresponding relevance scale would be a better choice for INEX. 1.

A Statistical Approach to Retrieving Historical Manuscript Images without

by Recognition Toni Rath
"... Handwritten historical document collections in libraries and other areas are often of interest to researchers, students or the general public. Convenient access to such corpora generally requires an index, which allows one to locate individual text units (pages, sentences, lines) that are relevant t ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
Handwritten historical document collections in libraries and other areas are often of interest to researchers, students or the general public. Convenient access to such corpora generally requires an index, which allows one to locate individual text units (pages, sentences, lines) that are relevant to a given query (usually provided as text). Several solutions are possible: manual annotation (very expensive), handwriting recognition (poor results) and word spotting - an image matching approach (computationally expensive).

Abstract Opinion Retrieval Experiments using Generative Models: Experiments for the TREC 2006 Blog Track

by Koji Eguchi, Chirag Shah
"... Ranking blog posts that express opinions regarding a given topic should serve a critical function in helping users. We explored three types of opinion retrieval methods in the framework of probabilistic language models. The first method combines topic-relevance model and opinion-relevance model that ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Ranking blog posts that express opinions regarding a given topic should serve a critical function in helping users. We explored three types of opinion retrieval methods in the framework of probabilistic language models. The first method combines topic-relevance model and opinion-relevance model that captures topic dependence of the opinion expressions. The second method makes use of probability that any of opinion-bearing words appear in each target document as document prior probability in query-likelihood model. The third method makes use of probability that any of adjectives or adverbs appear in each target document as document prior probability, assuming opinionated documents tend to contain more adjectives or adverbs than other documents. 1

Integrating conceptual knowledge into relevance models: A model and estimation method

by Edgar Meij, Maarten De Rijke - In International Conference on the Theory of Information Retrieval (ICTIR , 2007
"... Abstract: We address the issue of combining explicit background knowledge with pseudo-relevance feedback from within a document collection. To this end, we use document-level annotations in tandem with generative language models to generate terms from pseudo-relevant documents and bias the probabili ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract: We address the issue of combining explicit background knowledge with pseudo-relevance feedback from within a document collection. To this end, we use document-level annotations in tandem with generative language models to generate terms from pseudo-relevant documents and bias the probability estimates of expansion terms in a principled manner. By applying the knowledge inherent in document annotations, we aim to control query drift and reap the benefits of automatic query expansion in terms of recall without losing precision. We consider the parameters which are associated with our modeling and describe ways of estimating these automatically. We then evaluate our modeling and estimation methods on two test collections, both provided by the TREC Genomics track. 1

The Effect Of Smoothing In Language Models For Novelty Detection

by Ronald T. Fernández
"... The novelty task consists of finding relevant and novel sentences in a ranking of documents given a query. In the literature, different techniques have been applied to address this problem. Nevertheless, little is known about Language Models for novelty detection and, especially, the effect of smoot ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
The novelty task consists of finding relevant and novel sentences in a ranking of documents given a query. In the literature, different techniques have been applied to address this problem. Nevertheless, little is known about Language Models for novelty detection and, especially, the effect of smoothing on the selection of novel sentences. Language Models can be used to study novelty and relevance in a principled way. These statistical models have been shown to perform well empirically in many Information Retrieval tasks. In this work we study formally the effects of smoothing on novelty detection. To this aim, we compare different techniques based on the Kullback-Leibler divergence and we analyze the sensitivity of retrieval performance to the smoothing parameters. The ability of Language Modeling estimation methods to handle quantitatively the uncertainty associated to the use of natural language is a powerful tool that can drive the future development of noveltybased mechanisms.

The University of Amsterdam at the CLEF 2008 Domain Specific Track -- Parsimonious Relevance and Concept Models

by Edgar Meij, Maarten de Rijke
"... ... we address are threefold: (i) what are the effects of estimating and applying relevance models to the domain specific collection used at CLEF 2008, (ii) what are the results of parsimonizing these relevance models, and (iii) what are the results of applying concept models for blind relevance fee ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
... we address are threefold: (i) what are the effects of estimating and applying relevance models to the domain specific collection used at CLEF 2008, (ii) what are the results of parsimonizing these relevance models, and (iii) what are the results of applying concept models for blind relevance feedback? Parsimonization is a technique by which the term probabilities in a language model may be re-estimated based on a comparison with a reference model, making the resulting model more sparse and to the point. Concept models are term distributions over vocabulary terms, based on the language associated with concepts in a thesaurus or ontology and are estimated using the documents which are annotated with concepts. Concept models may be used for blind relevance feedback, by first translating a query to concepts and then back to query terms. We find that applying relevance models helps significantly for the current test collection, in terms of both mean average precision and early precision. Moreover, parsimonizing the relevance models helps mean average precision on title-only queries and early precision on title+narrative queries. Our concept models are able to significantly outperform a baseline query-likelihood run, both in terms of mean average precision and early precision on both title-only and title+narrative queries.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University