• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Generative Theory of Relevance (0)

by V Lavrenko
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 29
Next 10 →

Regularizing ad hoc retrieval scores

by Fernando Diaz , 2005
"... The cluster hypothesis states: closely related documents tend to be relevant to the same request. We exploit this hypothesis directly by adjusting ad hoc retrieval scores from an initial retrieval so that topically related documents receive similar scores. We refer to this process as score regulariz ..."
Abstract - Cited by 31 (1 self) - Add to MetaCart
The cluster hypothesis states: closely related documents tend to be relevant to the same request. We exploit this hypothesis directly by adjusting ad hoc retrieval scores from an initial retrieval so that topically related documents receive similar scores. We refer to this process as score regularization. Score regularization can be presented as an optimization problem, allowing the use of results from semisupervised learning. We demonstrate that regularized scores consistently and significantly rank documents better than un-regularized scores, given a variety of initial retrieval algorithms. We evaluate our method on two large corpora across a substantial number of topics.

Building effective question answering characters

by Anton Leuski, Ronakkumar Patel, David Traum - In Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue , 2006
"... In this paper, we describe methods for building and evaluation of limited domain question-answering characters. Several classification techniques are tested, including text classification using support vector machines, language-model based retrieval, and cross-language information retrieval techniqu ..."
Abstract - Cited by 25 (17 self) - Add to MetaCart
In this paper, we describe methods for building and evaluation of limited domain question-answering characters. Several classification techniques are tested, including text classification using support vector machines, language-model based retrieval, and cross-language information retrieval techniques, with the latter having the highest success rate. We also evaluated the effect of speech recognition errors on performance with users, finding that retrieval is robust until recognition reaches over 50 % WER. 1

UMass at TREC 2004: Novelty and HARD

by Nasreen Abdul-jaleel, James Allan, W. Bruce Croft, O Diaz, Leah Larkey, Xiaoyan Li, Mark D. Smucker, Courtney Wade - In Proceedings of TREC-13 , 2004
"... For the TREC 2004 Novelty track, UMass participated in all four tasks. Although finding relevant sentences was harder this year than last, we continue to show marked improvements over the baseline of calling all sentences relevant, with a variant of tfidf being the most successful approach. We achie ..."
Abstract - Cited by 16 (4 self) - Add to MetaCart
For the TREC 2004 Novelty track, UMass participated in all four tasks. Although finding relevant sentences was harder this year than last, we continue to show marked improvements over the baseline of calling all sentences relevant, with a variant of tfidf being the most successful approach. We achieve 5–9 % improvements over the baseline in locating novel sentences, primarily by looking at the similarity of a sentence to earlier sentences and focusing on named entities. For the High Accuracy Retrieval from Documents (HARD) track, we investigated the use of clarification forms, fixed- and variable-length passage retrieval, and the use of metadata. Clarification form results indicate that passage level feedback can provide improvements comparable to user supplied related-text for document evaluation and outperforms related-text for passage evaluation. Document retrieval methods without a query expansion component show the most gains from related-text. We also found that displaying the top passages for feedback outperformed displaying centroid passages. Named entity feedback resulted in mixed performance. Our primary findings for passage retrieval are that document retrieval methods performed better than passage retrieval methods on the passage evaluation metric of binary preference at 12,000 characters, and that clarification forms improved passage retrieval for every retrieval method explored. We found no benefit to using variable-length passages over fixed-length passages for this corpus. Our use of geography and genre metadata resulted in no significant changes in retrieval performance.

Users and assessors in the context of INEX: Are relevance dimensions relevant

by Jovan Pehcevski - In Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology, Second Edition , 2005
"... The main aspects of XML retrieval are identified by analysing and comparing the following two behaviours: the behaviour of the assessor when judging the relevance of returned document components; and the behaviour of users when interacting with components of XML documents. We argue that the two INEX ..."
Abstract - Cited by 13 (3 self) - Add to MetaCart
The main aspects of XML retrieval are identified by analysing and comparing the following two behaviours: the behaviour of the assessor when judging the relevance of returned document components; and the behaviour of users when interacting with components of XML documents. We argue that the two INEX relevance dimensions, Exhaustivity and Specificity, are not orthogonal dimensions; indeed, an empirical analysis of each dimension reveals that the grades of the two dimensions are correlated to each other. By analysing the level of agreement between the assessor and the users, we aim at identifying the best units of retrieval. The results of our analysis show that the highest level of agreement is on highly relevant and on non-relevant document components, suggesting that only the end points of the INEX 10-point relevance scale are perceived in the same way by both the assessor and the users. We propose a new definition of relevance for XML retrieval and argue that its corresponding relevance scale would be a better choice for INEX. 1.

The Opposite of Smoothing: A Language Model Approach to Ranking Query-Specific Document Clusters

by Oren Kurland , 2008
"... Exploiting information induced from (query-specific) clustering of top-retrieved documents has long been proposed as means for improving precision at the very top ranks of the returned results. We present a novel language model approach to ranking query-specific clusters by the presumed percentage o ..."
Abstract - Cited by 10 (5 self) - Add to MetaCart
Exploiting information induced from (query-specific) clustering of top-retrieved documents has long been proposed as means for improving precision at the very top ranks of the returned results. We present a novel language model approach to ranking query-specific clusters by the presumed percentage of relevant documents that they contain. While most previous cluster ranking approaches focus on the cluster as a whole, our model also exploits information induced from documents associated with the cluster. Our model substantially outperforms previous approaches for identifying clusters containing a high relevant-document percentage. Furthermore, using the model to produce document ranking yields precision-at-top-ranks performance that is consistently better than that of the initial ranking upon which clustering is performed; the performance also favorably compares with that of a state-of-the-art pseudo-feedback retrieval method.

Regularizing query-based retrieval scores

by Fernando Diaz - Information Retrieval , 2007
"... Abstract. We adapt the cluster hypothesis for score-based information retrieval by claiming that closely related documents should have similar scores. Given a retrieval from an arbitrary system, we describe an algorithm which directly optimizes this objective by adjusting retrieval scores so that to ..."
Abstract - Cited by 9 (2 self) - Add to MetaCart
Abstract. We adapt the cluster hypothesis for score-based information retrieval by claiming that closely related documents should have similar scores. Given a retrieval from an arbitrary system, we describe an algorithm which directly optimizes this objective by adjusting retrieval scores so that topically related documents receive similar scores. We refer to this process as score regularization. Because score regularization operates on retrieval scores, regardless of their origin, we can apply the technique to arbitrary initial retrieval rankings. Document rankings derived from regularized scores, when compared to rankings derived from un-regularized scores, consistently and significantly result in improved performance given a variety of baseline retrieval algorithms. We also present several proofs demonstrating that regularization generalizes methods such as pseudo-relevance feedback, document expansion, and cluster-based retrieval. Because of these strong empirical and theoretical results, we argue for the adoption of score regularization as general design principle or post-processing step for information retrieval systems.

Reducing the risk of query expansion via robust constrained optimization

by Kevyn Collins-thompson - Proceedings of the Eighteenth International Conference on Information and Knowledge Management (CIKM 2009). ACM. Hong
"... We introduce a new theoretical derivation, evaluation methods, and extensive empirical analysis for an automatic query expansion framework in which model estimation is cast as a robust constrained optimization problem. This framework provides a powerful method for modeling and solving complex expans ..."
Abstract - Cited by 9 (5 self) - Add to MetaCart
We introduce a new theoretical derivation, evaluation methods, and extensive empirical analysis for an automatic query expansion framework in which model estimation is cast as a robust constrained optimization problem. This framework provides a powerful method for modeling and solving complex expansion problems, by allowing multiple sources of domain knowledge or evidence to be encoded as simultaneous optimization constraints. Our robust optimization approach provides a clean theoretical way to model not only expansion benefit, but also expansion risk, by optimizing over uncertainty sets for the data. In addition, we introduce risk-reward curves to visualize expansion algorithm performance and analyze parameter sensitivity. We show that a robust approach significantly reduces the number and magnitude of expansion failures for a strong baseline algorithm, with no loss in average gain. Our approach is implemented as a highly efficient post-processing step that assumes little about the baseline expansion method used as input, making it easy to apply to existing expansion methods. We provide analysis showing that this approach is a natural and effective way to do selective expansion, automatically reducing or avoiding expansion in risky scenarios, and successfully attenuating noise in poor baseline methods.

UMass at TREC 2004: Notebook

by Nasreen Abdul-jaleel, James Allan, W. Bruce Croft, O Diaz, Leah Larkey, Xiaoyan Li, Donald Metzler, Mark D. Smucker, Trevor Strohman, Howard Turtle, Courtney Wade - In TREC 2004 , 2004
"... The retrieval model implemented in the Indri search engine is an enhanced version of the model described in [30], which combines the language modeling [35] and inference network [38] approaches to information retrieval. The resulting model allows structured queries similar to those used in INQUERY [ ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
The retrieval model implemented in the Indri search engine is an enhanced version of the model described in [30], which combines the language modeling [35] and inference network [38] approaches to information retrieval. The resulting model allows structured queries similar to those used in INQUERY [4] to be evaluated using language modeling estimates within the network, rather than tf.idf estimates. Figure 1.1 shows a graphical model representation of the network. As in the original inference network framework, documents are ranked according to P(I|D, α, β), the belief the information need I is met given document D and hyperparameters α and β as evidence. Due to space limitations, a general understanding of the inference network framework is assumed. See [30] and [38] to fill in any missing details. 1.1.1 DOCUMENT REPRESENTATION

A STATISTICAL APPROACH FOR TEXT PROCESSING IN VIRTUAL HUMANS

by Anton Leuski, David Traum
"... We describe a text classification approach based on statistical language modeling. We show how this approach can be used for several natural language processing tasks in a virtual human system. Specifically, we show it can applied to language understanding, language generation, and character respons ..."
Abstract - Cited by 8 (7 self) - Add to MetaCart
We describe a text classification approach based on statistical language modeling. We show how this approach can be used for several natural language processing tasks in a virtual human system. Specifically, we show it can applied to language understanding, language generation, and character response selection tasks. We illustrate these applications with some experimental results. 1. natural language understanding (NLU) module that interprets the text of the user’s utterance and converts it into some internal representation; 2. dialog manager (DM) module that analyzes the interpretation and selects the appropriate response; 3. natural language generation (NLG) module that converts the internal representation to the text of the response. 1

Utilizing Passage-Based Language Models for Document Retrieval

by Michael Bendersky, Oren Kurland , 2008
"... We show that several previously proposed passage-based document ranking principles, along with some new ones, can be derived from the same probabilistic model. We use language models to instantiate specific algorithms, and propose a passage language model that integrates information from the ambient ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
We show that several previously proposed passage-based document ranking principles, along with some new ones, can be derived from the same probabilistic model. We use language models to instantiate specific algorithms, and propose a passage language model that integrates information from the ambient document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we propose yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; the latter outperform a document-based relevance model. We also show that the homogeneity measures are effective means for integrating documentquery and passage-query similarity information for document retrieval.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University