Results 1 - 10
of
25
Cluster-based retrieval using language models
- In Proceedings of SIGIR
, 2004
"... Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. ..."
Abstract
-
Cited by 90 (6 self)
- Add to MetaCart
Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. We propose two new models for cluster-based retrieval and evaluate them on several TREC collections. We show that cluster-based retrieval can perform consistently across collections of realistic size, and significant improvements over document-based retrieval can be obtained in a fully automatic manner and without relevance information provided by human.
An Exploration of Proximity Measures in Information Retrieval
, 2007
"... In most existing retrieval models, documents are scored primarily based on various kinds of term statistics such as within-document frequencies, inverse document frequencies, and document lengths. Intuitively, the proximity of matched query terms in a document can also be exploited to promote scores ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
In most existing retrieval models, documents are scored primarily based on various kinds of term statistics such as within-document frequencies, inverse document frequencies, and document lengths. Intuitively, the proximity of matched query terms in a document can also be exploited to promote scores of documents in which the matched query terms are close to each other. Such a proximity heuristic, however, has been largely under-explored in the literature; it is unclear how we can model proximity and incorporate a proximity measure into an existing retrieval model. In this paper, we systematically explore the query term proximity heuristic. Specifically, we propose and study the effectiveness of five different proximity measures, each modeling proximity from a different perspective. We then design two heuristic constraints and use them to guide us in incorporating the proposed proximity measures into an existing retrieval model. Experiments on five standard TREC test collections show that one of the proposed proximity measures is indeed highly correlated with document relevance, and by incorporating it into the KL-divergence language model and the Okapi BM25 model, we can significantly improve retrieval performance.
Passage retrieval and evaluation
, 2005
"... Information retrieval researchers have studied passage retrieval extensively, yet there is no consensus within the community about how to evaluate the results of passage retrieval experiments. This paper describes five character-level passage evaluation measures and tasks for which they may be appro ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Information retrieval researchers have studied passage retrieval extensively, yet there is no consensus within the community about how to evaluate the results of passage retrieval experiments. This paper describes five character-level passage evaluation measures and tasks for which they may be appropriate. In the second half of the paper we compare several passage retrieval models, including a new generative mixture model that outperforms strong baselines on many of the evaluation measures discussed in part one. 1.
Structured Queries, Language Modeling, and Relevance Modeling in Cross-Language Information Retrieval
- Information Processing and Management Special Issue on Cross Language Information Retrieval
, 2003
"... Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym opera ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries – one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus. We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.
Dedicated backing-off distributions for language model based passage retrieval
- In Hildesheimer Informatik-Berichte, LWA
, 2006
"... Passage retrieval is an essential part of question answering systems. In this paper we use statistical language models to perform this task. Previous work has shown that language modeling techniques provide better results for both, document and passage retrieval. The motivation behind this paper is ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Passage retrieval is an essential part of question answering systems. In this paper we use statistical language models to perform this task. Previous work has shown that language modeling techniques provide better results for both, document and passage retrieval. The motivation behind this paper is to define new smoothing methods for passage retrieval in question answering systems. The long term objective is to improve the quality of question answering systems to isolate the correct answer by choosing and evaluating the appropriate section of a document. In this work we use a three step approach. The first two steps are standard document and passage retrieval using the Lemur toolkit. As a novel contribution we propose as the third step a re-ranking using dedicated backing-off distributions. In particular backing-off from the passage-based language model to a language model trained on the document from which the passage is taken shows a significant improvement. For a TREC question answering task we can increase the mean average precision from 0.127 to
Utilizing Passage-Based Language Models for Document Retrieval
, 2008
"... We show that several previously proposed passage-based document ranking principles, along with some new ones, can be derived from the same probabilistic model. We use language models to instantiate specific algorithms, and propose a passage language model that integrates information from the ambient ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We show that several previously proposed passage-based document ranking principles, along with some new ones, can be derived from the same probabilistic model. We use language models to instantiate specific algorithms, and propose a passage language model that integrates information from the ambient document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we propose yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; the latter outperform a document-based relevance model. We also show that the homogeneity measures are effective means for integrating documentquery and passage-query similarity information for document retrieval.
Topic Field Selection and Smoothing for XML Retrieval
, 2003
"... Information retrieval from XML documents offers an opportunity to go below the document level in search of relevant information, making any element of an XML document a retrievable unit. We consider two dimensions along which we compare this element retrieval task with the traditional document retri ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Information retrieval from XML documents offers an opportunity to go below the document level in search of relevant information, making any element of an XML document a retrievable unit. We consider two dimensions along which we compare this element retrieval task with the traditional document retrieval task. We investigate how different topic representations and language model smoothing approaches affect the performance of the two tasks. We evaluate our ideas against the INEX 2002 XML retrieval test-suite.
Answer Passage Retrieval for Question Answering
"... Document or passage retrieval is typically used as the first step in current question answering systems. The accuracy of the answer that is extracted from the passages and the efficiency of the question answering process will depend to some extent on the quality of this initial ranking. We show how ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Document or passage retrieval is typically used as the first step in current question answering systems. The accuracy of the answer that is extracted from the passages and the efficiency of the question answering process will depend to some extent on the quality of this initial ranking. We show how language model approaches can be used to improve answer passage ranking. In particular, we show how a variety of prior language models trained on correct answer text allow us to incorporate into the retrieval step information that is often used in answer extraction, for example, the presence of tagged entities. We demonstrate the effectiveness of these models on the TREC9 QA Corpus.
Re-Ranking Search Results Using Document-Passage Graphs
, 2008
"... We present a novel passage-based approach to re-ranking documents in an initially retrieved list so as to improve precision at top ranks. While most work on passage-based document retrieval ranks a document based on the query similarity of its constituent passages, our approach leverages information ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present a novel passage-based approach to re-ranking documents in an initially retrieved list so as to improve precision at top ranks. While most work on passage-based document retrieval ranks a document based on the query similarity of its constituent passages, our approach leverages information about the centrality of the document passages with respect to the initial document list. Passage centrality is induced over a bipartite document-passage graph, wherein edge weights represent document-passage similarities. Empirical evaluation shows that our approach yields effective re-ranking performance. Furthermore, the performance is superior to that of previously proposed passage-based document ranking methods.
L: Discriminative Probabilistic Models for Passage Based Retrieval
- In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore
"... The approach of using passage-level evidence for document retrieval has shown mixed results when it is applied to a variety of test beds with different characteristics. One main reason of the inconsistent performance is that there exists no unified framework to model the evidence of individual passa ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The approach of using passage-level evidence for document retrieval has shown mixed results when it is applied to a variety of test beds with different characteristics. One main reason of the inconsistent performance is that there exists no unified framework to model the evidence of individual passages within a document. This paper proposes two probabilistic models to formally model the evidence of a set of top ranked passages in a document. The first probabilistic model follows the retrieval criterion that a document is relevant if any passage in the document is relevant, and models each passage independently. The second probabilistic model goes a step further and incorporates the similarity correlations among the passages. Both models are trained in a discriminative manner. Furthermore, we present a combination approach to combine the ranked lists of document retrieval and passage-based retrieval. An extensive set of experiments have been conducted on four different TREC test beds to show the effectiveness of the proposed discriminative probabilistic models for passagebased retrieval. The proposed algorithms are compared with a state-of-the-art document retrieval algorithm and a language model approach for passage-based retrieval. Furthermore, our combined approach has been shown to provide better results than both document retrieval and passagebased retrieval approaches.

