• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Maximum Likelihood Ratio Information Retrieval Model (1999)

by Kenney Ng
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 34
Next 10 →

Parsimonious Language Models for Information Retrieval

by Djoerd Hiemstra, Stephen Robertson, Hugo Zaragoza - In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , 2004
"... We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, ..."
Abstract - Cited by 216 (37 self) - Add to MetaCart
We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, they need fewer (non-zero) parameters to describe the data. We apply parsimonious models at three stages of the retrieval process:1) at indexing time; 2) at search time; 3) at feedback time. Experimental results show that we are able to build models that are significantly smaller than standard models, but that still perform at least as well as the standard approaches.

Model-based Feedback in the Language Modeling Approach to Information Retrieval

by Chengxiang Zhai, John Lafferty - In Proceedings of Tenth International Conference on Information and Knowledge Management , 2001
"... The language modeling approach to retrieval has been shown to perform well empirically. One advantage of this new approach is its statistical foundations. However, feedback, as one important component in a retrieval system, has only been dealt with heuristically in this new retrieval approach: the o ..."
Abstract - Cited by 118 (17 self) - Add to MetaCart
The language modeling approach to retrieval has been shown to perform well empirically. One advantage of this new approach is its statistical foundations. However, feedback, as one important component in a retrieval system, has only been dealt with heuristically in this new retrieval approach: the original query is usually literally expanded by adding ditional terms to it. Such expansion-based feedback creates an inconsistent interpretation of the original and the expanded query. In this paper, we present a more principled approach to feedback in the language modeling approach. Specifically, we treat feedback as updating the query language model based on the extra evidence carried by the feedback documents. Such a model-based feedback strategy easily fits into an extension of the language modeling approach. We propose and evaluate two different approaches to updating a query language model based on feedback documents, one based on a generarive probabilistic model of feedback documents and one based on minimization of the KL-divergence over feedback documents. Experiment resuits show that both approaches are effective and outperform the Rocchio feedback approach.

Pagerank without hyperlinks: structural re-ranking using links induced by language models

by Oren Kurland, Lillian Lee - In Proceedings of SIGIR , 2005
"... Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider gener ..."
Abstract - Cited by 66 (10 self) - Add to MetaCart
Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider generation links, which indicate that the language model induced from one document assigns high probability to the text of another; in doing so, we take care to prevent bias against long documents. We study a number of re-ranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard language-model-based retrieval is quite effective at improving precision at top ranks.

Subword-based Approaches for Spoken Document Retrieval

by Kenney Ng , 2000
"... This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of ..."
Abstract - Cited by 40 (0 self) - Add to MetaCart
This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recognition vocabulary in order to cover the contents of growing and diverse message collections. The use of subword units in the recognizer constrains the size of the vocabulary needed to cover the language; and the use of subword units as indexing terms allows for the detection of new user-specified query terms during retrieval. Four

Twenty-One at TREC-8: using Language Technology for Information Retrieval

by Wessel Kraaij, Renée Pohlmann, Djoerd Hiemstra , 1999
"... This paper describes the ocial runs of the Twenty-One group for TREC-8. The Twenty-One group participated in the Ad-hoc, CLIR, Adaptive Filtering and SDR tracks. The main focus of our experiments is the development and evaluation of retrieval methods that are motivated by natural language processi ..."
Abstract - Cited by 32 (12 self) - Add to MetaCart
This paper describes the ocial runs of the Twenty-One group for TREC-8. The Twenty-One group participated in the Ad-hoc, CLIR, Adaptive Filtering and SDR tracks. The main focus of our experiments is the development and evaluation of retrieval methods that are motivated by natural language processing techniques. The following new techniques are introduced in this paper. In the Ad-Hoc and CLIR tasks we experimented with automatic sense disambiguation followed by query expansion or translation. We used a combination of thesaurial and corpus information for the disambiguation process. We continued research on CLIR techniques which exploit the target corpus for an implicit disambiguation, by importing the translation probabilities into the probabilistic term-weighting framework. In ltering we extended the the use of language models for document ranking with a relevance feedback algorithm for query term reweighting. 1 Introduction Twenty-One 1 is a project funded by the EU Tele...

Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term

by Djoerd Hiemstra - In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , 2002
"... This paper follows a formal approach to information retrieval based on statistical language models. By introducing some simple reformulations of the basic language modeling approach we introduce the notion of importance of a query term. The importance of a query term is an unknown parameter that exp ..."
Abstract - Cited by 30 (3 self) - Add to MetaCart
This paper follows a formal approach to information retrieval based on statistical language models. By introducing some simple reformulations of the basic language modeling approach we introduce the notion of importance of a query term. The importance of a query term is an unknown parameter that explicitly models which of the query terms are generated from the relevant documents (the important terms), and which are not (the unimportant terms). The new language modeling approach is shown to explain a number of practical facts of today’s information retrieval systems that are not very well explained by the current state of information retrieval theory, including stop words, mandatory terms, coordination level ranking and retrieval using phrases.

Embedding web-based statistical translation models in cross-language information retrieval

by Wessel Kraaij, Jian-yun Nie, Michel Simard - Computational Linguistics , 2003
"... Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since cu ..."
Abstract - Cited by 29 (3 self) - Add to MetaCart
Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since current models for information retrieval (IR) are still based on a bag of words. The Web provides a vast resource for the automatic construction of parallel corpora that can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this article, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost. 1.

Risk Minimization and Language Modeling in Text Retrieval

by ChengXiang Zhai , 2002
"... ..."
Abstract - Cited by 29 (5 self) - Add to MetaCart
Abstract not found

Better than the real thing? Iterative pseudo-query processing using cluster-based language models

by Oren Kurland, et al.
"... ..."
Abstract - Cited by 20 (3 self) - Add to MetaCart
Abstract not found

Statistical Cross-Language Information Retrieval using N-Best Query Translations

by Marcello Federico, Nicola Bertoldi , 2002
"... This paper presents a novel statistical model for crosslanguage information retrieval. Given a written query in the source language, documents in the target language are ranked by integrating probabilities computed by two statistical models: a query-translation model, which generates most probable t ..."
Abstract - Cited by 20 (2 self) - Add to MetaCart
This paper presents a novel statistical model for crosslanguage information retrieval. Given a written query in the source language, documents in the target language are ranked by integrating probabilities computed by two statistical models: a query-translation model, which generates most probable term-by-term translations of the query, and a query-document model, which evaluates the likelihood of each document and translation. Integration of the two scores is performed over the set of N most probable translations of the query. Experimental results with values N = 1, 5, 10 are presented on the Italian-English bilingual track data used in the CLEF 2000 and 2001 evaluation campaigns.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University