Results 1 - 10
of
25
Parsimonious Language Models for Information Retrieval
- In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 2004
"... We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, ..."
Abstract
-
Cited by 216 (37 self)
- Add to MetaCart
We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, they need fewer (non-zero) parameters to describe the data. We apply parsimonious models at three stages of the retrieval process:1) at indexing time; 2) at search time; 3) at feedback time. Experimental results show that we are able to build models that are significantly smaller than standard models, but that still perform at least as well as the standard approaches.
Twenty-One at TREC-8: using Language Technology for Information Retrieval
, 1999
"... This paper describes the ocial runs of the Twenty-One group for TREC-8. The Twenty-One group participated in the Ad-hoc, CLIR, Adaptive Filtering and SDR tracks. The main focus of our experiments is the development and evaluation of retrieval methods that are motivated by natural language processi ..."
Abstract
-
Cited by 32 (12 self)
- Add to MetaCart
This paper describes the ocial runs of the Twenty-One group for TREC-8. The Twenty-One group participated in the Ad-hoc, CLIR, Adaptive Filtering and SDR tracks. The main focus of our experiments is the development and evaluation of retrieval methods that are motivated by natural language processing techniques. The following new techniques are introduced in this paper. In the Ad-Hoc and CLIR tasks we experimented with automatic sense disambiguation followed by query expansion or translation. We used a combination of thesaurial and corpus information for the disambiguation process. We continued research on CLIR techniques which exploit the target corpus for an implicit disambiguation, by importing the translation probabilities into the probabilistic term-weighting framework. In ltering we extended the the use of language models for document ranking with a relevance feedback algorithm for query term reweighting. 1 Introduction Twenty-One 1 is a project funded by the EU Tele...
Embedding web-based statistical translation models in cross-language information retrieval
- Computational Linguistics
, 2003
"... Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since cu ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since current models for information retrieval (IR) are still based on a bag of words. The Web provides a vast resource for the automatic construction of parallel corpora that can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this article, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost. 1.
Improving Query Translation for Cross-Language Information Retrieval using Statistical Models
- SIGIR'01
, 2001
"... Dictionaries have often been used for query translation in crosslanguage information retrieval (CLIR). However, we are faced with the problem of translation ambiguity, i.e. multiple translations are stored in a dictionary for a word. In addition, a word-by-word query translation is not precise enoug ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
Dictionaries have often been used for query translation in crosslanguage information retrieval (CLIR). However, we are faced with the problem of translation ambiguity, i.e. multiple translations are stored in a dictionary for a word. In addition, a word-by-word query translation is not precise enough. In this paper, we explore several methods to improve the previous dictionary-based query translation. First, as many as possible, noun phrases are recognized and translated as a whole by using statistical models and phrase translation patterns. Second, the best word translations are selected based on the cohesion of the translation words. Our experimental results on TREC EnglishChinese CLIR collection show that these techniques result in significant improvements over the simple dictionary approaches, and achieve even better performance than a high-quality machine translation system.
Statistical Cross-Language Information Retrieval using N-Best Query Translations
, 2002
"... This paper presents a novel statistical model for crosslanguage information retrieval. Given a written query in the source language, documents in the target language are ranked by integrating probabilities computed by two statistical models: a query-translation model, which generates most probable t ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
This paper presents a novel statistical model for crosslanguage information retrieval. Given a written query in the source language, documents in the target language are ranked by integrating probabilities computed by two statistical models: a query-translation model, which generates most probable term-by-term translations of the query, and a query-document model, which evaluates the likelihood of each document and translation. Integration of the two scores is performed over the set of N most probable translations of the query. Experimental results with values N = 1, 5, 10 are presented on the Italian-English bilingual track data used in the CLEF 2000 and 2001 evaluation campaigns.
Cross-Lingual Text Categorization
, 2003
"... This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which arises when documents in different languages must be classified according to the same classification tree. We describe practical and cost-e#ective solutions for automatic Cross-Lingual Text Categorization, bot ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which arises when documents in different languages must be classified according to the same classification tree. We describe practical and cost-e#ective solutions for automatic Cross-Lingual Text Categorization, both in case a sufficient number of training examples is available for each new language and in the case that for some language no training examples are available. Experimental
Translation Resources, Merging Strategies and Relevance Feedback for Cross-Language Information Retrieval
"... This paper describes the o#cial runs of the Twenty-One group for the first CLEF workshop. The Twenty-One group participated in the monolingual, bilingual and multilingual tasks. The following new techniques are introduced in this paper. In the bilingual task we experimented with di#erent methods ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
This paper describes the o#cial runs of the Twenty-One group for the first CLEF workshop. The Twenty-One group participated in the monolingual, bilingual and multilingual tasks. The following new techniques are introduced in this paper. In the bilingual task we experimented with di#erent methods to estimate translation probabilities. In the multilingual task we experimented with refinements on raw-score merging techniques and with a new relevance feedback algorithm that re-estimates both the model's translation probabilities and the relevance weights. Finally, we performed preliminary experiments to exploit the web to generate translation probabilities and bilingual dictionaries, notably for English-Italian and English-Dutch. 1
Relating the New Language Models of Information Retrieval to the Traditional Retrieval Models
, 2000
"... During the last two years, exciting new approaches to information retrieval were introduced by a number of different research groups that use statistical language models for retrieval. ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
During the last two years, exciting new approaches to information retrieval were introduced by a number of different research groups that use statistical language models for retrieval.
A database approach to content-based XML retrieval
- In (Fuhr et al
"... This paper describes a first prototype system for content-based retrieval from XML data. The system's design supports both XPath queries and complex information retrieval queries based on a language modelling approach to information retrieval. Evaluation using the INEX benchmark shows that it is ben ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
This paper describes a first prototype system for content-based retrieval from XML data. The system's design supports both XPath queries and complex information retrieval queries based on a language modelling approach to information retrieval. Evaluation using the INEX benchmark shows that it is beneficial if the system is biased to retrieve large XML fragments over small fragments.

