Results 1 - 10
of
99
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web
- In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1999
"... This paper describes the use of a probabilistic translation model to cross-language IR (CLIR). The performance of this approach is compared with that using machine translation (MT). It is shown that using a probabilistic model, we are able to obtain performances close to those using an MT system. In ..."
Abstract
-
Cited by 90 (8 self)
- Add to MetaCart
This paper describes the use of a probabilistic translation model to cross-language IR (CLIR). The performance of this approach is compared with that using machine translation (MT). It is shown that using a probabilistic model, we are able to obtain performances close to those using an MT system. In addition, we also investigated the possibility of automatically gather parallel texts from the Web in an attempt to construct a reasonable training corpus. The result is very encouraging. We showed that in several tests, such a training corpus is as good as a manually constructed one for CLIR purposes.
Cross-Lingual Relevance Models
, 2002
"... We propose a formal model of Cross-Language Information Retrieval that does not rely on either query translation or document translation. Our approach leverages recent advances in language modeling to directly estimate an accurate topic model in the target language, starting with a query in the sour ..."
Abstract
-
Cited by 66 (5 self)
- Add to MetaCart
We propose a formal model of Cross-Language Information Retrieval that does not rely on either query translation or document translation. Our approach leverages recent advances in language modeling to directly estimate an accurate topic model in the target language, starting with a query in the source language. The model integrates popular techniques of disambiguation and query expansion in a unified formal framework. We describe how the topic model can be estimated with either a parallel corpus or a dictionary. We test the framework by constructing Chinese topic models from English queries and using them in the CLIR task of TREC9. The model achieves performance around 95% of the strong mono-lingual baseline in terms of average precision. In initial precision, our model outperforms the monolingual baseline by 20%. The main contribution of this work is the unified formal model which integrates techniques that are essential for e#ective Cross-Language Retrieval.
Information retrieval on the Web
- ACM Computing Surveys
, 2000
"... In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical figures vary, overall trends cited
Should we Translate the Documents or the Queries in Cross-language Information Retrieval?
, 1999
"... Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation di- rections using the same training data. W ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation di- rections using the same training data. We investigate information retrieval between English and French, incorporating both trans- lations directions into both document trans- lation and query translation-based information retrieval, as well as into hybrid systems. We find that hybrids o document and query translation-based systems outperform query translation systems, even human-quality query translation systems. I
Disambiguation strategies for cross-language information retrieval
- In Proceedings of the third European Conference on Research and Advanced Technology for Digital Libraries (ECDL
, 1999
"... Keywords: Cross-Language Information Retrieval, Statistical Machine ..."
Abstract
-
Cited by 33 (11 self)
- Add to MetaCart
Keywords: Cross-Language Information Retrieval, Statistical Machine
Twenty-One at TREC-8: using Language Technology for Information Retrieval
, 1999
"... This paper describes the ocial runs of the Twenty-One group for TREC-8. The Twenty-One group participated in the Ad-hoc, CLIR, Adaptive Filtering and SDR tracks. The main focus of our experiments is the development and evaluation of retrieval methods that are motivated by natural language processi ..."
Abstract
-
Cited by 32 (12 self)
- Add to MetaCart
This paper describes the ocial runs of the Twenty-One group for TREC-8. The Twenty-One group participated in the Ad-hoc, CLIR, Adaptive Filtering and SDR tracks. The main focus of our experiments is the development and evaluation of retrieval methods that are motivated by natural language processing techniques. The following new techniques are introduced in this paper. In the Ad-Hoc and CLIR tasks we experimented with automatic sense disambiguation followed by query expansion or translation. We used a combination of thesaurial and corpus information for the disambiguation process. We continued research on CLIR techniques which exploit the target corpus for an implicit disambiguation, by importing the translation probabilities into the probabilistic term-weighting framework. In ltering we extended the the use of language models for document ranking with a relevance feedback algorithm for query term reweighting. 1 Introduction Twenty-One 1 is a project funded by the EU Tele...
Improving Query Translation for Cross-Language Information Retrieval using Statistical Models
- SIGIR'01
, 2001
"... Dictionaries have often been used for query translation in crosslanguage information retrieval (CLIR). However, we are faced with the problem of translation ambiguity, i.e. multiple translations are stored in a dictionary for a word. In addition, a word-by-word query translation is not precise enoug ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
Dictionaries have often been used for query translation in crosslanguage information retrieval (CLIR). However, we are faced with the problem of translation ambiguity, i.e. multiple translations are stored in a dictionary for a word. In addition, a word-by-word query translation is not precise enough. In this paper, we explore several methods to improve the previous dictionary-based query translation. First, as many as possible, noun phrases are recognized and translated as a whole by using statistical models and phrase translation patterns. Second, the best word translations are selected based on the cohesion of the translation words. Our experimental results on TREC EnglishChinese CLIR collection show that these techniques result in significant improvements over the simple dictionary approaches, and achieve even better performance than a high-quality machine translation system.
Japanese/English Cross-Language Information Retrieval: Exploration of Query . . .
- COMPUTERS AND THE HUMANITIES
, 2001
"... Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper
The Effects Of Query Complexity, Expansion And Structure On Retrieval Performance In Probabilistic Text Retrieval
- University of Tampere
, 1999
"... ueries using all search facets identified from requests, low complexity was achieved by formulating queries with major facets only. Query expansion was based on a thesaurus, from which the expansion keys were elicited for queries. There were five expansion types: (1) the first query version was an u ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
ueries using all search facets identified from requests, low complexity was achieved by formulating queries with major facets only. Query expansion was based on a thesaurus, from which the expansion keys were elicited for queries. There were five expansion types: (1) the first query version was an unexpanded, original query with one search key for each search concept (original search concepts) elicited from the test thesaurus; (2) the synonyms of the original search keys were added to the original query; (3) search keys representing the narrower concepts of the original search concepts were added to the original query; (4) search keys representing the associative concepts of the original search concepts were added to the original query; (5) all previous expansion keys were cumulatively added to the original query. Query structure refers to the syntactic structure of a query expression, marked with query operators and parentheses. The structure of queries was either weak (queries with n
Improving cross-language retrieval using backoff translation
- In Proceedings of the first international
, 2001
"... The limited coverage of available translation lexicons can pose a serious challenge in some cross-language information retrieval applications. We present two techniques for combining evidence from dictionary-based and corpus-based translation lexicons, and show that backoff translation outperforms a ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
The limited coverage of available translation lexicons can pose a serious challenge in some cross-language information retrieval applications. We present two techniques for combining evidence from dictionary-based and corpus-based translation lexicons, and show that backoff translation outperforms a technique based on merging lexicons. 1.

