Results 1 -
8 of
8
Multilingual information retrieval using english and chinese queries
- In
, 2002
"... We participated in the CLEF 2001 monolingual, bilingual, and multilingual tasks. Our interests in these tasks are to test the utility of applying Chinese word segmentation algorithms to German decompounding, to experiment with techniques for combining translations from diverse resources, and to expe ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We participated in the CLEF 2001 monolingual, bilingual, and multilingual tasks. Our interests in these tasks are to test the utility of applying Chinese word segmentation algorithms to German decompounding, to experiment with techniques for combining translations from diverse resources, and to experiment with different approaches to multilingual retrieval. This paper describes our retrieval experiments. 1
Experiments on Cross-language and Patent Retrieval at NTCIR-3 Workshop
- In Proceedings of NTCIR-3
, 2003
"... The Berkeley group participated in the crosslanguage retrieval task and the patent retrieval task at the third NTCIR workshop. This paper describes our experiments on cross-language and patent retrieval. We present an automatic relevance feedback procedure for document ranking formula based on logis ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The Berkeley group participated in the crosslanguage retrieval task and the patent retrieval task at the third NTCIR workshop. This paper describes our experiments on cross-language and patent retrieval. We present an automatic relevance feedback procedure for document ranking formula based on logistic regression, and a procedure for automatically extracting Chinese/Japanese translations of English words from search results returned from Internet search engines using English words as queries.
Statistical query translation models for cross-language information retrieval
- ACM Transactions on Asian Language Information Processing (TALIP
, 2006
"... Query translation is an important task in cross-language information retrieval (CLIR), which aims to determine the best translation words and weights for a query. This paper presents three statistical query translation models that focus on resolution of query translation ambiguities. All the models ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Query translation is an important task in cross-language information retrieval (CLIR), which aims to determine the best translation words and weights for a query. This paper presents three statistical query translation models that focus on resolution of query translation ambiguities. All the models assume that the selection of the translation of a query term depends on the translations of other terms in the query. They differ in the way linguistic structures are detected and exploited. The co-occurrence model treats a query as a bag of words, and use all the other terms in the query as the context for translation disambiguation. The other two models exploit linguistic dependencies among terms. The noun phrase (NP) translation model detects NPs in a query, and translates each NP as a unit by assuming that the translation of a term only depends on other terms within the same NP. Similarly, the dependency translation model detects and translates dependency triples, such as verb-object, as units. The evaluations show that linguistic structures always lead to more precise translations. The experiments of CLIR on TREC Chinese collections show that all the three models have a positive impact on query translation, and lead to significant improvements of CLIR performance over the simple dictionary-based translation method. The best results are obtained by combining the three models.
Resource Selection for Domain-Specific Cross-Lingual IR
- In Proc. of the 27th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR
, 2004
"... An under-explored question in cross-language information retrieval (CLIR) is to what degree the performance of CLIR methods depends on the availability of high-quality translation resources for particular domains. To address this issue, we evaluate several competitive CLIR methods - with different t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
An under-explored question in cross-language information retrieval (CLIR) is to what degree the performance of CLIR methods depends on the availability of high-quality translation resources for particular domains. To address this issue, we evaluate several competitive CLIR methods - with different training corpora - on test documents in the medical domain. Our results show severe performance degradation when using a general-purpose training corpus or a commercial machine translation system (SYSTRAN), versus a domain-specific training corpus. A related unexplored question is whether we can improve CLIR performance by systematically analyzing training resources and optimally matching them to target collections. We start exploring this problem by suggesting a simple criterion for automatically matching training resources to target corpora. By using cosine similarity between training and target corpora as resource weights we obtained an average of 5.6% improvement over using all resources with no weights. The same metric yields 99.4% of the performance obtained when an oracle chooses the optimal resource every time.
Multilingual Web Retrieval: An Experiment in English–Chinese Business Intelligence
, 2005
"... As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIR), the study of retrieving information in one language by queries ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIR), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC)collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English–Chinese Web portal that incorporates various CLIR techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6 % improvement in precision over simple word-byword translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0 % improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise.
Exploiting the Web as the Multilingual Corpus for Unknown Query Translation
"... Users ’ cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this paper, we investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Users ’ cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this paper, we investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. We propose a Web-based term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine. This approach can enhance the construction of a domain-specific bilingual lexicon and bring multilingual support to a digital library that only has monolingual document collections. Very promising results have been obtained in generating effective translation equivalents for many unknown terms, including proper nouns, technical terms and Web query terms and in assisting bilingual lexicon construction for a real digital library system.
Domain Adaptation of Translation Models for Multilingual Applications
, 2009
"... number W0550432. ..."
Translating Common English and Chinese Verb-Noun Pairs in Technical Documents with Collocational and Bilingual Information
"... Abstract. We studied a special case for the translation of English verbs in verb-object pairs. Researchers have studied the effects of the linguistic information about the verbs being translated, and many have reported how considering the objects of the verbs will facilitate the quality of translati ..."
Abstract
- Add to MetaCart
Abstract. We studied a special case for the translation of English verbs in verb-object pairs. Researchers have studied the effects of the linguistic information about the verbs being translated, and many have reported how considering the objects of the verbs will facilitate the quality of translations. In this study, we took an extreme venue – assuming the availability of the Chinese translations of the English objects. We explored the issue with thousands of samples that we extracted from 2011 NTCIR PatentMT workshop. The results indicated that, when the English verbs and objects were known, the information about the object’s Chinese translation could still improve the quality of the verb’s translations but not quite significantly.

