Results 1 -
3 of
3
Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation
- Proceedings of the 3rd Workshop on Building and Using Comparable Corpora. Applications of Parallel and Comparable Corpora in Natural Language Engineering and the Humanities
, 2010
"... Lack of sufficient linguistic resources and parallel corpora for many languages and domains currently is one of the major obstacles to further advancement of automated translation. The solution proposed in this paper is to exploit the fact that non-parallel bi- or multilingual text resources are muc ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Lack of sufficient linguistic resources and parallel corpora for many languages and domains currently is one of the major obstacles to further advancement of automated translation. The solution proposed in this paper is to exploit the fact that non-parallel bi- or multilingual text resources are much more widely available than parallel translation data. This position paper presents previous research in this field and research plans of the ACCURAT project. Its goal is to find, analyze and evaluate novel methods that exploit comparable corpora in order to compensate for the shortage of linguistic resources, and ultimately to significantly improve MT quality for under-resourced languages and narrow domains. 1.
Anchor Points for Bilingual Lexicon Extraction from Small Comparable Corpora
"... 2 rue de la houssinière ..."
Revisiting context-based projection methods for termtranslation spotting in comparable corpora
- In Proceedings of the 23rd International Conference on Computational Linguistics
, 2010
"... Context-based projection methods for identifying the translation of terms in comparable corpora has attracted a lot of attention in the community, e.g. (Fung, 1998; Rapp, 1999). Surprisingly, none of those works have systematically investigated the impact of the many parameters controlling their app ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Context-based projection methods for identifying the translation of terms in comparable corpora has attracted a lot of attention in the community, e.g. (Fung, 1998; Rapp, 1999). Surprisingly, none of those works have systematically investigated the impact of the many parameters controlling their approach. The present study aims at doing just this. As a testcase, we address the task of translating terms of the medical domain by exploiting pages mined from Wikipedia. One interesting outcome of this study is that significant gains can be obtained by using an association measure that is rarely used in practice. 1

