Results 1 -
5 of
5
2006. Statistical Machine Translation of German Compound Words
- FinTAL - 5th International Conference on Natural Language Processing, Springer Verlag, LNCS
, 2006
"... Abstract. German compound words pose special problems to statistical machine translation systems: the occurence of each of the components in the training data is not sufficient for successful translation. Even if the compound itself has been seen during training, the system may not be capable of tra ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Abstract. German compound words pose special problems to statistical machine translation systems: the occurence of each of the components in the training data is not sufficient for successful translation. Even if the compound itself has been seen during training, the system may not be capable of translating it properly into two or more words. If German is the target language, the system might generate only separated components or may not be capable of choosing the correct compound. In this work, we investigate and compare different strategies for the treatment of German compound words in statistical machine translation systems. For translation from German, we compare linguistic-based and corpusbased compound splitting. For translation into German, we investigate splitting and rejoining German compounds, as well as joining English potential components. Additionaly, we investigate word alignments enhanced with knowledge about the splitting points of German compounds. The translation quality is consistently improved by all methods for both translation directions. 1
Context-dependent alignment models for Statistical Machine Translation
- In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
, 2009
"... We introduce alignment models for Machine Translation that take into account the context of a source word when determining its translation. Since the use of these contexts alone causes data sparsity problems, we develop a decision tree algorithm for clustering the contexts based on optimisation of t ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We introduce alignment models for Machine Translation that take into account the context of a source word when determining its translation. Since the use of these contexts alone causes data sparsity problems, we develop a decision tree algorithm for clustering the contexts based on optimisation of the EM auxiliary function. We show that our contextdependent models lead to an improvement in alignment quality, and an increase in translation quality when the alignments are used in Arabic-English and Chinese-English translation. 1
2007. Getting to know Moses: Initial experiments on German-English factored translation
- Proceedings of ACL Second Workshop on Statistical Machine Translation. 181–184
, 2007
"... We present results and experiences from our experiments with phrase-based statistical machine translation using Moses. The paper is based on the idea of using an offthe-shelf parser to supply linguistic information to a factored translation model and compare the results of German–English translation ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We present results and experiences from our experiments with phrase-based statistical machine translation using Moses. The paper is based on the idea of using an offthe-shelf parser to supply linguistic information to a factored translation model and compare the results of German–English translation to the shared task baseline system based on word form. We report partial results for this model and results for two simplified setups. Our best setup takes advantage of the parser’s lemmatization and decompounding. A qualitative analysis of compound translation shows that decompounding improves translation quality. 1
Improving statistical word alignments with morpho-syntactic transformations
- Proceedings of 5th International Conference on Natural Language Processing, FinTAL’06
, 2006
"... Abstract. This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignm ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish–English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability. 1
Alignment Models and Algorithms for Statistical Machine Translation
, 2010
"... This degree is submitted to the University of Cambridge ..."

