Results 1 -
3 of
3
Weighted alignment matrices for statistical machine translation
- In Proceedings of the EMNLP
, 2009
"... Current statistical machine translation systems usually extract rules from bilingual corpora annotated with 1-best alignments. They are prone to learn noisy rules due to alignment mistakes. We propose a new structure called weighted alignment matrix to encode all possible alignments for a parallel t ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Current statistical machine translation systems usually extract rules from bilingual corpora annotated with 1-best alignments. They are prone to learn noisy rules due to alignment mistakes. We propose a new structure called weighted alignment matrix to encode all possible alignments for a parallel text compactly. The key idea is to assign a probability to each word pair to indicate how well they are aligned. We design new algorithms for extracting phrase pairs from weighted alignment matrices and estimating their probabilities. Our experiments on multiple language pairs show that using weighted matrices achieves consistent improvements over using n-best lists in significant less extraction time. 1
Building Strong Multilingual Aligned Corpora
"... Recent advances have allowed algorithms that learn from aligned natural language texts to exploit aligned sentences in more than two languages. We investigate ways of combining () N 2 bilingual aligned corpora together to create a multilingual aligned corpus across N languages. As a result of the co ..."
Abstract
- Add to MetaCart
Recent advances have allowed algorithms that learn from aligned natural language texts to exploit aligned sentences in more than two languages. We investigate ways of combining () N 2 bilingual aligned corpora together to create a multilingual aligned corpus across N languages. As a result of the combination of several corpora, our algorithms output a multilingual corpus, with each aligned tuple assigned a quality score called ‘strength ’ that may be used when learning from the multilingual corpus. We show that the addition of bilingual corpora used with alignment strengths can significantly improve Statistical Machine Translation quality on an Arabic→English task. 1

