Results 1 - 10
of
10
A hierarchical phrase-based model for statistical machine translation
- In ACL
, 2005
"... We present a statistical phrase-based translation model that uses hierarchical phrases— phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of ..."
Abstract
-
Cited by 257 (7 self)
- Add to MetaCart
We present a statistical phrase-based translation model that uses hierarchical phrases— phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of syntaxbased translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical phrasebased model achieves a relative improvement of 7.5 % over Pharaoh, a state-of-the-art phrase-based system. 1
Word sense disambiguation improves statistical machine translation
- In 45th Annual Meeting of the Association for Computational Linguistics (ACL-07
, 2007
"... Recent research presents conflicting evidence on whether word sense disambiguation (WSD) systems can help to improve the performance of statistical machine translation (MT) systems. In this paper, we successfully integrate a state-of-the-art WSD system into a state-of-the-art hierarchical phrase-bas ..."
Abstract
-
Cited by 45 (3 self)
- Add to MetaCart
Recent research presents conflicting evidence on whether word sense disambiguation (WSD) systems can help to improve the performance of statistical machine translation (MT) systems. In this paper, we successfully integrate a state-of-the-art WSD system into a state-of-the-art hierarchical phrase-based MT system, Hiero. We show for the first time that integrating a WSD system improves the performance of a state-ofthe-art statistical MT system on an actual translation task. Furthermore, the improvement is statistically significant. 1
Arabic preprocessing schemes for statistical machine translation
- in Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
, 2006
"... Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so for morphologically rich languages such as Arabic. In this paper, we study the effect of different word-level preprocessing schemes for Arabic on the quality of phrase-based statistical machine translation. We also present and evaluate different methods for combining preprocessing schemes resulting in improved translation quality. 1
A simple and effective hierarchical phrase reordering model
- In Proceedings of EMNLP 2008
, 2008
"... While phrase-based statistical machine translation systems currently deliver state-of-theart performance, they remain weak on word order changes. Current phrase reordering models can properly handle swaps between adjacent phrases, but they typically lack the ability to perform the kind of long-dista ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
While phrase-based statistical machine translation systems currently deliver state-of-theart performance, they remain weak on word order changes. Current phrase reordering models can properly handle swaps between adjacent phrases, but they typically lack the ability to perform the kind of long-distance reorderings possible with syntax-based systems. In this paper, we present a novel hierarchical phrase reordering model aimed at improving non-local reorderings, which seamlessly integrates with a standard phrase-based system with little loss of computational efficiency. We show that this model can successfully handle the key examples often used to motivate syntax-based systems, such as the rotation of a prepositional phrase around a noun phrase. We contrast our model with reordering models commonly used in phrase-based systems, and show that our approach provides statistically significant BLEU point gains for two language pairs: Chinese-English (+0.53 on MT05 and +0.71 on MT08) and Arabic-English (+0.55 on MT05). 1
Phrase-Based Backoff Models for Machine Translation of Highly Inflected Languages
- in «Proceedings of the 21st International Conference on Computational Linguistics
, 2006
"... We propose a backoff model for phrasebased machine translation that translates unseen word forms in foreign-language text by hierarchical morphological abstractions at the word and the phrase level. The model is evaluated on the Europarl corpus for German-English and Finnish-English translation and ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We propose a backoff model for phrasebased machine translation that translates unseen word forms in foreign-language text by hierarchical morphological abstractions at the word and the phrase level. The model is evaluated on the Europarl corpus for German-English and Finnish-English translation and shows improvements over state-of-the-art phrase-based models. 1
Combination of Arabic Preprocessing Schemes for Statistical
- Machine Translation”, Proceedings of COLING/ACL, 2006
"... Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so for morphologically rich languages such as Arabic. In this paper, we study the effect of different word-level preprocessing schemes for Arabic on the quality of phrase-based statistical machine translation. We also present and evaluate different methods for combining preprocessing schemes resulting in improved translation quality. 1
Decomposability of Translation Metrics for Improved Evaluation and Efficient Algorithms
"... Bleu is the de facto standard for evaluation and development of statistical machine translation systems. We describe three real-world situations involving comparisons between different versions of the same systems where one can obtain improvements in Bleu scores that are questionable or even absurd. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Bleu is the de facto standard for evaluation and development of statistical machine translation systems. We describe three real-world situations involving comparisons between different versions of the same systems where one can obtain improvements in Bleu scores that are questionable or even absurd. These situations arise because Bleu lacks the property of decomposability, a property which is also computationally convenient for various applications. We propose a very conservative modification to Bleu and a cross between Bleu and word error rate that address these issues while improving correlation with human judgments. 1
Automatic Phrase Alignment Using statistical n-gram alignment for syntactic phrase alignment GSLT Statistical Methods- Term Paper
"... A parallel treebank consists of syntactically annotated sentences in two or more languages, taken from translated (i.e. parallel) documents. These parallel sentences are linked through alignment. Much work has been done on sentence and word alignment, but not as much on the intermediate level. This ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A parallel treebank consists of syntactically annotated sentences in two or more languages, taken from translated (i.e. parallel) documents. These parallel sentences are linked through alignment. Much work has been done on sentence and word alignment, but not as much on the intermediate level. This paper explores using n-gram alignment created for statistical machine translation based on GIZA++ word alignment. The n-grams are compared to the syntactic phrases of two parallel treebanks to create phrase alignment. The experiments show good results, even though the n-gram alignment is not very good, due to a small training material. 1
Hindi-to-Urdu Machine Translation Through Transliteration
"... We present a novel approach to integrate transliteration into Hindi-to-Urdu statistical machine translation. We propose two probabilistic models, based on conditional and joint probability formulations, that are novel solutions to the problem. Our models consider both transliteration and translation ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a novel approach to integrate transliteration into Hindi-to-Urdu statistical machine translation. We propose two probabilistic models, based on conditional and joint probability formulations, that are novel solutions to the problem. Our models consider both transliteration and translation when translating a particular Hindi word given the context whereas in previous work transliteration is only used for translating OOV (out-of-vocabulary) words. We use transliteration as a tool for disambiguation of Hindi homonyms which can be both translated or transliterated or transliterated differently based on different contexts. We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system. This indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu. 1
A Joint Sequence Translation Model with Integrated Reordering
"... We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the ..."
Abstract
- Add to MetaCart
We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task. 1

