Results 1 -
5 of
5
Arabic preprocessing schemes for statistical machine translation
- in Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
, 2006
"... Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so for morphologically rich languages such as Arabic. In this paper, we study the effect of different word-level preprocessing schemes for Arabic on the quality of phrase-based statistical machine translation. We also present and evaluate different methods for combining preprocessing schemes resulting in improved translation quality. 1
Computing Consensus Translation from Multiple Machine Translation Systems Using Enhanced Hypotheses Alignment
- Cambridge University Engineering Department
, 2006
"... This paper describes a novel method for computing a consensus translation from the outputs of multiple machine translation (MT) systems. The outputs are combined and a possibly new translation hypothesis can be generated. Similarly to the well-established ROVER approach of (Fiscus, 1997) for ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This paper describes a novel method for computing a consensus translation from the outputs of multiple machine translation (MT) systems. The outputs are combined and a possibly new translation hypothesis can be generated. Similarly to the well-established ROVER approach of (Fiscus, 1997) for combining speech recognition hypotheses, the consensus translation is computed by voting on a confusion network. To create the confusion network, we produce pairwise word alignments of the original machine translation hypotheses with an enhanced statistical alignment algorithm that explicitly models word reordering. The context of a whole document of translations rather than a single sentence is taken into account to produce the alignment.
An empirical study on computing consensus translations from multiple machine translation systems
- In EMNLP
, 2007
"... This paper presents an empirical study on how different selections of input translation systems affect translation quality in system combination. We give empirical evidence that the systems to be combined should be of similar quality and need to be almost uncorrelated in order to be beneficial for s ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
This paper presents an empirical study on how different selections of input translation systems affect translation quality in system combination. We give empirical evidence that the systems to be combined should be of similar quality and need to be almost uncorrelated in order to be beneficial for system combination. Experimental results are presented for composite translations computed from large numbers of different research systems as well as a set of translation systems derived from one of the bestranked machine translation engines in the 2006 NIST machine translation evaluation. 1
Combination of Arabic Preprocessing Schemes for Statistical
- Machine Translation”, Proceedings of COLING/ACL, 2006
"... Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so for morphologically rich languages such as Arabic. In this paper, we study the effect of different word-level preprocessing schemes for Arabic on the quality of phrase-based statistical machine translation. We also present and evaluate different methods for combining preprocessing schemes resulting in improved translation quality. 1
The NiCT-ATR statistical machine translation system for the iwslt 2006 evaluation
- in Proc. IWSLT, 2006
"... This paper describes the NiCT-ATR statistical machine translation (SMT) system used for the IWSLT 2006 evaluation compaign. We participated in all four language pair translation tasks (CE, JE, AE and IE) and all two tracks (OPEN and CSTAR). We used a phrase-based SMT in the OPEN track and a hybrid m ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper describes the NiCT-ATR statistical machine translation (SMT) system used for the IWSLT 2006 evaluation compaign. We participated in all four language pair translation tasks (CE, JE, AE and IE) and all two tracks (OPEN and CSTAR). We used a phrase-based SMT in the OPEN track and a hybrid multiple translation engine in the CSTAR track. We also equipped our system with some of new preprocessing and post-processing techniques for Chinese word segmentation, named entity translation, punctuation and capitalization, sentence splitting, and language model adaptation. Our experiments show these features significantly improved our system. 1.

