Results 1 -
5 of
5
The University of Maryland statistical machine translation system for the Fourth Workshop on Machine Translation
- In Proceedings of the EACL-2009 Workshop on Statistical Machine Translation
, 2009
"... This paper describes the techniques we explored to improve the translation of news text in the German-English and Hungarian-English tracks of the WMT09 shared translation task. Beginning with a convention hierarchical phrase-based system, we found benefits for using word segmentation lattices as inp ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
This paper describes the techniques we explored to improve the translation of news text in the German-English and Hungarian-English tracks of the WMT09 shared translation task. Beginning with a convention hierarchical phrase-based system, we found benefits for using word segmentation lattices as input, explicit generation of beginning and end of sentence markers, minimum Bayes risk decoding, and incorporation of a feature scoring the alignment of function words in the hypothesized translation. We also explored the use of monolingual paraphrases to improve coverage, as well as co-training to improve the quality of the segmentation lattices used, but these did not lead to improvements. 1
Using a maximum entropy model to build segmentation lattices for MT
- In NAACL
"... Recent work has shown that translating segmentation lattices (lattices that encode alternative ways of breaking the input to an MT system into words), rather than text in any particular segmentation, improves translation quality of languages whose orthography does not mark morpheme boundaries. Howev ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Recent work has shown that translating segmentation lattices (lattices that encode alternative ways of breaking the input to an MT system into words), rather than text in any particular segmentation, improves translation quality of languages whose orthography does not mark morpheme boundaries. However, much of this work has relied on multiple segmenters that perform differently on the same input to generate sufficiently diverse source segmentation lattices. In this work, we describe a maximum entropy model of compound word splitting that relies on a few general features that can be used to generate segmentation lattices for most languages with productive compounding. Using a model optimized for German translation, we present results showing significant improvements in translation quality in German-English, Hungarian-English, and Turkish-English translation over state-ofthe-art baselines. 1
Candidacy Examination
"... What empirical evidence is there that adding syntactic constraints to MT decoding particular, PMT decoding will lead to improvements in translation quality? Your proposal claims that your method for adding syntactic constraints will result not only in a more complete search of the space of string pe ..."
Abstract
- Add to MetaCart
What empirical evidence is there that adding syntactic constraints to MT decoding particular, PMT decoding will lead to improvements in translation quality? Your proposal claims that your method for adding syntactic constraints will result not only in a more complete search of the space of string permutations involved in PMT but also in an improved ability to discriminate between good and bad translations. In Section 3 you claim that the ability to account for syntactically governed re-ordering patterns is an advantage and in Section 4 you claim, on the basis of a constructed example, that your proposed method will improve quality by removing ungrammatical but high scoring distractor analyses, and that the completeness of the search will be improved by reducing the need for aggressive heuristics about re-ordering. Do you anticipate that separate constraints on re-ordering will still be required? If not, say why not. If so, brie y sketch how these constraints will be implemented and the means by which they will interact with the new syntactic constraints. Statistical MT (SMT) systems are based on the source-channel model of communication (Weaver, 1949; Brown et al., 1993, 1990) whereby an output string is modelled as being
Statistical Machine Translation with Local Language Models
"... Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that bu ..."
Abstract
- Add to MetaCart
Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models lead to more context-sensitive probability distributions and we also exploit the fact that different local models are used to estimate the language model probability of each word during decoding. Our approach is evaluated for Arabic- and Chinese-to-English translation. We show that it leads to statistically significant improvements for multiple test sets and also across different genres, when compared against a competitive baseline and a system using a part-of-speech model. 1
Translation Shared Task on Statistical Machine Translation: A Comparison of the Systems Output
"... The ACL Workshop on Statistical Machine Translation proposed a translation shared task focused on European language pairs. Participants used their systems to translate into the target language a test set of unseen sentences in the source language. Involved languages were French, English, Spanish, Ge ..."
Abstract
- Add to MetaCart
The ACL Workshop on Statistical Machine Translation proposed a translation shared task focused on European language pairs. Participants used their systems to translate into the target language a test set of unseen sentences in the source language. Involved languages were French, English, Spanish, German, Czech and Hungarian. The goal of this work is to quantitatively compare the translations generated by different systems. In particular, a selection of submitted runs for the French-English, German-English and Spanish-English tasks were analyzed. The systems involved in our investigation cover all the main approaches to machine translation, that is rule-based, statistical, example-based and hybrid.

