Results 1 -
3 of
3
Large-scale discriminative n-gram language models for statistical machine translation
- In Proceedings of AMTA
, 2008
"... We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systema ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systematic experiments on several benchmark tests for Chinese to English translation using a hierarchical phrase-based machine translation system, and show that a discriminative language model significantly improves upon a state-of-the-art baseline. The experiments also highlight the benefits of our data selection method. 1
2009a), A critique of statistical machine translation
- in Walter Daelemans & Véronique Hoste (eds.), Journal of translation and interpreting studies: Special Issue on Evaluation of Translation Technology, Linguistica Antverpiensia
"... Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators, and somewhat more surprisingly perhaps, many experienced MT protagonists, find the basic model extremely ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators, and somewhat more surprisingly perhaps, many experienced MT protagonists, find the basic model extremely difficult to understand. The main aim of this paper, therefore, is to discuss why this might be the case. Our basic thesis is that proponents of PB-SMT do not seek to address any community other than their own, for they do not feel any need to do so. We will demonstrate that this was not always the case; on the contrary, when statistical models of translation were first presented, the language used to describe how such a model might work was very conciliatory, and inclusive. Over the next five years things changed considerably; once SMT achieved dominance particularly over the rule-based paradigm, it had established a position where it did not need to bring along the rest of the MT community with it, and in our view, this has largely pertained to this day. Having discussed these issues, we will provide three additional observations: firstly, we will discuss the role of automatic MT evaluation metrics when describing PB-SMT systems; secondly, we will comment on the recent syntactic embellishments of PB-SMT, noting especially that most of these contributions have come from researchers who have prior experience in fields other than statistical models of translation; and finally, we will briefly comment on the relationship between PB-SMT and other models of translation, suggesting that there are many gains to be had if the SMT community were to open up more to the other MT paradigms. 1
Improving Chinese-English . . .
, 2009
"... Machine Translation (MT) is a task with multiple components, each of which can be very challenging. This thesis focuses on a difficult language pair – Chinese to English – and works on several language-specific aspects that make translation more difficult. The first challenge this thesis focuses on ..."
Abstract
- Add to MetaCart
Machine Translation (MT) is a task with multiple components, each of which can be very challenging. This thesis focuses on a difficult language pair – Chinese to English – and works on several language-specific aspects that make translation more difficult. The first challenge this thesis focuses on is the differences in the writing systems. In Chinese there are no explicit boundaries between words, and even the definition of a “word” is unclear. We build a general purpose Chinese word segmenter with linguistically inspired features that performs very well on the SIGHAN 2005 bakeoff data. Then we study how Chinese word segmenter performance is related to MT performance, and provide a way to tune the “word ” unit in Chinese so that it can better match up with the English word granularity, and therefore improve MT performance. The second challenge we address is different word order between Chinese and English. We first perform error analysis on three state-of-the-art MT systems to see what the most prominent problems are, especially how different word orders cause translation errors. According to our findings, we propose two solutions to improve Chinese-to-English

