Results 11 -
18 of
18
Minimum Error Rate Training by Sampling the Translation Lattice
"... Minimum Error Rate Training is the algorithm for log-linear model parameter training most used in state-of-the-art Statistical Machine Translation systems. In its original formulation, the algorithm uses N-best lists output by the decoder to grow the Translation Pool that shapes the surface on which ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Minimum Error Rate Training is the algorithm for log-linear model parameter training most used in state-of-the-art Statistical Machine Translation systems. In its original formulation, the algorithm uses N-best lists output by the decoder to grow the Translation Pool that shapes the surface on which the actual optimization is performed. Recent work has been done to extend the algorithm to use the entire translation lattice built by the decoder, instead of N-best lists. We propose here a third, intermediate way, consisting in growing the translation pool using samples randomly drawn from the translation lattice. We empirically measure a systematic improvement in the BLEU scores compared to training using N-best lists, without suffering the increase in computational complexity associated with operating with the whole lattice. 1
Assessing the Trade-Off between System Building Cost and Output Quality in Data-To-Text Generation
"... Abstract. Data-to-text generation systems tend to be knowledge-based and manually built, which limits their reusability and makes them time and cost-intensive to create and maintain. Methods for automating (part of) the system building process exist, but do such methods risk a loss in output quality ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Data-to-text generation systems tend to be knowledge-based and manually built, which limits their reusability and makes them time and cost-intensive to create and maintain. Methods for automating (part of) the system building process exist, but do such methods risk a loss in output quality? In this paper, we investigate the cost/quality trade-off in generation system building. We compare six data-to-text systems which were created by predominantly automatic techniques against six systems for the same domain which were created by predominantly manual techniques. We evaluate the systems using intrinsic automatic metrics and human quality ratings. We find that there is some correlation between degree of automation in the system-building process and output quality (more automation tending to mean lower evaluation scores). We also find that there are discrepancies between the results of the automatic evaluation metrics and the humanassessed evaluation experiments. We discuss caveats in assessing system-building cost and implications of the discrepancies in automatic and human evaluation. 1
Proceedings of the Workshop on Statistical Machine Translation, pages 154--157,
- In Proceedings on the Workshop on Statistical Machine Translation
, 2006
"... The joint probability model proposed by Marcu and Wong (2002) provides a strong probabilistic framework for phrase-based statistical machine translation (SMT). The model's usefulness is, however, limited by the computational complexity of estimating parameters at the phrase level. We present ..."
Abstract
- Add to MetaCart
The joint probability model proposed by Marcu and Wong (2002) provides a strong probabilistic framework for phrase-based statistical machine translation (SMT). The model's usefulness is, however, limited by the computational complexity of estimating parameters at the phrase level. We present the first model to use word alignments for constraining the space of phrasal alignments searched during Expectation Maximization (EM) training. Constraining the joint model improves performance, showing results that are very close to stateof -the-art phrase-based models. It also allows it to scale up to larger corpora and therefore be more widely applicable.
Printed in the United Kingdom 1 Automatic Generation of Weather Forecast Texts Using Comprehensive Probabilistic Generation-Space Models
, 2006
"... Two important recent trends in nlg are (i) probabilistic techniques and (ii) comprehensive approaches that move away from traditional strictly modular and sequential models. This paper reports experiments in which pcru — a generation framework that combines probabilistic generation methodology with ..."
Abstract
- Add to MetaCart
Two important recent trends in nlg are (i) probabilistic techniques and (ii) comprehensive approaches that move away from traditional strictly modular and sequential models. This paper reports experiments in which pcru — a generation framework that combines probabilistic generation methodology with a comprehensive model of the generation space — was used to semi-automatically create five different versions of a weather forecast generator. The generators were evaluated in terms of output quality, development time and computational efficiency against (i) human forecasters, (ii) a traditional handcrafted pipelined nlg system, and (iii) a halogen-style statistical generator. The most striking result is that despite acquiring all decision-making abilities automatically, the best pcru generators produce outputs of high enough quality to be scored more highly by human judges than forecasts written by experts. 1
Verb
"... The distortion cost function used in Mosesstyle machine translation systems has two flaws. First, it does not estimate the future cost of known required moves, thus increasing search errors. Second, all distortion is penalized linearly, even when appropriate reorderings are performed. Because the co ..."
Abstract
- Add to MetaCart
The distortion cost function used in Mosesstyle machine translation systems has two flaws. First, it does not estimate the future cost of known required moves, thus increasing search errors. Second, all distortion is penalized linearly, even when appropriate reorderings are performed. Because the cost function does not effectively constrain search, translation quality decreases at higher distortion limits, which are often needed when translating between languages of different typologies such as Arabic and English. To address these problems, we introduce a method for estimating future linear distortion cost, and a new discriminative distortion model that predicts word movement during translation. In combination, these extensions give a statistically significant improvement over a baseline distortion parameterization. When we triple the distortion limit, our model achieves a +2.32 BLEU average gain over Moses. 1
Improving Chinese-English . . .
, 2009
"... Machine Translation (MT) is a task with multiple components, each of which can be very challenging. This thesis focuses on a difficult language pair – Chinese to English – and works on several language-specific aspects that make translation more difficult. The first challenge this thesis focuses on ..."
Abstract
- Add to MetaCart
Machine Translation (MT) is a task with multiple components, each of which can be very challenging. This thesis focuses on a difficult language pair – Chinese to English – and works on several language-specific aspects that make translation more difficult. The first challenge this thesis focuses on is the differences in the writing systems. In Chinese there are no explicit boundaries between words, and even the definition of a “word” is unclear. We build a general purpose Chinese word segmenter with linguistically inspired features that performs very well on the SIGHAN 2005 bakeoff data. Then we study how Chinese word segmenter performance is related to MT performance, and provide a way to tune the “word ” unit in Chinese so that it can better match up with the English word granularity, and therefore improve MT performance. The second challenge we address is different word order between Chinese and English. We first perform error analysis on three state-of-the-art MT systems to see what the most prominent problems are, especially how different word orders cause translation errors. According to our findings, we propose two solutions to improve Chinese-to-English

