Results 1 - 10
of
35
Minimum bayes-risk decoding for statistical machine translation
- In Proceedings of HLT-NAACL
, 2004
"... We present Minimum Bayes-Risk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of l ..."
Abstract
-
Cited by 78 (10 self)
- Add to MetaCart
We present Minimum Bayes-Risk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of linguistic information from word strings, word-to-word alignments from an MT system, and syntactic structure from parse-trees of source and target language sentences. We report the performance of the MBR decoders on a Chinese-to-English translation task. Our results show that MBR decoding can be used to tune statistical MT performance for specific loss functions. 1
Online Large-Margin Training of Syntactic and Structural Translation Features
"... Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative to MERT. We first show that by parallel processing and exploiting more of the parse forest, we can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost. We then test the method on two classes of features that address deficiencies in the Hiero hierarchical phrasebased model: first, we simultaneously train a large number of Marton and Resnik’s soft syntactic constraints, and, second, we introduce a novel structural distortion model. In both cases we obtain significant improvements in translation performance. Optimizing them in combination, for a total of 56 feature weights, we improve performance by 2.6 Bleu on a subset of the NIST 2006 Arabic-English evaluation data.
Variational Decoding for Statistical Machine Translation
"... Statistical models in machine translation exhibit spurious ambiguity. That is, the probability of an output string is split among many distinct derivations (e.g., trees or segmentations). In principle, the goodness of a string is measured by the total probability of its many derivations. However, fi ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Statistical models in machine translation exhibit spurious ambiguity. That is, the probability of an output string is split among many distinct derivations (e.g., trees or segmentations). In principle, the goodness of a string is measured by the total probability of its many derivations. However, finding the best string (e.g., during decoding) is then computationally intractable. Therefore, most systems use a simple Viterbi approximation that measures the goodness of a string using only its most probable derivation. Instead, we develop a variational approximation, which considers all the derivations but still allows tractable decoding. Our particular variational distributions are parameterized as n-gram models. We also analytically show that interpolating these n-gram models for different n is similar to minimumrisk decoding for BLEU (Tromble et al., 2008). Experiments show that our approach improves the state of the art. 1
Vine parsing and minimum risk reranking for speed and precision
- In CoNLL
, 2006
"... We describe our entry in the CoNLL-X shared task. The system consists of three phases: a probabilistic vine parser (Eisner and N. Smith, 2005) that produces unlabeled dependency trees, a probabilistic relation-labeling model, and a discriminative minimum risk reranker (D. Smith and Eisner, 2006). Th ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We describe our entry in the CoNLL-X shared task. The system consists of three phases: a probabilistic vine parser (Eisner and N. Smith, 2005) that produces unlabeled dependency trees, a probabilistic relation-labeling model, and a discriminative minimum risk reranker (D. Smith and Eisner, 2006). The system is designed for fast training and decoding and for high precision. We describe sources of crosslingual error and ways to ameliorate them. We then provide a detailed error analysis of parses produced for sentences in German (much training data) and Arabic (little training data). 1
Regularization and Search for Minimum Error Rate Training
"... Minimum error rate training (MERT) is a widely used learning procedure for statistical machine translation models. We contrast three search strategies for MERT: Powell’s method, the variant of coordinate descent found in the Moses MERT utility, and a novel stochastic method. It is shown that the sto ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Minimum error rate training (MERT) is a widely used learning procedure for statistical machine translation models. We contrast three search strategies for MERT: Powell’s method, the variant of coordinate descent found in the Moses MERT utility, and a novel stochastic method. It is shown that the stochastic method obtains test set gains of +0.98 BLEU on MT03 and +0.61 BLEU on MT05. We also present a method for regularizing the MERT objective that achieves statistically significant gains when combined with both Powell’s method and coordinate descent. 1
Cunei machine translation platform: System description
- Proceedings of the 3rd Workshop on Example-Based Machine Translation
, 2009
"... In this work we present Cunei, a hybrid, open-source platform for machine translation that models each example of a phrase-pair at run-time and combines them in dynamic collections. This results in a flexible framework that provides consistent modeling and the use of non-local features. 1 ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
In this work we present Cunei, a hybrid, open-source platform for machine translation that models each example of a phrase-pair at run-time and combines them in dynamic collections. This results in a flexible framework that provides consistent modeling and the use of non-local features. 1
Large-scale discriminative n-gram language models for statistical machine translation
- In Proceedings of AMTA
, 2008
"... We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systema ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systematic experiments on several benchmark tests for Chinese to English translation using a hierarchical phrase-based machine translation system, and show that a discriminative language model significantly improves upon a state-of-the-art baseline. The experiments also highlight the benefits of our data selection method. 1
Softmax-margin crfs: Training log-linear models with loss functions
- In Proc. of NAACL
, 2010
"... We describe a method of incorporating taskspecific cost functions into standard conditional log-likelihood (CLL) training of linear structured prediction models. Recently introduced in the speech recognition community, we describe the method generally for structured models, highlight connections to ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We describe a method of incorporating taskspecific cost functions into standard conditional log-likelihood (CLL) training of linear structured prediction models. Recently introduced in the speech recognition community, we describe the method generally for structured models, highlight connections to CLL and max-margin learning for structured prediction (Taskar et al., 2003), and show that the method optimizes a bound on risk. The approach is simple, efficient, and easy to implement, requiring very little change to an existing CLL implementation. We present experimental results comparing with several commonly-used methods for training structured predictors for named-entity recognition. 1
Consensus training for consensus decoding in machine translation
- In EMNLP
, 2009
"... We propose a novel objective function for discriminatively tuning log-linear machine translation models. Our objective explicitly optimizes the BLEU score of expected n-gram counts, the same quantities that arise in forestbased consensus and minimum Bayes risk decoding methods. Our continuous object ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We propose a novel objective function for discriminatively tuning log-linear machine translation models. Our objective explicitly optimizes the BLEU score of expected n-gram counts, the same quantities that arise in forestbased consensus and minimum Bayes risk decoding methods. Our continuous objective can be optimized using simple gradient ascent. However, computing critical quantities in the gradient necessitates a novel dynamic program, which we also present here. Assuming BLEU as an evaluation measure, our objective function has two principle advantages over standard max BLEU tuning. First, it specifically optimizes model weights for downstream consensus decoding procedures. An unexpected second benefit is that it reduces overfitting, which can improve test set BLEU scores when using standard Viterbi decoding. 1
Joshua: An Open Source Toolkit for Parsing-based Machine Translation
"... We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beamand cube-pruning, and k-best extraction. The toolkit also implements s ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beamand cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed computing techniques for scalability. We demonstrate that the toolkit achieves state of the art translation performance on the WMT09 French-English translation task. 1

