Results 1 - 10
of
32
Forestbased translation
- In Proceedings of ACL-08: HLT
, 2008
"... Among syntax-based translation models, the tree-based approach, which takes as input a parse tree of the source sentence, is a promising direction being faster and simpler than its string-based counterpart. However, current tree-based systems suffer from a major drawback: they only use the 1-best pa ..."
Abstract
-
Cited by 41 (16 self)
- Add to MetaCart
Among syntax-based translation models, the tree-based approach, which takes as input a parse tree of the source sentence, is a promising direction being faster and simpler than its string-based counterpart. However, current tree-based systems suffer from a major drawback: they only use the 1-best parse to direct the translation, which potentially introduces translation mistakes due to parsing errors. We propose a forest-based approach that translates a packed forest of exponentially many parses, which encodes many more alternatives than standard n-best lists. Large-scale experiments show an absolute improvement of 1.7 BLEU points over the 1-best baseline. This result is also 0.8 points higher than decoding with 30-best parses, and takes even less time. 1
Preference Grammars: Softening Syntactic Constraints to Improve Statistical Machine Translation
"... We propose a novel probabilistic synchoronous context-free grammar formalism for statistical machine translation, in which syntactic nonterminal labels are represented as “soft ” preferences rather than as “hard” matching constraints. This formalism allows us to efficiently score unlabeled synchrono ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We propose a novel probabilistic synchoronous context-free grammar formalism for statistical machine translation, in which syntactic nonterminal labels are represented as “soft ” preferences rather than as “hard” matching constraints. This formalism allows us to efficiently score unlabeled synchronous derivations without forgoing traditional syntactic constraints. Using this score as a feature in a log-linear model, we are able to approximate the selection of the most likely unlabeled derivation. This helps reduce fragmentation of probability across differently labeled derivations of the same translation. It also allows the importance of syntactic preferences to be learned alongside other features (e.g., the language model) and for particular labeling procedures. We show improvements in translation quality on small and medium sized Chinese-to-English translation tasks. 1
A Scalable Decoder for Parsing-based Machine Translation with Equivalent Language Model State Maintenance
- ACL SSST
, 2008
"... We describe a scalable decoder for parsingbased machine translation. The decoder is written in JAVA and implements all the essential algorithms described in Chiang (2007): chart-parsing, m-gram language model integration, beam- and cube-pruning, and unique k-best extraction. Additionally, parallel a ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We describe a scalable decoder for parsingbased machine translation. The decoder is written in JAVA and implements all the essential algorithms described in Chiang (2007): chart-parsing, m-gram language model integration, beam- and cube-pruning, and unique k-best extraction. Additionally, parallel and distributed computing techniques are exploited to make it scalable. We also propose an algorithm to maintain equivalent language model states that exploits the back-off property of m-gram language models: instead of maintaining a separate state for each distinguished sequence of “state ” words, we merge multiple states that can be made equivalent for language model probability calculations due to back-off. We demonstrate experimentally that our decoder is more than 30 times faster than a baseline decoder written in PYTHON. We propose to release our decoder as an opensource toolkit. 1
Efficient Parsing for Transducer Grammars
"... The tree-transducer grammars that arise in current syntactic machine translation systems are large, flat, and highly lexicalized. We address the problem of parsing efficiently with such grammars in three ways. First, we present a pair of grammar transformations that admit an efficient cubic-time CKY ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The tree-transducer grammars that arise in current syntactic machine translation systems are large, flat, and highly lexicalized. We address the problem of parsing efficiently with such grammars in three ways. First, we present a pair of grammar transformations that admit an efficient cubic-time CKY-style parsing algorithm despite leaving most of the grammar in n-ary form. Second, we show how the number of intermediate symbols generated by this transformation can be substantially reduced through binarization choices. Finally, we describe a two-pass coarse-to-fine parsing approach that prunes the search space using predictions from a subset of the original grammar. In all, parsing time reduces by 81%. We also describe a coarse-to-fine pruning scheme for forest-based language model reranking that allows a 100-fold increase in beam size while reducing decoding time. The resulting translations improve by 1.3 BLEU. 1
Consensus training for consensus decoding in machine translation
- In EMNLP
, 2009
"... We propose a novel objective function for discriminatively tuning log-linear machine translation models. Our objective explicitly optimizes the BLEU score of expected n-gram counts, the same quantities that arise in forestbased consensus and minimum Bayes risk decoding methods. Our continuous object ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We propose a novel objective function for discriminatively tuning log-linear machine translation models. Our objective explicitly optimizes the BLEU score of expected n-gram counts, the same quantities that arise in forestbased consensus and minimum Bayes risk decoding methods. Our continuous objective can be optimized using simple gradient ascent. However, computing critical quantities in the gradient necessitates a novel dynamic program, which we also present here. Assuming BLEU as an evaluation measure, our objective function has two principle advantages over standard max BLEU tuning. First, it specifically optimizes model weights for downstream consensus decoding procedures. An unexpected second benefit is that it reduces overfitting, which can improve test set BLEU scores when using standard Viterbi decoding. 1
Rule filtering by pattern for efficient hierarchical translation
- In Proceedings of the EACL
, 2009
"... We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on th ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on the number of non-terminals and the pattern, and various filtering strategies are then applied to assess the impact on translation speed and quality. Results are reported on the 2008 NIST Arabic-to-English evaluation task. 1
Stabilizing Minimum Error Rate Training
"... The most commonly used method for training feature weights in statistical machine translation (SMT) systems is Och’s minimum error rate training (MERT) procedure. A well-known problem with Och’s procedure is that it tends to be sensitive to small changes in the system, particularly when the number o ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The most commonly used method for training feature weights in statistical machine translation (SMT) systems is Och’s minimum error rate training (MERT) procedure. A well-known problem with Och’s procedure is that it tends to be sensitive to small changes in the system, particularly when the number of features is large. In this paper, we quantify the stability of Och’s procedure by supplying different random seeds to a core component of the procedure (Powell’s algorithm). We show that for systems with many features, there is extensive variation in outcomes, both on the development data and on the test data. We analyze the causes of this variation and propose modifications to the MERT procedure that improve stability while helping performance on test data. 1
Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
"... We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimalit ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimality, on over 97 % of test examples; it has comparable speed to state-of-the-art decoders. 1
Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models
"... We present Jane, RWTH’s hierarchical phrase-based translation system, which has been open sourced for the scientific community. This system has been in development at RWTH for the last two years and has been successfully applied in different machine translation evaluations. It includes extensions to ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We present Jane, RWTH’s hierarchical phrase-based translation system, which has been open sourced for the scientific community. This system has been in development at RWTH for the last two years and has been successfully applied in different machine translation evaluations. It includes extensions to the hierarchical approach developed by RWTH as well as other research institutions. In this paper we give an overview of its main features. We also introduce a novel reordering model for the hierarchical phrase-based approach which further enhances translation performance, and analyze the effect some recent extended lexicon models have on the performance of the system. 1
Asynchronous Binarization for Synchronous Grammars
"... Binarization of n-ary rules is critical for the efficiency of syntactic machine translation decoding. Because the target side of a rule will generally reorder the source side, it is complex (and sometimes impossible) to find synchronous rule binarizations. However, we show that synchronous binarizat ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Binarization of n-ary rules is critical for the efficiency of syntactic machine translation decoding. Because the target side of a rule will generally reorder the source side, it is complex (and sometimes impossible) to find synchronous rule binarizations. However, we show that synchronous binarizations are not necessary in a two-stage decoder. Instead, the grammar can be binarized one way for the parsing stage, then rebinarized in a different way for the reranking stage. Each individual binarization considers only one monolingual projection of the grammar, entirely avoiding the constraints of synchronous binarization and allowing binarizations that are separately optimized for each stage. Compared to n-ary forest reranking, even simple target-side binarization schemes improve overall decoding accuracy. 1

