Results 1 - 10
of
15
Joint Parsing and Translation
"... Tree-based translation models, which exploit the linguistic syntax of source language, usually separate decoding into two steps: parsing and translation. Although this separation makes tree-based decoding simple and efficient, its translation performance is usually limited by the number of parse tre ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Tree-based translation models, which exploit the linguistic syntax of source language, usually separate decoding into two steps: parsing and translation. Although this separation makes tree-based decoding simple and efficient, its translation performance is usually limited by the number of parse trees offered by parser. Alternatively, we propose to parse and translate jointly by casting tree-based translation as parsing. Given a source-language sentence, our joint decoder produces a parse tree on the source side and a translation on the target side simultaneously. By combining translation and parsing models in a discriminative framework, our approach significantly outperforms a forestbased tree-to-string system by 1.1 absolute BLEU points on the NIST 2005 Chinese-English test set. As a parser, our joint decoder achieves an F1 score of 80.6 % on the Penn Chinese Treebank. 1
Learning Hierarchical Translation Structure with Linguistic Annotations
"... While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language par ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by selecting and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target. 1
unknown title
"... Tree-based statistical machine translation models have made significant progress in recent years, especially when replacing 1-best trees with packed forests. However, as the parsing accuracy usually goes down dramatically with the increase of sentence length, translating long sentences often takes l ..."
Abstract
- Add to MetaCart
Tree-based statistical machine translation models have made significant progress in recent years, especially when replacing 1-best trees with packed forests. However, as the parsing accuracy usually goes down dramatically with the increase of sentence length, translating long sentences often takes long time and only produces degenerate translations. We propose a new method named subsentence division that reduces the decoding time and improves the translation quality for tree-based translation. Our approach divides long sentences into several sub-sentences by exploiting tree structures. Large-scale experiments on the NIST 2008 Chinese-to-English test set show that our approach achieves an absolute improvement of 1.1 BLEU points over the baseline system in 50 % less time. 1
Sub-Sentence Division for Tree-Based Machine Translation
"... Tree-Based statistical machine translation models in days have witness promising progress in recent years, especially when incorporated with forest. However, long sentence translation will be time consuming and lower quality, due to the lower parsing accuracy and huge forest size on long sentences. ..."
Abstract
- Add to MetaCart
Tree-Based statistical machine translation models in days have witness promising progress in recent years, especially when incorporated with forest. However, long sentence translation will be time consuming and lower quality, due to the lower parsing accuracy and huge forest size on long sentences. A simple way is to divide them by punctuation. However, simply splitting sentences based on punctuation might inevitably hurt the parsing accuracy. We propose a new method named sub-sentence division that significantly reduces the decoding complexity while concerning the structure of whole parse tree. Large-scale experiments on NIST 2008 show that our approach has achieved a 1.1 BLEU score improvement and twice times faster over baseline system, which is also 0.3 BLEU points higher than the approach of splitting by commas. 1
Dependency-Based Bracketing Transduction Grammar for Statistical Machine Translation
"... In this paper, we propose a novel dependency-based bracketing transduction grammar for statistical machine translation, which converts a source sentence into a target dependency tree. Different from conventional bracketing transduction grammar models, we encode target dependency information into our ..."
Abstract
- Add to MetaCart
In this paper, we propose a novel dependency-based bracketing transduction grammar for statistical machine translation, which converts a source sentence into a target dependency tree. Different from conventional bracketing transduction grammar models, we encode target dependency information into our lexical rules directly, and then we employ two different maximum entropy models to determine the reordering and combination of partial dependency structures, when we merge two neighboring blocks. By incorporating dependency language model further, large-scale experiments on Chinese-English task show that our system achieves significant improvements over the baseline system on various test sets even with fewer phrases. 1
Utilizing Target-Side Semantic Role Labels to Assist Hierarchical Phrase-based Machine Translation
"... In this paper we present a novel approach of utilizing Semantic Role Labeling (SRL) information to improve Hierarchical Phrasebased Machine Translation. We propose an algorithm to extract SRL-aware Synchronous Context-Free Grammar (SCFG) rules. Conventional Hiero-style SCFG rules will also be extrac ..."
Abstract
- Add to MetaCart
In this paper we present a novel approach of utilizing Semantic Role Labeling (SRL) information to improve Hierarchical Phrasebased Machine Translation. We propose an algorithm to extract SRL-aware Synchronous Context-Free Grammar (SCFG) rules. Conventional Hiero-style SCFG rules will also be extracted in the same framework. Special conversion rules are applied to ensure that when SRL-aware SCFG rules are used in derivation, the decoder only generates hypotheses with complete semantic structures. We perform machine translation experiments using 9 different Chinese-English test-sets. Our approach achieved an average BLEU score improvement of 0.49 as well as 1.21 point reduction in TER.
Binarized Forest to String Translation
"... Tree-to-string translation is syntax-aware and efficient but sensitive to parsing errors. Forestto-string translation approaches mitigate the risk of propagating parser errors into translation errors by considering a forest of alternative trees, as generated by a source language parser. We propose a ..."
Abstract
- Add to MetaCart
Tree-to-string translation is syntax-aware and efficient but sensitive to parsing errors. Forestto-string translation approaches mitigate the risk of propagating parser errors into translation errors by considering a forest of alternative trees, as generated by a source language parser. We propose an alternative approach to generating forests that is based on combining sub-trees within the first best parse through binarization. Provably, our binarization forest can cover any non-consitituent phrases in a sentence but maintains the desirable property that for each span there is at most one nonterminal so that the grammar constant for decoding is relatively small. For the purpose of reducing search errors, we apply the synchronous binarization technique to forest-tostring decoding. Combining the two techniques, we show that using a fast shift-reduce parser we can achieve significant quality gains in NIST 2008 English-to-Chinese track (1.3 BLEU points over a phrase-based system, 0.8 BLEU points over a hierarchical phrase-based system). Consistent and significant gains are also shown in WMT 2010 in the English to German, French, Spanish and Czech tracks. 1
Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
"... We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to ..."
Abstract
- Add to MetaCart
We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to the decoding process, and find the least expensive transformation operation to better model word reordering. In particular, we integrate synchronous binarizations, verb regrouping, removal of redundant parse nodes, and incorporate a few important features such as translation boundaries. We learn the structural preferences from the data in a generative framework. The syntax-based translation system integrating the proposed techniques outperforms the best Arabic-English unconstrained system in NIST-08 evaluations by 1.3 absolute BLEU, which is statistically significant. 1
Survey: Weighted Extended Top-down Tree Transducers Part II -- Application in Machine Translation
, 2011
"... In this second part of the survey, we present the application of weighted extended topdown tree transducers in machine translation, which is the automatic translation of natural language texts. We present several formal properties that are relevant in machine translation and evaluate the weighted e ..."
Abstract
- Add to MetaCart
In this second part of the survey, we present the application of weighted extended topdown tree transducers in machine translation, which is the automatic translation of natural language texts. We present several formal properties that are relevant in machine translation and evaluate the weighted extended top-down tree transducer along those criteria. In addition, we demonstrate how to extract rules for an extended top-down tree transducer from existing linguistic data and how to obtain suitable rule weights automatically from similar information. Overall, the aim of the survey is twofold. It should provide a synopsis that illustrates how theory (tree transducers) and practice (machine translation) interact on this particular example. Secondly, it presents a uniform and simplified treatment of the rule extraction and training algorithms that is accessible to the nonexpert. Additional details can be found in the original results that are referenced throughout the text.

