Results 1 - 10
of
24
Learning to translate with source and target syntax
- In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10
"... Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of vary-ing amounts of information from linguis-tic theory: some use none at all, some use information about the grammar of the tar-get langu ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of vary-ing amounts of information from linguis-tic theory: some use none at all, some use information about the grammar of the tar-get language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the rela-tionship between the grammars of both the source and target language. We dis-cuss the reasons why this has been a chal-lenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim-ple approach that uses both source and tar-get syntax for significant improvements in translation accuracy. 1
Unsupervised syntactic alignment with inversion transduction grammars
- In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
, 2010
"... Abstract Syntactic machine translation systems currently use word alignments to infer syntactic correspondences between the source and target languages. Instead, we propose an unsupervised ITG alignment model that directly aligns syntactic structures. Our model aligns spans in a source sentence to ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Abstract Syntactic machine translation systems currently use word alignments to infer syntactic correspondences between the source and target languages. Instead, we propose an unsupervised ITG alignment model that directly aligns syntactic structures. Our model aligns spans in a source sentence to nodes in a target parse tree. We show that our model produces syntactically consistent analyses where possible, while being robust in the face of syntactic divergence. Alignment quality and end-to-end translation experiments demonstrate that this consistency yields higher quality alignments than our baseline.
Joint Parsing and Translation
"... Tree-based translation models, which exploit the linguistic syntax of source language, usually separate decoding into two steps: parsing and translation. Although this separation makes tree-based decoding simple and efficient, its translation performance is usually limited by the number of parse tre ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
Tree-based translation models, which exploit the linguistic syntax of source language, usually separate decoding into two steps: parsing and translation. Although this separation makes tree-based decoding simple and efficient, its translation performance is usually limited by the number of parse trees offered by parser. Alternatively, we propose to parse and translate jointly by casting tree-based translation as parsing. Given a source-language sentence, our joint decoder produces a parse tree on the source side and a translation on the target side simultaneously. By combining translation and parsing models in a discriminative framework, our approach significantly outperforms a forestbased tree-to-string system by 1.1 absolute BLEU points on the NIST 2005 Chinese-English test set. As a parser, our joint decoder achieves an F1 score of 80.6 % on the Penn Chinese Treebank. 1
Constituency to Dependency Translation with Forests
"... Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside synt ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts. 1
Shallow Local Multi Bottom-up Tree Transducers in Statistical Machine Translation
"... We present a new translation model integrating the shallow local multi bottomup tree transducer. We perform a largescale empirical evaluation of our obtained system, which demonstrates that we significantly beat a realistic tree-to-tree baseline on the WMT 2009 English → German translation task. As ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
We present a new translation model integrating the shallow local multi bottomup tree transducer. We perform a largescale empirical evaluation of our obtained system, which demonstrates that we significantly beat a realistic tree-to-tree baseline on the WMT 2009 English → German translation task. As an additional contribution we make the developed software and complete tool-chain publicly available for further experimentation. 1
Improving Syntax-Augmented Machine Translation by Coarsening the Label Set
"... We present a new variant of the Syntax-Augmented Machine Translation (SAMT) formalism with a category-coarsening algorithm originally developed for tree-to-tree grammars. We induce bilingual labels into the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
We present a new variant of the Syntax-Augmented Machine Translation (SAMT) formalism with a category-coarsening algorithm originally developed for tree-to-tree grammars. We induce bilingual labels into the
Adjoining Tree-to-String Translation
"... We introduce synchronous tree adjoining grammars (TAG) into tree-to-string translation, which converts a source tree to a target string. Without reconstructing TAG derivations explicitly, our rule extraction algorithm directly learns tree-to-string rules from aligned Treebank-style trees. As tree-to ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
We introduce synchronous tree adjoining grammars (TAG) into tree-to-string translation, which converts a source tree to a target string. Without reconstructing TAG derivations explicitly, our rule extraction algorithm directly learns tree-to-string rules from aligned Treebank-style trees. As tree-to-string translation casts decoding as a tree parsing problem rather than parsing, the decoder still runs fast when adjoining is included. Less than 2 times slower, the adjoining tree-tostring system improves translation quality by +0.7 BLEU over the baseline system only allowing for tree substitution on NIST Chinese-English test sets. 1
Joshua 3.0: Syntax-based Machine Translation with the Thrax Grammar Extractor
"... We present progress on Joshua, an opensource decoder for hierarchical and syntaxbased machine translation. The main focus is describing Thrax, a flexible, open source synchronous context-free grammar extractor. Thrax extracts both hierarchical (Chiang, 2007) and syntax-augmented machine translation ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
We present progress on Joshua, an opensource decoder for hierarchical and syntaxbased machine translation. The main focus is describing Thrax, a flexible, open source synchronous context-free grammar extractor. Thrax extracts both hierarchical (Chiang, 2007) and syntax-augmented machine translation (Zollmann and Venugopal, 2006) grammars. It is built on Apache Hadoop for efficient distributed performance, and can easily be extended with support for new grammars, feature functions, and output formats. 1
Effective Use of Function Words for Rule Generalization in Forest-Based Translation
"... In the present paper, we propose the effective usage of function words to generate generalized translation rules for forest-based translation. Given aligned forest-string pairs, we extract composed tree-to-string translation rules that account for multiple interpretations of both aligned and unalign ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
In the present paper, we propose the effective usage of function words to generate generalized translation rules for forest-based translation. Given aligned forest-string pairs, we extract composed tree-to-string translation rules that account for multiple interpretations of both aligned and unaligned target function words. In order to constrain the exhaustive attachments of function words, we limit to bind them to the nearby syntactic chunks yielded by a target dependency parser. Therefore, the proposed approach can not only capture source-tree-to-target-chunk correspondences but can also use forest structures that compactly encode an exponential number of parse trees to properly generate target function words during decoding. Extensive experiments involving large-scale English-to-Japanese translation revealed a significant improvement of 1.8 points in BLEU score, as compared with a strong forest-to-string baseline system. 1
Machine Translation with Lattices and Forests
"... Traditional 1-best translation pipelines suffer a major drawback: the errors of 1-best outputs, inevitably introduced by each module, will propagate and accumulate along the pipeline. In order to alleviate this problem, we use compact structures, lattice and forest, in each module instead of 1-best ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Traditional 1-best translation pipelines suffer a major drawback: the errors of 1-best outputs, inevitably introduced by each module, will propagate and accumulate along the pipeline. In order to alleviate this problem, we use compact structures, lattice and forest, in each module instead of 1-best results. We integrate both lattice and forest into a single tree-to-string system, and explore the algorithms of lattice parsing, lattice-forest-based rule extraction and decoding. More importantly, our model takes into account all the probabilities of different steps, such as segmentation, parsing, and translation. The main advantage of our model is that we can make global decision to search for the best segmentation, parse-tree and translation in one step. Medium-scale experiments show an improvement of +0.9 BLEU points over a state-of-the-art forest-based baseline. 1