Results 1 -
9 of
9
Syntactic Constraints on Paraphrases Extracted from Parallel Corpora
"... ccb cs jhu edu We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alo ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
ccb cs jhu edu We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alongside bilingual phrase pairs. In order to retain broad coverage of non-constituent phrases, complex syntactic labels are introduced. A manual evaluation indicates a 19% absolute improvement in paraphrase quality over the baseline method. 1
The power of extended top-down tree transducers
- SIAM J. COMPUT
, 2008
"... Unfortunately, the class of transformations computed by linear extended top-down tree transducers with regular look-ahead is not closed under composition. It is shown that the class of transformations computed by certain linear bimorphisms coincides with the previously mentioned class. Moreover, it ..."
Abstract
-
Cited by 13 (11 self)
- Add to MetaCart
Unfortunately, the class of transformations computed by linear extended top-down tree transducers with regular look-ahead is not closed under composition. It is shown that the class of transformations computed by certain linear bimorphisms coincides with the previously mentioned class. Moreover, it is demonstrated that every linear epsilon-free extended top-down tree transducer with regular look-ahead can be implemented by a linear multi bottom-up tree transducer. The class of transformations computed by the latter device is shown to be closed under composition, and to be included in the composition of the class of transformations computed by top-down tree transducers with itself. More precisely, it constitutes the composition closure of the class of transformations computed by nite-copying top-down tree transducers.
Linguistically Annotated BTG for Statistical Machine Translation
"... Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys linguistic knowledge of source-side syntax structures t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys linguistic knowledge of source-side syntax structures to BTG hierarchical structures through linguistic annotation. From the linguistically annotated data, we learn annotated BTG rules and train linguistically motivated phrase translation model and reordering model. We also present an annotation algorithm that captures syntactic information for BTG nodes. The experiments show that the LABTG approach significantly outperforms a baseline BTGbased system and a state-of-the-art phrasebased system on the NIST MT-05 Chineseto-English translation task. Moreover, we empirically demonstrate that the proposed method achieves better translation selection and phrase reordering. 1
Inductive Detection of Language Features via Clustering Minimal Pairs: Toward Feature-Rich Grammars in Machine Translation
"... Syntax-based Machine Translation systems have recently become a focus of research with much hope that they will outperform traditional Phrase-Based Statistical Machine Translation (PBSMT). Toward this goal, we present a method for analyzing the morphosyntactic content of language from an Elicitation ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Syntax-based Machine Translation systems have recently become a focus of research with much hope that they will outperform traditional Phrase-Based Statistical Machine Translation (PBSMT). Toward this goal, we present a method for analyzing the morphosyntactic content of language from an Elicitation Corpus such as the one being including in the LDC’s LCTL language packs. The presented method discovers a mapping between morphemes and linguistically relevant features. By providing this tool with which structure-based models of MT can be augmented with these rich features, we believe the discriminative power of current models can be improved. We conclude by outlining how the resulting output can then be used in inducing a morphosyntactically feature-rich grammar for AVENUE, a modern syntaxbased MT system. 1
Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
"... We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to ..."
Abstract
- Add to MetaCart
We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to the decoding process, and find the least expensive transformation operation to better model word reordering. In particular, we integrate synchronous binarizations, verb regrouping, removal of redundant parse nodes, and incorporate a few important features such as translation boundaries. We learn the structural preferences from the data in a generative framework. The syntax-based translation system integrating the proposed techniques outperforms the best Arabic-English unconstrained system in NIST-08 evaluations by 1.3 absolute BLEU, which is statistically significant. 1
Preference Grammars and Decoding Algorithms for Probabilistic Synchronous Context Free Grammar Based Translation.
"... Probabilistic Synchronous Context-free Grammars (PSCFGs) [Aho and Ullmann, 1969, Wu, 1996] define weighted transduction rules to represent translation and reordering operations. When translation models use features that are defined locally, on each rule, there are efficient dynamic programming algor ..."
Abstract
- Add to MetaCart
Probabilistic Synchronous Context-free Grammars (PSCFGs) [Aho and Ullmann, 1969, Wu, 1996] define weighted transduction rules to represent translation and reordering operations. When translation models use features that are defined locally, on each rule, there are efficient dynamic programming algorithms to perform translation with these grammars [Kasami, 1965]. In general, the integration of non-local features into the translation model can make translation NP-hard, requiring decoding approximations that limit the impact of these features. In this thesis, we consider the impact and interaction between two non-local features, the n-gram language model (LM) and labels on rule nonterminal symbols in the Syntax-Augmented MT (SAMT) grammar [Zollmann and Venugopal, 2006]. While these features do not result in NP-hard search, they would lead to serious increases in wall-clock runtime if naïve dynamic programming methods are applied. We develop novel two-pass algorithms that make strong decoding approximations during a first pass search, generating a hypergraph of sentence spanning translation i derivations. In a second pass, we use knowledge about non-local features to explore
Chris Callison-Burch and
"... This paper describes the resource- and system-building efforts of an eight-week Johns Hopkins ..."
Abstract
- Add to MetaCart
This paper describes the resource- and system-building efforts of an eight-week Johns Hopkins
SCFG Latent Annotation for Machine Translation
"... We discuss learning latent annotations for synchronous context-free grammars (SCFG) for the purpose of improving machine translation. We show that learning annotations for nonterminals results in not only more accurate translation, but also faster SCFG decoding. 1. ..."
Abstract
- Add to MetaCart
We discuss learning latent annotations for synchronous context-free grammars (SCFG) for the purpose of improving machine translation. We show that learning annotations for nonterminals results in not only more accurate translation, but also faster SCFG decoding. 1.

