Results 1  10
of
27
Synchronous binarization for machine translation
 In Proc. HLTNAACL
, 2006
"... Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary reorderings between the two langu ..."
Abstract

Cited by 52 (11 self)
 Add to MetaCart
(Show Context)
Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary reorderings between the two languages, and rules extracted from parallel corpora can be quite large. We devise a lineartime algorithm for factoring syntactic reorderings by binarizing synchronous rules when possible and show that the resulting rule set significantly improves the speed and accuracy of a stateoftheart syntaxbased machine translation system. 1
Empirical lower bounds on the complexity of translational equivalence
 In Proceedings of ACL 2006
, 2006
"... This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helpe ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
(Show Context)
This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helped to improve statistical translation models, including finitestate phrasebased models, treetostring models, and treetotree models. The paper also presents evidence that inversion transduction grammars cannot generate some translational equivalence relations, even in relatively simple real bitexts in syntactically similar languages with rigid word order. Instructions for replicating our experiments are at
An Introduction to Synchronous Grammars
, 2006
"... Synchronous contextfree grammars are a generalization of contextfree grammars (CFGs) that generate ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Synchronous contextfree grammars are a generalization of contextfree grammars (CFGs) that generate
Extracting synchronous grammar rules from wordlevel alignments in linear time
 In Proceedings of the 22nd International Conference on Computational Linguistics (COLING08
, 2008
"... We generalize Uno and Yagiura’s algorithm for finding all common intervals of two permutations to the setting of two sequences with manytomany alignment links across the two sides. We show how to maximally decompose a wordaligned sentence pair in linear time, which can be used to generate all pos ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
We generalize Uno and Yagiura’s algorithm for finding all common intervals of two permutations to the setting of two sequences with manytomany alignment links across the two sides. We show how to maximally decompose a wordaligned sentence pair in linear time, which can be used to generate all possible phrase pairs or a Synchronous ContextFree Grammar (SCFG) with the simplest rules possible. We also use the algorithm to precisely analyze the maximum SCFG rule length needed to cover handaligned data from various language pairs. 1
Factoring synchronous grammars by sorting
 In Proceedings of the International Conference on Computational Linguistics and the Association for Computational Linguistics (COLING/ACL06
, 2006
"... Synchronous ContextFree Grammars (SCFGs) have been successfully exploited as translation models in machine translation applications. When parsing with an SCFG, computational complexity grows exponentially with the length of the rules, in the worst case. In this paper we examine the problem of facto ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
(Show Context)
Synchronous ContextFree Grammars (SCFGs) have been successfully exploited as translation models in machine translation applications. When parsing with an SCFG, computational complexity grows exponentially with the length of the rules, in the worst case. In this paper we examine the problem of factorizing each rule of an input SCFG to a generatively equivalent set of rules, each having the smallest possible length. Our algorithm works in time O(n log n), for each rule of length n. This improves upon previous results and solves an open problem about recognizing permutations that can be factored. 1
Prior derivation models for formally syntaxbased translation using linguistically syntactic parsing and tree kernels
 In Proceedings of the ACL’08: HLT SSST2
, 2008
"... This paper presents an improved formally syntaxbased SMT model, which is enriched by linguistically syntactic knowledge obtained from statistical constituent parsers. We propose a linguisticallymotivated prior derivation model to score hypothesis derivations on top of the baseline model during the ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
This paper presents an improved formally syntaxbased SMT model, which is enriched by linguistically syntactic knowledge obtained from statistical constituent parsers. We propose a linguisticallymotivated prior derivation model to score hypothesis derivations on top of the baseline model during the translation decoding. Moreover, we devise a fast training algorithm to achieve such improved models based on tree kernel methods. Experiments on an EnglishtoChinese task demonstrate that our proposed models outperformed the baseline formally syntaxbased models, while both of them achieved
Two monolingual parses are better than one (synchronous parse
 In Proc. of HLTNAACL
, 2010
"... We describe a synchronous parsing algorithm that is based on two successive monolingual parses of an input sentence pair. Although the worstcase complexity of this algorithm is and must be O(n6) for binary SCFGs, its averagecase runtime is far better. We demonstrate that for a number of common sy ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
We describe a synchronous parsing algorithm that is based on two successive monolingual parses of an input sentence pair. Although the worstcase complexity of this algorithm is and must be O(n6) for binary SCFGs, its averagecase runtime is far better. We demonstrate that for a number of common synchronous parsing problems, the twoparse algorithm substantially outperforms alternative synchronous parsing strategies, making it efficient enough to be utilized without resorting to a pruned search. 1
Empirical lower bounds on alignment error rates in syntaxbased machine translation
 In SSST ’09
, 2009
"... The empirical adequacy of synchronous contextfree grammars of rank two (2SCFGs) (Satta and Peserico, 2005), used in syntaxbased machine translation systems such as Wu (1997), Zhang et al. (2006) and Chiang (2007), in terms of what alignments they induce, has been discussed in Wu (1997) and Welli ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
The empirical adequacy of synchronous contextfree grammars of rank two (2SCFGs) (Satta and Peserico, 2005), used in syntaxbased machine translation systems such as Wu (1997), Zhang et al. (2006) and Chiang (2007), in terms of what alignments they induce, has been discussed in Wu (1997) and Wellington et al. (2006), but with a onesided focus on socalled “insideout alignments”. Other alignment configurations that cannot be induced by 2SCFGs are identified in this paper, and their frequencies across a wide collection of handaligned parallel corpora are examined. Empirical lower bounds on two measures of alignment error rate, i.e. the one introduced in Och and Ney (2000) and one where only complete translation units are considered, are derived for 2SCFGs and related formalisms. 1
Better synchronous binarization for machine translation
 In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
, 2009
"... Binarization of Synchronous Context Free Grammars (SCFG) is essential for achieving polynomial time complexity of decoding for SCFG parsing based machine translation systems. In this paper, we first investigate the excess edge competition issue caused by a leftheavy binary SCFG derived with the me ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Binarization of Synchronous Context Free Grammars (SCFG) is essential for achieving polynomial time complexity of decoding for SCFG parsing based machine translation systems. In this paper, we first investigate the excess edge competition issue caused by a leftheavy binary SCFG derived with the method of Zhang et al. (2006). Then we propose a new binarization method to mitigate the problem by exploring other alternative equivalent binary SCFGs. We present an algorithm that iteratively improves the resulting binary SCFG, and empirically show that our method can improve a stringtotree statistical machine translations system based on the synchronous binarization method in Zhang et al. (2006) on the NIST machine translation evaluation tasks. 1
Worstcase synchronous grammar rules
 In Proceedings of the 2007 Meeting of the North American chapter of the Association for Computational Linguistics (NAACL07
, 2007
"... We relate the problem of finding the best application of a Synchronous ContextFree Grammar (SCFG) rule during parsing to a Markov Random Field. This representation allows us to use the theory of expander graphs to show that the complexity of SCFG parsing of an input sentence of length N is Ω(Ncn), ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
We relate the problem of finding the best application of a Synchronous ContextFree Grammar (SCFG) rule during parsing to a Markov Random Field. This representation allows us to use the theory of expander graphs to show that the complexity of SCFG parsing of an input sentence of length N is Ω(Ncn), for a grammar with maximum rule length n and some constant c. This improves on the previous best result of Ω(N c √ n 1