Results 1 - 10
of
16
Synchronous binarization for machine translation
- In Proc. HLT-NAACL
, 2006
"... Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary re-orderings between the two langu ..."
Abstract
-
Cited by 27 (10 self)
- Add to MetaCart
Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary re-orderings between the two languages, and rules extracted from parallel corpora can be quite large. We devise a linear-time algorithm for factoring syntactic re-orderings by binarizing synchronous rules when possible and show that the resulting rule set significantly improves the speed and accuracy of a state-of-the-art syntax-based machine translation system. 1
Empirical lower bounds on the complexity of translational equivalence
- In Proceedings of ACL 2006
, 2006
"... This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helpe ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helped to improve statistical translation models, including finitestate phrase-based models, tree-to-string models, and tree-to-tree models. The paper also presents evidence that inversion transduction grammars cannot generate some translational equivalence relations, even in relatively simple real bitexts in syntactically similar languages with rigid word order. Instructions for replicating our experiments are at
Factoring synchronous grammars by sorting
- In Proceedings of the International Conference on Computational Linguistics and the Association for Computational Linguistics (COLING/ACL-06
, 2006
"... Synchronous Context-Free Grammars (SCFGs) have been successfully exploited as translation models in machine translation applications. When parsing with an SCFG, computational complexity grows exponentially with the length of the rules, in the worst case. In this paper we examine the problem of facto ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Synchronous Context-Free Grammars (SCFGs) have been successfully exploited as translation models in machine translation applications. When parsing with an SCFG, computational complexity grows exponentially with the length of the rules, in the worst case. In this paper we examine the problem of factorizing each rule of an input SCFG to a generatively equivalent set of rules, each having the smallest possible length. Our algorithm works in time O(n log n), for each rule of length n. This improves upon previous results and solves an open problem about recognizing permutations that can be factored. 1
An Introduction to Synchronous Grammars
, 2006
"... Synchronous context-free grammars are a generalization of context-free grammars (CFGs) that generate ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Synchronous context-free grammars are a generalization of context-free grammars (CFGs) that generate
Extracting synchronous grammar rules from word-level alignments in linear time
- In Proceedings of the 22nd International Conference on Computational Linguistics (COLING-08
, 2008
"... We generalize Uno and Yagiura’s algorithm for finding all common intervals of two permutations to the setting of two sequences with many-to-many alignment links across the two sides. We show how to maximally decompose a word-aligned sentence pair in linear time, which can be used to generate all pos ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We generalize Uno and Yagiura’s algorithm for finding all common intervals of two permutations to the setting of two sequences with many-to-many alignment links across the two sides. We show how to maximally decompose a word-aligned sentence pair in linear time, which can be used to generate all possible phrase pairs or a Synchronous Context-Free Grammar (SCFG) with the simplest rules possible. We also use the algorithm to precisely analyze the maximum SCFG rule length needed to cover hand-aligned data from various language pairs. 1
Parsing and Translation Algorithms Based on Weighted Extended Tree Transducers
"... This paper proposes a uniform framework for the development of parsing and translation algorithms for weighted extended (top-down) tree transducers and input strings. The asymptotic time complexity of these algorithms can be improved in practice by exploiting an algorithm for rule factorization in t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper proposes a uniform framework for the development of parsing and translation algorithms for weighted extended (top-down) tree transducers and input strings. The asymptotic time complexity of these algorithms can be improved in practice by exploiting an algorithm for rule factorization in the above transducers. 1
Two monolingual parses are better than one (synchronous parse
- In Proc. of HLT-NAACL
, 2010
"... We describe a synchronous parsing algorithm that is based on two successive monolingual parses of an input sentence pair. Although the worst-case complexity of this algorithm is and must be O(n6) for binary SCFGs, its average-case run-time is far better. We demonstrate that for a number of common sy ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We describe a synchronous parsing algorithm that is based on two successive monolingual parses of an input sentence pair. Although the worst-case complexity of this algorithm is and must be O(n6) for binary SCFGs, its average-case run-time is far better. We demonstrate that for a number of common synchronous parsing problems, the two-parse algorithm substantially outperforms alternative synchronous parsing strategies, making it efficient enough to be utilized without resorting to a pruned search. 1
2006b. Empirical lower bounds on the complexity of translational equivalence
- In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (ACL
"... This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helpe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helped to improve statistical translation models, including finitestate phrase-based models, tree-to-string models, and tree-to-tree models. The paper also presents evidence that inversion transduction grammars cannot generate some translational equivalence relations, even in relatively simple real bitexts in syntactically similar languages with rigid word order. Instructions for replicating our experiments are at
Enumeration of Factorizable Multi-Dimensional Permutations
"... A d-dimensional permutation is a sequence of d + 1 permutations with the leading element being the identity permutation. It can be viewed as an alignment structure across d+1 sequences, or visualized as the result of permuting n hypercubes of (d+1) dimensions. We study the hierarchical decomposition ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A d-dimensional permutation is a sequence of d + 1 permutations with the leading element being the identity permutation. It can be viewed as an alignment structure across d+1 sequences, or visualized as the result of permuting n hypercubes of (d+1) dimensions. We study the hierarchical decomposition of d-dimensional permutations. We show that when d ≥ 2, the ratio between non-decomposable or simple d-permutations and all d-permutations approaches 1. We also prove that the growth rate of the number of d-permutations that can be factorized into k-ary branching trees approaches � � k d e as k grows. 1
Binarization, Synchronous Binarization, and Target-side Binarization ∗
"... Binarization is essential for achieving polynomial time complexities in parsing and syntax-based machine translation. This paper presents a new binarization scheme, target-side binarization, and compares it with source-side and synchronous binarizations on both stringbased and tree-based systems usi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Binarization is essential for achieving polynomial time complexities in parsing and syntax-based machine translation. This paper presents a new binarization scheme, target-side binarization, and compares it with source-side and synchronous binarizations on both stringbased and tree-based systems using synchronous grammars. In particular, we demonstrate the effectiveness of targetside binarization on a large-scale tree-tostring translation system. 1

