Results 1 - 10
of
22
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Syntactic re-alignment models for machine translation
- In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-2007
, 2007
"... We present a method for improving word alignment for statistical syntax-based machine translation that employs a syntactically informed alignment model closer to the translation model than commonly-used word alignment models. This leads to extraction of more useful linguistic patterns and improved B ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We present a method for improving word alignment for statistical syntax-based machine translation that employs a syntactically informed alignment model closer to the translation model than commonly-used word alignment models. This leads to extraction of more useful linguistic patterns and improved BLEU scores on translation experiments in Chinese and Arabic. 1 Methods of statistical MT Roughly speaking, there are two paths commonly taken in statistical machine translation (Figure 1). The idealistic path uses an unsupervised learning algorithm such as EM (Demptser et al., 1977) to learn parameters for some proposed translation model from a bitext training corpus, and then directly translates using the weighted model. Some examples of the idealistic approach are the direct IBM word model (Berger et al., 1994; Germann et al., 2001), the phrase-based approach of Marcu and Wong (2002), and the syntax approaches of Wu (1996) and Yamada and Knight (2001). Idealistic approaches are conceptually simple and thus easy to relate to observed phenomena. However, as more parameters are added to the model the idealistic approach has not scaled well, for it is increasingly difficult to incorporate large amounts of training data efficiently over an increasingly large search space. Additionally, the EM procedure has a tendency to overfit its training data when the input units have varying explanatory powers, such as variable-size phrases or variable-height trees.
Factorization of synchronous context-free grammars in linear time
- In NAACL Workshop on Syntax and Structure in Statistical Translation (SSST
, 2007
"... Factoring a Synchronous Context-Free Grammar into an equivalent grammar with a smaller number of nonterminals in each rule enables synchronous parsing algorithms of lower complexity. The problem can be formalized as searching for the tree-decomposition of a given permutation with the minimal branchi ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Factoring a Synchronous Context-Free Grammar into an equivalent grammar with a smaller number of nonterminals in each rule enables synchronous parsing algorithms of lower complexity. The problem can be formalized as searching for the tree-decomposition of a given permutation with the minimal branching factor. In this paper, by modifying the algorithm of Uno and Yagiura (2000) for the closely related problem of finding all common intervals of two permutations, we achieve a linear time algorithm for the permutation factorization problem. We also use the algorithm to analyze the maximum SCFG rule length needed to cover hand-aligned data from various language pairs. 1
Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation
"... Language Weaver, Inc. This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a stat ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Language Weaver, Inc. This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-theart syntax MT system:re-structuring changes the syntactic structure of training parse trees to enable reuse of substructures; re-labeling alters bracket labels to enrich rule application context; and re-aligning unifies word alignment across sentences to remove bad word alignments and refine good ones. Better structures, labels, and word alignments are learned by the EM algorithm. We show that each individual technique leads to improvement as measured by BLEU, and we also show that the greatest improvement is achieved by combining them. We report an overall 1.48 BLEU improvement on the NIST08 evaluation set over a strong baseline in Chinese/English translation. 1. Background Syntactic methods have recently proven useful in statistical machine translation (SMT). In this article, we explore different ways of exploiting the structure of bilingual material for syntax-based SMT. In particular, we ask what kinds of tree structures, tree labels, and word alignments are best suited for improving end-to-end translation accuracy. We begin with structures from standard parsing and alignment tools, then use the EM algorithm to revise these structures in light of the translation task. We report an overall +1.48 BLEU improvement on a standard Chinese-to-English test.
Efficient Parsing for Transducer Grammars
"... The tree-transducer grammars that arise in current syntactic machine translation systems are large, flat, and highly lexicalized. We address the problem of parsing efficiently with such grammars in three ways. First, we present a pair of grammar transformations that admit an efficient cubic-time CKY ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The tree-transducer grammars that arise in current syntactic machine translation systems are large, flat, and highly lexicalized. We address the problem of parsing efficiently with such grammars in three ways. First, we present a pair of grammar transformations that admit an efficient cubic-time CKY-style parsing algorithm despite leaving most of the grammar in n-ary form. Second, we show how the number of intermediate symbols generated by this transformation can be substantially reduced through binarization choices. Finally, we describe a two-pass coarse-to-fine parsing approach that prunes the search space using predictions from a subset of the original grammar. In all, parsing time reduces by 81%. We also describe a coarse-to-fine pruning scheme for forest-based language model reranking that allows a 100-fold increase in beam size while reducing decoding time. The resulting translations improve by 1.3 BLEU. 1
Rule filtering by pattern for efficient hierarchical translation
- In Proceedings of the EACL
, 2009
"... We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on th ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on the number of non-terminals and the pattern, and various filtering strategies are then applied to assess the impact on translation speed and quality. Results are reported on the 2008 NIST Arabic-to-English evaluation task. 1
Inversion transduction grammar coverage of arabic-english word alignment for tree-structured statistical machine translation
- In Proceedings of the IEEE/ACL Workshop on Spoken Language Technology
, 2006
"... We present the first known direct measurement of word alignment coverage on an Arabic-English parallel corpus using inversion transduction grammar constraints. While direct measurements have been reported for several European and Asian languages, to date no results have been available for Arabic or ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present the first known direct measurement of word alignment coverage on an Arabic-English parallel corpus using inversion transduction grammar constraints. While direct measurements have been reported for several European and Asian languages, to date no results have been available for Arabic or any Semitic language despite much recent activity on Arabic-English spoken language and text translation. Many recent syntax based statistical MT models operate within the domain of ITG expressiveness, often for efficiency reasons, so it has become important to determine the extent to which the ITG constraint assumption holds. Our results on Arabic provide further evidence that ITG expressiveness appears largely sufficient for core MT models.
Parsing and Translation Algorithms Based on Weighted Extended Tree Transducers
"... This paper proposes a uniform framework for the development of parsing and translation algorithms for weighted extended (top-down) tree transducers and input strings. The asymptotic time complexity of these algorithms can be improved in practice by exploiting an algorithm for rule factorization in t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper proposes a uniform framework for the development of parsing and translation algorithms for weighted extended (top-down) tree transducers and input strings. The asymptotic time complexity of these algorithms can be improved in practice by exploiting an algorithm for rule factorization in the above transducers. 1
Phrase Translation Probabilities with ITG Priors and Smoothing as Learning Objective
"... The conditional phrase translation probabilities constitute the principal components of phrase-based machine translation systems. These probabilities are estimated using a heuristic method that does not seem to optimize any reasonable objective function of the word-aligned, parallel training corpus. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The conditional phrase translation probabilities constitute the principal components of phrase-based machine translation systems. These probabilities are estimated using a heuristic method that does not seem to optimize any reasonable objective function of the word-aligned, parallel training corpus. Earlier efforts on devising a better understood estimator either do not scale to reasonably sized training data, or lead to deteriorating performance. In this paper we explore a new approach based on three ingredients (1) A generative model with a prior over latent segmentations derived from Inversion Transduction Grammar (ITG), (2) A phrase table containing all phrase pairs without length limit, and (3) Smoothing as learning objective using a novel Maximum-A-Posteriori version of Deleted Estimation working with Expectation-Maximization. Where others conclude that latent segmentations lead to overfitting and deteriorating performance, we show here that these three ingredients give performance equivalent to the heuristic method on reasonably sized training data.
Enumeration of Factorizable Multi-Dimensional Permutations
"... A d-dimensional permutation is a sequence of d + 1 permutations with the leading element being the identity permutation. It can be viewed as an alignment structure across d+1 sequences, or visualized as the result of permuting n hypercubes of (d+1) dimensions. We study the hierarchical decomposition ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A d-dimensional permutation is a sequence of d + 1 permutations with the leading element being the identity permutation. It can be viewed as an alignment structure across d+1 sequences, or visualized as the result of permuting n hypercubes of (d+1) dimensions. We study the hierarchical decomposition of d-dimensional permutations. We show that when d ≥ 2, the ratio between non-decomposable or simple d-permutations and all d-permutations approaches 1. We also prove that the growth rate of the number of d-permutations that can be factorized into k-ary branching trees approaches � � k d e as k grows. 1

