Results 1 - 10
of
19
Chinese Syntactic Reordering for Statistical Machine Translation
- In Proceedings of EMNLP
, 2007
"... Syntactic reordering approaches are an effective method for handling word-order differences between source and target languages in statistical machine translation (SMT) systems. This paper introduces a reordering approach for translation from Chinese to English. We describe a set of syntactic reorde ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
Syntactic reordering approaches are an effective method for handling word-order differences between source and target languages in statistical machine translation (SMT) systems. This paper introduces a reordering approach for translation from Chinese to English. We describe a set of syntactic reordering rules that exploit systematic differences between Chinese and English word order. The resulting system is used as a preprocessor for both training and test sentences, transforming Chinese sentences to be much closer to English in terms of their word order. We evaluated the reordering approach within the MOSES phrase-based SMT system (Koehn et al., 2007). The reordering approach improved the BLEU score for the MOSES system from 28.52 to 30.86 on the NIST 2006 evaluation data. We also conducted a series of experiments to analyze the accuracy and impact of different types of reordering rules. 1
Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation
- In Proc. of COLING-ACL
, 2006
"... We propose a novel reordering model for phrase-based statistical machine translation (SMT) that uses a maximum entropy (MaxEnt) model to predicate reorderings of neighbor blocks (phrase pairs). The model provides content-dependent, hierarchical phrasal reordering with generalization based on feature ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
We propose a novel reordering model for phrase-based statistical machine translation (SMT) that uses a maximum entropy (MaxEnt) model to predicate reorderings of neighbor blocks (phrase pairs). The model provides content-dependent, hierarchical phrasal reordering with generalization based on features automatically learned from a real-world bitext. We present an algorithm to extract all reordering events of neighbor blocks from bilingual data. In our experiments on Chineseto-English translation, this MaxEnt-based reordering model obtains significant improvements in BLEU score on the NIST MT-05 and IWSLT-04 tasks. 1
A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation
"... Inspired by previous preprocessing approaches to SMT, this paper proposes a novel, probabilistic approach to reordering which combines the merits of syntax and phrase-based SMT. Given a source sentence and its parse tree, our method generates, by tree operations, an n-best list of reordered inputs, ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Inspired by previous preprocessing approaches to SMT, this paper proposes a novel, probabilistic approach to reordering which combines the merits of syntax and phrase-based SMT. Given a source sentence and its parse tree, our method generates, by tree operations, an n-best list of reordered inputs, which are then fed to standard phrase-based decoder to produce the optimal translation. Experiments show that, for the NIST MT-05 task of Chinese-to-English translation, the proposal leads to BLEU improvement of 1.56%. 1
Regularization and Search for Minimum Error Rate Training
"... Minimum error rate training (MERT) is a widely used learning procedure for statistical machine translation models. We contrast three search strategies for MERT: Powell’s method, the variant of coordinate descent found in the Moses MERT utility, and a novel stochastic method. It is shown that the sto ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Minimum error rate training (MERT) is a widely used learning procedure for statistical machine translation models. We contrast three search strategies for MERT: Powell’s method, the variant of coordinate descent found in the Moses MERT utility, and a novel stochastic method. It is shown that the stochastic method obtains test set gains of +0.98 BLEU on MT03 and +0.61 BLEU on MT05. We also present a method for regularizing the MERT objective that achieves statistically significant gains when combined with both Powell’s method and coordinate descent. 1
Segment Choice Models: Feature-Rich Models for Global Distortion in Statistical Machine Translation (accepted for publication in HLT-NAACL conference, to be held
- In HLT-NAACL
, 2006
"... This paper presents a new approach to distortion (phrase reordering) in phrasebased machine translation (MT). Distortion is modeled as a sequence of choices during translation. The approach yields trainable, probabilistic distortion models that are global: they assign a probability to each possible ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper presents a new approach to distortion (phrase reordering) in phrasebased machine translation (MT). Distortion is modeled as a sequence of choices during translation. The approach yields trainable, probabilistic distortion models that are global: they assign a probability to each possible phrase reordering. These “segment choice ” models (SCMs) can be trained on “segment-aligned ” sentence pairs; they can be applied during decoding or rescoring. The approach yields a metric called “distortion perplexity ” (“disperp”) for comparing SCMs offline on test data, analogous to perplexity for language models. A decision-tree-based SCM is tested on Chinese-to-English translation, and outperforms a baseline distortion penalty approach at the 99 % confidence level. 1
Large-scale statistical machine translation with weighted finite state transducers
- In Post Proceedings of the 7th International Workshop on Finite-State Methods and Natural Language Processing, FSMNLP 2008
, 2009
"... statistical machine translation system follows a generative model of translation and is implemented by the composition of component models of translation and movement realised as Weighted Finite State Transducers. Our flexible architecture requires no special purpose decoder and readily handles the ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
statistical machine translation system follows a generative model of translation and is implemented by the composition of component models of translation and movement realised as Weighted Finite State Transducers. Our flexible architecture requires no special purpose decoder and readily handles the large-scale natural language processing demands of state-of-the-art machine translation systems. In this paper we describe the CUED system’s participation in the NIST 2008 Arabic-English machine translation evaluation task. Key words: Statistical machine translation, weighted finite state transducers, large-scale natural language processing, finite state grammars. 1
A block bigram prediction model for statistical machine translation
- ACM Transactions Speech Language Processing
, 2007
"... In this paper, we present a novel training method for a localized phrase-based prediction model for statistical machine translation (SMT). The model predicts block neighbors to carry out a phrasebased translation that explicitly handles local phrase re-ordering. We use a maximum likelihood criterion ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, we present a novel training method for a localized phrase-based prediction model for statistical machine translation (SMT). The model predicts block neighbors to carry out a phrasebased translation that explicitly handles local phrase re-ordering. We use a maximum likelihood criterion to train a log-linear block bigram model which uses real-valued features (e.g. a language model score) as well as binary features based on the block identities themselves (e.g. block bigram features). The model training relies on an efficient enumeration of local block neighbors in parallel training data. A novel stochastic gradient descent (SGD) training algorithm is presented that can easily handle millions of features. Moreover, when viewing SMT as a block generation process, it becomes quite similar to sequential natural language annotation problems such as part-of-speech tagging, phrase chunking, or shallow parsing. The novel approach is successfully tested on a standard Arabic-English translation task using two different phrase re-ordering models: a block orientation model and a phrase-distortion model. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing—statistical machine translation; G.3 [Probability and Statistics]: Statistical computing— stochastic gradient descent
Accurate Non-Hierarchical Phrase-Based Translation
"... A principal weakness of conventional (i.e., non-hierarchical) phrase-based statistical machine translation is that it can only exploit continuous phrases. In this paper, we extend phrase-based decoding to allow both source and target phrasal discontinuities, which provide better generalization on un ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A principal weakness of conventional (i.e., non-hierarchical) phrase-based statistical machine translation is that it can only exploit continuous phrases. In this paper, we extend phrase-based decoding to allow both source and target phrasal discontinuities, which provide better generalization on unseen data and yield significant improvements to a standard phrase-based system (Moses). More interestingly, our discontinuous phrasebased system also outperforms a state-of-the-art hierarchical system (Joshua) by a very significant margin (+1.03 BLEU on average on five Chinese-English NIST test sets), even though both Joshua and our system support discontinuous phrases. Since the key difference between these two systems is that ours is not hierarchical—i.e., our system uses a string-based decoder instead of CKY, and it imposes no hard hierarchical reordering constraints during training and decoding—this paper sets out to challenge the commonly held belief that the tree-based parameterization of systems such as Hiero and Joshua is crucial to their good performance against Moses. 1
Web-Based Machine Translation
, 2003
"... Abstract This chapter has two main aims: (i) to present the state-of-the-art in Machine Translation (MT), namely Phrase-Based Statistical MT, together with the major competing paradigms used in MT research and development today; and (ii) to provide an overview of the MT research carried out by my te ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract This chapter has two main aims: (i) to present the state-of-the-art in Machine Translation (MT), namely Phrase-Based Statistical MT, together with the major competing paradigms used in MT research and development today; and (ii) to provide an overview of the MT research carried out by my team here at DCU, characterised here in terms of ‘hybrid MT’. In addition, we provide our views on the directions that MT research might take in the near future, and conclude the chapter with lists of further reading for the interested reader.
Learning Linear Ordering Problems for Better Translation ∗
"... We apply machine learning to the Linear Ordering Problem in order to learn sentence-specific reordering models for machine translation. We demonstrate that even when these models are used as a mere preprocessing step for German-English translation, they significantly outperform Moses ’ integrated le ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We apply machine learning to the Linear Ordering Problem in order to learn sentence-specific reordering models for machine translation. We demonstrate that even when these models are used as a mere preprocessing step for German-English translation, they significantly outperform Moses ’ integrated lexicalized reordering model. Our models are trained on automatically aligned bitext. Their form is simple but novel. They assess, based on features of the input sentence, how strongly each pair of input word tokens wi, wj would like to reverse their relative order. Combining all these pairwise preferences to find the best global reordering is NP-hard. However, we present a non-trivial O(n3) algorithm, based on chart parsing, that at least finds the best reordering within a certain exponentially large neighborhood. We show how to iterate this reordering process within a local search algorithm, which we use in training. 1

