## Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation (2006)

### Cached

### Download Links

- [mtgroup.ict.ac.cn]
- [www.mt-archive.info]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. of COLING-ACL |

Citations: | 59 - 14 self |

### BibTeX

@INPROCEEDINGS{Xiong06maximumentropy,

author = {Deyi Xiong},

title = {Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation},

booktitle = {In Proc. of COLING-ACL},

year = {2006},

pages = {521--528}

}

### Years of Citing Articles

### OpenURL

### Abstract

We propose a novel reordering model for phrase-based statistical machine translation (SMT) that uses a maximum entropy (MaxEnt) model to predicate reorderings of neighbor blocks (phrase pairs). The model provides content-dependent, hierarchical phrasal reordering with generalization based on features automatically learned from a real-world bitext. We present an algorithm to extract all reordering events of neighbor blocks from bilingual data. In our experiments on Chineseto-English translation, this MaxEnt-based reordering model obtains significant improvements in BLEU score on the NIST MT-05 and IWSLT-04 tasks. 1

### Citations

640 | Statistical phrase-based translation
- Koehn, Och, et al.
- 2003
(Show Context)
Citation Context ...still a computationally expensive problem just like reordering at the word level (Knight, 1999). Many systems use very simple models to reorder phrases 1 . One is distortion model (Och and Ney, 2004; =-=Koehn et al., 2003-=-) which penalizes translations according to their jump distance instead of their content. For example, if N words are skipped, a penalty of N will be paid regardless of which words are reordered. This... |

453 | H.: Improved Statistical Alignment Models - Och, Ney |

431 | Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
- Wu
- 1997
(Show Context)
Citation Context ... the reorderings of phrases, but also integrates some phrasal generalizations into the global model. In this paper, we propose a novel solution for phrasal reordering. Here, under the ITG constraint (=-=Wu, 1997-=-; Zens et al., 2004), we need to consider just two kinds of reorderings, straight and inverted between two consecutive blocks. Therefore reordering can be modelled as a problem ofsclassification with ... |

367 | A hierarchical phrase-based model for statistical machine translation
- Chiang
- 2005
(Show Context)
Citation Context ...lation probabilities in both directions, and exp(1) and exp(|x|) are the phrase penalty and word penalty, respectively. These features are very common in state-of-the-art systems (Koehn et al., 2005; =-=Chiang, 2005-=-) and λs are weights of features. For the reordering model Ω, we define it on the two consecutive blocks A 1 and A 2 and their order o ∈ {straight, inverted} Ω = f(o, A 1 , A 2 ) (6) Under this framew... |

350 |
The alignment template approach to statistical machine translation
- Och, Ney
- 2004
(Show Context)
Citation Context ...vel, reordering is still a computationally expensive problem just like reordering at the word level (Knight, 1999). Many systems use very simple models to reorder phrases 1 . One is distortion model (=-=Och and Ney, 2004-=-; Koehn et al., 2003) which penalizes translations according to their jump distance instead of their content. For example, if N words are skipped, a penalty of N will be paid regardless of which words... |

230 | A comparison of algorithms for maximum entropy parameter estimation - Malouf |

148 | Better k-best parsing - Huang, Chiang - 2005 |

116 | Decoding complexity in word-replacement translation models
- Knight
- 1999
(Show Context)
Citation Context ...ase-based systems can easily address reorderings of words within phrases. However, at the phrase level, reordering is still a computationally expensive problem just like reordering at the word level (=-=Knight, 1999-=-). Many systems use very simple models to reorder phrases 1 . One is distortion model (Och and Ney, 2004; Koehn et al., 2003) which penalizes translations according to their jump distance instead of t... |

84 | A Polynomial-Time Algorithm for Statistical Machine Translation
- Wu
- 1996
(Show Context)
Citation Context ...chnology Chinese Academy of Sciences Beijing, China, 100080 {liuqun, sxlin}@ict.ac.cn which are common between two languages with very different orders. Another simple model is flat reordering model (=-=Wu, 1996-=-; Zens et al., 2004; Kumar et al., 2005) which is not content dependent either. Flat model assigns constant probabilities for monotone order and non-monotone order. The two probabilities can be set to... |

42 |
A unigram orientation model for statistical machine translation
- Tillmann
- 2004
(Show Context)
Citation Context ...t to prefer monotone or non-monotone orientations depending on the language pairs. In view of content-independency of the distortion and flat reordering models, several researchers (Och et al., 2004; =-=Tillmann, 2004-=-; Kumar et al., 2005; Koehn et al., 2005) proposed a more powerful model called lexicalized reordering model that is phrase dependent. Lexicalized reordering model learns local orientations (monotone ... |

41 | Local phrase reordering models for statistical machine translation - Kumar, Byrne - 2005 |

29 | A Localized Prediction Model for Statistical Machine Translation - Tillmann, Zhang - 2005 |

17 | Considerations in Maximum Mutual Information and Minimum Classification Error training for Statistical Machine Translation - Venugopal, Vogel |

14 | Sumita: Reordering Constraints for Phrase-Based Statistical Machine Translation
- Zens, Ney, et al.
- 2004
(Show Context)
Citation Context ...hinese Academy of Sciences Beijing, China, 100080 {liuqun, sxlin}@ict.ac.cn which are common between two languages with very different orders. Another simple model is flat reordering model (Wu, 1996; =-=Zens et al., 2004-=-; Kumar et al., 2005) which is not content dependent either. Flat model assigns constant probabilities for monotone order and non-monotone order. The two probabilities can be set to prefer monotone or... |

6 | system description for the 2005 IWSLT speech translation evaluation - Edinburgh |

5 | Minimum Error Rate Training in Statistical Machine Translation - 2003a |

2 | Statistical Machine Translation: From Single-Word Models to Alignment Templates Thesis - 2003b |

2 |
C4.5: progarms for machine learning
- Quinlan
- 1993
(Show Context)
Citation Context ...ords as well as the whole blocks against the order on the reordering examples extracted by the algorithm described above. The IGR is the measure used in the decision tree learning to select features (=-=Quinlan, 1993-=-). It represents how precisely the feature predicate the class. For feature f and class c, the IGR(f, c) IGR(f, c) = En(c) − En(c|f) En(f) (11) where En(·) is the entropy and En(·|·) is the conditiona... |