## Discriminative reranking for machine translation (2004)

### Cached

### Download Links

- [www.cs.sfu.ca]
- [ssli.ee.washington.edu]
- [ssli.ee.washington.edu]
- [ssli.ee.washington.edu]
- [www.cs.sfu.ca]
- [acl.ldc.upenn.edu]
- [acl.ldc.upenn.edu]
- [www.aclweb.org]
- [wing.comp.nus.edu.sg]
- [aclweb.org]
- [www.aclweb.org]
- [aclweb.org]
- [acl.ldc.upenn.edu]
- [www.cis.upenn.edu]
- [www.mt-archive.info]
- DBLP

### Other Repositories/Bibliography

Venue: | In HLTNAACL 2004 |

Citations: | 66 - 1 self |

### BibTeX

@INPROCEEDINGS{Shen04discriminativereranking,

author = {Libin Shen},

title = {Discriminative reranking for machine translation},

booktitle = {In HLTNAACL 2004},

year = {2004},

pages = {177--184}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper describes the application of discriminative reranking techniques to the problem of machine translation. For each sentence in the source language, we obtain from a baseline statistical machine translation system, a ranked nbest list of candidate translations in the target language. We introduce two novel perceptroninspired reranking algorithms that improve on the quality of machine translation over the baseline system based on evaluation using the BLEU metric. We provide experimental results on the NIST 2003 Chinese-English large data track evaluation. We also provide theoretical analysis of our algorithms and experiments that verify that our algorithms provide state-of-theart performance in machine translation. 1

### Citations

9804 | The nature of statistical learning theory
- Vapnik
- 1995
(Show Context)
Citation Context ...an not separate e1 from e10, but not for the case of e21 versus e30. 3.6 Large Margin Classifiers There are quite a few linear classifiers 1 that can separate samples with large margin, such as SVMs (=-=Vapnik, 1998-=-), Boosting (Schapire et al., 1997), Winnow (Zhang, 2000) and Perceptron (Krauth and Mezard, 1987). The performance of SVMs is superior to other linear classifiers because of their ability to margin m... |

1622 | Bleu: a method for automatic evaluation of machine translation,” IBM Research Report RC22176
- Papineni, Roukos
- 2001
(Show Context)
Citation Context ... 450 different feature functions were used in order to improve the syntactic well-formedness of MT output. By reranking a 1000-best list generated by the baseline MT system from Och (2003), the BLEU (=-=Papineni et al., 2001-=-) score on the test dataset was improved from 31.6% to 32.9%. 2 Ranking and Reranking 2.1 Reranking for NLP tasks Like machine translation, parsing is another field of natural language processing in w... |

1273 | The mathematics of statistical machine translation: Parameter estimation
- Brown, Pietra, et al.
- 1993
(Show Context)
Citation Context ...earning in natural language processing. Discriminative reranking for machine translation is an important application of our research project. 1.1 Statistical Machine Translation After the IBM Models (=-=Brown et al., 1993-=-) was proposed in 1993, various generative models have been proposed for Statistical Machine Translation (SMT) in the last ten years. In the IBM Models, the source-channel formalism, which was previou... |

868 | A maximum-entropy-inspired parser
- Charniak
- 2000
(Show Context)
Citation Context ...ious learning algorithms have been employed in parse reranking, such Boosting (Collins, 2000), Perceptron (Collins and Duffy, 2002), Support Vector Machines (Shen et al., 2003) and Log-linear models (=-=Charniak, 2000-=-; Collins, 2000). The reranking technique gives rise to 13.5% error deduction in labeled recall/precision over the previous best generative parsing systems.s2 Discriminative Reranking for MT Inspired ... |

767 | Boosting the margin: A new explanation for the effectiveness of voting methods
- Schapire, Freund, et al.
- 1997
(Show Context)
Citation Context ...10, but not for the case of e21 versus e30. 3.6 Large Margin Classifiers There are quite a few linear classifiers 1 that can separate samples with large margin, such as SVMs (Vapnik, 1998), Boosting (=-=Schapire et al., 1997-=-), Winnow (Zhang, 2000) and Perceptron (Krauth and Mezard, 1987). The performance of SVMs is superior to other linear classifiers because of their ability to margin maximization. However, SVMs are ext... |

636 | A Statistical Approach to Machine Translation
- Brown, Cocke, et al.
- 1990
(Show Context)
Citation Context ...also provide theoretical analysis of our algorithms and experiments that verify that our algorithms provide state-of-theart performance in machine translation. 1 Introduction The noisy-channel model (=-=Brown et al., 1990-=-) has been the foundation for statistical machine translation (SMT) for over ten years. Recently so-called reranking techniques, such as maximum entropy models (Och and Ney, 2002) and gradient methods... |

476 | Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
- Wu
- 1997
(Show Context)
Citation Context ...ectively. Parse trees have also been used in alignment models. Wu (1997) introduced constraints on alignments using a probabilistic synchronous context-free grammar restricted to Chomskynormal form. (=-=Wu, 1997-=-) was an implicit or selforganizing syntax model as it did not use a Treebank. Yamada and Knight (2001) used a statistical parser trained using a Treebank in the source language to produce parse trees... |

417 | Discriminative Training and Maximum Entropy Models for Statistical Machine Translation
- Och, Ney
- 2002
(Show Context)
Citation Context ...noisy-channel model (Brown et al., 1990) has been the foundation for statistical machine translation (SMT) for over ten years. Recently so-called reranking techniques, such as maximum entropy models (=-=Och and Ney, 2002-=-) and gradient methods (Och, 2003), have been applied to machine translation (MT), and have provided significant improvements. In this paper, we introduce two novel machine learning algorithms special... |

314 |
Large margin rank boundaries for ordinal regression
- Herbrich, Obermayer, et al.
- 2000
(Show Context)
Citation Context ... does not work on the reranking tasks due to the introduction of global ranks. The other approach is to reduce the ranking problem to a classification problem by using the method of pairwise samples (=-=Herbrich et al., 2000-=-). The underlying assumption is that the samples of consecutive ranks are separable. This may become a problem in the case that ranks are unreliable when ranking does not strongly distinguish between ... |

275 | A syntax-based statistical translation model - Yamada, Knight - 2001 |

222 | New ranking algorithms for parsing and tagging: kernels over discrete structures and the voted perceptron
- Collins, Duffy
- 2002
(Show Context)
Citation Context ...f local and global features of various kinds, which are unavailable in generative models. Various learning algorithms have been employed in parse reranking, such Boosting (Collins, 2000), Perceptron (=-=Collins and Duffy, 2002-=-), Support Vector Machines (Shen et al., 2003) and Log-linear models (Charniak, 2000; Collins, 2000). The reranking technique gives rise to 13.5% error deduction in labeled recall/precision over the p... |

173 | Pranking with ranking
- Crammer, Singer
- 2001
(Show Context)
Citation Context ...o large margin approaches have been used. One is the PRank algorithm, a variant of the perceptron algorithm, that uses multiple biases to represent the boundaries between every two consecutive ranks (=-=Crammer and Singer, 2001-=-; Harrington, 2003). However, as we will show in section 3.7, the PRank algorithm does not work on the reranking tasks due to the introduction of global ranks. The other approach is to reduce the rank... |

99 |
Learning algorithms with optimal stability in neural networks
- Krauth, Mézard
- 1987
(Show Context)
Citation Context ...Classifiers There are quite a few linear classifiers 1 that can separate samples with large margin, such as SVMs (Vapnik, 1998), Boosting (Schapire et al., 1997), Winnow (Zhang, 2000) and Perceptron (=-=Krauth and Mezard, 1987-=-). The performance of SVMs is superior to other linear classifiers because of their ability to margin maximization. However, SVMs are extremely slow in training since they need to solve a quadratic pr... |

60 | The perceptron algorithm with uneven margins
- Li, Zaragoza, et al.
- 2002
(Show Context)
Citation Context ...aller set of “better” features, cf. (Shen and Joshi, 2004). If the number of the non-discriminative features is large enough, the data set becomes unsplittable. We have tried using the λ trick as in (=-=Li et al., 2002-=-) to make data separable artificially, but the performance could not be improved with such features. We achieve similar results with Algorithm 2, the ordinal regression with uneven margin. It converge... |

34 | Using ltag based features in parse reranking
- Shen, Sarkar, et al.
- 2003
(Show Context)
Citation Context ... are unavailable in generative models. Various learning algorithms have been employed in parse reranking, such Boosting (Collins, 2000), Perceptron (Collins and Duffy, 2002), Support Vector Machines (=-=Shen et al., 2003-=-) and Log-linear models (Charniak, 2000; Collins, 2000). The reranking technique gives rise to 13.5% error deduction in labeled recall/precision over the previous best generative parsing systems.s2 Di... |

33 | An SVM based voting algorithm with application to parse reranking
- Shen, Joshi
- 2003
(Show Context)
Citation Context ...improvements in parsing. Various machine learning algorithms have been employed in parse reranking, such as Boosting (Collins, 2000), Perceptron (Collins and Duffy, 2002) and Support Vector Machines (=-=Shen and Joshi, 2003-=-). The reranking techniques have resulted in a 13.5% error reduction in labeled recall/precision over the previous best generative parsing models. Discriminative reranking methods for parsing typicall... |

28 | Online Ranking/Collaborative filtering using the Perceptron Algorithm
- Harrington
- 2003
(Show Context)
Citation Context ...have been used. One is the PRank algorithm, a variant of the perceptron algorithm, that uses multiple biases to represent the boundaries between every two consecutive ranks (Crammer and Singer, 2001; =-=Harrington, 2003-=-). However, as we will show in section 3.7, the PRank algorithm does not work on the reranking tasks due to the introduction of global ranks. The other approach is to reduce the ranking problem to a c... |

21 | Modeling with structures in statistical machine translation - Wang, Waibel - 1998 |

15 | Improving Statistical Natural Language Translation with Categories and Rules
- Och, Weber
- 1998
(Show Context)
Citation Context ... an SMT model based on phrase-based alignments. Since their translation model reordered phrases directly, it achieved higher accuracy for translation between languages with different word orders. In (=-=Och and Weber, 1998-=-; Och et al., 1999), a two-level alignment model was employed to utilize shallow phrase structures: alignment between templates was used to handle phrase reordering, and word alignments within a templ... |

12 | Flexible margin selection for reranking with full pairwise samples
- Shen, Joshi
- 2004
(Show Context)
Citation Context ...lly use the notion of a margin as the distance between the best candidate parse and the rest of the parses. The reranking problem is reduced to a classification problem by using pairwise samples. In (=-=Shen and Joshi, 2004-=-), we have introduced a new perceptron-like ordinal regression algorithm for parse reranking. In that algorithm, pairwise samples are used for training and margins are defined as the distance between ... |

5 | Large margin winnow methods for text categorization
- Zhang
- 2000
(Show Context)
Citation Context ...versus e30. 3.6 Large Margin Classifiers There are quite a few linear classifiers 1 that can separate samples with large margin, such as SVMs (Vapnik, 1998), Boosting (Schapire et al., 1997), Winnow (=-=Zhang, 2000-=-) and Perceptron (Krauth and Mezard, 1987). The performance of SVMs is superior to other linear classifiers because of their ability to margin maximization. However, SVMs are extremely slow in trainin... |

2 |
Improved alignment models for statistical machine
- Och, Tillmann, et al.
- 1999
(Show Context)
Citation Context ...gnment model based on shallow model structures. Since their translation model reordered phrases directly, it achieved higher accuracy for translation between languages with different word orders. In (=-=Och et al., 1999-=-), a two-level alignment model was employed to utilize shallow phrase structures. Alignment between templates was used to handle phrase reordering, and word alignments within a template was used to ha... |