Assessing Phrase-Based Translation Models with Oracle Decoding
| Citations: | 1 - 1 self |
BibTeX
@MISC{Wisniewski_assessingphrase-based,
author = {Guillaume Wisniewski and Re Allauzen},
title = {Assessing Phrase-Based Translation Models with Oracle Decoding},
year = {}
}
OpenURL
Abstract
Extant Statistical Machine Translation (SMT) systems are very complex softwares, which embed multiple layers of heuristics and embark very large numbers of numerical parameters. As a result, it is difficult to analyze output translations and there is a real need for tools that could help developers to better understand the various causes of errors. In this study, we make a step in that direction and present an attempt to evaluate the quality of the phrase-based translation model. In order to identify those translation errors that stem from deficiencies in the phrase table (PT), we propose to compute the oracle BLEU-4 score, that is the best score that a system based on this PT can achieve on a reference corpus. By casting the computation of the oracle BLEU-1 as an Integer Linear Programming (ILP) problem, we show that it is possible to efficiently compute accurate lower-bounds of this score, and report measures performed on several standard benchmarks. Various other applications of these oracle decoding techniques are also reported and discussed.







