Results 1 
3 of
3
Measuring machine translation errors in new domains
 In manuscript
, 2013
"... We develop two techniques for analyzing the effect of porting a machine translation system to a new domain. One is a macrolevel analysis that measures how domain shift affects corpuslevel evaluation; the second is a microlevel analysis for wordlevel errors. We apply these methods to understand wh ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
We develop two techniques for analyzing the effect of porting a machine translation system to a new domain. One is a macrolevel analysis that measures how domain shift affects corpuslevel evaluation; the second is a microlevel analysis for wordlevel errors. We apply these methods to understand what happens when a Parliamenttrained phrasebased machine translation system is applied in four very different domains: news, medical texts, scientific articles and movie subtitles. We present quantitative and qualitative experiments that highlight opportunities for future research in domain adaptation for machine translation. 1
Computing Lattice BLEU Oracle Scores for Machine Translation
"... The search space of PhraseBased Statistical Machine Translation (PBSMT) systems can be represented under the form of a directed acyclic graph (lattice). The quality of this search space can thus be evaluated by computing the best achievable hypothesis in the lattice, the socalled oracle hypothesis ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The search space of PhraseBased Statistical Machine Translation (PBSMT) systems can be represented under the form of a directed acyclic graph (lattice). The quality of this search space can thus be evaluated by computing the best achievable hypothesis in the lattice, the socalled oracle hypothesis. For common SMT metrics, this problem is however NPhard and can only be solved using heuristics. In this work, we present two new methods for efficiently computing BLEU oracles on lattices: the first one is based on a linear approximation of the corpus BLEU score and is solved using the FST formalism; the second one relies on integer linear programming formulation and is solved directly and using the Lagrangian relaxation framework. These new decoders are positively evaluated and compared with several alternatives from the literature for three language pairs, using lattices produced by two PBSMT systems. 1
ALattice BLEU Oracles in Machine Translation ARTEM SOKOLOV, Institut für Computerlinguistik, Universität Heidelberg
"... The search space of PhraseBased Statistical Machine Translation (PBSMT) systems can be represented as a directed acyclic graph (lattice). By exploring this search space, it is possible to analyze and understand the failures of PBSMT systems. Indeed, useful diagnoses can be obtained by computing the ..."
Abstract
 Add to MetaCart
(Show Context)
The search space of PhraseBased Statistical Machine Translation (PBSMT) systems can be represented as a directed acyclic graph (lattice). By exploring this search space, it is possible to analyze and understand the failures of PBSMT systems. Indeed, useful diagnoses can be obtained by computing the socalled oracle hypotheses, which are hypotheses in the search space that have the highest quality score. For standard SMT metrics, this problem is however NPhard and can only be solved approximately. In this work, we present two new methods for efficiently computing BLEU oracles on lattices: the first one is based on a linear approximation of the corpus BLEU score and is solved using generic shortest distance algorithms; the second one relies on an Integer Linear Programming (ILP) formulation of the oracle decoding that incorporates count clipping constraints. It can either be solved directly using a standard ILP solver or using Lagrangian relaxation techniques. These new decoders are evaluated and compared with several alternatives from the literature for three language pairs, using lattices produced by two PBSMT systems.