Results 1 -
4 of
4
Principles of context-based machine translation evaluation
- Machine Translation
, 2002
"... Abstract. This article defines a Framework for Machine Translation Evaluation (FEMTI) which relates the quality model used to evaluate a machine translation system to the purpose and context of the system. Our proposal attempts to put together, into a coherent picture, previous attempts to structure ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Abstract. This article defines a Framework for Machine Translation Evaluation (FEMTI) which relates the quality model used to evaluate a machine translation system to the purpose and context of the system. Our proposal attempts to put together, into a coherent picture, previous attempts to structure a domain characterised by overall complexity and local difficulties. In this article, we first summarise these attempts, then present an overview of the ISO/IEC guidelines for software evaluation (ISO/IEC 9126 and ISO/IEC 14598). As an application of these guidelines to machine translation software, we introduce FEMTI, a framework that is made of two interrelated classifications or taxonomies. The first classification enables evaluators to define an intended context of use, while the links to the second classification generate a relevant quality model (quality characteristics and metrics) for the respective context. The second classification provides definitions of various metrics used by the community. Further on, as part of ongoing, long-term research, we explain how metrics are analyzed, first from the general point of view of “meta-evaluation”, then focusing on examples. Finally, we show how consensus towards the present framework is sought for, and how feedback from the community is taken into account in the FEMTI life-cycle. Key words: MT evaluation, quality models, evaluation metrics, context-based evaluation 1.
A Fluency Error Categorization Scheme to Guide Automated Machine Translation Evaluation. AMTA: Machine Translation: From Real Users to Research
, 2004
"... Abstract. Existing automated MT evaluation methods often require expert human translations. These are produced for every language pair evaluated and, due to this expense, subsequent evaluations tend to rely on the same texts, which do not necessarily reflect real MT use. In contrast, we are designin ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Existing automated MT evaluation methods often require expert human translations. These are produced for every language pair evaluated and, due to this expense, subsequent evaluations tend to rely on the same texts, which do not necessarily reflect real MT use. In contrast, we are designing an automated MT evaluation system, intended for use by post-editors, purchasers and developers, that requires nothing but the raw MT output. Furthermore, our research is based on texts that reflect corporate use of MT. This paper describes our first step in system design: a hierarchical classification scheme of fluency errors in English MT output, to enable us to identify error types and frequencies, and guide the selection of errors for automated detection. We present results from the statistical analysis of 20,000 words of MT output, manually annotated using our classification scheme, and describe correlations between error frequencies and human scores for fluency and adequacy. 1
Sharing Problems and Solutions for Machine Translation of
"... Examples from chat interaction are presented to demonstrate that machine translation of written interaction shares many problems with translation of spoken interaction. The potential for common solutions to the problems is illustrated by describing operations that normalize and tag input befo ..."
Abstract
- Add to MetaCart
Examples from chat interaction are presented to demonstrate that machine translation of written interaction shares many problems with translation of spoken interaction. The potential for common solutions to the problems is illustrated by describing operations that normalize and tag input before translation. Segmenting utterances into small translation units and processing short turns separately are also motivated using data from chat.
ABSTRACT Title of dissertation: AN INVESTIGATION OF THE RELATIONSHIP BETWEEN AUTOMATED MACHINE TRANSLATION EVALUATION METRICS AND USER PERFORMANCE ON
"... This dissertation applies nonparametric statistical techniques to Machine Translation (MT) Evaluation using data from a MT Evaluation experiment conducted through a joint Army Research Laboratory (ARL) and Center for the Advanced Study of Language (CASL) project. In particular, the relationship betw ..."
Abstract
- Add to MetaCart
This dissertation applies nonparametric statistical techniques to Machine Translation (MT) Evaluation using data from a MT Evaluation experiment conducted through a joint Army Research Laboratory (ARL) and Center for the Advanced Study of Language (CASL) project. In particular, the relationship between human task performance on an information extraction task with translated documents and well-known automated translation evaluation metric scores for those documents is studied. Findings from a correlation analysis of the connection between autometrics and task-based metrics are presented and contrasted with current strategies for evaluating translations. A novel idea for assessing partial rank correlation within the presence of grouping factors is also introduced. Lastly, this dissertation presents a framework for task-based machine translation (MT) evaluation and predictive modeling of task responses that gives new information about the relative predic-tive strengths of the different autometrics (and re-coded variants of them) within the statistical Generalized Linear Models developed in analyses of the Information Extraction Task data. This work shows that current autometrics are inadequate with respect to the

