Results 1 -
4 of
4
2007b. Regression for Sentence-Level MT Evaluation with Pseudo References
- In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL
"... Many automatic evaluation metrics for machine translation (MT) rely on making comparisons to human translations, a resource that may not always be available. In this work, we present a method for developing sentence-level MT evaluation metrics that do not directly rely on human reference translation ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Many automatic evaluation metrics for machine translation (MT) rely on making comparisons to human translations, a resource that may not always be available. In this work, we present a method for developing sentence-level MT evaluation metrics that do not directly rely on human reference translations. Our metrics are developed using regression learning and are based on a set of weaker indicators of fluency and adequacy (pseudo references). Experimental results suggest that they rival standard reference-based metrics in terms of correlations with human judgments on new test instances. 1
2005. A Paraphrase-Based Approach to Machine Translation Evaluation
- University of Maryland, College Park
, 2005
"... We propose a novel approach to automatic machine translation evaluation based on paraphrase identification. The quality of machine-generated output can be viewed as the extent to which the conveyed meaning matches the semantics of reference translations, independent of lexical and syntactic divergen ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We propose a novel approach to automatic machine translation evaluation based on paraphrase identification. The quality of machine-generated output can be viewed as the extent to which the conveyed meaning matches the semantics of reference translations, independent of lexical and syntactic divergences. This idea is implemented in linear regression models that attempt to capture human judgments of adequacy and fluency, based on features that have previously been shown to be effective for paraphrase identification. We evaluated our model using the output of three different MT systems from the 2004 NIST Arabic-to-English MT evaluation. Results show that models employing paraphrase-based features correlate better with human judgments than models based purely on existing automatic MT metrics. 1 1
Towards Heterogeneous Automatic MT Error Analysis
"... This work studies the viability of performing heterogeneous automatic MT error analyses. Error analysis is, undoubtly, one of the most crucial stages in the development cycle of an MT system. However, often not enough attention is paid to this process. The reason is that performing an accurate error ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This work studies the viability of performing heterogeneous automatic MT error analyses. Error analysis is, undoubtly, one of the most crucial stages in the development cycle of an MT system. However, often not enough attention is paid to this process. The reason is that performing an accurate error analysis requires intensive human labor. In order to speed up the error analysis process, we suggest partially automatizing it by having automatic evaluation metrics play a more active role. For that purpose, we have compiled a large and heterogeneous set of features at different linguistic levels and at different levels of granularity. Through a practical case study, we show how these features provide an effective means of ellaborating interpretable and detailed automatic reports of translation quality. 1.
Regression for Sentence-Level MT Evaluation with Pseudo References
"... Many automatic evaluation metrics for machine translation (MT) rely on making comparisons to human translations, a resource that may not always be available. We present a method for developing sentence-level MT evaluation metrics that do not directly rely on human reference translations. Our metrics ..."
Abstract
- Add to MetaCart
Many automatic evaluation metrics for machine translation (MT) rely on making comparisons to human translations, a resource that may not always be available. We present a method for developing sentence-level MT evaluation metrics that do not directly rely on human reference translations. Our metrics are developed using regression learning and are based on a set of weaker indicators of fluency and adequacy (pseudo references). Experimental results suggest that they rival standard reference-based metrics in terms of correlations with human judgments on new test instances. 1

