Results 1 -
4 of
4
2009, ‘Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric
- Association for Computational Linguistics
"... Automatic Machine Translation (MT) evaluation metrics have traditionally been evaluated by the correlation of the scores they assign to MT output with human judgments of translation performance. Different types of human judgments, such as Fluency, Adequacy, and HTER, measure varying aspects of MT pe ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Automatic Machine Translation (MT) evaluation metrics have traditionally been evaluated by the correlation of the scores they assign to MT output with human judgments of translation performance. Different types of human judgments, such as Fluency, Adequacy, and HTER, measure varying aspects of MT performance that can be captured by automatic MT metrics. We explore these differences through the use of a new tunable MT metric: TER-Plus, which extends the Translation Edit Rate evaluation metric with tunable parameters and the incorporation of morphology, synonymy and paraphrases. TER-Plus was shown to be one of the top metrics in NIST’s Metrics MATR 2008 Challenge, having the highest average rank in terms of Pearson and Spearman correlation. Optimizing TER-Plus to different types of human judgments yields significantly improved correlations and meaningful changes in the weight of different types of edits, demonstrating significant differences between the types of human judgments. 1
Ter-Plus: Paraphrase, Semantic, and Alignment Enhancements to Translation Edit Rate
"... Abstract. This paper describes a new evaluation metric, Ter-Plus (Terp) for automatic evaluation of machine translation. Terp is an extension of Translation Edit Rate (Ter). It builds on the success of Ter as an evaluation metric and alignment tool and addresses several of its weaknesses through the ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Abstract. This paper describes a new evaluation metric, Ter-Plus (Terp) for automatic evaluation of machine translation. Terp is an extension of Translation Edit Rate (Ter). It builds on the success of Ter as an evaluation metric and alignment tool and addresses several of its weaknesses through the use of paraphrases, stemming, synonyms, as well as edit costs that can be automatically optimized to correlate better with various types of human judgments. We present a correlation study comparing Terp to Bleu, Meteor and Ter, and illustrate that Terp can better evaluate translation adequacy.
TERp System Description
"... This paper describes TER-Plus (TERp) the University of Maryland / BBN Technologies submission for the NIST Metric MATR 2008 workshop on automatic machine translation evaluation metrics. TERp is an extension of Translation Edit Rate (TER) that builds off of the success of TER as an evaluation metric ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper describes TER-Plus (TERp) the University of Maryland / BBN Technologies submission for the NIST Metric MATR 2008 workshop on automatic machine translation evaluation metrics. TERp is an extension of Translation Edit Rate (TER) that builds off of the success of TER as an evaluation metric and alignment tool while addressing several of its weaknesses through the use of paraphrases, morphological stemming, and synonyms, as well as edit costs that are optimized to correlate better with various types of human judgments. 1
in TERp: Stem Matches, Synonym Matches and Phrase Substitutions (Paraphrases).
"... TER-Plus (TERp) is an extended TER evaluation metric incorporating morphology, synonymy and paraphrases. There are three new edit operations ..."
Abstract
- Add to MetaCart
TER-Plus (TERp) is an extended TER evaluation metric incorporating morphology, synonymy and paraphrases. There are three new edit operations

