Scaling the ISLE framework: Use of existing corpus resources for validation of MT evaluation metrics across languages (2002)
| Venue: | In Proceedings of LREC 2002. Las Plamas, Canary Islands |
| Citations: | 4 - 1 self |
BibTeX
@INPROCEEDINGS{Vanni02scalingthe,
author = {Michelle Vanni and Keith Miller},
title = {Scaling the ISLE framework: Use of existing corpus resources for validation of MT evaluation metrics across languages},
booktitle = {In Proceedings of LREC 2002. Las Plamas, Canary Islands},
year = {2002},
pages = {1254--1262}
}
OpenURL
Abstract
This paper describes a machine translation (MT) evaluation (MTE) research program which has benefited from the availability of two collections of source language texts and the results of processing these texts with several commercial MT engines (DARPA 1994, Doyon, Taylor, & White 1999). The methodology entails the systematic development of a predictive relationship between discrete, well-defined MTE metrics and specific information processing tasks that can be reliably performed with output of a given MT system. Unlike tests used in initial experiments on automated scoring (Jones and Rusk 2000), we employ traditional measures of MT output quality, selected from the International Standards for Language Engineering (ISLE) framework: Coherence, Clarity, Syntax, Morphology, General and Domain-specific Lexical robustness, to include Named-entity translation. Each test was originally validated on MT output produced by three Spanish-to-English systems (1994 DARPA MTE). We validate tests in the present work, however, with material taken from the MT Scale Evaluation research program produced by Japanese-to-English MT systems. Since Spanish and Japanese differ structurally on the morphological, syntactic, and discourse levels, a comparison of scores on tests measuring these output qualities should reveal how structural similarity, such as that enjoyed by Spanish and English, and structural contrast, such as that found between Japanese and English, affect the linguistic distinctions which must be accommodated by MT systems. Moreover, we show that metrics developed using Spanish-English MT output are equally effective when applied to Japanese-English MT output. 1.







