Results 1 -
9 of
9
Learning to Translate with Multiple Objectives
"... We introduce an approach to optimize a machine translation (MT) system on multiple metrics simultaneously. Different metrics (e.g. BLEU, TER) focus on different aspects of translation quality; our multi-objective approach leverages these diverse aspects to improve overall quality. Our approach is ba ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
We introduce an approach to optimize a machine translation (MT) system on multiple metrics simultaneously. Different metrics (e.g. BLEU, TER) focus on different aspects of translation quality; our multi-objective approach leverages these diverse aspects to improve overall quality. Our approach is based on the theory of Pareto Optimality. It is simple to implement on top of existing single-objective optimization methods (e.g. MERT, PRO) and outperforms ad hoc alternatives based on linear-combination of metrics. We also discuss the issue of metric tunability and show that our Pareto approach is more effective in incorporating new metrics from MT evaluation for MT optimization. 1
TESLA at WMT 2011: Translation Evaluation and Tunable Metric
"... This paper describes the submission from the National University of Singapore to the WMT 2011 Shared Evaluation Task and the Tunable Metric Task. Our entry is TESLA in three different ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper describes the submission from the National University of Singapore to the WMT 2011 Shared Evaluation Task and the Tunable Metric Task. Our entry is TESLA in three different
PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning. Accepted for publication
- in Proceedings of ACL
, 2012
"... Abstract Many machine translation (MT) evaluation metrics have been shown to correlate better with human judgment than BLEU. In principle, tuning on these metrics should yield better systems than tuning on BLEU. However, due to issues such as speed, requirements for linguistic resources, and optimi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract Many machine translation (MT) evaluation metrics have been shown to correlate better with human judgment than BLEU. In principle, tuning on these metrics should yield better systems than tuning on BLEU. However, due to issues such as speed, requirements for linguistic resources, and optimization difficulty, they have not been widely adopted for tuning. This paper presents PORT 1 , a new MT evaluation metric which combines precision, recall and an ordering metric and which is primarily designed for tuning MT systems. PORT does not require external resources and is quick to compute. It has a better correlation with human judgment than BLEU. We compare PORT-tuned MT systems to BLEU-tuned baselines in five experimental conditions involving four language pairs. PORT tuning achieves consistently better performance than BLEU tuning, according to four automated metrics (including BLEU) and to human evaluation: in comparisons of outputs from 300 source sentences, human judges preferred the PORT-tuned output 45.3% of the time (vs. 32.7% BLEU tuning preferences and 22.0% ties).
Sentences with Linear-programming-based Analysis – Character-level Evaluation for
"... Languages with Ambiguous word Boundaries) for automatic machine translation evaluation. For languages such as Chinese where words usually have meaningful internal structure and word boundaries are often fuzzy, TESLA-CELAB acknowledges the advantage of character-level evaluation over word-level evalu ..."
Abstract
- Add to MetaCart
(Show Context)
Languages with Ambiguous word Boundaries) for automatic machine translation evaluation. For languages such as Chinese where words usually have meaningful internal structure and word boundaries are often fuzzy, TESLA-CELAB acknowledges the advantage of character-level evaluation over word-level evaluation. By reformulating the problem in the linear programming framework, TESLA-CELAB addresses several drawbacks of the character-level metrics, in particular the modeling of synonyms spanning multiple characters. We show empirically that TESLA-CELAB significantly outperforms characterlevel BLEU in the English-Chinese translation evaluation tasks. 1
User-Centered Design of Translation Systems
"... The goal of this thesis is to design an interactive translation system to support multilingual communication using the user-centered design approach; it details how to select the best machine translation for the user’s input message, customize translation for different communication topics, and inte ..."
Abstract
- Add to MetaCart
The goal of this thesis is to design an interactive translation system to support multilingual communication using the user-centered design approach; it details how to select the best machine translation for the user’s input message, customize translation for different communication topics, and interact with users to improve translation quality for multilingual communication. Existing studies on machine translation mediated communication show that mistranslation can lead to ineffective communication. Traditionally, machine translators cannot prevent the transfer of mistranslations, and users do not know how machine translator works, thus translation systems are just transparent channels to the users. We analyze three challenges of users ’ needs and limitations from the perspective of monolingual and noncomputing professional users. The first challenge is that how can users use multiple machine translators. The second is how can users customize translation. The last is how to help users repair the mistranslations. Following
Multi-Metric Optimization Using Ensemble Tuning
"... This paper examines tuning for statistical machine translation (SMT) with respect to multiple evaluation metrics. We propose several novel methods for tuning towards multiple objectives, including some based on ensemble decoding methods. Pareto-optimality is a natural way to think about multi-metric ..."
Abstract
- Add to MetaCart
(Show Context)
This paper examines tuning for statistical machine translation (SMT) with respect to multiple evaluation metrics. We propose several novel methods for tuning towards multiple objectives, including some based on ensemble decoding methods. Pareto-optimality is a natural way to think about multi-metric optimization (MMO) and our methods can effectively combine several Pareto-optimal solutions, obviating the need to choose one. Our best performing ensemble tuning method is a new algorithm for multi-metric optimization that searches for Pareto-optimal ensemble models. We study the effectiveness of our methods through experiments on multiple as well as single reference(s) datasets. Our experiments show simultaneous gains across several metrics (BLEU, RIBES), without any significant reduction in other metrics. This contrasts the traditional tuning where gains are usually limited to a single metric. Our human evaluation results confirm that in order to produce better MT output, optimizing multiple metrics is better than optimizing only one. 1
Probabilistic Finite State Machines for Regression-based MT Evaluation
"... Accurate and robust metrics for automatic evaluation are key to the development of statistical machine translation (MT) systems. We first introduce a new regression model that uses a probabilistic finite state machine (pFSM) to compute weighted edit distance as predictions of translation quality. We ..."
Abstract
- Add to MetaCart
(Show Context)
Accurate and robust metrics for automatic evaluation are key to the development of statistical machine translation (MT) systems. We first introduce a new regression model that uses a probabilistic finite state machine (pFSM) to compute weighted edit distance as predictions of translation quality. We also propose a novel pushdown automaton extension of the pFSM model for modeling word swapping and cross alignments that cannot be captured by standard edit distance models. Our models can easily incorporate a rich set of linguistic features, and automatically learn their weights, eliminating the need for ad-hoc parameter tuning. Our methods achieve state-of-the-art correlation with human judgments on two different prediction tasks across a diverse set of standard evaluations (NIST OpenMT06,08; WMT06-08). 1
Confusion Network Based System Combination for Chinese Translation Output: Word-Level or Character-Level?
"... Recently, confusion network based system combination has applied successfully to various machine translation tasks. However, to construct the confusion network when combining the Chinese translation outputs from multiple machine translation systems, it is possible to either take a Chinese word as th ..."
Abstract
- Add to MetaCart
Recently, confusion network based system combination has applied successfully to various machine translation tasks. However, to construct the confusion network when combining the Chinese translation outputs from multiple machine translation systems, it is possible to either take a Chinese word as the atomic unit (word-level) or take a Chinese character as the atomic unit (character-level). In this paper, we compare word-level approach with character-level approach for combining Chinese translation outputs on the NIST'08 EC tasks and IWSLT'08 EC CRR challenge tasks. Our experimental results reveal that character-level combination system significantly outperforms word-level combination system.
APPROVAL
"... reproduced without authorization under the conditions for “Fair Dealing.” Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately. ..."
Abstract
- Add to MetaCart
reproduced without authorization under the conditions for “Fair Dealing.” Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately.