Results 1 - 10
of
246
Improving statistical machine translation using word sense disambiguation
- In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
, 2007
"... We show for the first time that incorporating the predictions of a word sense disambigua-tion system within a typical phrase-based statistical machine translation (SMT) model consistently improves translation quality across all three different IWSLT Chinese-English test sets, as well as producing st ..."
Abstract
-
Cited by 128 (7 self)
- Add to MetaCart
(Show Context)
We show for the first time that incorporating the predictions of a word sense disambigua-tion system within a typical phrase-based statistical machine translation (SMT) model consistently improves translation quality across all three different IWSLT Chinese-English test sets, as well as producing sta-tistically significant improvements on the larger NIST Chinese-English MT task— and moreover never hurts performance on any test set, according not only to BLEU but to all eight most commonly used au-tomatic evaluation metrics. Recent work has challenged the assumption that word sense disambiguation (WSD) systems are useful for SMT. Yet SMT translation qual-ity still obviously suffers from inaccurate lexical choice. In this paper, we address this problem by investigating a new strat-egy for integrating WSD into an SMT sys-tem, that performs fully phrasal multi-word disambiguation. Instead of directly incor-porating a Senseval-style WSD system, we redefine the WSD task to match the ex-act same phrasal translation disambiguation task faced by phrase-based SMT systems. Our results provide the first known empir-ical evidence that lexical semantics are in-deed useful for SMT, despite claims to the contrary. ∗This material is based upon work supported in part by
Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability.
- for Computational Linguistics.
, 2011
"... Abstract In statistical machine translation, a researcher seeks to determine whether some innovation (e.g., a new feature, model, or inference algorithm) improves translation quality in comparison to a baseline system. To answer this question, he runs an experiment to evaluate the behavior of the t ..."
Abstract
-
Cited by 124 (15 self)
- Add to MetaCart
(Show Context)
Abstract In statistical machine translation, a researcher seeks to determine whether some innovation (e.g., a new feature, model, or inference algorithm) improves translation quality in comparison to a baseline system. To answer this question, he runs an experiment to evaluate the behavior of the two systems on held-out data. In this paper, we consider how to make such experiments more statistically reliable. We provide a systematic analysis of the effects of optimizer instability-an extraneous variable that is seldom controlled for-on experimental outcomes, and make recommendations for reporting results more accurately.
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 93 (6 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems
"... This paper describes Meteor 1.3, our submission ..."
(Show Context)
A Study of Translation Error Rate with Targeted Human Annotation
- In Proceedings of the Association for Machine Transaltion in the Americas (AMTA 2006
, 2006
"... We define a new, intuitive measure for evaluating machine translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments. Translation Error Rate (TER) measures the amount of editing that a human would have to perform to cha ..."
Abstract
-
Cited by 53 (4 self)
- Add to MetaCart
We define a new, intuitive measure for evaluating machine translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments. Translation Error Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation. We also compute a human-targeted TER (or HTER), where the minimum TER of the translation is computed against a human ‘targeted reference ’ that preserves the meaning (provided by the reference translations) and is fluent, but is chosen to minimize the TER score for a particular system output. We show that: (1) The single-reference variant of TER correlates as well with human judgments of MT quality as the four-reference variant of BLEU; (2) The human-targeted HTER yields a 33 % error-rate reduction and is shown to be very well correlated with human judgments; (3) The four-reference variant of TER and the single-reference variant of HTER yield higher correlations with human judgments than BLEU; (4) HTER yields higher correlations with human judgments than METEOR or its human-targeted variant (HMETEOR); and (5) The four-reference variant of TER correlates as well with a single human judgment as a second human judgment does, while HTER, HBLEU, and HMETEOR correlate significantly better with a human judgment than a second human judgment does.
Diagnosing meaning errors in short answers to reading comprehension questions
- Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, held at ACL 2008. Columbus, Ohio: Associa12 for Computational Linguistics
, 2008
"... A common focus of systems in Intelligent Computer-Assisted Language Learning (ICALL) is to provide immediate feedback to language learners working on exercises. Most of this research has focused on providing feedback on the form of the learner input. Foreign language practice and second language acq ..."
Abstract
-
Cited by 27 (22 self)
- Add to MetaCart
(Show Context)
A common focus of systems in Intelligent Computer-Assisted Language Learning (ICALL) is to provide immediate feedback to language learners working on exercises. Most of this research has focused on providing feedback on the form of the learner input. Foreign language practice and second language acquisition research, on the other hand, emphasizes the importance of exercises that require the learner to manipulate meaning. The ability of an ICALL system to diagnose and provide feedback on the meaning conveyed by a learner response depends on how well it can deal with the response variation allowed by an activity. We focus on short-answer reading comprehension questions which have a clearly defined target response but the learner may convey the meaning of the target in multiple ways. As empirical basis of our work, we collected an English as a Second Language (ESL) learner corpus of short-answer reading comprehension questions, for which two graders provided target answers and correctness judgments. On this basis, we developed a Content-Assessment Module (CAM), which performs shallow semantic analysis to diagnose meaning errors. It reaches an accuracy of 88 % for semantic error detection and 87 % on semantic error diagnosis on a held-out test data set. 1
Incremental hypothesis alignment for building confusion networks with application to machine translation system combination
- In Proceedings Third Workshop on Statistical Machine Translation
, 2008
"... Confusion network decoding has been the most successful approach in combining outputs from multiple machine translation (MT) systems in the recent DARPA GALE and NIST Open MT evaluations. Due to the varying word order between outputs from different MT systems, the hypothesis alignment presents the b ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
(Show Context)
Confusion network decoding has been the most successful approach in combining outputs from multiple machine translation (MT) systems in the recent DARPA GALE and NIST Open MT evaluations. Due to the varying word order between outputs from different MT systems, the hypothesis alignment presents the biggest challenge in confusion network decoding. This paper describes an incremental alignment method to build confusion networks based on the translation edit rate (TER) algorithm. This new algorithm yields significant BLEU score improvements over other recent alignment methods on the GALE test sets and was used in BBN’s submission to the WMT08 shared translation task. 1
The RWTH statistical machine translation system for
- the IWSLT 2006 evaluation,” in Proc. Int. Workshop Spoken Language Translation, 2006
"... We give an overview of the RWTH phrase-based statistical machine translation system that was used in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2006. The system was ranked first with respect to the BLEU measure in all language pairs it was used Using ..."
Abstract
-
Cited by 24 (15 self)
- Add to MetaCart
(Show Context)
We give an overview of the RWTH phrase-based statistical machine translation system that was used in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2006. The system was ranked first with respect to the BLEU measure in all language pairs it was used Using a two-pass aproach, we first generate the N best translation candidates. The second pass consists of rescoring and reranking these candidates. We will give a description of the search algorithm as well as of the models used in each pass. We will also describe our method for dealing with punctuation restoration, in order to overcome the difficulties of spoken language translation. This work also includes a brief description of the system combination done by the partners participating in the European TC-Star project. 1.
Dependency-Based Automatic Evaluation for Machine Translation
- In Proceedings of SSST, NAACLHLT/AMTA Workshop on Syntax and Structure in Statistical Translation
, 2007
"... We present a novel method for evaluating the output of Machine Translation (MT), based on comparing the dependency structures of the translation and reference rather than their surface string forms. Our method uses a treebank-based, widecoverage, probabilistic Lexical-Functional Grammar (LFG) parser ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
(Show Context)
We present a novel method for evaluating the output of Machine Translation (MT), based on comparing the dependency structures of the translation and reference rather than their surface string forms. Our method uses a treebank-based, widecoverage, probabilistic Lexical-Functional Grammar (LFG) parser to produce a set of structural dependencies for each translation-reference sentence pair, and then calculates the precision and recall for these dependencies. Our dependencybased evaluation, in contrast to most popular string-based evaluation metrics, will not unfairly penalize perfectly valid syntactic variations in the translation. In addition to allowing for legitimate syntactic differences, we use paraphrases in the evaluation process to account for lexical variation. In comparison with other metrics on 16,800 sentences of Chinese-English newswire text, our method reaches high correlation with human scores. An experiment with two translations of 4,000 sentences from Spanish-English Europarl shows that, in contrast to most other metrics, our method does not display a high bias towards statistical models of translation. 1
Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language
- Journal of Artificial Intelligence Research
, 2010
"... We present a novel framework for learning to interpret and generate language using only perceptual context as supervision. We demonstrate its capabilities by developing a system that learns to sportscast simulated robot soccer games in both English and Korean without any language-specific prior know ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
(Show Context)
We present a novel framework for learning to interpret and generate language using only perceptual context as supervision. We demonstrate its capabilities by developing a system that learns to sportscast simulated robot soccer games in both English and Korean without any language-specific prior knowledge. Training employs only ambiguous supervision consisting of a stream of descriptive textual comments and a sequence of events extracted from the simulation trace. The system simultaneously establishes correspondences between individual comments and the events that they describe while building a translation model that supports both parsing and generation. We also present a novel algorithm for learning which events are worth describing. Human evaluations of the generated commentaries indicate they are of reasonable quality and in some cases even on par with those produced by humans for our limited domain. 1.