Results 1 -
6 of
6
Findings of the 2009 Workshop on Statistical Machine Translation Chris Callison-Burch
"... j schroeder ed ac uk This paper presents the results of the WMT09 shared tasks, which included a translation task, a system combination task, and an evaluation task. We conducted a large-scale manual evaluation of 87 machine translation systems and 22 system combination entries. We used the ranking ..."
Abstract
- Add to MetaCart
j schroeder ed ac uk This paper presents the results of the WMT09 shared tasks, which included a translation task, a system combination task, and an evaluation task. We conducted a large-scale manual evaluation of 87 machine translation systems and 22 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality, for more than 20 metrics. We present a new evaluation technique whereby system output is edited and judged for correctness.
HYPOTHESIS RANKING AND TWO-PASS APPROACHES FOR MACHINE TRANSLATION SYSTEM COMBINATION ∗
"... Given a number of machine translations of a source segment, the goal of system combination is to produce a new translation that has better quality than all of them. This paper describes a number of improvements that were recently added to the JHU system combination scheme: (i) A hypothesis ranking t ..."
Abstract
- Add to MetaCart
Given a number of machine translations of a source segment, the goal of system combination is to produce a new translation that has better quality than all of them. This paper describes a number of improvements that were recently added to the JHU system combination scheme: (i) A hypothesis ranking technique which orders the system outputs, on a per-segment basis, according to predicted translation quality, thus improving a subsequent incremental combination step. (ii) A two-pass combination procedure, which first produces several combination outputs with the given translations, and then performs one more combination step with these new outputs. Results from the NIST MT09 informal system combination evaluation on Arabic-to-English and Urdu-to-English1 show that both approaches offer significant BLEU and TER gains over a baseline JHU combination scheme.
Combining Unsupervised and Supervised Alignments for MT: An Empirical Study
"... Word alignment plays a central role in statistical MT (SMT) since almost all SMT systems extract translation rules from word aligned parallel training data. While most SMT systems use unsupervised algorithms (e.g. GIZA++) for training word alignment, supervised methods, which exploit a small amount ..."
Abstract
- Add to MetaCart
Word alignment plays a central role in statistical MT (SMT) since almost all SMT systems extract translation rules from word aligned parallel training data. While most SMT systems use unsupervised algorithms (e.g. GIZA++) for training word alignment, supervised methods, which exploit a small amount of human-aligned data, have become increasingly popular recently. This work empirically studies the performance of these two classes of alignment algorithms and explores strategies to combine them to improve overall system performance. We used two unsupervised aligners, GIZA++ and HMM, and one supervised aligner, ITG, in this study. To avoid language and genre specific conclusions, we ran experiments on test sets consisting of two language pairs (Chinese-to-English and Arabicto-English) and two genres (newswire and weblog). Results show that the two classes of algorithms achieve the same level of MT performance. Modest improvements were achieved by taking the union of the translation grammars extracted from different alignments. Significant improvements (around 1.0 in BLEU) were achieved by combining outputs of different systems trained with different alignments. The improvements are consistent across languages and genres. 1
in TERp: Stem Matches, Synonym Matches and Phrase Substitutions (Paraphrases).
"... TER-Plus (TERp) is an extended TER evaluation metric incorporating morphology, synonymy and paraphrases. There are three new edit operations ..."
Abstract
- Add to MetaCart
TER-Plus (TERp) is an extended TER evaluation metric incorporating morphology, synonymy and paraphrases. There are three new edit operations
Expected BLEU Training for Graphs: BBN System Description for WMT11 System Combination Task
"... BBN submitted system combination outputs for Czech-English, German-English, Spanish-English, and French-English language pairs. All combinations were based on confusion network decoding. The confusion networks were built using incremental hypothesis alignment algorithm with flexible matching. A nove ..."
Abstract
- Add to MetaCart
BBN submitted system combination outputs for Czech-English, German-English, Spanish-English, and French-English language pairs. All combinations were based on confusion network decoding. The confusion networks were built using incremental hypothesis alignment algorithm with flexible matching. A novel bi-gram count feature, which can penalize bi-grams not present in the input hypotheses corresponding to a source sentence, was introduced in addition to the usual decoder features. The system combination weights were tuned using a graph based expected BLEU as the objective function while incrementally expanding the networks to bi-gram and 5-gram contexts. The expected BLEU tuning described in this paper naturally generalizes to hypergraphs and can be used to optimize thousands of weights. The combination gained about 0.5-4.0 BLEU points over the best individual systems on the official WMT11 language pairs. A 39 system multisource combination achieved an 11.1 BLEU point gain. 1
MANY improvements for WMT’11
"... This paper describes the development operated into MANY for the 2011 WMT system combination evaluation campaign. Hypotheses from French/English and English/French MT systems were combined with a new version of MANY, an open source system combination software based on confusion networks decoding curr ..."
Abstract
- Add to MetaCart
This paper describes the development operated into MANY for the 2011 WMT system combination evaluation campaign. Hypotheses from French/English and English/French MT systems were combined with a new version of MANY, an open source system combination software based on confusion networks decoding currently developed at LIUM. MANY has been updated in order to optimize decoder parameters with MERT, which proves to find better weights. The system combination yielded significant improvements in BLEU score when applied on system combination data from two languages.

