Results 1 - 10
of
124
Batch tuning strategies for statistical machine translation
- In HLTNAACL
, 2012
"... There has been a proliferation of recent work on SMT tuning algorithms capable of handling larger feature sets than the traditional MERT approach. We analyze a number of these algorithms in terms of their sentencelevel loss functions, which motivates several new approaches, including a Structured SV ..."
Abstract
-
Cited by 62 (10 self)
- Add to MetaCart
(Show Context)
There has been a proliferation of recent work on SMT tuning algorithms capable of handling larger feature sets than the traditional MERT approach. We analyze a number of these algorithms in terms of their sentencelevel loss functions, which motivates several new approaches, including a Structured SVM. We perform empirical comparisons of eight different tuning strategies, including MERT, in a variety of settings. Among other results, we find that a simple and efficient batch version of MIRA performs at least as well as training online, and consistently outperforms other options. 1
WIT3: Web inventory of transcribed and translated talks.
- In Proc. EAMT,
, 2012
"... Abstract We describe here a Web inventory named WIT 3 that offers access to a collection of transcribed and translated talks. The core of WIT 3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007, the TE ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
(Show Context)
Abstract We describe here a Web inventory named WIT 3 that offers access to a collection of transcribed and translated talks. The core of WIT 3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007, the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BY-NC-ND license, also represents a precious language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages. This effort repurposes the original content in a way which is more convenient for machine translation researchers.
Structured Ramp Loss Minimization for Machine Translation
"... This paper seeks to close the gap between training algorithms used in statistical machine translation and machine learning, specifically the framework of empirical risk minimization. We review well-known algorithms, arguing that they do not optimize the loss functions they are assumed to optimize wh ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
(Show Context)
This paper seeks to close the gap between training algorithms used in statistical machine translation and machine learning, specifically the framework of empirical risk minimization. We review well-known algorithms, arguing that they do not optimize the loss functions they are assumed to optimize when applied to machine translation. Instead, most have implicit connections to particular forms of ramp loss. We propose to minimize ramp loss directly and present a training algorithm that is easy to implement and that performs comparably to others. Most notably, our structured ramp loss minimization algorithm, RAMPION, is less sensitive to initialization and random seeds than standard approaches. 1
A class-based agreement model for generating accurately inflected translations
- In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL ’12
, 2012
"... When automatically translating from a weakly inflected source language like English to a target language with richer grammatical features such as gender and dual number, the output commonly contains morpho-syntactic agreement errors. To address this issue, we present a target-side, class-based agree ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
When automatically translating from a weakly inflected source language like English to a target language with richer grammatical features such as gender and dual number, the output commonly contains morpho-syntactic agreement errors. To address this issue, we present a target-side, class-based agreement model. Agreement is promoted by scoring a sequence of fine-grained morpho-syntactic classes that are predicted during decoding for each translation hypothesis. For English-to-Arabic translation, our model yields a +1.04 BLEU average improvement over a state-of-the-art baseline. The model does not require bitext or phrase table annotations and can be easily implemented as a feature in many phrase-based decoders. 1
Unsupervised Word Alignment with Arbitrary Features
"... We introduce a discriminatively trained, globally normalized, log-linear variant of the lexical translation models proposed by Brown et al. (1993). In our model, arbitrary, nonindependent features may be freely incorporated, thereby overcoming the inherent limitation of generative models, which requ ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
We introduce a discriminatively trained, globally normalized, log-linear variant of the lexical translation models proposed by Brown et al. (1993). In our model, arbitrary, nonindependent features may be freely incorporated, thereby overcoming the inherent limitation of generative models, which require that features be sensitive to the conditional independencies of the generative process. However, unlike previous work on discriminative modeling of word alignment (which also permits the use of arbitrary features), the parameters in our models are learned from unannotated parallel sentences, rather than from supervised word alignments. Using a variety of intrinsic and extrinsic measures, including translation performance, we show our model yields better alignments than generative baselines in a number of language pairs. 1
Mixing multiple translation models in statistical machine translation
- In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea
, 2012
"... Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain. We propose a novel approach, ensemble decoding, which combines a number of translation systems d ..."
Abstract
-
Cited by 15 (9 self)
- Add to MetaCart
(Show Context)
Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain. We propose a novel approach, ensemble decoding, which combines a number of translation systems dynamically at the decoding step. In this paper, we evaluate performance on a domain adaptation setting where we translate sentences from the medical domain. Our experimental results show that ensemble decoding outperforms various strong baselines including mixture models, the current state-of-the-art for domain adaptation in machine translation. 1
Down-stream effects of tree-to-dependency conversions
"... Dependency analysis relies on morphosyntactic evidence, as well as semantic evidence. In some cases, however, morphosyntactic evidence seems to be in conflict with semantic evidence. For this reason dependency grammar theories, annotation guidelines and tree-to-dependency conversion schemes often di ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Dependency analysis relies on morphosyntactic evidence, as well as semantic evidence. In some cases, however, morphosyntactic evidence seems to be in conflict with semantic evidence. For this reason dependency grammar theories, annotation guidelines and tree-to-dependency conversion schemes often differ in how they analyze various syntactic constructions. Most experiments for which constituent-based treebanks such as the Penn Treebank are converted into dependency treebanks rely blindly on one of four-five widely used tree-to-dependency conversion schemes. This paper evaluates the down-stream effect of choice of conversion scheme, showing that it has dramatic impact on end results. 1
Constructing parallel corpora for six indian languages via crowdsourcing
- In Proceedings of the Seventh Workshop on Statistical Machine Translation
, 2012
"... Recent work has established the efficacy of Amazon’s Mechanical Turk for constructing parallel corpora for machine translation re-search. We apply this to building a collec-tion of parallel corpora between English and six languages from the Indian subcontinent: Bengali, Hindi, Malayalam, Tamil, Telu ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
(Show Context)
Recent work has established the efficacy of Amazon’s Mechanical Turk for constructing parallel corpora for machine translation re-search. We apply this to building a collec-tion of parallel corpora between English and six languages from the Indian subcontinent: Bengali, Hindi, Malayalam, Tamil, Telugu, and Urdu. These languages are low-resource, under-studied, and exhibit linguistic phenom-ena that are difficult for machine translation. We conduct a variety of baseline experiments and analysis, and release the data to the com-munity. 1
Translating into Morphologically Rich Languages with Synthetic Phrases
"... Translation into morphologically rich languages is an important but recalcitrant problem in MT. We present a simple and effective approach that deals with the problem in two phases. First, a discriminative model is learned to predict inflections of target words from rich source-side annotations. The ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
(Show Context)
Translation into morphologically rich languages is an important but recalcitrant problem in MT. We present a simple and effective approach that deals with the problem in two phases. First, a discriminative model is learned to predict inflections of target words from rich source-side annotations. Then, this model is used to create additional sentencespecific word- and phrase-level translations that are added to a standard translation model as “synthetic ” phrases. Our approach relies on morphological analysis of the target language, but we show that an unsupervised Bayesian model of morphology can successfully be used in place of a supervised analyzer. We report significant improvements in translation quality when translating from English to Russian, Hebrew and Swahili. 1
Analysing the Effect of Out-of-Domain Data on SMT Systems
"... In statistical machine translation (SMT), it is known that performance declines when the training data is in a different domain from the test data. Nevertheless, it is frequently necessary to supplement scarce in-domain training data with out-of-domain data. In this paper, we first try to relate the ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
In statistical machine translation (SMT), it is known that performance declines when the training data is in a different domain from the test data. Nevertheless, it is frequently necessary to supplement scarce in-domain training data with out-of-domain data. In this paper, we first try to relate the effect of the outof-domain data on translation performance to measures of corpus similarity, then we separately analyse the effect of adding the outof-domain data at different parts of the training pipeline (alignment, phrase extraction, and phrase scoring). Through experiments in 2 domains and 8 language pairs it is shown that the out-of-domain data improves coverage and translation of rare words, but may degrade the translation quality for more common words. 1