Results 11 - 20
of
39
Combining Multi-Domain Statistical Machine Translation Models using Automatic Classifiers
"... This paper presents a set of experiments on Domain Adaptation of Statistical Machine Translation systems. The experiments focus on Chinese-English and two domain-specific corpora. The paper presents a novel approach for combining multiple domain-trained translation models to achieve improved transla ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper presents a set of experiments on Domain Adaptation of Statistical Machine Translation systems. The experiments focus on Chinese-English and two domain-specific corpora. The paper presents a novel approach for combining multiple domain-trained translation models to achieve improved translation quality for both domain-specific as well as combined sets of sentences. We train a statistical classifier to classify sentences according to the appropriate domain and utilize the corresponding domain-specific MT models to translate them. Experimental results show that the method achieves a statistically sig nificant absolute improvement of 1.58 BLEU (2.86% relative improvement) score over a translation model trained on combined data, and considerable improvements over a model using multiple decoding paths of the Moses decoder, for the combined domain test set. Furthermore, even for domain-specific test sets, our approach works almost as well as dedicated domain-specific models and perfect classification.
Stream-based Translation Models for Statistical Machine Translation
"... Typical statistical machine translation systems are trained with static parallel corpora. Here we account for scenarios with a continuous incoming stream of parallel training data. Such scenarios include daily governmental proceedings, sustained output from translation agencies, or crowd-sourced tra ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Typical statistical machine translation systems are trained with static parallel corpora. Here we account for scenarios with a continuous incoming stream of parallel training data. Such scenarios include daily governmental proceedings, sustained output from translation agencies, or crowd-sourced translations. We show incorporating recent sentence pairs from the stream improves performance compared with a static baseline. Since frequent batch retraining is computationally demanding we introduce a fast incremental alternative using an online version of the EM algorithm. To bound our memory requirements we use a novel data-structure and associated training regime. When compared to frequent batch retraining, our online time and space-bounded model achieves the same performance with significantly less computational overhead. 1
Translation Model Adaptation by Resampling
"... The translation model of statistical machine translation systems is trained on parallel data coming from various sources and domains. These corpora are usually concatenated, word alignments are calculated and phrases are extracted. This means that the corpora are not weighted according to their impo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The translation model of statistical machine translation systems is trained on parallel data coming from various sources and domains. These corpora are usually concatenated, word alignments are calculated and phrases are extracted. This means that the corpora are not weighted according to their importance to the domain of the translation task. This is in contrast to the training of the language model for which well known techniques are used to weight the various sources of texts. On a smaller granularity, the automatic calculated word alignments differ in quality. This is usually not considered when extracting phrases either. In this paper we propose a method to automatically weight the different corpora and alignments. This is achieved with a resampling technique. We report experimental results for a small (IWSLT) and large (NIST) Arabic/English translation tasks. In both cases, significant improvements in the BLEU score were observed. 1
The University of Edinburgh System Description for IWSLT 2007
"... We present the University of Edinburgh’s submission for the IWSLT 2007 shared task. Our efforts focused on adapting our statistical machine translation system to the open data conditions for the Italian-English task of the evaluation campaign. We examine the challenges of building a system with a li ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present the University of Edinburgh’s submission for the IWSLT 2007 shared task. Our efforts focused on adapting our statistical machine translation system to the open data conditions for the Italian-English task of the evaluation campaign. We examine the challenges of building a system with a limited set of in-domain development data (SITAL), a small training corpus in a related but distinct domain (BTEC), and a large out of domain corpus (Europarl). We concentrated on the corrected text track, and present additional results of our experiments using the open-source Moses MT system with speech input. 1.
Automatic Translation of Court Judgments
"... This document presents an experiment in the automatic translation of Canadian Court judgments from English to French and from French to English. We show that although the language used in this type of legal text is complex and specialized, an SMT system can produce intelligible and useful translatio ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This document presents an experiment in the automatic translation of Canadian Court judgments from English to French and from French to English. We show that although the language used in this type of legal text is complex and specialized, an SMT system can produce intelligible and useful translations, provided that the system can be trained on a vast amount of legal text. We also describe the results of a human evaluation of the output of the system. 1 Context of the work NLP Technologies 1 is an innovative enterprise devoted to the use of advanced information technologies in the judicial domain. Its main focus is the DecisionExpress ™ automatic summarization technology of legal information (Farzindar et al., 2004, Chieze et al. 2008). During the last year, a feasibility study was performed in collaboration with researchers from the RALI 2 at Université de Montréal to determine to what extent judgments from the Canadian Federal Courts could be automatically translated. As it happens, about 50 new judgments are produced weekly; 80 % of which are originally written in English, and 20%, in French. By law, the Federal Courts have to provide a trans-
Domain Adaptation for Machine Translation by Mining Unseen Words
"... We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrasebased translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs. 1
Lightly-Supervised Training for Hierarchical Phrase-Based Machine Translation
"... In this paper we apply lightly-supervised training to a hierarchical phrase-based statistical machine translation system. We employ bitexts that have been built by automatically translating large amounts of monolingual data as additional parallel training corpora. We explore different ways of using ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper we apply lightly-supervised training to a hierarchical phrase-based statistical machine translation system. We employ bitexts that have been built by automatically translating large amounts of monolingual data as additional parallel training corpora. We explore different ways of using this additional data to improve our system. Our results show that integrating a second translation model with only non-hierarchical phrases extracted from the automatically generated bitexts is a reasonable approach. The translation performance matches the result we achieve with a joint extraction on all training bitexts while the system is kept smaller due to a considerably lower overall number of phrases. 1
Adaptation of Statistical Machine Translation Model for Cross-Lingual Information Retrieval in a Service Context
"... This work proposes to adapt an existing general SMT model for the task of translating queries that are subsequently going to be used to retrieve information from a target language collection. In the scenario that we focus on access to the document collection itself is not available and changes to th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This work proposes to adapt an existing general SMT model for the task of translating queries that are subsequently going to be used to retrieve information from a target language collection. In the scenario that we focus on access to the document collection itself is not available and changes to the IR model are not possible. We propose two ways to achieve the adaptation effect and both of them are aimed at tuning parameter weights on a set of parallel queries. The first approach is via a standard tuning procedure optimizing for BLEU score and the second one is via a reranking approach optimizing for MAP score. We also extend the second approach by using syntax-based features. Our experiments show improvements of 1-2.5 in terms of MAP score over the retrieval with the non-adapted translation. We show that these improvements are due both to the integration of the adaptation and syntax-features for the query translation task. 1
Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation
"... We investigate the problem of domain adaptation for parallel data in Statistical Machine Translation (SMT). While techniques for domain adaptation of monolingual data can be borrowed for parallel data, we explore conceptual differences between translation model and language model domain adaptation a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We investigate the problem of domain adaptation for parallel data in Statistical Machine Translation (SMT). While techniques for domain adaptation of monolingual data can be borrowed for parallel data, we explore conceptual differences between translation model and language model domain adaptation and their effect on performance, such as the fact that translation models typically consist of several features that have different characteristics and can be optimized separately. We also explore adapting multiple (4–10) data sets with no a priori distinction between in-domain and out-of-domain data except for an in-domain development set. 1
Experiments with Swedish-English Statistical Machine Translation
"... We have conducted initial experiments with statistical machine translation between English and Swedish based on the Moses toolkit and the Europarl corpus. The main aim was to decrease processing times without harming translation quality by changing the settings in Moses and for the parameter tuning. ..."
Abstract
- Add to MetaCart
We have conducted initial experiments with statistical machine translation between English and Swedish based on the Moses toolkit and the Europarl corpus. The main aim was to decrease processing times without harming translation quality by changing the settings in Moses and for the parameter tuning. The experiments show that translation and tuning times can be cut around 10 times without harming translation quality, and in some cases the quality is even increased.

