Results 1 -
9 of
9
Experiments in domain adaptation for statistical machine translation
- Prague, Czech Republic. Association for Computational Linguistics
, 2007
"... The special challenge of the WMT 2007 shared task was domain adaptation. We took this opportunity to experiment with various ways of adapting a statistical machine translation systems to a special domain (here: news commentary), when most of the training data is from a different domain (here: Europe ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
The special challenge of the WMT 2007 shared task was domain adaptation. We took this opportunity to experiment with various ways of adapting a statistical machine translation systems to a special domain (here: news commentary), when most of the training data is from a different domain (here: European Parliament speeches). This paper also gives a description of the submission of the University of Edinburgh to the shared task. 1 Our framework: the Moses MT system The open source Moses (Koehn et al., 2007) MT system was originally developed at the University
Enriching Morphologically Poor Languages for Statistical Machine Translation
, 2008
"... We address the problem of translating from morphologically poor to morphologically rich languages by adding per-word linguistic information to the source language. We use the syntax of the source sentence to extract information for noun cases and verb persons and annotate the corresponding words acc ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We address the problem of translating from morphologically poor to morphologically rich languages by adding per-word linguistic information to the source language. We use the syntax of the source sentence to extract information for noun cases and verb persons and annotate the corresponding words accordingly. In experiments, we show improved performance for translating from English into Greek and Czech. For English–Greek, we reduce the error on the verb conjugation from 19 % to 5.4 % and noun case agreement from 9 % to 6%. 1
Quadratic-Time Dependency Parsing for Machine Translation
"... Efficiency is a prime concern in syntactic MT decoding, yet significant developments in statistical parsing with respect to asymptotic efficiency haven’t yet been explored in MT. Recently, McDonald et al. (2005b) formalized dependency parsing as a maximum spanning tree (MST) problem, which can be so ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Efficiency is a prime concern in syntactic MT decoding, yet significant developments in statistical parsing with respect to asymptotic efficiency haven’t yet been explored in MT. Recently, McDonald et al. (2005b) formalized dependency parsing as a maximum spanning tree (MST) problem, which can be solved in quadratic time relative to the length of the sentence. They show that MST parsing is almost as accurate as cubic-time dependency parsing in the case of English, and that it is more accurate with free word order languages. This paper applies MST parsing to MT, and describes how it can be integrated into a phrase-based decoder to compute dependency language model scores. Our results show that augmenting a state-ofthe-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets. 1
CCG Contextual Labels in Hierarchical Phrase-Based SMT
"... In this paper, we present a method to employ target-side syntactic contextual information in a Hierarchical Phrase-Based system. Our method uses Combinatory Categorial Grammar (CCG) to annotate training data with labels that represent the left and right syntactic context of target-side phrases. Thes ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper, we present a method to employ target-side syntactic contextual information in a Hierarchical Phrase-Based system. Our method uses Combinatory Categorial Grammar (CCG) to annotate training data with labels that represent the left and right syntactic context of target-side phrases. These labels are then used to assign labels to nonterminals in hierarchical rules. CCG-based contextual labels help to produce more grammatical translations by forcing phrases which replace nonterminals during translations to comply with the contextual constraints imposed by the labels. We present experiments which examine the performance of CCG contextual labels on Chinese–English and Arabic– English translation in the news and speech expressions domains using different data sizes and CCG-labeling settings. Our experiments show that our CCG contextual labels-based system achieved a 2.42 % relative BLEU improvement over a Phrase-Based baseline on Arabic–English translation and a 1 % relative BLEU improvement over a Hierarchical Phrase-Based system baseline on Chinese–English translation.
The University of Edinburgh System Description for IWSLT 2007
"... We present the University of Edinburgh’s submission for the IWSLT 2007 shared task. Our efforts focused on adapting our statistical machine translation system to the open data conditions for the Italian-English task of the evaluation campaign. We examine the challenges of building a system with a li ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present the University of Edinburgh’s submission for the IWSLT 2007 shared task. Our efforts focused on adapting our statistical machine translation system to the open data conditions for the Italian-English task of the evaluation campaign. We examine the challenges of building a system with a limited set of in-domain development data (SITAL), a small training corpus in a related but distinct domain (BTEC), and a large out of domain corpus (Europarl). We concentrated on the corrected text track, and present additional results of our experiments using the open-source Moses MT system with speech input. 1.
Addressing SMT Data Sparseness when Translating into Morphologically-Rich Languages
"... Abstract. The phrase-based translation approach has overcome several drawbacks of the word-based translation methods and proved to significantly improve the quality of translated output. However, they show less improvement on translating between languages with very different syntax and morphology, e ..."
Abstract
- Add to MetaCart
Abstract. The phrase-based translation approach has overcome several drawbacks of the word-based translation methods and proved to significantly improve the quality of translated output. However, they show less improvement on translating between languages with very different syntax and morphology, especially when the translation direction is from a language with limited word order and morphological variations to a highly inflected language. We describe an experiment that uses morpho-syntactic descriptions to translate and generate morphological information in factored machine translation. We show that from English to a morphologically rich language this setting has better performance than the baseline phrase-based system, when only a small parallel corpus is available. Also, we show that it scales well to a large parallel corpus when additional target monolingual corpus is available.
Statistical Alignment Models for . . .
, 2007
"... The ever-increasing amount of parallel data opens a rich resource to multilingual natural language processing, enabling models to work on various translational aspects like detailed human annotations, syntax and semantics. With efficient statistical models, many cross-language applications have seen ..."
Abstract
- Add to MetaCart
The ever-increasing amount of parallel data opens a rich resource to multilingual natural language processing, enabling models to work on various translational aspects like detailed human annotations, syntax and semantics. With efficient statistical models, many cross-language applications have seen significant progresses in recent years, such as statistical machine trans-lation, speech-to-speech translation, cross-lingual information retrieval and bilingual lexicog-raphy. However, the current state-of-the-art statistical translation models rely heavily on the word-level mixture models — a bottleneck, which fails to represent the rich varieties and depen-dencies in translations. In contrast to word-based translations, phrase-based models are more robust in capturing various translation phenomena than the word-level (e.g., local word reordering), and less susceptive to the errors from preprocessing such as word segmentations and tok-enizations. Leveraging phrase level knowledge in translation models is challenging yet reward-ing: it also brings significant improvements on translation qualities. Above the phrase-level are
Forest-guided Supertagger Training
"... Supertagging is an important technique for deep syntactic analysis. A supertagger is usually trained independently of the parser using a sequence labeling method. This presents an inconsistent training objective between the supertagger and the parser. In this paper, we propose a forest-guided supert ..."
Abstract
- Add to MetaCart
Supertagging is an important technique for deep syntactic analysis. A supertagger is usually trained independently of the parser using a sequence labeling method. This presents an inconsistent training objective between the supertagger and the parser. In this paper, we propose a forest-guided supertagger training method to alleviate this problem by incorporating global grammar constraints into the supertagging process using a CFGfilter. It also provides an approach to make the supertagger and the parser more tightly integrated. The experiment shows that using the forest-guided trained supertagger, the parser got an absolute 0.68% improvement from baseline in F-score for predicate-argument relation recognition accuracy and achieved a competitive result of 89.31 % with a faster parsing speed, compared to a state-of-the-art HPSG parser. 1

