Results 1 - 10
of
12
Clause restructuring for statistical machine translation
- In ACL
, 2005
"... We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. The reordering approach is applied as a pre-processing step in both the training and decoding phases of a phrase-based statistical MT system. We describe experiments on translation from German to English, showing an improvement from 25.2 % Bleu score for a baseline system to 26.8 % Bleu score for the system with reordering, a statistically significant improvement.
Word-Sense Disambiguation for Machine Translation
- In EMNLP
, 2005
"... In word sense disambiguation, a system attempts to determine the sense of a word from contextual features. Major barriers to building a high-performing word sense disambiguation system include the difficulty of labeling data for this task and of predicting fine-grained sense distinctions. These issu ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
In word sense disambiguation, a system attempts to determine the sense of a word from contextual features. Major barriers to building a high-performing word sense disambiguation system include the difficulty of labeling data for this task and of predicting fine-grained sense distinctions. These issues stem partly from the fact that the task is being treated in isolation from possible uses of automatically disambiguated data. In this paper, we consider the related task of word translation, where we wish to determine the correct translation of a word from context. We can use parallel language corpora as a large supply of partially labeled data for this task. We present algorithms for solving the word translation problem and demonstrate a significant improvement over a baseline system. We then show that the word-translation system can be used to improve performance on a simplified machinetranslation task and can effectively and accurately prune the set of candidate translations for a word. 1
VRML 97: The Virtual Reality Modeling Language, iso/iec 14772:1997
- In: Proceedings of the ACL04 Workshop on Multiword Expressions: Integrating Processing
, 2004
"... We present a method for compositionally translating noun-noun (NN) compounds, using a word-level bilingual dictionary and syntactic templates for candidate generation, and corpus and dictionary statistics for selection. We propose a support vector learning-based method employing target language corp ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present a method for compositionally translating noun-noun (NN) compounds, using a word-level bilingual dictionary and syntactic templates for candidate generation, and corpus and dictionary statistics for selection. We propose a support vector learning-based method employing target language corpus and bilingual dictionary data, and evaluate it over a English Japanese machine translation task. We show the proposed method to be superior to previous methods and also robust over low-frequency NN compounds.
Translation Selection for Japanese-English Noun-Noun Compounds
, 2003
"... We present a method for compositionally translating Japanese NN compounds into English, using a wordlevel transfer dictionary and target language monolingual corpus. The method interpolates over fullyspecified and partial translation data, based on corpus evidence. In evaluation, we demonstrate th ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present a method for compositionally translating Japanese NN compounds into English, using a wordlevel transfer dictionary and target language monolingual corpus. The method interpolates over fullyspecified and partial translation data, based on corpus evidence. In evaluation, we demonstrate that interpolation over the two data types is superior to using either one, and show that our method performs at an F-score of 0.68 over translation-aligned inputs and 0.66 over a random sample of 500 NN compounds.
Localization of Difficult-to-Translate Phrases
"... This paper studies the impact that difficult-totranslate source-language phrases might have on the machine translation process. We formulate the notion of difficulty as a measurable quantity; we show that a classifier can be trained to predict whether a phrase might be difficult to translate; and we ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper studies the impact that difficult-totranslate source-language phrases might have on the machine translation process. We formulate the notion of difficulty as a measurable quantity; we show that a classifier can be trained to predict whether a phrase might be difficult to translate; and we develop a framework that makes use of the classifier and external resources (such as human translators) to improve the overall translation quality. Through experimental work, we verify that by isolating difficult-to-translate phrases and processing them as special cases, their negative impact on the translation of the rest of the sentences can be reduced. 1
Improved Statistical Machine Translation Using Monolingual Paraphrases
"... Abstract. We propose a novel monolingual sentence paraphrasing method for augmenting the training data for statistical machine translation systems “for free ” – by creating it from data that is already available rather than having to create more aligned data. Starting with a syntactic tree, we recu ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We propose a novel monolingual sentence paraphrasing method for augmenting the training data for statistical machine translation systems “for free ” – by creating it from data that is already available rather than having to create more aligned data. Starting with a syntactic tree, we recursively generate new sentence variants where noun compounds are paraphrased using suitable prepositions, and vice-versa – preposition-containing noun phrases are turned into noun compounds. The evaluation shows an improvement equivalent to 33%-50 % of that of doubling the amount of training data. 1
Localization of Difficult-to-Translate Phrases
"... This paper studies the impact that difficult-totranslate source-language phrases might have on the machine translation process. We formulate the notion of difficulty as a measurable quantity; we show that a classifier can be trained to predict whether a phrase might be difficult to translate; and we ..."
Abstract
- Add to MetaCart
This paper studies the impact that difficult-totranslate source-language phrases might have on the machine translation process. We formulate the notion of difficulty as a measurable quantity; we show that a classifier can be trained to predict whether a phrase might be difficult to translate; and we develop a framework that makes use of the classifier and external resources (such as human translators) to improve the overall translation quality. Through experimental work, we verify that by isolating difficult-to-translate phrases and processing them as special cases, their negative impact on the translation of the rest of the sentences can be reduced. 1
Integrating Output������� � from Specialized � � ����� � Modules in Machine Translation������ � �����������
, 2009
"... Abstract � � ������� � � ������� � ������ � �� � ������������ � ������ � ����������� � � � ������ � � � ������ � �� ����In�� � many ��������� � cases in�� � SMT �� � �������� � we want ��������� � to allow �������� � specialized � � modules ������� � ���� � to propose �� � �� � translation �� � ��� ..."
Abstract
- Add to MetaCart
Abstract � � ������� � � ������� � ������ � �� � ������������ � ������ � ����������� � � � ������ � � � ������ � �� ����In�� � many ��������� � cases in�� � SMT �� � �������� � we want ��������� � to allow �������� � specialized � � modules ������� � ���� � to propose �� � �� � translation �� � ��������� � fragments �������to ������������� � the decoder n���� � and allow ������� � them���� � to compete ����������� � with ���� � translations �� � ������������ � contained ���ink���� � the phrase ������� table. ���� � ������������ � Transliteration ������� � is one �� � module ���������� � that �������� � may produce ��������� � such �� � specialized ���������output. � � ��� � In � � �������� � this paper, �� as ���������� � an example, ������������� � we build a��� � specialized �� � ������ � Urdu � � ��� � transliteration ��� � � � ���� � ����� � module ����and � ������� � integrate ������ � its output ������� into � � ������ � an Urdu–English MT system. The module marks-up the test text using an XML format, and the decoder allows alternate translations (transliterations) to compete. 1.

