Results 1 -
8 of
8
Reordered search and tuple unfolding for ngram-based SMT
- PROC. OF THE MT SUMMIT X
, 2005
"... In Statistical Machine Translation, the use of reordering for certain language pairs can produce a significant improvement on translation accuracy. However, the search problem is shown to be NP-hard when arbitrary reorderings are allowed. This paper addresses the question of reordering for an Ngram- ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
In Statistical Machine Translation, the use of reordering for certain language pairs can produce a significant improvement on translation accuracy. However, the search problem is shown to be NP-hard when arbitrary reorderings are allowed. This paper addresses the question of reordering for an Ngram-based SMT approach following two complementary strategies, namely reordered search and tuple unfolding. These strategies interact to improve translation quality in a Chinese to English task. On the one hand, we allow for an Ngrambased decoder (MARIE) to perform a reordered search over the source sentence, while combining a translation tuples Ngram model, a target language model, a word penalty and a word distance model. Interestingly, even though the translation units are learnt sequentially, its reordered search produces an improved translation. On the other hand, we allow for a modification of the translation units that unfolds the tuples, so that shorter units are learnt from a new parallel corpus, where the source sentences are reordered according to the target language. This tuple unfolding technique reduces data sparseness and, when combined with the reordered search, further boosts translation performance. Translation accuracy and efficency results are reported for the IWSLT 2004 Chinese to English task.
Improving statistical machine translation by classifying and generalizing inflected verb forms
- In Proceedings of 9th European Conference on Speech Communication and Technology
, 2005
"... This paper introduces a rule-based classification of single-word and compound verbs into a statistical machine translation approach. By substituting verb forms by the lemma of their head verb, the data sparseness problem caused by highly-inflected languages can be successfully addressed. On the othe ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper introduces a rule-based classification of single-word and compound verbs into a statistical machine translation approach. By substituting verb forms by the lemma of their head verb, the data sparseness problem caused by highly-inflected languages can be successfully addressed. On the other hand, the information of seen verb forms can be used to generate new translations for unseen verb forms. Translation results for an English to Spanish task are reported, producing a significant performance improvement. 1.
Ngram-based versus Phrasebased Statistical Machine Translation
- In Proceedings of the International Workshop on Spoken Language Technology (IWSLT’05
, 2005
"... This work summarizes a comparison between two approaches to Statistical Machine Translation (SMT), namely Ngram-based and Phrase-based SMT. In both approaches, the translation process is based on bilingual units related by word-to-word alignments (pairs of source and target words), while the main di ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This work summarizes a comparison between two approaches to Statistical Machine Translation (SMT), namely Ngram-based and Phrase-based SMT. In both approaches, the translation process is based on bilingual units related by word-to-word alignments (pairs of source and target words), while the main differences are based on the extraction process of these units and the statistical modeling of the translation context. The study has been carried out on two different translation tasks (in terms of translation difficulty and amount of available training data), and allowing for distortion (reordering) in the decoding process. Thus it extends a previous work were both approaches were compared under monotone conditions. We finally report comparative results in terms of translation accuracy, computation time and memory size. Results show how the ngram-based approach outperforms the phrase-based approach by achieving similar accuracy scores in less computational time and with less memory needs. 1.
Integration of postag-based source reordering into smt decoding by an extended search graph
- Proc. of the 7th Conf. of the Association for Machine Translation in the Americas
, 2006
"... This paper presents a reordering framework for statistical machine translation (SMT) where source-side reorderings are integrated into SMT decoding, allowing for a highly constrained reordered search graph. The monotone search is extended by means of a set of reordering patterns (linguistically moti ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents a reordering framework for statistical machine translation (SMT) where source-side reorderings are integrated into SMT decoding, allowing for a highly constrained reordered search graph. The monotone search is extended by means of a set of reordering patterns (linguistically motivated rewrite patterns). Patterns are automatically learnt in training from word-to-word alignments and source-side Part-Of-Speech (POS) tags. Traversing the extended search graph, the decoder evaluates every hypothesis making use of a group of widely used SMT models and helped by an additional Ngram language model of sourceside POS tags. Experiments are reported on the Euparl task (Spanish-to-English and English-to-Spanish). Results are presented regarding translation accuracy (using human and automatic evaluations) and computational efficiency, showing significant improvements in translation quality for both translation directions at a very low computational cost. 1
Grouping Multi-word Expressions According to Part-Of-Speech in Statistical Machine Translation
"... This paper studies a strategy for identifying and using multi-word expressions in Statistical Machine Translation. The performance of the proposed strategy for various types of multi-word expressions (like nouns or verbs) is evaluated in terms of alignment quality as well as translation accuracy. Ev ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper studies a strategy for identifying and using multi-word expressions in Statistical Machine Translation. The performance of the proposed strategy for various types of multi-word expressions (like nouns or verbs) is evaluated in terms of alignment quality as well as translation accuracy. Evaluations are performed by using real-life data, namely the European Parliament corpus. Results from translation tasks from English-to-Spanish and from Spanish-to-English are presented and discussed. 1
LANGUAGE MODELING FOR VERBATIM TRANSLATION TASK
"... In this paper we present the first results towards finding the better TC-STAR 1 2006 verbatim transcription system configuration by means of improving the quality of language model performance. There is a present lack of research devoted to special techniques of verbatim translation, therefore we ha ..."
Abstract
- Add to MetaCart
In this paper we present the first results towards finding the better TC-STAR 1 2006 verbatim transcription system configuration by means of improving the quality of language model performance. There is a present lack of research devoted to special techniques of verbatim translation, therefore we have made an attempt to improve translation accuracy by combining the Final Text Edition (FTE) system with supplementary verbatim corpus. Our work was focused on finding the best combination of the baseline (FTE) and verbatim language models for Spanish-English and English-Spanish language pairs. In order to improve the overall system performance standard n-gram based statistical machine translation (SMT) system was supplemented with a log linear combination of some additional feature functions and linguistically motivated word reordering technique. In the final part of the study we report the results of the baseline system translation accuracy in comparison with the FTE-verbatim interpolated language model systems for various proportions of the language models linear combination. 1.
Proceedings of the Workshop on Statistical Machine Translation, pages 154--157,
- In Proceedings on the Workshop on Statistical Machine Translation
, 2006
"... The joint probability model proposed by Marcu and Wong (2002) provides a strong probabilistic framework for phrase-based statistical machine translation (SMT). The model's usefulness is, however, limited by the computational complexity of estimating parameters at the phrase level. We present ..."
Abstract
- Add to MetaCart
The joint probability model proposed by Marcu and Wong (2002) provides a strong probabilistic framework for phrase-based statistical machine translation (SMT). The model's usefulness is, however, limited by the computational complexity of estimating parameters at the phrase level. We present the first model to use word alignments for constraining the space of phrasal alignments searched during Expectation Maximization (EM) training. Constraining the joint model improves performance, showing results that are very close to stateof -the-art phrase-based models. It also allows it to scale up to larger corpora and therefore be more widely applicable.
TALP Phrase-based . . .
- PROC. OF THE HLT/NAACL WORKSHOP ON STATISTICAL MACHINE TRANSLATION
, 2006
"... This paper reports translation results for the "Exploiting Parallel Texts for Statistical Machine Translation" (HLT-NAACL Workshop on Parallel Texts 2006). We have studied different techniques to improve the standard Phrase-Based translation system. Mainly we introduce two reordering approaches an ..."
Abstract
- Add to MetaCart
This paper reports translation results for the "Exploiting Parallel Texts for Statistical Machine Translation" (HLT-NAACL Workshop on Parallel Texts 2006). We have studied different techniques to improve the standard Phrase-Based translation system. Mainly we introduce two reordering approaches and add morphological information.

