Results 1 -
9 of
9
A Systematic Comparison of Various Statistical Alignment Models
- Computational Linguistics
, 2003
"... this article the problem of finding the word alignment of a bilingual sentence-aligned corpus by using language-independent statistical methods. There is a vast literature on this topic, and many different systems have been suggested to solve this problem. Our work follows and extends the methods in ..."
Abstract
-
Cited by 805 (22 self)
- Add to MetaCart
this article the problem of finding the word alignment of a bilingual sentence-aligned corpus by using language-independent statistical methods. There is a vast literature on this topic, and many different systems have been suggested to solve this problem. Our work follows and extends the methods introduced by Brown, Della Pietra, Della Pietra, and Mercer (1993) by using refined statistical models for the translation process. The basic idea of this approach is to develop a model of the translation process with the word alignment as a hidden variable of this process, to apply statistical estimation theory to compute the "optimal" model parameters, and to perform alignment search to compute the best word alignment
The CMU Statistical Machine Translation System
- IN PROCEEDINGS OF MT SUMMIT IX
, 2003
"... In this paper we describe the components of our statistical machine translation system. This system ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
In this paper we describe the components of our statistical machine translation system. This system
PESA: Phrase Pair Extraction as Sentence Splitting
- in Proceedings: the tenth Machine Translation
, 2005
"... Most statistical machine translation systems use phrase-to-phrase translations to capture local context information, leading to better lexical choice and more reliable local reordering. The quality of the phrase alignment is crucial to the quality of the resulting translations. Here, we propose a ne ..."
Abstract
-
Cited by 19 (10 self)
- Add to MetaCart
Most statistical machine translation systems use phrase-to-phrase translations to capture local context information, leading to better lexical choice and more reliable local reordering. The quality of the phrase alignment is crucial to the quality of the resulting translations. Here, we propose a new phrase alignment method, not based on the Viterbi path of word alignment models. Phrase alignment is viewed as a sentence splitting task. For a given spitting of the source sentence (source phrase, left segment, right segment) find a splitting for the target sentence, which optimizes the overall sentence alignment probability. Experiments on different translation tasks show that this phrase alignment method leads to highly competitive translation results. 1
Word reordering and a dynamic programming beam search algorithm for statistical machine translation
- Computational Linguistics
, 2003
"... In this article, we describe an efficient beam search algorithm for statistical machine translation based on dynamic programming (DP). The search algorithm uses the translation model presented in Brown et al. (1993). Starting from a DP-based solution to the traveling-salesman problem, we present a n ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
In this article, we describe an efficient beam search algorithm for statistical machine translation based on dynamic programming (DP). The search algorithm uses the translation model presented in Brown et al. (1993). Starting from a DP-based solution to the traveling-salesman problem, we present a novel technique to restrict the possible word reorderings between source and target language in order to achieve an efficient search algorithm. Word reordering restrictions especially useful for the translation direction German to English are presented. The restrictions are generalized, and a set of four parameters to control the word reordering is introduced, which then can easily be adopted to new translation directions. The beam search procedure has been successfully tested on the Verbmobil task (German to English, 8,000-word vocabulary) and on the Canadian Hansards task (French to English, 100,000-word vocabulary). For the medium-sized Verbmobil task, a sentence can be translated in a few seconds, only a small number of search errors occur, and there is no performance degradation as measured by the word error criterion used in this article. 1.
An ngram-based statistical machine translation decoder
- PROC. OF THE 9TH EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY, INTERSPEECH’05
, 2005
"... In this paper we describe MARIE, an Ngram-based statistical machine translation decoder. It is implemented using a beam search strategy, with distortion (or reordering) capabilities. The underlying translation model is based on an Ngram approach, extended to introduce reordering at the phrase level. ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
In this paper we describe MARIE, an Ngram-based statistical machine translation decoder. It is implemented using a beam search strategy, with distortion (or reordering) capabilities. The underlying translation model is based on an Ngram approach, extended to introduce reordering at the phrase level. The search graph structure is designed to perform very accurate comparisons, what allows for a high level of pruning, improving the decoder efficiency. We report several techniques for efficiently prune out the search space. The combinatory explosion of the search space derived from the search graph structure is reduced by limiting the number of reorderings a given translation is allowed to perform, and also the maximum distance a word (or a phrase) is allowed to be reordered. We finally report translation accuracy results on three different translation tasks.
A POS-Based Model for Long-Range Reorderings in SMT
- In Proc. of Forth ACL Workshop on Statistical Machine Translation
, 2009
"... In this paper we describe a new approach to model long-range word reorderings in statistical machine translation (SMT). Until now, most SMT approaches are only able to model local reorderings. But even the word order of related languages like German and English can be very different. In recent years ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
In this paper we describe a new approach to model long-range word reorderings in statistical machine translation (SMT). Until now, most SMT approaches are only able to model local reorderings. But even the word order of related languages like German and English can be very different. In recent years approaches that reorder the source sentence in a preprocessing step to better match target sentences according to POS(Part-of-Speech)-based rules have been applied successfully. We enhance this approach to model long-range reorderings by introducing discontinuous rules. We tested this new approach on a German-English translation task and could significantly improve the translation quality, by up to 0.8 BLEU points, compared to a system which already uses continuous POSbased rules to model short-range reorderings. 1
A Maximum Entropy/Minimum Divergence Translation Model
- IN ACL
, 2000
"... I present empirical comparisons between a linear combination of stan- dard statistical language and translation models and an equivalent Maximum Entropy/Minimum Divergence (MEMD) model, using several different methods for automatic feature selection. The MEMD model significantly outperforms the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
I present empirical comparisons between a linear combination of stan- dard statistical language and translation models and an equivalent Maximum Entropy/Minimum Divergence (MEMD) model, using several different methods for automatic feature selection. The MEMD model significantly outperforms the standard model in test corpus perplexity, even though it has far fewer parameters.
The CMU statistical machine translation system for IWSLT 2005
- IN MT SUMMIT IX
, 2003
"... In this paper we describe the CMU statistical machine translation system used in the IWSLT 2005 evaluation campaign. This system is based on phrase-to-phrase translations extracted from a bilingual corpus. We experimented with two different phrase extraction methods; PESA on-the-fly phrase extractio ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper we describe the CMU statistical machine translation system used in the IWSLT 2005 evaluation campaign. This system is based on phrase-to-phrase translations extracted from a bilingual corpus. We experimented with two different phrase extraction methods; PESA on-the-fly phrase extraction and alignment free extraction method. The translation model, language model and other features were combined in a log-linear model during decoding. We present our experiments on model adaptation for new data in a different domain, as well as combining different translation hypotheses to obtain better translations. We participated in the supplied data track for manual transcriptions in the translation directions: Arabic-English, Chinese-English, Japanese-English and Korean-English. For Chinese-English direction we also worked on ASR output of the supplied data, and with additional data in unrestricted and C-STAR tracks.
Statistical Approach With Factored Translation Models For Indian Languages
"... Factored translation models are an extension to phrase based statistical translation models which integrate additional annotation at word level. Here we present a study of statistical models and approaches to translate Hindi to English. Experiments were also conducted on alignment models using vario ..."
Abstract
- Add to MetaCart
Factored translation models are an extension to phrase based statistical translation models which integrate additional annotation at word level. Here we present a study of statistical models and approaches to translate Hindi to English. Experiments were also conducted on alignment models using various word groupings and using GIZA++ to predict their English translations and fertility. TAJ- A new word alignment model was developed for Hindi to English. It used a bootstrapping mechanism and a Expectation maximization algorithm along with POS tags to boost probability of words to create a bilingual dictionary. TAJ is compared with GIZA++ for its accuracy on a parallel corpus. 1.

