## Smooth Bilingual N-gram Translation

### Cached

### Download Links

Citations: | 2 - 1 self |

### BibTeX

@MISC{Schwenk_smoothbilingual,

author = {Holger Schwenk and Marta R. Costa-jussà and José A. R. Fonollosa},

title = {Smooth Bilingual N-gram Translation},

year = {}

}

### OpenURL

### Abstract

We address the problem of smoothing translation probabilities in a bilingual N-grambased statistical machine translation system. It is proposed to project the bilingual tuples onto a continuous space and to estimate the translation probabilities in this representation. A neural network is used to perform the projection and the probability estimation. Smoothing probabilities is most important for tasks with a limited amount of training material. We consider here the BTEC task of the 2006 IWSLT evaluation. Improvements in all official automatic measures are reported when translating from Italian to English. Using a continuous space model for the translation model and the target language model, an improvement of 1.5 BLEU on the test data is observed. 1

### Citations

1087 | Moses: Open source toolkit for statistical machine translation
- Koehn, Hoang, et al.
- 2007
(Show Context)
Citation Context ...In particular, the problem of generalization to new translations seems to be promising to us. This could be addressed by the so-called factored phrase-based model as implemented in the Moses decoder (=-=Koehn et al., 2007-=-). In this approach words are decomposed into several factors. These factors are trans437 lated and a target phrase is generated. This model could be complemented by a factored continuous tuple N-gram... |

975 | An empirical study of smoothing techniques for language modeling
- Chen, Goodman
- 1998
(Show Context)
Citation Context ...putational Natural Language Learning, pp. 430–438, Prague, June 2007. c○2007 Association for Computational Linguisticssarea of language modeling. A systematic comparison can be for instance found in (=-=Chen and Goodman, 1999-=-). Language models and phrase tables have in common that the probabilities of rare events may be overestimated. However, in language modeling probability mass must be redistributed in order to account... |

909 | SRILM - An Extensible Language Modeling Toolkit
- Stolcke
- 2002
(Show Context)
Citation Context ... sentence: <s> how long#cuánto does#NULL last#dura the#el flight#vuelo </s> The reference bilingual trigram back-off translation model was trained on these bilingual tuples using the SRI LM toolkit (=-=Stolcke, 2002-=-). Different smoothing techniques were tried, and best results were obtained using Good-Turing discounting. The neural network approach was trained on exactly the same data. A context of two tuples wa... |

443 | Discriminative training and maximum entropy models for statistical machine translation
- Och, Ney
(Show Context)
Citation Context ... ∗ = arg max p(e|f) = arg max e {exp(� i λihi(e,f))} (1) The feature functions hi are the system models and the λi weights are typically optimized to maximize a scoring function on a development set (=-=Och and Ney, 2002-=-). The phrase translation probabilities P(˜e| ˜ f) and P( ˜ f|˜e) are usually obtained using relative frequency estimates. Statistical learning theory, however, tells us that relative frequency estima... |

327 | A comparison of alignment models for statistical machine translation
- Och, Ney
- 2000
(Show Context)
Citation Context ...During the last few years, the use of context in SMT systems has provided great improvements in translation. SMT has evolved from the original word-based approach to phrase-based translation systems (=-=Och et al., 1999-=-; Koehn et al., 2003). A phrase is defined as a group of source words ˜ f that should be translated together into a group of target words ˜e. The translation model in phrase-based systems includes the... |

139 | A smorgasbord of features for statistical machine translation
- Och, Gildea, et al.
- 2004
(Show Context)
Citation Context ...e methods developed for language modeling can be used. Glass-box methods decompose P(˜e| ˜ f) into a set of lexical distributions P(e| ˜ f). For instance, it was suggested to use IBM-1 probabilities (=-=Och et al., 2004-=-), or other lexical translation probabilities (Koehn et al., 2003; Zens and Ney, 2004). Some form of glass-box smoothing is now used in all state-of-the-art statistical machine translation systems. An... |

91 |
The Mathematics of Statistical Machine Translation
- Brown, Pietra, et al.
- 1993
(Show Context)
Citation Context ...(f|e)Pr(e) where Pr(f|e) is the translation model and Pr(e) is the target language model. This approach is usually referred to as the noisy source-channel approach in statistical machine translation (=-=Brown et al., 1993-=-). UPC - TALP Barcelona 08034, Spain {mruiz,adrian}@gps.tsc.upc.edu During the last few years, the use of context in SMT systems has provided great improvements in translation. SMT has evolved from th... |

32 | Phrasetable smoothing for statistical machine translation
- Carpuat, Foster
- 2006
(Show Context)
Citation Context ...ion and the best matching phrase pair among the existing ones. We are only aware of one work that performs a systematic comparison of smoothing techniques in phrase-based machine translation systems (=-=Foster et al., 2006-=-). Two types of phrase-table smoothing were compared: black-box and glass-box methods. Black-methods do not look inside phrases but instead treat them as atomic objects. By these means, all the method... |

29 | CONDOR, a new parallel, constrained extension of Powell’s UOBYQA algorithm: experimental results and comparison with the DFO algorithm - Berghen, Bersini - 2005 |

18 | Speech-tospeech translation based on finite-state transducers - Casacuberta, Llorenz, et al. - 2001 |

14 |
Continuous space language models,” Computer Speech and Language
- Schwenk
- 2007
(Show Context)
Citation Context ...ingful interpolations even when only a limited amount of training material is available. This approach was successfully applied to language modeling in large vocabulary continuous speech recognition (=-=Schwenk, 2007-=-) and to language modeling in phrase-based SMT systems (Schwenk et al., 2006). In this paper, we investigate whether this approach is useful to smooth the probabilities involved in the bilingual tuple... |

6 | Factored Neural Language Models
- Alexandrescu, Kirchhoff
- 2006
(Show Context)
Citation Context ...e is generated. This model could be complemented by a factored continuous tuple N-gram. Factored word language models were already successfully used in speech recognition (Bilmes and Kirchhoff, 2003; =-=Alexandrescu and Kirchhoff, 2006-=-) and an extension to machine translation seems to be promising. The described smoothing method was explicitly developed to tackle the data sparseness problem in tasks like the BTEC corpus. It is well... |

3 |
Statistical phrasedbased machine translation
- Koehn, Och, et al.
- 2003
(Show Context)
Citation Context ...w years, the use of context in SMT systems has provided great improvements in translation. SMT has evolved from the original word-based approach to phrase-based translation systems (Och et al., 1999; =-=Koehn et al., 2003-=-). A phrase is defined as a group of source words ˜ f that should be translated together into a group of target words ˜e. The translation model in phrase-based systems includes the phrase translation ... |

2 |
Toward a borad-coverage bilingual corpus for speech translation of travel conversations in the real world
- Takezawa, Sumita, et al.
(Show Context)
Citation Context ...C) as used in the 2006 evaluations of the international workshop on spoken language translation (IWSLT). This corpus consists of typical sentences from phrase books for tourists in several languages (=-=Takezawa et al., 2002-=-). We report results on the supplied development corpus of 489 sentences and the official test set of the IWSLT’06 evaluation. The main measure is the BLEU score, using seven reference translations. T... |

1 |
Factored language models and generalized backoff
- Bilmes, Kirchhoff
- 2003
(Show Context)
Citation Context ...437 lated and a target phrase is generated. This model could be complemented by a factored continuous tuple N-gram. Factored word language models were already successfully used in speech recognition (=-=Bilmes and Kirchhoff, 2003-=-; Alexandrescu and Kirchhoff, 2006) and an extension to machine translation seems to be promising. The described smoothing method was explicitly developed to tackle the data sparseness problem in task... |

1 | 2006. Overview of the IWSLT 2006 campaign - Paul |