Results 1 -
7 of
7
PESA: Phrase Pair Extraction as Sentence Splitting
- in Proceedings: the tenth Machine Translation
, 2005
"... Most statistical machine translation systems use phrase-to-phrase translations to capture local context information, leading to better lexical choice and more reliable local reordering. The quality of the phrase alignment is crucial to the quality of the resulting translations. Here, we propose a ne ..."
Abstract
-
Cited by 19 (10 self)
- Add to MetaCart
Most statistical machine translation systems use phrase-to-phrase translations to capture local context information, leading to better lexical choice and more reliable local reordering. The quality of the phrase alignment is crucial to the quality of the resulting translations. Here, we propose a new phrase alignment method, not based on the Viterbi path of word alignment models. Phrase alignment is viewed as a sentence splitting task. For a given spitting of the source sentence (source phrase, left segment, right segment) find a splitting for the target sentence, which optimizes the overall sentence alignment probability. Experiments on different translation tasks show that this phrase alignment method leads to highly competitive translation results. 1
Flexible Speech Translation Systems
- Special Issue in Speech Translation, IEEE Transactions of Speech and Audio Processing, Accepted for publication
, 2006
"... Speech translation research has made significant progress over the years with many high-visibility efforts showing that translation of spontaneously spoken speech from and to diverse languages is possible and applicable in a variety of domains. As language and domains continue to expand, practical c ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Speech translation research has made significant progress over the years with many high-visibility efforts showing that translation of spontaneously spoken speech from and to diverse languages is possible and applicable in a variety of domains. As language and domains continue to expand, practical concerns such as portability and reconfigurability of speech come into play: system maintenance becomes a key issue and data is never sufficient to cover the changing domains over varying languages. In this paper, we discuss strategies to overcome the limits of today's speech translation systems. In the first part, we describe our layered system architecture that allows for easy component integration, resource sharing across components, comparison of alternative approaches, and the migration toward hybrid desktop/PDA or stand-alone PDA systems. In the second part, we show how flexibility and reconfigurability is implemented by more radically relying on learning approaches and use our English--Thai two-way speech translation system as a concrete example.
INCORPORATING MONOLINGUAL CORPORA INTO BILINGUAL LATENT SEMANTIC ANALYSIS FOR CROSSLINGUAL LM ADAPTATION
"... The major limitation in bilingual latent semantic analysis (bLSA) is the requirement of parallel training corpora. Motivated by semi-supervised learning, we propose a clusterbased bLSA training approach to incorporate monolingual corpora. Treating each parallel document pair as centroids of the para ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The major limitation in bilingual latent semantic analysis (bLSA) is the requirement of parallel training corpora. Motivated by semi-supervised learning, we propose a clusterbased bLSA training approach to incorporate monolingual corpora. Treating each parallel document pair as centroids of the parallel document clusters, each monolingual document is associated to the closest centroid according to their topic similarity. The resulting parallel document clusters are used as constraints to enforce a one-to-one topic correspondence in variational EM. Slight performance improvement in crosslingual language model adaptation is observed compared to the baseline without monolingual corpora. Index Terms: monolingual corpora, bilingual LSA, crosslingual word trigger, crosslingual LM adaptation
The UKA/CMU statistical machine translation system for
- IWSLT 2006,” in Procedings of IWSLT, 2005
"... This paper describes the UKA/CMU statistical machine translation system used in the IWSLT 2006 evaluation campaign. The system is based on phrase-to-phrase translations extracted from a bilingual corpus. We compare two different phrase alignment techniques both based on word alignment probabilities. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper describes the UKA/CMU statistical machine translation system used in the IWSLT 2006 evaluation campaign. The system is based on phrase-to-phrase translations extracted from a bilingual corpus. We compare two different phrase alignment techniques both based on word alignment probabilities. The system was used for all language pairs and data conditions in the evaluation campaign translating both the ASR output (as 1best) and the correct recognition results. 1.
Augmenting Manual Dictionaries for Statistical Machine Translation Systems
- In 2003 Proceedings of LREC
, 2004
"... We show that the usefulness of manually created dictionaries can be enhanced for a statistical machine translation system when new translations are automatically added which are simple morphological transformations (plural forms, different verb inflections) of the original. Further improvement is po ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We show that the usefulness of manually created dictionaries can be enhanced for a statistical machine translation system when new translations are automatically added which are simple morphological transformations (plural forms, different verb inflections) of the original. Further improvement is possible when assigning probabilities to the lexicon entries. We describe a method to do this on the basis of an automatically trained statistical lexicon. Experimental results are given for Chinese to English translation tasks and show a significant improvement in translation quality.
OPEN DOMAIN SPEECH RECOGNITION TRANSLATION: LECTURES AND SPEECHES C. F ugen
- in Proc. IEEE Conf. on Acoustics, Speech, and Signal Processing – ICASSP
, 2006
"... For years speech translation has focused on the recognition and translation of discourses in limited domains, such as hotel reservations or scheduling tasks. Only recently research projects have been started to tackle the problem of open domain speech recognition and translation of complex tasks suc ..."
Abstract
- Add to MetaCart
For years speech translation has focused on the recognition and translation of discourses in limited domains, such as hotel reservations or scheduling tasks. Only recently research projects have been started to tackle the problem of open domain speech recognition and translation of complex tasks such as lectures and speeches. In this paper we present the on-going work at our laboratory in open domain speech translation of lectures and parliamentary speeches. Starting from a translation system for European parliamentary plenary sessions and a lecture speech recognition system we show how both components perform in unison on speech translation of lectures.
Communicating Unknown Words in Machine Translation
"... A new approach to handle unknown words in machine translation is presented. The basic idea is to find definitions for the unknown words on the source language side and translate those definitions instead. Only monolingual resources are required, which generally offer a broader coverage than bilingua ..."
Abstract
- Add to MetaCart
A new approach to handle unknown words in machine translation is presented. The basic idea is to find definitions for the unknown words on the source language side and translate those definitions instead. Only monolingual resources are required, which generally offer a broader coverage than bilingual resources and are available for a large number of languages. In order to use this in a machine translation system definitions are extracted automatically from online dictionaries and encyclopedias. The translated definition is then inserted and clearly marked in the original hypothesis. This is shown to lead to significant improvements in (subjective) translation quality. clear if it will be a positive or a negative event. The first sentence is relatively understandable, but there is the possibility that the unknown word might negate the actual sentence. A background lexicon can ameliorate this situation, but it will not be possible to have a lexicon covering all words. 1

