Results 1 - 10
of
10
Processing of Swedish Compounds for Phrase-Based Statistical Machine Translation
"... Abstract. We investigated the effects of processing Swedish compounds for phrase-based SMT between Swedish and English. Compounds were split in a pre-processing step using an unsupervised empirical method. After translation into Swedish, compounds were merged, using a novel merging algorithm. We inv ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract. We investigated the effects of processing Swedish compounds for phrase-based SMT between Swedish and English. Compounds were split in a pre-processing step using an unsupervised empirical method. After translation into Swedish, compounds were merged, using a novel merging algorithm. We investigated two ways of handling compound parts, by marking them as compound parts or by normalizing them to a canonical form. We found that compound splitting did improve translation into Swedish, according to automatic metrics. For translation into English the results were not consistent across automatic metrics. However, error analysis of compound translation showed a small improvement in the systems that used splitting. The number of untranslated words in the English output was reduced by 50%.
A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages ∗
"... We propose a language-independent approach for improving statistical machine translation for morphologically rich languages using a hybrid morpheme-word representation where the basic unit of translation is the morpheme, but word boundaries are respected at all stages of the translation process. Our ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We propose a language-independent approach for improving statistical machine translation for morphologically rich languages using a hybrid morpheme-word representation where the basic unit of translation is the morpheme, but word boundaries are respected at all stages of the translation process. Our model extends the classic phrase-based model by means of (1) word boundary-aware morpheme-level phrase extraction, (2) minimum error-rate training for a morpheme-level translation model using word-level BLEU, and (3) joint scoring with morpheme- and word-level language models. Further improvements are achieved by combining our model with the classic one. The evaluation on English to Finnish using Europarl (714K sentence pairs; 15.5M English words) shows statistically significant improvements over the classic model based on BLEU and human judgments. 1
MoVi: Mobile Phone based Video Highlights via Collaborative Sensing
"... Sensor networks have been conventionally defined as a network of sensor motes that collaboratively detect events and report them to a remote monitoring station. This paper makes an attempt to extend this notion to the social context by using mobile phones as a replacement for motes. We envision a so ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Sensor networks have been conventionally defined as a network of sensor motes that collaboratively detect events and report them to a remote monitoring station. This paper makes an attempt to extend this notion to the social context by using mobile phones as a replacement for motes. We envision a social application where mobile phones collaboratively sense their ambience and recognize socially “interesting ” events. The phone with a good view of the event triggers a video recording, and later, the video-clips from different phones are “stitched ” into a video highlights of the occasion. We observe that such a video highlights is akin to the notion of event coverage in conventional sensor networks, only the notion of “event ” has changed from physical to social. We have built a Mobile Phone based Video Highlights system (MoVi) using Nokia phones and iPod Nanos, and have experimented in real-life social gatherings. Results show that MoVi-generated video highlights (created offline) are quite similar to those created manually, (i.e., by painstakingly editing the entire video of the occasion). In that sense, MoVi can be viewed as a collaborative information distillation tool capable of filtering events of social relevance.
Overview and results of Morpho Challenge 2009
- IN: WORKING NOTES FOR THE CLEF 2009 WORKSHOP
, 2009
"... The goal of Morpho Challenge 2009 was to evaluate unsupervised algorithms that provide morpheme analyses for words in different languages and in various practical applications. Morpheme analysis is particularly useful in speech recognition, information retrieval and machine translation for morphol ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The goal of Morpho Challenge 2009 was to evaluate unsupervised algorithms that provide morpheme analyses for words in different languages and in various practical applications. Morpheme analysis is particularly useful in speech recognition, information retrieval and machine translation for morphologically rich languages where the amount of different word forms is very large. The evaluations consisted of: 1. a comparison to grammatical morphemes, 2. using morphemes instead of words in information retrieval tasks, and 3. combining morpheme and word based systems in statistical machine translation tasks. The evaluation
Toward Using Morphology in French-English Phrase-based SMT
"... We describe the system used in our submission to the WMT-2009 French-English translation task. We use the Moses phrasebased Statistical Machine Translation system with two simple modications of the decoding input and word-alignment strategy based on morphology, and analyze their impact on translatio ..."
Abstract
- Add to MetaCart
We describe the system used in our submission to the WMT-2009 French-English translation task. We use the Moses phrasebased Statistical Machine Translation system with two simple modications of the decoding input and word-alignment strategy based on morphology, and analyze their impact on translation quality. 1
EXPLOITING MORPHOLOGY IN SPEECH TRANSLATION WITH PHRASE-BASED FINITE-STATE TRANSDUCERS
"... This work implements a novel formulation for phrase-based translation models making use of morpheme-based translation units under a stochastic finite-state framework. This approach has an additional interest for speech translation tasks since it leads to the integration of the acoustic and translati ..."
Abstract
- Add to MetaCart
This work implements a novel formulation for phrase-based translation models making use of morpheme-based translation units under a stochastic finite-state framework. This approach has an additional interest for speech translation tasks since it leads to the integration of the acoustic and translation models. As a further contribution, this is the first paper addressing a Basque-to-Spanish speech translation task. For this purpose a morpheme based finite-state recognition system is combined with a finite-state transducer that translates phrases of morphemes in the source language into usual sequences of words in the target language. The proposed models were assessed under a limiteddomain application task. Good performances were obtained for the proposed phrase-based finite-state translation model using morphemes as translation units, and also notable improvements are obtained in decoding time.
Enhancing Morphological Alignment for Translating Highly Inflected Languages ∗
"... We propose an unsupervised approach utilizing only raw corpora to enhance morphological alignment involving highly inflected languages. Our method focuses on closed-class morphemes, modeling their influence on nearby words. Our languageindependent model recovers important links missing in the IBM Mo ..."
Abstract
- Add to MetaCart
We propose an unsupervised approach utilizing only raw corpora to enhance morphological alignment involving highly inflected languages. Our method focuses on closed-class morphemes, modeling their influence on nearby words. Our languageindependent model recovers important links missing in the IBM Model 4 alignment and demonstrates improved end-toend translations for English-Finnish and English-Hungarian. 1
Speech to speech machine translation: Biblical chatter from Finnish to English
"... Speech-to-speech machine translation is in some ways the peak of natural language processing, in that it deals directly with our original, oral mode of communication (as opposed to derived written language). As such, it presents challenges that are not to be taken lightly. Although existing technolo ..."
Abstract
- Add to MetaCart
Speech-to-speech machine translation is in some ways the peak of natural language processing, in that it deals directly with our original, oral mode of communication (as opposed to derived written language). As such, it presents challenges that are not to be taken lightly. Although existing technology covers each of the steps in the process, from speech recognition to synthesis, deriving a model of translation that is effective in the domain of spoken language is an interesting and challenging task. If we could teach our algorithms to learn as children acquire language, the result would be useful both for language technology and cognitive science. We propose several potential approaches, an implementation of a multi-path model that translates recognized morphemes alongside words, and a web-interface to test our speech translation tool as trained for Finnish to English. We also discuss current approaches to machine translation and the problems they face in adapting simultaneously to morphologically rich languages and to the spoken modality. 1
Modeling Inflection and Word-Formation in SMT
"... The current state-of-the-art in statistical machine translation (SMT) suffers from issues of sparsity and inadequate modeling power when translating into morphologically rich languages. We model both inflection and word-formation for the task of translating into German. We translate from English wor ..."
Abstract
- Add to MetaCart
The current state-of-the-art in statistical machine translation (SMT) suffers from issues of sparsity and inadequate modeling power when translating into morphologically rich languages. We model both inflection and word-formation for the task of translating into German. We translate from English words to an underspecified German representation and then use linearchain CRFs to predict the fully specified German representation. We show that improved modeling of inflection and wordformation leads to improved SMT. 1

