Results 11 - 20
of
38
Example-based Machine Translation of the Basque Language
- In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas
, 2006
"... Basque is both a minority and a highly inflected language with free order of sentence constituents. Machine Translation of Basque is thus both a real need and a test bed for MT techniques. In this paper, we present a modular Data-Driven MT system which includes different chunkers as well as chunk al ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
Basque is both a minority and a highly inflected language with free order of sentence constituents. Machine Translation of Basque is thus both a real need and a test bed for MT techniques. In this paper, we present a modular Data-Driven MT system which includes different chunkers as well as chunk aligners which can deal with the free order of sentence constituents of Basque. We conducted Basque to English translation experiments, evaluated on a large corpus (270, 000 sentence pairs). The experimental results show that our system significantly outperforms state-of-the-art approaches according to several common automatic evaluation metrics. 1
Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora
"... This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates are extracted from sentence-aligned parallel corpora and extended with a set of restrictions which are derived from the bilingual dictionary of the MT system and control their application as transfer rules. The experiments conducted using three different language pairs in the free/open-source MT platform Apertium show that translation quality is improved as compared to word-for-word translation (when no transfer rules are used), and that the resulting translation quality is close to that obtained using hand-coded transfer rules. The method we present is entirely unsupervised and benefits from information in the rest of modules of the MT system in which the inferred rules are applied. 1.
MATREX: the DCU MT System for WMT 2008
"... In this paper, we give a description of the machine translation system developed at DCU that was used for our participation in the evaluation campaign of the Third Workshop on Statistical Machine Translation at ACL 2008. We describe the modular design of our datadriven MT system with particular focu ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper, we give a description of the machine translation system developed at DCU that was used for our participation in the evaluation campaign of the Third Workshop on Statistical Machine Translation at ACL 2008. We describe the modular design of our datadriven MT system with particular focus on the components used in this participation. We also describe some of the significant modules which were unused in this task. We participated in the EuroParl task for the following translation directions: Spanish– English and French–English, in which we employed our hybrid EBMT-SMT architecture to translate. We also participated in the Czech– English News and News Commentary tasks which represented a previously untested language pair for our system. We report results on the provided development and test sets. 1
2006. Lost in Translation: the Problems of Using Mainstream MT Evaluation Metrics for Sign Language Translation
- In Proceedings of the 5th SALTMIL Workshop on Minority Languages at LREC’06
, 2006
"... In this paper we consider the problems of applying corpus-based techniques to minority languages that are neither politically recognised nor have a formally accepted writing system, namely sign languages. We discuss the adoption of an annotated form of sign language data as a suitable corpus for the ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper we consider the problems of applying corpus-based techniques to minority languages that are neither politically recognised nor have a formally accepted writing system, namely sign languages. We discuss the adoption of an annotated form of sign language data as a suitable corpus for the development of a data-driven machine translation (MT) system, and deal with issues that arise from its use. Useful software tools that facilitate easy annotation of video data are also discussed. Furthermore, we address the problems of using traditional MT evaluation metrics for sign language translation. Based on the candidate translations produced from our example-based machine translation system, we discuss why standard metrics fall short of providing an accurate evaluation and suggest more suitable evaluation methods. 1.
Alignment-Guided Chunking
"... We introduce an adaptable monolingual chunking approach–Alignment-Guided Chunking (AGC)–which makes use of knowledge of word alignments acquired from bilingual corpora. Our approach is motivated by the observation that a sentence should be chunked differently depending the foreseen end-tasks. For ex ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We introduce an adaptable monolingual chunking approach–Alignment-Guided Chunking (AGC)–which makes use of knowledge of word alignments acquired from bilingual corpora. Our approach is motivated by the observation that a sentence should be chunked differently depending the foreseen end-tasks. For example, given the different requirements of translation into (say) French and German, it is inappropriate to chunk up an English string in exactly the same way as preparation for translation into one or other of these languages. We test our chunking approach on two language pairs: French– English and German–English, where these two bilingual corpora share the same English sentences. Two chunkers trained on French–English (FE-Chunker) and German–English (DE-Chunker) respectively are used to perform chunking on the same English sentences. We construct two test sets, each suitable for French– English and German–English respectively. The performance of the two chunkers is evaluated on the appropriate test set and with one reference translation only, we report Fscores of 32.63 % for the FE-Chunker and 40.41 % for the DE-Chunker. 1
Wrapper syntax for example-based machine translation
- In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas
, 2006
"... TransBooster is a wrapper technology designed to improve the performance of wide-coverage machine translation systems. Using linguistically motivated syntactic information, it automatically decomposes source language sentences into shorter and syntactically simpler chunks, and recomposes their trans ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
TransBooster is a wrapper technology designed to improve the performance of wide-coverage machine translation systems. Using linguistically motivated syntactic information, it automatically decomposes source language sentences into shorter and syntactically simpler chunks, and recomposes their translation to form target language sentences. This generally improves both the word order and lexical selection of the translation. To date, TransBooster has been successfully applied to rule-based MT, statistical MT, and multi-engine MT. This paper presents the application of TransBooster to Example-Based Machine Translation. In an experiment conducted on test sets extracted from Europarl and the Penn II Treebank we show that our method can raise the BLEU score up to 3.8 % relative to the EBMT baseline. We also conduct a manual evaluation, showing that TransBooster-enhanced EBMT produces a better output in terms of fluency than the baseline EBMT in 55 % of the cases and in terms of accuracy in 53 % of the cases.
Improving the Quality of Automated DVD Subtitles via Example-Based Machine Translation
, 2006
"... Denoual (2005) discovered that, contrary to popular belief, an EBMT system trained on heterogeneous data produced significantly better results than a system trained on homogeneous data. Using similar evaluation metrics and a few additional ones, in this paper we show that this does not hold true for ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Denoual (2005) discovered that, contrary to popular belief, an EBMT system trained on heterogeneous data produced significantly better results than a system trained on homogeneous data. Using similar evaluation metrics and a few additional ones, in this paper we show that this does not hold true for the automated translation of subtitles. In fact, our system (when trained on homogeneous data) shows a relative increase of 74 % BLEU in the language direction German-English and 86 % BLEU English-German. Furthermore, we show that increasing the amount of heterogeneous data results in ‘bad examples ’ being put forward as translation candidates, thus lowering the translation quality. 1
The use of
- STL and STL extensions in CGAL
, 1998
"... Before children can ride a bicycle or tie their shoes, they have learned a great deal about how words are combined to form complex sentences. This achievement is especially impressive because children acquire most of this syntactic knowledge with little or no direct instruction. Nevertheless, master ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Before children can ride a bicycle or tie their shoes, they have learned a great deal about how words are combined to form complex sentences. This achievement is especially impressive because children acquire most of this syntactic knowledge with little or no direct instruction. Nevertheless, mastering natural language syntax may be among the most difficult learning tasks
Learning to Translate: A Psycholinguistic Approach to the Induction of Grammars and Transfer Functions
, 1995
"... dentified many constraints on the form and processing of human languages. By incorporating these constraints into a language learning system, it is possible to build a system that learns to translate (infers functions and grammars for machine translation) from an aligned bilingual corpus of sentence ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
dentified many constraints on the form and processing of human languages. By incorporating these constraints into a language learning system, it is possible to build a system that learns to translate (infers functions and grammars for machine translation) from an aligned bilingual corpus of sentences using understandable, symbolic linguistic principles and representations. This work focuses on one particular constraint, the Marker Hypothesis, which is shown to be powerful, understandable, and computationally accessible. This hypothesis has been incorporated into a family of systems that infer such transfer functions using standard multivariate optimization techniques. These systems have been tested on a variety of language pairs and corpora, demonstrating the language and corpus independence of this approach. Furthermore, the design iv principles are in theory independent of any particular inference technique or grammatical representation and reflect only the constraints of the Marke

