Results 1 - 10
of
10
Robust Language Pair-Independent Sub-Tree Alignment
"... Data-driven approaches to machine translation (MT) achieve state-of-the-art results. Many syntax-aware approaches, such as Example-Based MT and Data-Oriented Translation, make use of tree pairs aligned at sub-sentential level. Obtaining sub-sentential alignments manually is time-consuming and error- ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Data-driven approaches to machine translation (MT) achieve state-of-the-art results. Many syntax-aware approaches, such as Example-Based MT and Data-Oriented Translation, make use of tree pairs aligned at sub-sentential level. Obtaining sub-sentential alignments manually is time-consuming and error-prone, and requires expert knowledge of both source and target languages. We propose a novel, language pair-independent algorithm which automatically induces alignments between phrase-structure trees. We evaluate the alignments themselves against a manually aligned gold standard, and perform an extrinsic evaluation by using the aligned data to train and test a DOT system. Our results show that translation accuracy is comparable to that of the same translation system trained on manually aligned data, and coverage improves. 1.
Tree-based Target Language Modeling
"... In this paper we describe an approach to target language modeling which is based on a large treebank. We assume a bag of bags as input for the target language generation component, leaving it up to this component to decide upon word and phrase order. An experiment with Dutch as target language shows ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper we describe an approach to target language modeling which is based on a large treebank. We assume a bag of bags as input for the target language generation component, leaving it up to this component to decide upon word and phrase order. An experiment with Dutch as target language shows that this approach to candidate translation reranking outperforms standard n-gram modeling, when measuring
Parallel Treebanks in Phrase-Based Statistical Machine Translation
"... Abstract. Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidate ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PBSMT) system leads to significant improvements in translation quality. We describe further experiments on incorporating parallel treebank information into PBSMT, such as word alignments. We investigate the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the potential of parallel treebanks in other paradigms of MT. 1
Removing the Distinction Between a Translation Memory, a Bilingual Dictionary and a Parallel Corpus
"... This paper presents a prototype MT system which does not make the dis-tinction between a dictionary, a sub-sentential aligned parallel corpus, and post-edited information (translators output) like a translation memory. The system is based on the METIS-approach (Vandeghinste et al, 2006), and uses an ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper presents a prototype MT system which does not make the dis-tinction between a dictionary, a sub-sentential aligned parallel corpus, and post-edited information (translators output) like a translation memory. The system is based on the METIS-approach (Vandeghinste et al, 2006), and uses an XML-based dictionary format in which not only simple word-to-word translations can be included, but which also contains complex dictionary en-tries, including discontinuous entries, like idioms and proverbs. The pre-sented prototype is a system that automatically adapts its dictionary and tar-get language corpus depending on the post-edited output as made by the users of the system, and will therefore have a learning curve in its performance. 1 1
Accuracy-Based Scoring for DOT: Towards Direct Error Minimization for Data-Oriented Translation
"... In this work we present a novel technique to rescore fragments in the Data-Oriented Translation model based on their contribution to translation accuracy. We describe three new rescoring methods, and present the initial results of a pilot experiment on a small subset of the Europarl corpus. This wor ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this work we present a novel technique to rescore fragments in the Data-Oriented Translation model based on their contribution to translation accuracy. We describe three new rescoring methods, and present the initial results of a pilot experiment on a small subset of the Europarl corpus. This work is a proof-of-concept, and is the first step in directly optimizing translation decisions solely on the hypothesized accuracy of potential translations resulting from those decisions. 1
2009a), A critique of statistical machine translation
- in Walter Daelemans & Véronique Hoste (eds.), Journal of translation and interpreting studies: Special Issue on Evaluation of Translation Technology, Linguistica Antverpiensia
"... Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators, and somewhat more surprisingly perhaps, many experienced MT protagonists, find the basic model extremely ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators, and somewhat more surprisingly perhaps, many experienced MT protagonists, find the basic model extremely difficult to understand. The main aim of this paper, therefore, is to discuss why this might be the case. Our basic thesis is that proponents of PB-SMT do not seek to address any community other than their own, for they do not feel any need to do so. We will demonstrate that this was not always the case; on the contrary, when statistical models of translation were first presented, the language used to describe how such a model might work was very conciliatory, and inclusive. Over the next five years things changed considerably; once SMT achieved dominance particularly over the rule-based paradigm, it had established a position where it did not need to bring along the rest of the MT community with it, and in our view, this has largely pertained to this day. Having discussed these issues, we will provide three additional observations: firstly, we will discuss the role of automatic MT evaluation metrics when describing PB-SMT systems; secondly, we will comment on the recent syntactic embellishments of PB-SMT, noting especially that most of these contributions have come from researchers who have prior experience in fields other than statistical models of translation; and finally, we will briefly comment on the relationship between PB-SMT and other models of translation, suggesting that there are many gains to be had if the SMT community were to open up more to the other MT paradigms. 1
Bottom-up transfer in Example-based Machine Translation
"... This paper describes the transfer component of a syntax-based Example-based Machine Translation system. The source sentence parse tree is matched in a bottom-up fashion with the source language side of a parallel example treebank, which results in a target forest which is sent to the target language ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes the transfer component of a syntax-based Example-based Machine Translation system. The source sentence parse tree is matched in a bottom-up fashion with the source language side of a parallel example treebank, which results in a target forest which is sent to the target language generation component. The results on a 500 sentences test set are compared with a top-down approach to transfer of the same system, with the bottom-up approach yielding much better results. 1
Linguistic and Statistical Extensions of Data Oriented Parsing
, 2006
"... This thesis explores certain linguistic and statistical extensions of Data-Oriented Parsing (DOP). The central idea in DOP is to analyse new input on the basis of a collection of fragment-probability pairs. In its simplest version, Tree-DOP, the fragments used are subparts of simple phrase structure ..."
Abstract
- Add to MetaCart
This thesis explores certain linguistic and statistical extensions of Data-Oriented Parsing (DOP). The central idea in DOP is to analyse new input on the basis of a collection of fragment-probability pairs. In its simplest version, Tree-DOP, the fragments used are subparts of simple phrase structure trees. Resolving ambiguity (i.e. selecting the optimal analysis) involves identifying the Most Probable Parse (MPP). Though empirical evaluation has shown state-of-the-art results, the linguistic expressive mechanism of this model is very limited. In addition, the algorithm used to compute the MPP has been shown to suffer from several disadvantages. The aim of the thesis is two-fold. In the first part, we seek to explore how the linguis-tic dimension of DOP can be enhanced. To this end, we investigate how the framework can be applied to representations based on a richer annotation scheme, specifically that of Head-driven Phrase Structure Grammar (HPSG). This investigation culminates in the development of an HPSG-DOP model, which takes maximal advantage of the un-derlying formalism. The proposed model embodies a number of positive characteristics
Abstract
"... In this paper we describe and evaluate a top-down transfer component of a hybrid example-based machine translation system with an architecture similar to that of transfer MT systems, but with automatically derived transfer-rules and dictionary entries based on a parallel treebank. The tests were app ..."
Abstract
- Add to MetaCart
In this paper we describe and evaluate a top-down transfer component of a hybrid example-based machine translation system with an architecture similar to that of transfer MT systems, but with automatically derived transfer-rules and dictionary entries based on a parallel treebank. The tests were applied on the translation pair Dutch to English. Evaluation and error analysis have shown that the top-down transfer process has a number of shortcomings on which we wish to report and which we will try to solve in future work by applying bottom-up transfer. 1
Scaling up a hybrid MT system: From low to full resources
"... This article describes a hybrid approach to machine translation (MT) that is inspired by the rule-based, statistical, example-based, and other hybrid machine translation approaches currently used or described in academic literature. It describes how the approach was implemented for language pairs us ..."
Abstract
- Add to MetaCart
This article describes a hybrid approach to machine translation (MT) that is inspired by the rule-based, statistical, example-based, and other hybrid machine translation approaches currently used or described in academic literature. It describes how the approach was implemented for language pairs using only limited monolingual resources and hardly any parallel resources (the METIS-II system), and how it is currently implemented with rich resources on both the source and target side as well as rich parallel data (the PaCo-MT system). We aim to illustrate that a similar paradigm can be used, irrespectively of the resources available, but of course with an impact on translation quality. 1.

