Results 1 - 10
of
11
A Matching Technique in Example-Based Machine Translation
, 1994
"... This paper addresses an important problem in Example-Based Machine Translation (EBMT), namely how to measure similarity between a sentence fragment and a set of stored examples. A new method is proposed that measures similarity according to both surthce structure and content. A second contribution i ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
This paper addresses an important problem in Example-Based Machine Translation (EBMT), namely how to measure similarity between a sentence fragment and a set of stored examples. A new method is proposed that measures similarity according to both surthce structure and content. A second contribution is the use of clustering to make retrieval of the best matching example from the database more efficient. Results on a large number of test cases from the CELEX database are presented.
EBMT Seen as Case-based Reasoning
- In (Carl & Way
, 2001
"... This paper looks at EBMT from the perspective of the Case-based Reasoning (CBR) paradigm. We attempt to describe the task of machine translation (MT) seen as a potential application of CBR, and attempt to describe MT in standard CBR terms. The aim is to see if other applications of CBR can suggest b ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper looks at EBMT from the perspective of the Case-based Reasoning (CBR) paradigm. We attempt to describe the task of machine translation (MT) seen as a potential application of CBR, and attempt to describe MT in standard CBR terms. The aim is to see if other applications of CBR can suggest better ways to approach EBMT.
Stone Soup Translation: The Linked Automata Model
, 2002
"... The automated translation of one natural language to another, known as machine translation (MT), typically requires successful modeling of the grammars of the languages and the relationship between them. Rather than hand-coding these grammars and relationships, some machine translation e#orts employ ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The automated translation of one natural language to another, known as machine translation (MT), typically requires successful modeling of the grammars of the languages and the relationship between them. Rather than hand-coding these grammars and relationships, some machine translation e#orts employ data-driven methods, where the goal is to learn from a large amount of training examples of accurate translations. One such data-driven approach is statistical MT, where language and alignment models are automatically induced from parallel corpora. This work has also been extended to probabilistic finite-state approaches, most often via transducers.
Low-cost, High-performance Translation Retrieval: Dumber Is Better
- In Proc. of the 39th Annual Meeting of the ACL and 10th Conference of the EACL (ACL-EACL
, 2001
"... In this paper, we compare the relative effects of segment order, segmentation and segment contiguity on the retrieval performance of a translation memory system. We take a selection of both bag-of-words and segment order-sensitive string comparison methods, and run each over both characterand word-s ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, we compare the relative effects of segment order, segmentation and segment contiguity on the retrieval performance of a translation memory system. We take a selection of both bag-of-words and segment order-sensitive string comparison methods, and run each over both characterand word-segmented data, in combination with a range of local segment contiguity models (in the form of N-grams).
The Effects of Word Order and Segmentation on Translation Retrieval Performance
- In Proc. of the 18th International Conference on Computational Linguistics (COLING 2000
, 2000
"... This research looks at the e#ects of word order and segmentation on translation retrieval performance for an experimental Japanese-English translation memory system. We implement a number of both bag-of-words and word order-sensitive similarity metrics, and test each over characterbased and word-bas ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This research looks at the e#ects of word order and segmentation on translation retrieval performance for an experimental Japanese-English translation memory system. We implement a number of both bag-of-words and word order-sensitive similarity metrics, and test each over characterbased and word-based indexing. The translation retrieval performance of each system configuration is evaluated empirically through the notion of word edit distance between translation candidate outputs and the model translation. Our results indicate that character-based indexing is consistently superior to word-based indexing, suggesting that segmentation is an unnecessary luxury in the given domain. Word order-sensitive approaches are demonstrated to generally outperform bag-of-words methods, with source language segment-level edit distance proving the most e#ective similarity metric.
Linguistic Knowledge and Complexity in an EBMT System Based on Translation Patterns
- IN PROCEEDINGS OF THE WORKSHOP ON EBMT, MT SUMMIT VIII
"... An approach to Example-Based Machine Translation is presented which operates by extracting translation patterns from a bilingual corpus aligned at the level of the sentence. This is carried out using a language-neutral recursive machine-learning algorithm based on the principle of similar distributi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
An approach to Example-Based Machine Translation is presented which operates by extracting translation patterns from a bilingual corpus aligned at the level of the sentence. This is carried out using a language-neutral recursive machine-learning algorithm based on the principle of similar distributions of strings. The translation patterns extracted represent generalisations of sentences that are translations of each other and, to some extent, resemble transfer rules but with fewer constraints. The strings and variables, of which translations patterns are composed, are aligned in order to provide a more refined bilingual knowledge source, necessary for the recombination phase. A non-structural approach based on surface forms is error prone and liable to produce translation patterns that are false translations. Such errors are highlighted and solutions are proposed by the addition of external linguistic resources, namely morphological analysis and part-of-speech tagging. The amount of linguistic resources added has consequences for computational complexity and portability.
Confidence Factor Assignment to Translation Templates
, 1998
"... that I have read this thesis and that in my opinion it is fully adequate, in scope ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
that I have read this thesis and that in my opinion it is fully adequate, in scope
Stone Soup Translation
, 2002
"... The automated translation of one natural language to another, known as machine translation, typically requires successful modeling of the grammars of the two languages and of the relationship between them. Rather than hand-coding these grammars and relationships, some machine translation e#orts ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The automated translation of one natural language to another, known as machine translation, typically requires successful modeling of the grammars of the two languages and of the relationship between them. Rather than hand-coding these grammars and relationships, some machine translation e#orts have begun to employ statistical methods, where the goal is to learn from a large amount of training examples of accurate translations. This work has also been extended to probabilistic finite-state approaches, most often via transducers. In this project, a novel combination of finite-state devices is employed. The model proposed, which consists of two probabilistically linked automata, is more flexible than a transducer model, giving increased ability to handle word order di#erences. In addition to the model and algorithms for its construction and use, we present several increased-coverage techniques, including methods for extracting partial results from the model. We present preliminary results for a test corpus of English to Spanish translations, which suggest the model may serve as a base for rudimentary translation, when used in conjunction with these extensions.
2009a), A critique of statistical machine translation
- in Walter Daelemans & Véronique Hoste (eds.), Journal of translation and interpreting studies: Special Issue on Evaluation of Translation Technology, Linguistica Antverpiensia
"... Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators, and somewhat more surprisingly perhaps, many experienced MT protagonists, find the basic model extremely ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators, and somewhat more surprisingly perhaps, many experienced MT protagonists, find the basic model extremely difficult to understand. The main aim of this paper, therefore, is to discuss why this might be the case. Our basic thesis is that proponents of PB-SMT do not seek to address any community other than their own, for they do not feel any need to do so. We will demonstrate that this was not always the case; on the contrary, when statistical models of translation were first presented, the language used to describe how such a model might work was very conciliatory, and inclusive. Over the next five years things changed considerably; once SMT achieved dominance particularly over the rule-based paradigm, it had established a position where it did not need to bring along the rest of the MT community with it, and in our view, this has largely pertained to this day. Having discussed these issues, we will provide three additional observations: firstly, we will discuss the role of automatic MT evaluation metrics when describing PB-SMT systems; secondly, we will comment on the recent syntactic embellishments of PB-SMT, noting especially that most of these contributions have come from researchers who have prior experience in fields other than statistical models of translation; and finally, we will briefly comment on the relationship between PB-SMT and other models of translation, suggesting that there are many gains to be had if the SMT community were to open up more to the other MT paradigms. 1
Translation Pattern Extraction and Recombination for Example-Based Machine Translation
, 2001
"... No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or institute of learning. An approach to Example-Based Machine Translation is presented which operates by extracting and recombining t ..."
Abstract
- Add to MetaCart
No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or institute of learning. An approach to Example-Based Machine Translation is presented which operates by extracting and recombining translation patterns from a bilingual corpus aligned at the level of the sentence. The translation patterns are extracted using a recursive machinelearning algorithm based on the principle of similar distributions of strings: source and target language lexical items that co-occur in the same two sentence-pairs are likely to be translations of each other. The translation patterns extracted represent generalisations of sentences that are translations of each other in that certain sequences of words are replaced by variables. The translation patterns resemble, to a certain extent, transfer rules but with less constraints since there is no concept of syntactic structure in this approach: translation patterns are extracted based on the

