Results 1 -
9 of
9
Enriching Morphologically Poor Languages for Statistical Machine Translation
, 2008
"... We address the problem of translating from morphologically poor to morphologically rich languages by adding per-word linguistic information to the source language. We use the syntax of the source sentence to extract information for noun cases and verb persons and annotate the corresponding words acc ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We address the problem of translating from morphologically poor to morphologically rich languages by adding per-word linguistic information to the source language. We use the syntax of the source sentence to extract information for noun cases and verb persons and annotate the corresponding words accordingly. In experiments, we show improved performance for translating from English into Greek and Czech. For English–Greek, we reduce the error on the verb conjugation from 19 % to 5.4 % and noun case agreement from 9 % to 6%. 1
Data-Oriented Models of Parsing and Translation
, 2005
"... A dissertation submitted in fulfilment of the requirements for the award of ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
A dissertation submitted in fulfilment of the requirements for the award of
An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora
- In Proceedings of the 10th Conference of the European Association for Machine Translation (EAMT-05
, 2005
"... Most statistical machine translation (SMT) systems use phrase-to-phrase translations to capture local context information, leading to better lexical choices and more reliable word reordering. Long phrases capture more contexts than short phrases and result in better translation qualities. On the ot ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Most statistical machine translation (SMT) systems use phrase-to-phrase translations to capture local context information, leading to better lexical choices and more reliable word reordering. Long phrases capture more contexts than short phrases and result in better translation qualities. On the other hand, the increasing amount of bilingual data poses serious problems for storing all possible phrases. In this paper, we describe a novel phrase-to-phrase alignment model which allows for arbitrarily long phrases and works for very large bilingual corpora. This model is very efficient in both time and space and the resulting translations are better than the state-of-the-art systems.
Evaluating evaluation methods for generation in the presence of variation
- in Proceedings of CICLing 2005
, 2005
"... Abstract. Recent years have seen increasing interest in automatic metrics for the evaluation of generation systems. When a system can generate syntactic variation, automatic evaluation becomes more difficult. In this paper, we compare the performance of several automatic evaluation metrics using a c ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract. Recent years have seen increasing interest in automatic metrics for the evaluation of generation systems. When a system can generate syntactic variation, automatic evaluation becomes more difficult. In this paper, we compare the performance of several automatic evaluation metrics using a corpus of automatically generated paraphrases. We show that these evaluation metrics can at least partially measure adequacy (similarity in meaning), but are not good measures of fluency (syntactic correctness). We make several proposals for improving the evaluation of generation systems that produce variation. 1
Maximal Lattice Overlap in Example-Based Machine Translation
, 2003
"... Example-Based Machine Translation (EBMT) retrieves pre-translated phrases from a sentence-aligned bilingual training corpus to translate new input sentences. EBMT uses long pre-translated phrases effectively but is subject to disfluencies at phrasal translation boundaries. We address this problem by ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Example-Based Machine Translation (EBMT) retrieves pre-translated phrases from a sentence-aligned bilingual training corpus to translate new input sentences. EBMT uses long pre-translated phrases effectively but is subject to disfluencies at phrasal translation boundaries. We address this problem by introducing a novel method that exploits overlapping phrasal translations and the increased confidence in translation accuracy they imply. We specify an efficient algorithm for producing translations using overlap. Finally, our empirical analysis indicates that this approach produces higher quality translations than the standard method of EBMT in a peak-to-peak comparison.
Using Patterns for Machine Translation (MT
- In Proceedings of the European Association for Machine Translation
, 2006
"... Abstract. In this paper an innovative approach is presented for MT, which is based on pattern matching techniques, relies on extensive target language monolingual corpora and employs a series of similarity weights between the source and the target language. Our system is based on the notion of ‘patt ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. In this paper an innovative approach is presented for MT, which is based on pattern matching techniques, relies on extensive target language monolingual corpora and employs a series of similarity weights between the source and the target language. Our system is based on the notion of ‘patterns’, which are viewed as ‘models ’ of target language strings, whose final form is defined by the corpus. 1.
LANGUAGE MODEL ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION AND STATISTICAL MACHINE TRANSLATION
, 2004
"... Language modeling is critical and indispensable for many natural language ap-plications such as automatic speech recognition and machine translation. Due to the complexity of natural language grammars, it is almost impossible to construct language models by a set of linguistic rules; therefore stati ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Language modeling is critical and indispensable for many natural language ap-plications such as automatic speech recognition and machine translation. Due to the complexity of natural language grammars, it is almost impossible to construct language models by a set of linguistic rules; therefore statistical techniques have been dominant for language modeling over the last few decades. All statistical modeling techniques, in principle, work under some conditions: 1) a reasonable amount of training data is available and 2) the training data comes from the same population as the test data to which we want to apply our model. Based on observations from the training data, we build statistical models and therefore, the success of a statistical model is crucially dependent on the training data. In other words, if we don’t have enough data for training, or the training data is not matched with the test data, we are not able to build accurate statistical models. This thesis presents novel methods to cope with those problems in language modeling—language model adaptation.
Summaries and the Process of Summarization from Evaluation of Automatic Text Summarization- A practical implementation
"... Text summarization (or rather, automatic text summarization) is the technique where a computer automatically creates an abstract, or summary, of one or more texts. The initial interest in automatic shortening of texts was spawned during the sixties in American research libraries. A large amount of s ..."
Abstract
- Add to MetaCart
Text summarization (or rather, automatic text summarization) is the technique where a computer automatically creates an abstract, or summary, of one or more texts. The initial interest in automatic shortening of texts was spawned during the sixties in American research libraries. A large amount of scientific papers and books were to be digitally
Towards Interactive and Automatic Refinement of Translation Rules
"... Although Machine Translation (MT) has advanced recently for language pairs with large amounts of parallel data, translation quality has not yet reached satisfactory levels, especially not for resource-poor languages with little if any parallel text to train statistical or example-based MT systems. R ..."
Abstract
- Add to MetaCart
Although Machine Translation (MT) has advanced recently for language pairs with large amounts of parallel data, translation quality has not yet reached satisfactory levels, especially not for resource-poor languages with little if any parallel text to train statistical or example-based MT systems. Rule-based transfer MT systems are the only feasible solution for resourcepoor scenarios. However it can prove very costly and time consuming to refine and extend translation rule sets manually by trained computational linguists with knowledge of both languages. If the translation rules are written manually, no matter how many rules there are, coverage and accuracy can always be increased. If they are automatically learned, they might be either too general or too specific. Either way, in the face of unseen examples, the translation rules will need to be refined to account for new data. Thus, the goal of this thesis is to generalize post-edition efforts in an effective way, by identifying and correcting rules semi-automatically to improve coverage and overall translation quality, especially for resource-poor languages.

