Results 11 -
19 of
19
2006b. Empirical lower bounds on the complexity of translational equivalence
- In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (ACL
"... This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helpe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helped to improve statistical translation models, including finitestate phrase-based models, tree-to-string models, and tree-to-tree models. The paper also presents evidence that inversion transduction grammars cannot generate some translational equivalence relations, even in relatively simple real bitexts in syntactically similar languages with rigid word order. Instructions for replicating our experiments are at
Multimedia Content Processing and Retrieval in the REVEAL THIS setting
"... Abstract — The explosion of multimedia digital content and the development of technologies that go beyond traditional broadcast and TV have rendered access to such content important for all end-users of these technologies. REVEAL THIS develops content processing technology able to semantically index ..."
Abstract
- Add to MetaCart
Abstract — The explosion of multimedia digital content and the development of technologies that go beyond traditional broadcast and TV have rendered access to such content important for all end-users of these technologies. REVEAL THIS develops content processing technology able to semantically index, categorise and cross-link multiplatform, multimedia and multilingual digital content, providing the system user with search, retrieval, summarisation and translation functionalities. Index Terms—audio-image-text analysis, cross-media linking and indexing, cross-media categorisation, cross-media summarisation, cross-lingual translation T I.
Learning an Expert from Human Annotations in Statistical Machine Translation: the Case of Out-of-Vocabulary Words
"... We present a general method for incorporating an “expert ” model into a Statistical Machine Translation (SMT) system, in order to improve its performance on a particular “area of expertise”, and apply this method to the specific task of finding adequate replacements for Out-of-Vocabulary (OOV) words ..."
Abstract
- Add to MetaCart
We present a general method for incorporating an “expert ” model into a Statistical Machine Translation (SMT) system, in order to improve its performance on a particular “area of expertise”, and apply this method to the specific task of finding adequate replacements for Out-of-Vocabulary (OOV) words. Candidate replacements are paraphrases and entailed phrases, obtained using monolingual resources. These candidate replacements are transformed into “dynamic biphrases”, generated at decoding time based on the context of each source sentence. Standard SMT features are enhanced with a number of new features aimed at scoring translations produced by using different replacements. Active learning is used to discriminatively train the model parameters from human assessments of the quality of translations. The learning framework yields an SMT system which is able to deal with sentences containing OOV words but also guarantees that the performance is not degraded for input sentences without OOV words. Results of experiments on English-French translation show that this method outperforms previous work addressing OOV words in terms of acceptability. 1
Complexity-Based Phrase-Table Filtering for Statistical Machine Translation
"... We describe an approach for filtering phrase tables in a Statistical Machine Translation system, which relies on a statistical independence measure called Noise, first introduced in (Moore, 2004). While previous work by (Johnson et al., 2007) also addressed the question of phrase table filtering, it ..."
Abstract
- Add to MetaCart
We describe an approach for filtering phrase tables in a Statistical Machine Translation system, which relies on a statistical independence measure called Noise, first introduced in (Moore, 2004). While previous work by (Johnson et al., 2007) also addressed the question of phrase table filtering, it relied on a simpler independence measure, the p-value, which is theoretically less satisfying than the Noise in this context. In this paper, we use Noise as the filtering criterion, and show that when we partition the bi-phrase tables in several sub-classes according to their complexity, using Noise leads to improvements in BLEU score that are unreachable using p-value, while allowing a similar amount of pruning of the phrase tables. 1
Generative Models of Monolingual and Bilingual Gappy Patterns
"... A growing body of machine translation research aims to exploit lexical patterns (e.g., n-grams and phrase pairs) with gaps (Simard et al., 2005; Chiang, 2005; Xiong et al., 2011). Typically, these “gappy patterns ” are discovered using heuristics based on word alignments or local statistics such as ..."
Abstract
- Add to MetaCart
A growing body of machine translation research aims to exploit lexical patterns (e.g., n-grams and phrase pairs) with gaps (Simard et al., 2005; Chiang, 2005; Xiong et al., 2011). Typically, these “gappy patterns ” are discovered using heuristics based on word alignments or local statistics such as mutual information. In this paper, we develop generative models of monolingual and parallel text that build sentences using gappy patterns of arbitrary length and with arbitrarily many gaps. We exploit Bayesian nonparametrics and collapsed Gibbs sampling to discover salient patterns in a corpus. We evaluate the patterns qualitatively and also add them as features to an MT system, reporting promising preliminary results. 1
Chunk alignment for Corpus-Based Machine Translation
, 2010
"... Since sub-sentential alignment is critically important to the translation quality of an Example-Based Machine Translation (EBMT) system, which operates by finding and combining phrase-level matches against the training examples, we developed a new alignment algorithm for the purpose of improving the ..."
Abstract
- Add to MetaCart
Since sub-sentential alignment is critically important to the translation quality of an Example-Based Machine Translation (EBMT) system, which operates by finding and combining phrase-level matches against the training examples, we developed a new alignment algorithm for the purpose of improving the EBMT system’s performance. This new Symmetric Probabilistic Alignment (SPA) algorithm treats the source and target languages in a symmetric fashion. We describe our basic algorithm and its primary extensions that enable use of surrounding context, and of positional preference information, compare its alignment accuracy with IBM Model 4, and report on experiments in which either IBM Model 4 or SPA alignments are substituted for the aligner currently built into the EBMT system. Both Model 4 and SPA are significantly better than the internal aligner. Then we extend SPA to exploit external alignment information from Moses and to output non-contiguous target phrases. We also alter SPA so that the weights for its feature scores are tuned using minimum error rate training. Our experiments show that exploiting
Productive Generation of Compound Words in Statistical Machine Translation
"... In many languages the use of compound words is very productive. A common practice to reduce sparsity consists in splitting compounds in the training data. When this is done, the system incurs the risk of translating components in non-consecutive positions, or in the wrong order. Furthermore, a post- ..."
Abstract
- Add to MetaCart
In many languages the use of compound words is very productive. A common practice to reduce sparsity consists in splitting compounds in the training data. When this is done, the system incurs the risk of translating components in non-consecutive positions, or in the wrong order. Furthermore, a post-processing step of compound merging is required to reconstruct compound words in the output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order. We also propose new heuristic methods for merging components that outperform all known methods, and a learning-based method that has similar accuracy as the heuristic method, is better at producing novel compounds, and can operate with no background linguistic resources. 1
Fundamental and New Approaches to Statistical Machine Translation
"... Statistical Machine Translation (SMT) is an approach to automatic text translation based on the use of statistical models and examples of translations. Although Machine Translation (MT) systems developed according to other paradigms are still in use, mainly rule-based or example-based MT, SMT domina ..."
Abstract
- Add to MetaCart
Statistical Machine Translation (SMT) is an approach to automatic text translation based on the use of statistical models and examples of translations. Although Machine Translation (MT) systems developed according to other paradigms are still in use, mainly rule-based or example-based MT, SMT dominates academic
A Dataset for Assessing Machine Translation Evaluation Metrics
"... We describe a dataset containing 16,000 translations produced by four machine translation systems and manually annotated for quality by professional translators. This dataset can be used in a range of tasks assessing machine translation evaluation metrics, from basic correlation analysis to training ..."
Abstract
- Add to MetaCart
We describe a dataset containing 16,000 translations produced by four machine translation systems and manually annotated for quality by professional translators. This dataset can be used in a range of tasks assessing machine translation evaluation metrics, from basic correlation analysis to training and test of machine learning-based metrics. By providing a standard dataset for such tasks, we hope to encourage the development of better MT evaluation metrics. 1.

