Results 1 - 10
of
22
Inducing Crosslingual Distributed Representations of Words
, 2012
"... Distributed representations of words have proven extremely useful in numerous natural language processing tasks. Their appeal is that they can help alleviate data sparsity problems common to supervised learning. Methods for inducing these representations require only unlabeled language data, which a ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Distributed representations of words have proven extremely useful in numerous natural language processing tasks. Their appeal is that they can help alleviate data sparsity problems common to supervised learning. Methods for inducing these representations require only unlabeled language data, which are plentiful for many natural languages. In this work, we induce distributed representations for a pair of languages jointly. We treat it as a multitask learning problem where each task corresponds to a single word, and task relatedness is derived from co-occurrence statistics in bilingual parallel data. These representations can be used for a number of crosslingual learning tasks, where a learner can be trained on annotations present in one language and applied to test data in another. We show that our representations are informative by using them for crosslingual document classification, where classifiers trained on these representations substantially outperform strong baselines (e.g. machine translation) when applied to a new language.
Learning a Phrase-based Translation Model from Mon- olingual Data with Application to Domain Adaptation
"... Currently, almost all of the statistical ma-chine translation (SMT) models are trained with the parallel corpora in some specific domains. However, when it comes to a lan-guage pair or a different domain without any bilingual resources, the traditional SMT loses its power. Recently, some research wo ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Currently, almost all of the statistical ma-chine translation (SMT) models are trained with the parallel corpora in some specific domains. However, when it comes to a lan-guage pair or a different domain without any bilingual resources, the traditional SMT loses its power. Recently, some research works study the unsupervised SMT for in-ducing a simple word-based translation model from the monolingual corpora. It successfully bypasses the constraint of bitext for SMT and obtains a relatively promising result. In this paper, we take a step forward and propose a simple but effec-tive method to induce a phrase-based model from the monolingual corpora given an au-tomatically-induced translation lexicon or a manually-edited translation dictionary. We apply our method for the domain adaptation task and the extensive experiments show that our proposed method can substantially improve the translation quality. 1
Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals
"... Prior research into learning translations from source and target language monolingual texts has treated the task as an unsupervised learning problem. Although many techniques take advantage of a seed bilingual lexicon, this work is the first to use that data for supervised learning to combine a dive ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Prior research into learning translations from source and target language monolingual texts has treated the task as an unsupervised learning problem. Although many techniques take advantage of a seed bilingual lexicon, this work is the first to use that data for supervised learning to combine a diverse set of signals derived from a pair of monolingual corpora into a single discriminative model. Even in a low resource machine translation setting, where induced translations have the potential to improve performance substantially, it is reasonable to assume access to some amount of data to perform this kind of optimization. Our work shows that only a few hundred translation pairs are needed to achieve strong performance on the bilingual lexicon induction task, and our approach yields an average relative gain in accuracy of nearly 50 % over an unsupervised baseline. Large gains in accuracy hold for all 22 languages (low and high resource) that we investigate. 1
How to make words with vectors: Phrase generation in distributional semantics
"... We introduce the problem of generation in distributional semantics: Given a distri-butional vector representing some mean-ing, how can we generate the phrase that best expresses that meaning? We mo-tivate this novel challenge on theoretical and practical grounds and propose a sim-ple data-driven app ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
We introduce the problem of generation in distributional semantics: Given a distri-butional vector representing some mean-ing, how can we generate the phrase that best expresses that meaning? We mo-tivate this novel challenge on theoretical and practical grounds and propose a sim-ple data-driven approach to the estimation of generation functions. We test this in a monolingual scenario (paraphrase gen-eration) as well as in a cross-lingual set-ting (translation by synthesizing adjective-noun phrase vectors in English and gener-ating the equivalent expressions in Italian). 1
Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data
"... Statistical phrase-based translation learns translation rules from bilingual corpora, and has traditionally only used monolin-gual evidence to construct features that rescore existing translation candidates. In this work, we present a semi-supervised graph-based approach for generating new translati ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Statistical phrase-based translation learns translation rules from bilingual corpora, and has traditionally only used monolin-gual evidence to construct features that rescore existing translation candidates. In this work, we present a semi-supervised graph-based approach for generating new translation rules that leverages bilingual and monolingual data. The proposed tech-nique first constructs phrase graphs using both source and target language mono-lingual corpora. Next, graph propaga-tion identifies translations of phrases that were not observed in the bilingual cor-pus, assuming that similar phrases have similar translations. We report results on a large Arabic-English system and a medium-sized Urdu-English system. Our proposed approach significantly improves the performance of competitive phrase-based systems, leading to consistent im-provements between 1 and 4 BLEU points on standard evaluation sets. 1
Dependency-Based Decipherment for Resource-Limited Machine Translation
"... We introduce dependency relations into deci-phering foreign languages and show that de-pendency relations help improve the state-of-the-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We introduce dependency relations into deci-phering foreign languages and show that de-pendency relations help improve the state-of-the-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based ma-chine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets. 1
Combining Bilingual and Comparable Corpora for Low Resource Machine Translation
"... Statistical machine translation (SMT) performance suffers when models are trained on only small amounts of parallel data. The learned models typically have both low accuracy (incorrect translations and feature scores) and low coverage (high out-of-vocabulary rates). In this work, we use an additiona ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Statistical machine translation (SMT) performance suffers when models are trained on only small amounts of parallel data. The learned models typically have both low accuracy (incorrect translations and feature scores) and low coverage (high out-of-vocabulary rates). In this work, we use an additional data resource, comparable corpora, to improve both. Beginning with a small bitext and corresponding phrase-based SMT model, we improve coverage by using bilingual lexicon induction techniques to learn new translations from comparable corpora. Then, we supplement the model’s feature space with translation scores estimated over comparable corpora in order to improve accuracy. We observe improvements between 0.5 and 1.7 BLEU translating Tamil, Telugu,
Hallucinating Phrase Translations for Low Resource MT
"... We demonstrate that “hallucinating” phrasal translations can significantly improve the quality of machine translation in low resource conditions. Our hallucinated phrase tables consist of entries composed from multiple unigram translations drawn from the baseline phrase table and from translations t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
We demonstrate that “hallucinating” phrasal translations can significantly improve the quality of machine translation in low resource conditions. Our hallucinated phrase tables consist of entries composed from multiple unigram translations drawn from the baseline phrase table and from translations that are induced from monolingual corpora. The hallucinated phrase table is very noisy. Its translations are low precision but high recall. We counter this by introducing 30 new feature functions (including a variety of monolinguallyestimated features) and by aggressively pruning the phrase table. Our analysis evaluates the intrinsic quality of our hallucinated phrase pairs as well as their impact in end-to-end Spanish-English and Hindi-English MT. 1
Beyond Parallel Data: Joint Word Alignment and Decipherment Improves Machine Translation
"... Inspired by previous work, where de-cipherment is used to improve machine translation, we propose a new idea to combine word alignment and decipher-ment into a single learning process. We use EM to estimate the model parameters, not only to maximize the probability of parallel corpus, but also the m ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Inspired by previous work, where de-cipherment is used to improve machine translation, we propose a new idea to combine word alignment and decipher-ment into a single learning process. We use EM to estimate the model parameters, not only to maximize the probability of parallel corpus, but also the monolingual corpus. We apply our approach to im-prove Malagasy-English machine transla-tion, where only a small amount of paral-lel data is available. In our experiments, we observe gains of 0.9 to 2.1 Bleu over a strong baseline. 1