Results 1  10
of
42
Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
, 1997
"... ..."
Models of Translational Equivalence among Words
 Computational Linguistics
, 2000
"... This article presents methods for biasing statistical translation models to reflect these properties. Evaluation with respect to independent human judgments has confirmed that translation models biased in this fashion are significantly more accurate than a baseline knowledgefree model. This article ..."
Abstract

Cited by 161 (2 self)
 Add to MetaCart
This article presents methods for biasing statistical translation models to reflect these properties. Evaluation with respect to independent human judgments has confirmed that translation models biased in this fashion are significantly more accurate than a baseline knowledgefree model. This article also shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs. Even the simplest kinds of languagespecific knowledge, such as the distinction between content words and function words, are shown to reliably boost translation model performance on some tasks. Statistical models that reflect knowledge about the model domain combine the best of both the rationalist and empiricist paradigms
Translating Collocations for Bilingual Lexicons: A Statistical Approach
, 1996
"... ..."
(Show Context)
Decoding Complexity in WordReplacement Translation Models
 Computational Linguistics
, 1999
"... This paper looks at decoding complexity. ..."
(Show Context)
A WordtoWord Model of Translational Equivalence
, 1997
"... Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts f ..."
Abstract

Cited by 82 (6 self)
 Add to MetaCart
Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level . The model's precision /recall tradeoff can be directly controlled via one threshold parameter. This feature makes the model more suitable for applications that are not fully statistical. The model's hidden parameters can be easily conditioned on information extrinsic to the model, providing an easy way to integrate preexisting knowledge such as partofspeech, dictionaries, word order, etc.. Our model can link word tokens in parallel texts as well as other translation models in the literature. Unlike other translation models, it can automatically produce dictionarysized translation lexicons, and it can do so with over 99% accuracy.
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to NonParallel Corpora
 Parallel Text Processing
, 1998
"... . We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can nonparallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method ..."
Abstract

Cited by 68 (3 self)
 Add to MetaCart
. We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can nonparallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words in noisy parallel corpora. Using DKvec on noisy parallel corpora in English/Japanese and English/Chinese, our evaluations show a 55.35% precision from a small corpus and 89.93% precision from a larger corpus. Our major contribution is in the extraction of bilingual lexicon from nonparallel corpora. We present a first such result in this area, from a new methodConvec. Convec is based on context information of a word to be translated. We show a 30% to 76% precision when topone to top20 translation candidates are considered. Most of the top20 candidates are either collocations or words rela...
Automatic Construction Of Clean BroadCoverage Translation Lexicons
 In Proceedings of the 2nd Conference of the Association for Machine Translation in the Americas
"... Wordlevel translational equivalences can be extracted from parallel texts by surprisingly simple statistical techniques. However, these techniques are easily fooled by indirect associations  pairs of unrelated words whose statistical properties resemble those of mutual translations. Indirect ass ..."
Abstract

Cited by 67 (9 self)
 Add to MetaCart
Wordlevel translational equivalences can be extracted from parallel texts by surprisingly simple statistical techniques. However, these techniques are easily fooled by indirect associations  pairs of unrelated words whose statistical properties resemble those of mutual translations. Indirect associations pollute the resulting translation lexicons, drastically reducing their precision. This paper presents an iterative lexicon cleaning method. On each iteration, most of the remaining incorrect lexicon entries are filtered out, without significant degradation in recall. This lexicon cleaning technique can produce translation lexicons with recall and precision both exceeding 90%, as well as dictionarysized translation lexicons that are over 99% correct. 1 Introduction Translation lexicons are explicit representations of translational equivalence at the word level. They are central to any machine translation system, and play a vital role in other multilingual applications, including ...
Finding terminology translations from nonparallel corpora
 In Proceedings of the 5th Annual Workshop on Very Large Corpora
, 1997
"... We present a statistical word feature, the Word Relation Matrix, which can be used to find translated pairs of words and terms from nonparallel corpora, across language groups. Online dictionary entries are used as seed words to generate Word Relation Matrices for the unknown words according to cor ..."
Abstract

Cited by 64 (5 self)
 Add to MetaCart
We present a statistical word feature, the Word Relation Matrix, which can be used to find translated pairs of words and terms from nonparallel corpora, across language groups. Online dictionary entries are used as seed words to generate Word Relation Matrices for the unknown words according to correlation measures. Word Relation Matrices are then mapped across the corpora to find translation pairs. Translation accuracies are around 30% when only the top candidate is counted. Nevertheless, top 20 candidate output give a 50.9% average increase in accuracy on human translator performance.
An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words
, 1995
"... We describe a granmmrless method for simultaneously bracketing both halves of a parallel text and giving word alignments, assuming only a translation lexicon for the language pair. We introduce inversioninvariant transduction grammars which serve as generafive models for parallel bilingual se ..."
Abstract

Cited by 43 (13 self)
 Add to MetaCart
We describe a granmmrless method for simultaneously bracketing both halves of a parallel text and giving word alignments, assuming only a translation lexicon for the language pair. We introduce inversioninvariant transduction grammars which serve as generafive models for parallel bilingual sentences with weak order constraints. Focusing on transduction grammars for bracketing, we formu late a normal form, and a stochastic version amenable to a maximumlikelihood bracketing algorithm. Several extensions and experiments are discussed.
Compiling Bilingual Lexicon Entries from a NonParallel EnglishChinese Corpus
 Proceedings of the Third Workshop on Very Large Corpora
"... We propose a novel context heterogeneity similarity measure between words and their translations in helping to compile bilingual lexicon entries from a nonparallel EnglishChinese corpus. Current algorithms for bilingual lexicon compilation rely on occurrence frequencies, length or positional sta ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
We propose a novel context heterogeneity similarity measure between words and their translations in helping to compile bilingual lexicon entries from a nonparallel EnglishChinese corpus. Current algorithms for bilingual lexicon compilation rely on occurrence frequencies, length or positional statistics derived from parallel texts. There is little correlation between such statistics of a word and its translation in nonparallel corpora. On the other hand, we suggest that words with productive context in one language translate to words with productive context in another language, and words with rigid context translate into words With rigid context. Context heterogeneity measures how productive the context of a word is in a given domain, independent of its absolute occurrence frequency in the text. Based on this information, we derive statistics of bilingual word pairs from a nonparallel corpus. These statistics can be used to bootstrap a bilingual dictionary compilation algorithm.