Results 1 - 10
of
14
Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
, 1997
"... ..."
Termight: Identifying and Translating Technical Terminology
, 1994
"... We propose a semi-automatic tool, termight, that helps professional translators and terminologists identify technical terms and their translations. The tool makes use of part-of-speech tagging and word-alignment programs to extract candidate terms and their translations. Although the extraction prog ..."
Abstract
-
Cited by 80 (1 self)
- Add to MetaCart
We propose a semi-automatic tool, termight, that helps professional translators and terminologists identify technical terms and their translations. The tool makes use of part-of-speech tagging and word-alignment programs to extract candidate terms and their translations. Although the extraction programs are far from perfect, it isn't too hard for the user to filter out the wheat from the chaff. The extraction algorithms emphasize completeness. Alter-native proposals are likely to miss important but infrequent terms/translations. To reduce the burden on the user during the filtering phase, candidates are presented in a convenient order, along with some useful concordance evidence, in an interface that is designed to minimize keystrokes. Termight is currently being used by the trans-
A Word-to-Word Model of Translational Equivalence
, 1997
"... Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts f ..."
Abstract
-
Cited by 73 (6 self)
- Add to MetaCart
Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level . The model's precision /recall trade-off can be directly controlled via one threshold parameter. This feature makes the model more suitable for applications that are not fully statistical. The model's hidden parameters can be easily conditioned on information extrinsic to the model, providing an easy way to integrate pre-existing knowledge such as part-of-speech, dictionaries, word order, etc.. Our model can link word tokens in parallel texts as well as other translation models in the literature. Unlike other translation models, it can automatically produce dictionarysized translation lexicons, and it can do so with over 99% accuracy.
Robust Bilingual Word Alignment for Machine Aided Translation
- In Proceedings of the Workshop on Very Large Corpora
, 1993
"... We have developed a new program called word_align for aligning parallel text, text such as the Canadian Hansards that are available in two or more languages. The program takes the output of char_align (Church, 1993), a robust alternative to sentence-based alignment pro- grams, and applies word-level ..."
Abstract
-
Cited by 64 (2 self)
- Add to MetaCart
We have developed a new program called word_align for aligning parallel text, text such as the Canadian Hansards that are available in two or more languages. The program takes the output of char_align (Church, 1993), a robust alternative to sentence-based alignment pro- grams, and applies word-level constraints us- ing a version of Brown et al.'s Model 2 (Brown et al., 1993), modified and extended to deal with robustness issues. Word_align was tested on a subset of Canadian Itansards supplied by Simard (Simard et al., 1992). The combination of word_align plus char_align reduces the variance (average square error) by a factor of 5 over char_align alone. More importantly, because word_align and char_align were designed to work robustly on texts that are smaller and more noisy than the 1tansards, it has been pos- sible to successfully deploy the programs at AT&T Language Line Services, a commercial translation service, to help them with difficult terminology.
Automatic Construction Of Clean Broad-Coverage Translation Lexicons
- In Proceedings of the 2nd Conference of the Association for Machine Translation in the Americas
"... Word-level translational equivalences can be extracted from parallel texts by surprisingly simple statistical techniques. However, these techniques are easily fooled by indirect associations --- pairs of unrelated words whose statistical properties resemble those of mutual translations. Indirect ass ..."
Abstract
-
Cited by 55 (9 self)
- Add to MetaCart
Word-level translational equivalences can be extracted from parallel texts by surprisingly simple statistical techniques. However, these techniques are easily fooled by indirect associations --- pairs of unrelated words whose statistical properties resemble those of mutual translations. Indirect associations pollute the resulting translation lexicons, drastically reducing their precision. This paper presents an iterative lexicon cleaning method. On each iteration, most of the remaining incorrect lexicon entries are filtered out, without significant degradation in recall. This lexicon cleaning technique can produce translation lexicons with recall and precision both exceeding 90%, as well as dictionary-sized translation lexicons that are over 99% correct. 1 Introduction Translation lexicons are explicit representations of translational equivalence at the word level. They are central to any machine translation system, and play a vital role in other multilingual applications, including ...
Grammarless Extraction of Phrasal Translation Examples from Parallel Texts
- In Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation
, 1995
"... We describe a method for identifying subsentential phrasal translation examples in sentencealigned parallel corpora, using only a probabilistic translation lexicon for the language pair. Our method differs from previous approaches in that (1) it is founded on a formal basis, making use of an inversi ..."
Abstract
-
Cited by 31 (7 self)
- Add to MetaCart
We describe a method for identifying subsentential phrasal translation examples in sentencealigned parallel corpora, using only a probabilistic translation lexicon for the language pair. Our method differs from previous approaches in that (1) it is founded on a formal basis, making use of an inversion transduction grammar (ITG) formalism that we recently developed for bilingual language modeling, and (2) it requires no language-specific monolingual grammars for the source and target languages. Instead, we devise a generic, language-independent constituent-matching ITG with inherent expressiveness properties that correspond to a desirable level of matching flexibility. Bilingual parsing, in conjunction with a stochastic version of the ITG formalism, performs the phrasal translation extraction. The Hong Kong University of Science & Technology Technical Report Series Department of Computer Science TMI-95 WU 2 1 Introduction Phrasal translation examples at the subsentential level are an...
A Class-based Approach to Word Alignment
- Computational Linguistics
, 1997
"... This paper presents an algorithm capable of identifying the translation for each word in a bilingual corpus. Previously proposed methods rely heavily on word-based statistics. Under a word-based approach, frequent words with a consistent translation can be aligned at a high rate of precision. Howeve ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
This paper presents an algorithm capable of identifying the translation for each word in a bilingual corpus. Previously proposed methods rely heavily on word-based statistics. Under a word-based approach, frequent words with a consistent translation can be aligned at a high rate of precision. However, words that are less frequent or exhibit diverse translations generally do not have statistically significant evidence for confident alignment, thereby leading to incomplete or incorrect alignments. The algorithm proposed herein attempts to broaden coverage by exploiting lexicographic resources. To this end, we draw on the two classification systems of words in Longman Lexicon of Contemporary English (LLOCE) and Tongyici Cilin (Synonym Forest, CILIN). Automatically acquired class-based alignment rules are used to compensate for what is lacking in a bilingual dictionary such as the English-Chinese version of the Longman Dictionary of Contemporary English (LecDOCE). In addition, this alignment method is implemented using LecDOCE examples and their translations for training and testing, while further examples from a technical manual in both English and Chinese are used for an open test. Quantitative results of the closed and open tests are also summarized
Flow Network Models for Word Alignment and Terminology Extraction From Bilingual Corpora
, 1998
"... This paper presents a new model for word alignments between parallel sentences, which allows one to accurately estimate different parameters, in a computationally efficient way. An application of this model to bilingual terminology ex- traction, where terms are identified in one language and guesse ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
This paper presents a new model for word alignments between parallel sentences, which allows one to accurately estimate different parameters, in a computationally efficient way. An application of this model to bilingual terminology ex- traction, where terms are identified in one language and guessed, through the alignment process, in the other one, is also described. An experiment conducted on a small English-French parallel corpus gave results with high precision, demonstrating the validity of the model.
Term Alignment in Use: Machine-Aided Human Translation
, 2000
"... Keywords: Machine-Aided Human Translation, Translation Memory, Word Alignment, Terminology Extraction 1 Introduction Parallel texts are a resource with many interesting applications. In this chapter, we look at how word and term alignment algorithms which are applied to parallel texts can be used f ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Keywords: Machine-Aided Human Translation, Translation Memory, Word Alignment, Terminology Extraction 1 Introduction Parallel texts are a resource with many interesting applications. In this chapter, we look at how word and term alignment algorithms which are applied to parallel texts can be used for machineaided human translation. Manual translation is a labor intensive process. Machine translation systems do not produce translations with high enough quality to be accepable in many situations, particularly for the localization of technical documentation. However, existing translations are an extremely valuable resource which can be exploited with software systems to improve the efficiency of human translation. Bilingual concordances and translation memories are two examples of such software systems which use parallel texts aligned at the sentence level. Recent advances in automatic terminology extraction and statistical alignment algorithms allow us to build systems which can recogni...

