Results 1 - 10
of
25
A Word-to-Word Model of Translational Equivalence
, 1997
"... Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts f ..."
Abstract
-
Cited by 73 (6 self)
- Add to MetaCart
Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level . The model's precision /recall trade-off can be directly controlled via one threshold parameter. This feature makes the model more suitable for applications that are not fully statistical. The model's hidden parameters can be easily conditioned on information extrinsic to the model, providing an easy way to integrate pre-existing knowledge such as part-of-speech, dictionaries, word order, etc.. Our model can link word tokens in parallel texts as well as other translation models in the literature. Unlike other translation models, it can automatically produce dictionarysized translation lexicons, and it can do so with over 99% accuracy.
Bitext Maps and Alignment via Pattern Recognition
- Computational Linguistics
, 1999
"... This article advances the state of the art ofbitext mapping by formulating the problem in terms of pattern recognition. From this point of view, the success of a bitext mapping algorithm hinges on how well it performs three tasks: signal generation, noise filtering, and search. The Smooth Injective ..."
Abstract
-
Cited by 68 (0 self)
- Add to MetaCart
This article advances the state of the art ofbitext mapping by formulating the problem in terms of pattern recognition. From this point of view, the success of a bitext mapping algorithm hinges on how well it performs three tasks: signal generation, noise filtering, and search. The Smooth Injective Map Recognizer (SIMR) algorithm presented here integrates innovative approaches to each of these tasks. Objective evaluation has shown that SIMR's accuracy is consistently high for language pairs as diverse as French/English and Korean/English. If necessary, S IMR's bitext maps can be efficiently converted into segment alignments using the Geometric Segment Alignment (GSA) algorithm, which is also presented here. SIMR has produced bitext maps for over 200 megabytes of French-English bitexts. GSA has converted these maps into alignments. Both the maps and the alignments are available from the Linguistic Data Consortium) 1.
Improving machine translation performance by exploiting non-parallel corpora
- Computational Linguistics
, 2005
"... We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. Thus, our method can be applied with great benefit to language pairs for which only scarce resources are available. 1.
Fast and Accurate Sentence Alignment of Bilingual Corpora
- In Stephen D
, 2002
"... Abstract. We present a new method for aligning sentences with their translations in a parallel bilingual corpus. Previous approaches have generally been based either on sentence length or word correspondences. Sentence-length-based methods are relatively fast and fairly accurate. Word-correspondence ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
Abstract. We present a new method for aligning sentences with their translations in a parallel bilingual corpus. Previous approaches have generally been based either on sentence length or word correspondences. Sentence-length-based methods are relatively fast and fairly accurate. Word-correspondence-based methods are generally more accurate but much slower, and usually depend on cognates or a bilingual lexicon. Our method adapts and combines these approaches, achieving high accuracy at a modest computational cost, and requiring no knowledge of the languages or the corpus beyond division into words and sentences. 1
Automating knowledge acquisition for machine translation
- AI Mag
, 1997
"... How can we write a computer program to translate an English sentence into Japanese? Anyone who has taken a graduate-level course in Arti cial Intelligence knows the answer. First, compute the meaning of the English sentence. That is, convert it into logic or your favorite knowledge ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
How can we write a computer program to translate an English sentence into Japanese? Anyone who has taken a graduate-level course in Arti cial Intelligence knows the answer. First, compute the meaning of the English sentence. That is, convert it into logic or your favorite knowledge
Empirically Estimating Order Constraints for Content Planning in Generation
, 2001
"... In a language generation system, a content planner embodies one or more "plans" that are usually hand--crafted, sometimes through manual analysis of target text. In this paper, we present a system that we developed to automatically learn elements of a plan and the ordering constraints among them. As ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
In a language generation system, a content planner embodies one or more "plans" that are usually hand--crafted, sometimes through manual analysis of target text. In this paper, we present a system that we developed to automatically learn elements of a plan and the ordering constraints among them. As training data, we use semantically annotated transcripts of domain experts performing the task our system is designed to mimic. Given the large degree of variation in the spoken language of the transcripts, we developed a novel algorithm to find parallels between transcripts based on techniques used in computational genomics. Our proposed methodology was evaluated two--fold: the learning and generalization capabilities were quantitatively evaluated using cross validation obtaining a level of accuracy of 89%. A qualitative evaluation is also provided.
Methods and Practical Issues in Evaluating Alignment Techniques
, 1998
"... This paper describes the work achieved in the first half of a 4-year cooperative research project (ARCADE), financed by AUPELF-UREF. The project is devoted to the evaluation of paral-lel text alignment techniques. In its first period ARCADE ran a competition between six sys-tems on a sentence-to-sen ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
This paper describes the work achieved in the first half of a 4-year cooperative research project (ARCADE), financed by AUPELF-UREF. The project is devoted to the evaluation of paral-lel text alignment techniques. In its first period ARCADE ran a competition between six sys-tems on a sentence-to-sentence alignment task which yielded two main types of results. First, a large reference bilingual corpus comprising of texts of different genres was created, each pre-senting various degrees of difficulty with respect to the aligmnent task. Second, significant methodological progress was made both on the evaluation protocols and metrics, and the algorithms used by the dif-ferent systems. For the second phase, which is now underway, ARCADE has been opened to a larger number of teams who will tackle the problem of word-level alignment.
A Scalable Architecture for Bilingual Lexicography
- University of Pennsylvania
, 1997
"... Introduction SABLE (Scalable Architecture for Bilingual LExicography) is a turn-key system for producing clean broad-coverage translation lexicons from raw, unaligned parallel texts (bitexts). SABLE is designed to work for any text genre, in any pair of languages. As long as the input texts are mut ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Introduction SABLE (Scalable Architecture for Bilingual LExicography) is a turn-key system for producing clean broad-coverage translation lexicons from raw, unaligned parallel texts (bitexts). SABLE is designed to work for any text genre, in any pair of languages. As long as the input texts are mutual translations, the relative word order of the input languages makes no difference. No SABLE component makes any assumptions about the kinds of text units in the input: no component makes any use of sentence boundaries. SABLE was designed with the following features in mind: ffl Black box functionality: Automatic construction of translation lexicons requires only that the user provide the input bitexts and identify the two languages involved. ffl Robustness: SABLE copes well with omissions and inversions in translations. ffl Scalability: SABLE has been used successfully on bitexts larger than 130MB. ffl
A Multilingual Procedure for Dictionary-based Sentence Alignment
- In: Proceedings of AMTA'98: Machine Translation and the Information Soup
, 1998
"... This paper describes a sentence alignment technique based on a machine readable dictionary. Alignment takes place in a single pass through the text, based on the scores of matches between pairs of source and target sentences. Pairings consisting of sets of matches are evaluated using a version o ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper describes a sentence alignment technique based on a machine readable dictionary. Alignment takes place in a single pass through the text, based on the scores of matches between pairs of source and target sentences. Pairings consisting of sets of matches are evaluated using a version of the Gale-Shapely solution to the stable marriage problem. An algorithm is described which can handle N-to-1 (or 1-toN) matches, for n 0, i.e., deletions, 1-to-1 (including scrambling), and 1-to-many matches. A simple frequency based method for acquiring supplemental dictionary entries is also discussed. We achieve high quality alignments using available bilingual dictionaries, both for closely related language pairs (Spanish/English) and more distantly related pairs (Japanese/English).

