Results 1 - 10
of
18
Term Alignment in Use: Machine-Aided Human Translation
, 2000
"... Keywords: Machine-Aided Human Translation, Translation Memory, Word Alignment, Terminology Extraction 1 Introduction Parallel texts are a resource with many interesting applications. In this chapter, we look at how word and term alignment algorithms which are applied to parallel texts can be used f ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Keywords: Machine-Aided Human Translation, Translation Memory, Word Alignment, Terminology Extraction 1 Introduction Parallel texts are a resource with many interesting applications. In this chapter, we look at how word and term alignment algorithms which are applied to parallel texts can be used for machineaided human translation. Manual translation is a labor intensive process. Machine translation systems do not produce translations with high enough quality to be accepable in many situations, particularly for the localization of technical documentation. However, existing translations are an extremely valuable resource which can be exploited with software systems to improve the efficiency of human translation. Bilingual concordances and translation memories are two examples of such software systems which use parallel texts aligned at the sentence level. Recent advances in automatic terminology extraction and statistical alignment algorithms allow us to build systems which can recogni...
A language-neutral sparse-data algorithm for extracting translation patterns
- Proceedings of the 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI 99
, 1999
"... In this paper, we present an algorithm for the automatic extraction of translation patterns between two (Indo-)European languages. These consist of possibly discontiguous text fragments, with the bilingual relationship between the text fragments and the discontinuities between them made explicit. Th ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In this paper, we present an algorithm for the automatic extraction of translation patterns between two (Indo-)European languages. These consist of possibly discontiguous text fragments, with the bilingual relationship between the text fragments and the discontinuities between them made explicit. The patterns are extracted from a bilingual parallel corpus aligned at the sentence level, without the need for linguistic analysis, and are used to build a translation memory database which is intended for use in a machine aided human translation (MAHT) setting, such as a translator’s workbench (TWB). The patterns extracted could also form the basis for example-based machine translation (EBMT) without the need for complex linguistic or statistical processing. Given a TM database made up of our concept of translation patterns and a SL input string, relevant translation patterns combine to form TL translations as suggestions to the translator. We evaluate the accuracy of the translation patterns extracted along with the quality of translations produced. 1
Using Parallel Corpora to enrich Multilingual Lexical Resources
- In Third International Conference on Language Resources and Evaluation
, 2002
"... This paper describes the use of a bilingual vector model for the automatic discovery of German translations of English terms. The model is built by analysing co-occurence patterns in a parallel corpus of English and German medical abstracts, a method also used for CrossLingual Information Retrieval. ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
This paper describes the use of a bilingual vector model for the automatic discovery of German translations of English terms. The model is built by analysing co-occurence patterns in a parallel corpus of English and German medical abstracts, a method also used for CrossLingual Information Retrieval. The model generates candidate German translations of English words using the cosine similarity measure between terms in the bilingual vector space. The correct translations could be added to UMLS, the multilingual dictionary in question. The accuracy of the translations is evaluated by measuring how many of the existing UMLS translations are correctly predicted by the vector translations. The model also detects synonymy, particularly acronyms. An online public demonstration of the model is available.
Evaluation of Word Alignment Systems
, 2000
"... Recent years have seen a few serious attempts to develop methods and measures for the evaluation of word alignment systems, notably the Blinker project (Melamed, 1998) and the ARCADE project (Vronis and Langlais, forthcoming). In this paper we discuss different approaches to the problem and report o ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Recent years have seen a few serious attempts to develop methods and measures for the evaluation of word alignment systems, notably the Blinker project (Melamed, 1998) and the ARCADE project (Vronis and Langlais, forthcoming). In this paper we discuss different approaches to the problem and report on results from a project where two word alignment systems have been evaluated. These results include methods and tools for the generation of reference data and a set of measures for system performance. We note that the selection and sampling of reference data can have a great impact on scoring results.
From the Rosetta Stone to the Information Society: A Survey of parellel text processing
, 2000
"... This introductory chapter provides a survey of the processing and use of parallel texts, i.e., texts accompanied by their translation. Throughout the chapter, the various authors' contributions to the book are considered and related to the state of the art in the field. Three themes are addressed, c ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This introductory chapter provides a survey of the processing and use of parallel texts, i.e., texts accompanied by their translation. Throughout the chapter, the various authors' contributions to the book are considered and related to the state of the art in the field. Three themes are addressed, corresponding to the three parts of the book: (i) techniques and methodology for the alignment of parallel texts at various levels such as sentences, clauses or words; (ii) applications of parallel texts in fields such as translation, lexicography, and information retrieval; and (iii) available corpus resources and evaluation of alignment methods.
Automatic processing of multilingual medical terminology: Applications to thesaurus enrichment and cross-language information retrieval
- Artificial Intelligence in Medicine, 33(2
, 2005
"... We present in this article experiments on Multi-Language Information Extraction and Access in the medical domain. Methods for extracting bilingual lexicons from parallel and comparable corpora are described and their use in Multi-Language Information Access is illustrated. Our experiments show that ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present in this article experiments on Multi-Language Information Extraction and Access in the medical domain. Methods for extracting bilingual lexicons from parallel and comparable corpora are described and their use in Multi-Language Information Access is illustrated. Our experiments show that these automatically extracted bilingual lexicons are accurate enough for semi-automatically enriching mono- or bilingual thesauri (such as UMLS), and that their use in Cross-language Information Retrieval (CLIR) significantly improves the retrieval performance and clearly outperforms existing bilingual lexicon resources (both general lexicons and specialized ones).
Building a Multilingual Parallel Subtitle Corpus
"... In this paper on-going work of creating an extensive multilingual parallel corpus of movie subtitles is presented. The corpus currently contains roughly 23,000 pairs of aligned subtitles covering about 2,700 movies in 29 languages. Subtitles mainly consist of transcribed speech, sometimes in a very ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper on-going work of creating an extensive multilingual parallel corpus of movie subtitles is presented. The corpus currently contains roughly 23,000 pairs of aligned subtitles covering about 2,700 movies in 29 languages. Subtitles mainly consist of transcribed speech, sometimes in a very condensed way. Insertions, deletions and paraphrases are very frequent which makes them a challenging data set to work with especially when applying automatic sentence alignment. Standard alignment approaches rely on translation consistency either in terms of length or term translations or a combination of both. In the paper, we show that these approaches are not applicable for subtitles and we propose a new alignment approach based on time overlaps specifically designed for subtitles. In our experiments we obtain a significant improvement of alignment accuracy compared to standard length-based approaches. 1
Zweigenbaum P: Using Word Alignment to Extend Multilingual Medical Terminologies
- In the Proceedings of Language Resources and Evaluation 2006, Workshop on Acquiring and
"... Medical terminologies such as those provided in the UMLS are never exhaustive and there is a constant need to enrich them, especially in terms of multilinguality. We present a methodology to acquire new French translations of English medical terms based on word alignment in a parallel corpus — i.e. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Medical terminologies such as those provided in the UMLS are never exhaustive and there is a constant need to enrich them, especially in terms of multilinguality. We present a methodology to acquire new French translations of English medical terms based on word alignment in a parallel corpus — i.e. pairing of corresponding words. We automatically collected a 27.7-million-word parallel, English-French corpus. Based on a first 1.3-million-word extract of this corpus, we detected 3,255 French translations of English MeSH terms, among which 1,956 are new translations. 1.
SBA-term: Sparse Bilingual Association for Terms
"... Abstract—Bilingual semantic term association is very useful in cross-language information retrieval, statistical machine translation, and many other applications in natural language processing. In this paper, we present a method, named SBA-term, which applies sparse linear regression (Lasso, Least S ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—Bilingual semantic term association is very useful in cross-language information retrieval, statistical machine translation, and many other applications in natural language processing. In this paper, we present a method, named SBA-term, which applies sparse linear regression (Lasso, Least Squares with l1 penalty) and L 2 rescaling for design matrix to the task of bilingual term association. The approach hinges on formulating the task as a feature selection problem within a classification framework. Our experimental results indicate that our novel proposed method is more efficient than co-occurrence at extracting relevant bilingual terms semantic associations. In addition, our approach connects the vibrant area of sparse machine learning to an important problem of natural language processing. I.
Improving Word Alignment in an English – Malay Parallel Corpus for Machine Translation
"... A bilingual parallel corpora is an important resource in constructing an English – Malay Bilingual Knowledge Base that is heavily referred to in our English to Malay machine translation system. We present an approach that we applied at word level alignment from a bilingual parallel corpora to improv ..."
Abstract
- Add to MetaCart
A bilingual parallel corpora is an important resource in constructing an English – Malay Bilingual Knowledge Base that is heavily referred to in our English to Malay machine translation system. We present an approach that we applied at word level alignment from a bilingual parallel corpora to improve the translation quality of our English to Malay Example-based machine translation. Initially, one-to-one word alignment was applied against the source and target languages. We revised this method to a many-to-one word alignment. The comparison of translation results for both method shows that our many-to-one word alignment is capable to improve the translation quality. 1.

