Results 1 - 10
of
29
Knowledge-Lite Extraction of Multi-Word Units with Language Filters and Entropy Thresholds
- In Proceedings of RIAO'2000, Collége de
, 2000
"... In this paper two approaches to knowledge-lite terminology extraction are compared, both involving language filters which are used to remove ill-formed multi-word units (MWUs). A knowledge-lite approach entails swift portability to new languages and to new domains, which is difficult to achieve if k ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
In this paper two approaches to knowledge-lite terminology extraction are compared, both involving language filters which are used to remove ill-formed multi-word units (MWUs). A knowledge-lite approach entails swift portability to new languages and to new domains, which is difficult to achieve if knowledge-intensive resources such as grammars, parsers, taggers and lexicons are used. The two approaches described in this paper have been applied in monolingual term extraction for translation purposes as well as in a pre-processing stage for bilingual word and MWU alignment. The implemented software has been tested for Swedish, English, German and French. Introduction Identifying terminology in a corpus of texts is related to the problem of identifying collocations and phrases. To produce compilations of such multi word units is not a trivial problem. Statistical methods based on frequency or measuring mutual information scores for strings of words (cf. Choueka, 1988; Smadja 1993; Nagao...
A language-neutral sparse-data algorithm for extracting translation patterns
- Proceedings of the 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI 99
, 1999
"... In this paper, we present an algorithm for the automatic extraction of translation patterns between two (Indo-)European languages. These consist of possibly discontiguous text fragments, with the bilingual relationship between the text fragments and the discontinuities between them made explicit. Th ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In this paper, we present an algorithm for the automatic extraction of translation patterns between two (Indo-)European languages. These consist of possibly discontiguous text fragments, with the bilingual relationship between the text fragments and the discontinuities between them made explicit. The patterns are extracted from a bilingual parallel corpus aligned at the sentence level, without the need for linguistic analysis, and are used to build a translation memory database which is intended for use in a machine aided human translation (MAHT) setting, such as a translator’s workbench (TWB). The patterns extracted could also form the basis for example-based machine translation (EBMT) without the need for complex linguistic or statistical processing. Given a TM database made up of our concept of translation patterns and a SL input string, relevant translation patterns combine to form TL translations as suggestions to the translator. We evaluate the accuracy of the translation patterns extracted along with the quality of translations produced. 1
A System for Incremental and Interactive Word Linking
- In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC
, 2002
"... Aligned parallel corpora constitute a critical information resource for a great number of linguistic and technological endeavors. Automatic sentence alignment has reached a level whereby large parallel documents can be fully aligned with the aid of interactive post-editing tools. Word alignment syst ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Aligned parallel corpora constitute a critical information resource for a great number of linguistic and technological endeavors. Automatic sentence alignment has reached a level whereby large parallel documents can be fully aligned with the aid of interactive post-editing tools. Word alignment systems have not yet reached the same level of performance, but are good enough to support full word alignment if embedded in an interactive system. In this paper we describe a system for fast and accurate word alignment currently under development at our department, where the user can review and improve the output from an automatic system in an incremental fashion.
Evaluation of Word Alignment Systems
, 2000
"... Recent years have seen a few serious attempts to develop methods and measures for the evaluation of word alignment systems, notably the Blinker project (Melamed, 1998) and the ARCADE project (Vronis and Langlais, forthcoming). In this paper we discuss different approaches to the problem and report o ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Recent years have seen a few serious attempts to develop methods and measures for the evaluation of word alignment systems, notably the Blinker project (Melamed, 1998) and the ARCADE project (Vronis and Langlais, forthcoming). In this paper we discuss different approaches to the problem and report on results from a project where two word alignment systems have been evaluated. These results include methods and tools for the generation of reference data and a set of measures for system performance. We note that the selection and sampling of reference data can have a great impact on scoring results.
Multi-Align: Combining Linguistic and Statistical Techniques To Improve Alignments for Adaptable MT
- In Proceedings of AMTA’2004
, 2004
"... The continuously growing MT market faces the challenge of translating new languages, diverse genres, and di#erent domains using a variety of available linguistic resources. ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
The continuously growing MT market faces the challenge of translating new languages, diverse genres, and di#erent domains using a variety of available linguistic resources.
Bilingual Parallel Corpora and Language Engineering
- IN IN PROC. OF WORKSHOP ON LANGUAGE ENGINEERING FOR SOUTH-ASIAN LANGUAGES
, 2001
"... ..."
Word to Word Alignment Strategies
- In Proceedings of the 20th International Conference on Computational Linguistics (COLING-2004
, 2004
"... Word alignment is a challenging task aiming at the identification of translational relations between words and multi-word units in parallel corpora. Many alignment strategies are based on links between single words. Different strategies can be used to find the optimal word alignment using such one-t ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Word alignment is a challenging task aiming at the identification of translational relations between words and multi-word units in parallel corpora. Many alignment strategies are based on links between single words. Different strategies can be used to find the optimal word alignment using such one-toone word links including relations between multi-word units. In this paper seven algorithms are compared using a word alignment approach based on association clues and an English-Swedish bitext together with a handcrafted reference alignment used for evaluation. 1
The PLUG Project: Parallel corpora in Linkoping, uppsala, goteborg: Aims and achievements
- Uppsala University
, 1999
"... In this paper we present the aims and achievements of the PLUG project. It is a cooperative Swedish project focusing on the generation of translation data from sentencealigned bitext with Swedish as the source or the target. A sentence-aligned quadrilingual corpus was established and used as a testb ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we present the aims and achievements of the PLUG project. It is a cooperative Swedish project focusing on the generation of translation data from sentencealigned bitext with Swedish as the source or the target. A sentence-aligned quadrilingual corpus was established and used as a testbed. Two systems for word linking and contrastive lexical extraction were evaluated and improved with the aim of combining them into a common system. The common system will run as an application of a modular corpus tool also created in the project. The basic principles of the word linking systems are outlined and illustrative results are presented and discussed with regard to recall, precision, and application in example-based machine translation, enhanced machine translation, transfer-based machine translation and human translation. Further processing for these applications as well as integration issues remain to be explored. Finally, the extraction of syntactic translation data is an issue that remains to be approached. Focus will be set on verb valency with imperative and infinitive clauses as basic frames. 1.
Using Similarity Scoring To Improve the Bilingual Dictionary for Word Alignment
- In Proc. ACL 2002
, 2002
"... We describe an approach to improve the bilingual cooccurrence dictionary that is used for word alignment, and evaluate the improved dictionary using a version of the Competitive Linking algorithm. We demonstrate a problem faced by the Competitive Linking algorithm and present an approach to a ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We describe an approach to improve the bilingual cooccurrence dictionary that is used for word alignment, and evaluate the improved dictionary using a version of the Competitive Linking algorithm. We demonstrate a problem faced by the Competitive Linking algorithm and present an approach to ameliorate it. In particular, we rebuild the bilingual dictionary by clustering similar words in a language and assigning them a higher cooccurrence score with a given word in the other language than each single word would have otherwise.
Extracting multilingual lexicons from parallel corpora
, 2008
"... Extracting bilingual dictionaries from corpora can be seen as a very fine-grained alignment process, where the aligned units are not paragraphs or sentences but words and phrases. Most approaches to this problem rely on statistical means to build translation lexicons from bilingual texts, roughly fa ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Extracting bilingual dictionaries from corpora can be seen as a very fine-grained alignment process, where the aligned units are not paragraphs or sentences but words and phrases. Most approaches to this problem rely on statistical means to build translation lexicons from bilingual texts, roughly falling into two categories: the hypotheses testing approach and the estimating approach. There are pros and cons for each type of approach, some

