Results 1 -
6 of
6
A Hybrid Relational Approach for WSD - First Results
- Coling-ACL
, 2006
"... We present a novel hybrid approach for Word Sense Disambiguation (WSD) which makes use of a relational formalism to represent instances and background knowledge. It is built using Inductive Logic Programming techniques to combine evidence coming from both sources during the learning process, produci ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We present a novel hybrid approach for Word Sense Disambiguation (WSD) which makes use of a relational formalism to represent instances and background knowledge. It is built using Inductive Logic Programming techniques to combine evidence coming from both sources during the learning process, producing a rule-based WSD model. We experimented with this approach to disambiguate 7 highly ambiguous verbs in English-Portuguese translation. Results showed that the approach is promising, achieving an average accuracy of 75%, which outperforms the other machine learning techniques investigated (66%). 1
Measuring Historical Word Sense Variation
"... We describe here a method for automatically identifying word sense variation in a dated collection of historical books in a large digital library. By leveraging a small set of known translation book pairs to induce a bilingual sense inventory and labeled training data for a WSD classifier, we are ab ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We describe here a method for automatically identifying word sense variation in a dated collection of historical books in a large digital library. By leveraging a small set of known translation book pairs to induce a bilingual sense inventory and labeled training data for a WSD classifier, we are able to automatically classify the Latin word senses in a 389 million word corpus and track the rise and fall of those senses over a span of two thousand years. We evaluate the performance of seven different classifiers both in a tenfold test on 83,892 words from the aligned parallel corpus and on a smaller, manually annotated sample of 525 words, measuring both the overall accuracy of each system and how well that accuracy correlates (via mean square error) to the observed historical variation.
M.: Translation Context Sensitive WSD
- the 11th Annual Conference of the European Association for Machine Translation
, 2006
"... While it is generally agreed that Word Sense Disambiguation (WSD) is an application-dependent task, the great majority of systems pursue application-independent approaches. We propose a strategy to support WSD for Machine Translation which is designed specifically for this application. It relies on ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
While it is generally agreed that Word Sense Disambiguation (WSD) is an application-dependent task, the great majority of systems pursue application-independent approaches. We propose a strategy to support WSD for Machine Translation which is designed specifically for this application. It relies on the analysis of co-occurrences in the context that refer to words which have already been translated. Experiments on the English-Portuguese translation of 10 verbs using just this knowledge yielded an accuracy of 51%, which outperforms the baseline using the most frequent translation (37%). A less strict evaluation criterion considering the 10 best ranked translations proved the potential for this approach to be used as extra knowledge source for WSD: the correct translation was among the top 10 results in 92% of the cases. 1.
A Hybrid Relational Approach for Word Sense Disambiguation
"... Abstract. We propose a novel approach for word sense disambiguation which makes use of corpus-based evidence combined with background knowledge. Using an inductive logic programming technique, it generates expressive models which exploit several knowledge sources and also the relations between them. ..."
Abstract
- Add to MetaCart
Abstract. We propose a novel approach for word sense disambiguation which makes use of corpus-based evidence combined with background knowledge. Using an inductive logic programming technique, it generates expressive models which exploit several knowledge sources and also the relations between them. The approach is evaluated in two tasks: identification of the correct translation for verbs in English-Portuguese and disambiguation of verbs from the Senseval-3 competition. The accuracy obtained in the multilingual task outperforms the alternative learning techniques investigated. The models also yielded significant improvement to the translation quality when integrated into a machine translation system. In the monolingual task, the approach performs as well as the state-of-theart systems for Senseval verbs.
Using a Parallel Corpus in Translation Practice and Research
"... There are so many variables underlying translation that examining anything longer than a few paragraphs of translated text at a time can become quite a daunting task. Using the technology of corpus linguistics, however, it is possible to analyse enormous quantities of translated text in unprecedente ..."
Abstract
- Add to MetaCart
There are so many variables underlying translation that examining anything longer than a few paragraphs of translated text at a time can become quite a daunting task. Using the technology of corpus linguistics, however, it is possible to analyse enormous quantities of translated text in unprecedented ways. A parallel language corpus, i.e., a computerized collection of texts in one language aligned with their translations into another language, can provide automatic access to countless features of translated texts that up to now have not been possible to study in a systematic way. COMPARA, a translation tool developed by Linguateca 1, is the largest public, edited online parallel corpus of English and Portuguese in the world. In its current version 7.04, it provides access to almost three million words of original and translated fiction published in Portuguese and English. The aim of this presentation is to offer a brief description of the corpus and to demonstrate how it can be used in translation practice and research. Key words: parallel corpora, Portuguese-English, translation. A brief introduction to the COMPARA corpus COMPARA is an extensible bidirectional parallel corpus of English and Portuguese. At
Compiling and using a parallel corpus for research in translation
"... There are so many variables underlying translation that examining anything longer than a few paragraphs of translated text at a time can become quite a daunting task. The advent of corpus linguistics, however, has made it possible to analyse enormous quantities of translated text in unprecedented wa ..."
Abstract
- Add to MetaCart
There are so many variables underlying translation that examining anything longer than a few paragraphs of translated text at a time can become quite a daunting task. The advent of corpus linguistics, however, has made it possible to analyse enormous quantities of translated text in unprecedented ways. In line with these advances, parallel corpora can provide access to many aspects of translation that had previously not been possible to study in a systematic way. The first part of this paper discusses different types of decisions that have to be made when building a parallel corpus, with particular emphasis to compilation questions that are unique to parallel corpora as opposed to corpora in general. This is followed by an account of the choices made when creating COMPARA- a post-edited, bi-directional parallel corpus of English and Portuguese literary texts with 3 million words, freely available for research and education at

