Results 1 - 10
of
16
Evaluating Cross-Language Annotation Transfer
- in the MultiSemCor Corpus. COLING 2004, Geneva
, 2004
"... In this paper we illustrate and evaluate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annot ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper we illustrate and evaluate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The transfer approach has been tested in the creation of the MultiSemCor corpus, an English/Italian parallel corpus created on the basis of the English SemCor corpus. In MultiSemCor texts are aligned at the word level and semantically annotated with a shared inventory of senses. We present some experiments carried out to evaluate the different steps involved in the methodology. The results of the evaluation suggest that the cross-language annotation transfer methodology is a promising solution allowing for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new (resourcepoor) languages with greatly reduced human effort. 1
Sense tagging: does It make sense?
- Corpus Linguistics’2001 Conference
, 2001
"... Sense tagging is probably one of the challenges that corpus linguists have to face in the near future. So far, computerisation of this task has yielded very modest results despite numerous efforts, and sense tagging is turning out to be a touchy task. Difficulties stem from various sources, extracti ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Sense tagging is probably one of the challenges that corpus linguists have to face in the near future. So far, computerisation of this task has yielded very modest results despite numerous efforts, and sense tagging is turning out to be a touchy task. Difficulties stem from various sources, extracting disambiguating information from the context. However, one of the main problems that lies upstream of the disambiguating process is the sense inventory itself. Most tagging efforts rely on traditional dictionaries to supply the reference senses, or on computer-oriented resources such as WordNet, which do not differ significantly from traditional dictionaries in terms of sense division. The present paper shows that human taggers perform very poorly when given a traditional dictionary as the reference, and that machines should therefore not be expected to perform any better if the same kind of resource is used. A detailed analysis reveals the lack of distributional criteria in dictionary entries: traditional dictionaries are chiefly concerned with meaning definition, and not with the surface clues (syntactic, collocational, etc.) that are required to match a given sense with a given corpus occurrence. It is argued that no fundamental progress can be made until large-scale lexical resources have been built that incorporate extensive distributional information, and that, until that time, any massive sense tagging efforts based on traditional dictionaries or computer-oriented resources such as WordNet would not only be premature but also questionable in terms of resource management.
LIHLA: A lexical aligner based on language-independent heuristics
, 2005
"... Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a l ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a lexical aligner which uses bilingual probabilistic lexicons generated by a freely available set of tools (NATools) and language-independent heuristics to find links between single words and multiword units in Brazilian Portuguese, Spanish and English parallel texts. The method has achieved a precision of 92.48% and 84.35% and a recall of 88.32% and 76.39% on Brazilian Portuguese--Spanish and Brazilian Portuguese--English parallel texts, respectively.
Evaluation of Methods for Sentence and Lexical Alignment of Brazilian Portuguese and English Parallel Texts
, 2004
"... Parallel texts, i.e., texts in one language and their translations to other languages, are very useful nowadays for many applications such as machine translation and multilingual information retrieval. If these texts are aligned in a sentence or lexical level their relevance increases considerab ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Parallel texts, i.e., texts in one language and their translations to other languages, are very useful nowadays for many applications such as machine translation and multilingual information retrieval. If these texts are aligned in a sentence or lexical level their relevance increases considerably. In this paper we describe some experiments that have being carried out with Brazilian Portuguese and English parallel texts by the use of well known alignment methods: five methods for sentence alignment and two methods for lexical alignment. Some linguistic resources were built for these tasks and they are also described here. The results have shown that sentence alignment methods achieved 85.89% to 100% precision and word alignment methods, 51.84% to 95.61% on corpora from di#erent genres.
Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts
, 2005
"... Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a le ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a lexical aligner which uses bilingual probabilistic lexicons generated by a freely available set of tools (NATools) and language-independent heuristics to find links between single words and multiword units in sentence-aligned parallel texts. The method has achieved a precision of 92.44% and 85.09% and a recall of 91.13% and 64.66% on Brazilian Portuguese--Spanish and Spanish--Basque parallel texts, respectively.
Using Alignment for Multilingual Text Compression
"... Abstract. Multilingual text compression exploits the existence of the same text in several languages to compress the second and subsequent copies by reference to the first. We explore the details of this framework and present experimental results for parallel English and French texts. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Multilingual text compression exploits the existence of the same text in several languages to compress the second and subsequent copies by reference to the first. We explore the details of this framework and present experimental results for parallel English and French texts.
Sense Tagging: Don't Look For The Meaning But For The Use
- Workshop on Computational Lexicography and Multimedia Dictionaries (COMLEX’2000
, 2000
"... Automatic sense-tagging is one of the next challenges that corpus linguists have to face. So far, results are modest, despite numerous efforts, and sense-tagging appears as a vexing task. Difficulties stem from various sources, and in particular the extraction of disambiguating information from the ..."
Abstract
- Add to MetaCart
Automatic sense-tagging is one of the next challenges that corpus linguists have to face. So far, results are modest, despite numerous efforts, and sense-tagging appears as a vexing task. Difficulties stem from various sources, and in particular the extraction of disambiguating information from the context. However, one of the main problems comes upstream of the disambiguating process and lies in the sense inventory itself. Most tagging efforts use traditional dictionaries as the reference sense list, or machine-oriented resources such as WordNet which do not differ significantly from traditional dictionaries in terms of sense division. This paper shows that human taggers perform very poorly when they are given a traditional dictionary as reference, and that machines should therefore not be expected to perform better using the same kind of resource. A detailed analysis reveals the lack of distributional criteria in dictionary entries: traditional dictionaries are chiefly concerned with...
Improving a general-purpose Statistical Translation Engine by
"... The past decade has witnessed exciting work in the field of Statistical Machine Translation (SMT). However, accurate evaluation of its potential in real-life contexts is still a questionable issue. ..."
Abstract
- Add to MetaCart
The past decade has witnessed exciting work in the field of Statistical Machine Translation (SMT). However, accurate evaluation of its potential in real-life contexts is still a questionable issue.
Translation Spotting for Translation Memories
- In Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond - Volume 3
"... The term translation spotting (TS) refers to the task of identifying the target-language (TL) words that correspond to a given set of sourcelanguage (SL) words in a pair of text segments known to be mutual translations. This article examines this task within the context of a sub-sentential tra ..."
Abstract
- Add to MetaCart
The term translation spotting (TS) refers to the task of identifying the target-language (TL) words that correspond to a given set of sourcelanguage (SL) words in a pair of text segments known to be mutual translations. This article examines this task within the context of a sub-sentential translation-memory system, i.e.

