Results 1 - 10
of
16
Integrating subject field codes into wordnet
, 2000
"... In this paper, we present a lexical resource where WordNet synsets are annotated with Subject Field Codes. We discuss both the methodological issues we dealt with and the annotation techniques used. A quantitative analysis of the resource coverage, as well as a qualitative evaluation of the proposed ..."
Abstract
-
Cited by 113 (8 self)
- Add to MetaCart
In this paper, we present a lexical resource where WordNet synsets are annotated with Subject Field Codes. We discuss both the methodological issues we dealt with and the annotation techniques used. A quantitative analysis of the resource coverage, as well as a qualitative evaluation of the proposed annotations, are reported. 1.
An Unsupervised Method for Word Sense Tagging using Parallel
- Proceedings of ACL
, 2002
"... We present an unsupervised method for word sense disambiguation that exploits translation correspondences in parallel corpora. The technique takes advantage of the fact that crosslanguage lexicalizations of the same concept tend to be consistent, preserving some core element of its semantics, ..."
Abstract
-
Cited by 51 (2 self)
- Add to MetaCart
We present an unsupervised method for word sense disambiguation that exploits translation correspondences in parallel corpora. The technique takes advantage of the fact that crosslanguage lexicalizations of the same concept tend to be consistent, preserving some core element of its semantics, and yet also variable, reflecting differing translator preferences and the influence of context. Working with parallel corpora introduces an extra complication for evaluation, since it is difficult to find a corpus that is both sense tagged and parallel with another language; therefore we use pseudotranslations, created by machine translation systems, in order to make possible the evaluation of the approach against a standard test set. The results demonstrate that word-level translation correspondences are a valuable source of information for sense disambiguation.
Sense Discrimination with Parallel Corpora
, 2002
"... This paper describes an experiment that uses translation equivalents derived from parallel corpora to determine sense distinctions that can be used for automatic sense-tagging and other disambiguation tasks. Our results show that sense distinctions derived from cross-lingual information are at least ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
This paper describes an experiment that uses translation equivalents derived from parallel corpora to determine sense distinctions that can be used for automatic sense-tagging and other disambiguation tasks. Our results show that sense distinctions derived from cross-lingual information are at least as reliable as those made by human annotators. Because our approach is fully automated through all its steps, it could provide means to obtain large samples of "sense-tagged" data without the high cost of human annotation.
Automatic Sense Tagging Using Parallel Corpora
- In Proceedings of the 6 th Natural Language Processing Pacific Rim Symposium
, 2001
"... This article reports the results of an analysis of translation equivalents in six languages from different language families, automatically extracted from an on-line 7-way parallel corpus of George Orwell’s Nineteen Eighty-Four. The goal is to determine sense distinctions that can be used to automat ..."
Abstract
-
Cited by 20 (10 self)
- Add to MetaCart
This article reports the results of an analysis of translation equivalents in six languages from different language families, automatically extracted from an on-line 7-way parallel corpus of George Orwell’s Nineteen Eighty-Four. The goal is to determine sense distinctions that can be used to automatically sense-tag the data. Our results show that sense distinctions derived from cross-lingual information correspond to those made by human annotators, especially at the coarse-grained level. We also show that the reliability of sense assignments at finer-grained levels is comparable for human annotators and those produced automatically with cross-lingual data. 1
Parallel Translations as Sense Discriminators
- In Proceedings of the ACL SIGLEX workshop on Standardizing Lexical Resources
, 1999
"... This article reports the results of a preliminary analysis of translation equivalents in four languages from different language families, extracted from an on-line parallel corpus of George Orwell's Nineteen Eighty-Four. The goal of the study is to determine the degree to which translation equivalen ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This article reports the results of a preliminary analysis of translation equivalents in four languages from different language families, extracted from an on-line parallel corpus of George Orwell's Nineteen Eighty-Four. The goal of the study is to determine the degree to which translation equivalents for different meanings of a polysemous word in English are lexicalized differently across a variety of languages, and to determine whether this information can be used to structure or create a set of sense distinctions useful in natural language processing applications. A coherence index is computed that measures the tendency for different senses of the same English word to be lexicalized differently, and from this data a clustering algorithm is used to create sense hierarchies. Introduction It is well known that the most nagging issue for word sense disambiguation (WSD) is the definition of just what a word sense is. At its base, the problem is a philosophical and linguistic one that i...
Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity
"... There have been many proposals to extract semantically related words using measures of distributional similarity, but these typically are not able to distinguish between synonyms and other types of semantically related words such as antonyms, (co)hyponyms and hypernyms. We present a method based on ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
There have been many proposals to extract semantically related words using measures of distributional similarity, but these typically are not able to distinguish between synonyms and other types of semantically related words such as antonyms, (co)hyponyms and hypernyms. We present a method based on automatic word alignment of parallel corpora consisting of documents translated into multiple languages and compare our method with a monolingual syntax-based method. The approach that uses aligned multilingual data to extract synonyms shows much higher precision and recall scores for the task of synonym extraction than the monolingual syntax-based approach. 1
Translation as Annotation
- PROCEEDINGS OF THE AI*IA 2003 WORKSHOP "TOPICS AND PERSPECTIVES OF NATURAL LANGUAGE PROCESSING IN
, 2003
"... In this paper we illustrate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the key notion that translating a text can be seen as a linguistic annotation task which is easier than manua ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In this paper we illustrate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the key notion that translating a text can be seen as a linguistic annotation task which is easier than manual annotation with formal schemes. After translation, formal annotations can be automatically derived from aligned translated texts. We will show that translations can be exploited in various interesting ways to speed up and automate the linguistic annotation of texts. If none of the texts is already annotated, information from aligned texts can be exploited to carry out the annotation from scratch. On the contrary, if the texts in one language have been annotated and the others have not, annotations can be transferred from one language to the other. The transferbased method allows for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new languages with highly reduced human effort.
Exploiting Hidden Meanings Using Bilingual Text
- In A. Gelbukh (Ed.), Lecture Notes in Computer Science 2945: Computational Linguistics and Intelligent Text Processing: Fifth International Conference, CICLing 2004 Proceedings (pp. 283–299
, 2004
"... The last decade has taught computational linguists that high performance on broad-coverage natural language processing tasks is best obtained using supervised learning techniques, which require annotation of large quantities of training data. But annotated text is hard to obtain. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The last decade has taught computational linguists that high performance on broad-coverage natural language processing tasks is best obtained using supervised learning techniques, which require annotation of large quantities of training data. But annotated text is hard to obtain.
Harvesting Multi-Word Expressions from Parallel Corpora
"... The paper presents a set of approaches to extend the automatically created Slovene wordnet with nominal multiword expressions. In the first approach multiword expressions from Princeton WordNet are translated with a technique that is based on wordalignment and lexicosyntactic patterns. This is follo ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The paper presents a set of approaches to extend the automatically created Slovene wordnet with nominal multiword expressions. In the first approach multiword expressions from Princeton WordNet are translated with a technique that is based on wordalignment and lexicosyntactic patterns. This is followed by extracting new terms from a monolingual corpus using keywordness ranking and contextual patterns. Finally, the multiword expressions are assigned a hypernym and added to our wordnet. Manual evaluation and comparison of the results shows that the translation approach is the most straightforward and accurate. However, it is successfully complemented by the two monolingual approaches which are able to identify more term candidates in the corpus that would otherwise go unnoticed. Some weaknesses of the proposed wordnet extension techniques are also addressed. 1.
ALIGNING ONTOLOGIES THROUGH FORMAL CONCEPT ANALYSIS
"... Abstract. Ontologies have been developed for a number of knowledge domains as diverse as clinical terminology, photo camera parts and micro-array gene expression data. However, an innate characteristic of the development of ontologies is that they are often created by independent groups of expertise ..."
Abstract
- Add to MetaCart
Abstract. Ontologies have been developed for a number of knowledge domains as diverse as clinical terminology, photo camera parts and micro-array gene expression data. However, an innate characteristic of the development of ontologies is that they are often created by independent groups of expertise, which generated the necessity of merging and aligning ontologies covering overlapping domains. Many algorithms and tools have been proposed for merging of ontologies, but most of them disregard the structural properties of the source ontologies, focusing mostly on syntactic analysis. This article focuses on an alignment method for ontologies based on Formal Concept Analysis, a data analysis technique founded on lattice theory. 1

