Results 1 - 10
of
12
Exploiting Parallel Texts to Produce a Multilingual Sense Tagged Corpus for Word Sense Disambiguation
- In Proceedings of RANLP-05, Borovets
, 2005
"... We describe an approach to the automatic creation of a sense tagged corpus intended to train a word sense disambiguation (WSD) system for English-Portuguese machine translation. The approach uses parallel corpora, translation dictionaries and a set of straightforward heuristics. In an evaluati ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
We describe an approach to the automatic creation of a sense tagged corpus intended to train a word sense disambiguation (WSD) system for English-Portuguese machine translation. The approach uses parallel corpora, translation dictionaries and a set of straightforward heuristics. In an evaluation with nine corpora containing 10 ambiguous verbs, the approach achieved an average precision of 94%, compared with 58% when a state of the art statistical alignment tool was used. The resulting corpus consists of 113,802 instances tagged with the senses (i.e., translations) of the 10 verbs. Besides the word-sense tags, this corpus provides other useful information, such as POS-tags, and can be readily used as input to supervised machine learning algorithms in order to build WSD models for machine translation.
Differentiating homonymy and polysemy in information retrieval
- In Proceedings of the HLT/EMNLP
, 2005
"... Recent studies into Web retrieval have shown that word sense disambiguation can increase retrieval effectiveness. However, it remains unclear as to the minimum disambiguation accuracy required and the granularity with which one must define word sense in order to maximize these benefits. This study a ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Recent studies into Web retrieval have shown that word sense disambiguation can increase retrieval effectiveness. However, it remains unclear as to the minimum disambiguation accuracy required and the granularity with which one must define word sense in order to maximize these benefits. This study answers these questions using a simulation of the effects of ambiguity on information retrieval. It goes beyond previous studies by differentiating between homonymy and polysemy. Results show that retrieval is more sensitive to polysemy than homonymy and that, when resolving polysemy, accuracy as low as 55 % can potentially lead to increased performance. 1
An empirical study for automatic acquisition of topic signatures
- Proceedings of Third International Global WordNet Conference. Jeju Island (Korea
, 2006
"... The main goal of this work is to compare different methods for building Topic Signatures, which are vectors of weighted words acquired from large corpora. We used two different software tools, ExRetriever [Cuadros et al., 2004] and Infomap [Dorow and Widdows, 2003], for acquiring Topic Signatures fr ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The main goal of this work is to compare different methods for building Topic Signatures, which are vectors of weighted words acquired from large corpora. We used two different software tools, ExRetriever [Cuadros et al., 2004] and Infomap [Dorow and Widdows, 2003], for acquiring Topic Signatures from corpus. Using these tools, we retrieve sense examples from large text collections. We also include in the comparison the Topic Signatures acquired previously by [Agirre and de la Calle, 2004] from the web. The three systems construct queries for each word sense using WordNet. ExRetriever and Infomap acquire the sense examples from the British National Corpus. The quality of the acquired Topic Signatures is indirectly evaluated on the Word Sense Disambiguation English Lexical Task of Senseval-2. 1
Estimating class priors in domain adaptation for word sense disambiguation
- In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
, 2006
"... Instances of a word drawn from different domains may have different sense priors (the proportions of the different senses of a word). This in turn affects the accuracy of word sense disambiguation (WSD) systems trained and applied on different domains. This paper presents a method to estimate the se ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Instances of a word drawn from different domains may have different sense priors (the proportions of the different senses of a word). This in turn affects the accuracy of word sense disambiguation (WSD) systems trained and applied on different domains. This paper presents a method to estimate the sense priors of words drawn from a new domain, and highlights the importance of using well calibrated probabilities when performing these estimations. By using well calibrated probabilities, we are able to estimate the sense priors effectively to achieve significant improvements in WSD accuracy. 1
Unsupervised Learning of Ontology-Linked Selectional Preferences, Procs. of CIARP’2004
- Computational Linguistics
"... Abstract. We present a method for extracting selectional preferences of verbs from unannotated text. These selectional preferences are linked to an ontology (e.g. the hypernym relations found in WordNet), which allows for extending the coverage for unseen valency fillers. For example, if drink vodka ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We present a method for extracting selectional preferences of verbs from unannotated text. These selectional preferences are linked to an ontology (e.g. the hypernym relations found in WordNet), which allows for extending the coverage for unseen valency fillers. For example, if drink vodka is found in the training corpus, a whole WordNet hierarchy is assigned to the verb to drink (drink liquor, drink alcohol, drink beverage, drink substance, etc.), so that when drink gin is seen in a later stage, it is possible to relate the selectional preference drink vodka with drink gin (as gin is a co-hyponym of vodka). This information can be used for word sense disambiguation, prepositional phrase attachment disambiguation, syntactic disambiguation, and other applications within the approach of pattern-based statistical methods combined with knowledge. As an example, we present an application to word sense disambiguation based on the Senseval-2 training text for Spanish. The results of this experiment are similar to those obtained by Resnik for English. 1
Using web selectors for the disambiguation of all words
- In Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009
"... This research examines a word sense disambiguation method using selectors acquired from the Web. Selectors describe words which may take the place of another given word within its local context. Work in using Web selectors for noun sense disambiguation is generalized into the disambiguation of verbs ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This research examines a word sense disambiguation method using selectors acquired from the Web. Selectors describe words which may take the place of another given word within its local context. Work in using Web selectors for noun sense disambiguation is generalized into the disambiguation of verbs, adverbs, and adjectives as well. Additionally, this work incorporates previously ignored adverb context selectors and explores the effectiveness of each type of context selector according to its part of speech. Overall results for verb, adjective, and adverb disambiguation are well above a random baseline and slightly below the most frequent sense baseline, a point which noun sense disambiguation overcomes. Our experiments find that, for noun and verb sense disambiguation tasks, each type of context selector may assist target selectors in disambiguation. Finally, these experiments also help to draw insights about the future direction of similar research. 1
Word Relatives in Context for Word Sense Disambiguation
"... The current situation for Word Sense Disambiguation (WSD) is somewhat stuck due to lack of training data. We present in this paper a novel disambiguation algorithm that improves previous systems based on acquisition of examples by incorporating local context information. With a basic configuration, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The current situation for Word Sense Disambiguation (WSD) is somewhat stuck due to lack of training data. We present in this paper a novel disambiguation algorithm that improves previous systems based on acquisition of examples by incorporating local context information. With a basic configuration, our method is able to obtain state-of-the-art performance. We complemented this work by evaluating other well-known methods in the same dataset, and analysing the comparative results per word. We observed that each algorithm performed better for different types of words, and each of them failed for some particular words. We proposed then a simple unsupervised voting scheme that improved significantly over single systems, achieving the best unsupervised performance on both the Senseval 2 and Senseval 3 lexical sample datasets. 1
On the Use of Automatically Acquired Examples for All-Nouns Word Sense Disambiguation
"... This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing task that is thought to be important for many Language Technology applications, such as Information Retrieval, Information Extraction, or Machine Translation. One of the main issues preventing the deploym ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing task that is thought to be important for many Language Technology applications, such as Information Retrieval, Information Extraction, or Machine Translation. One of the main issues preventing the deployment of WSD technology is the lack of training examples for Machine Learning systems, also known as the Knowledge Acquisition Bottleneck. A method which has been shown to work for small samples of words is the automatic acquisition of examples. We have previously shown that one of the most promising example acquisition methods scales up and produces a freely available database of 150 million examples from Web snippets for all polysemous nouns in WordNet. This paper focuses on the issues that arise when using those examples, all alone or in addition to manually tagged examples, to train a supervised WSD system for all nouns. The extensive evaluation on both lexical-sample and all-words Senseval benchmarks shows that we are able to improve over commonly used baselines and to achieve top-rank performance. The good use of the prior distributions from the senses proved to be a crucial factor. 1.
An Automatic Approach to Create a Sense Tagged Corpus for Word Sense
, 2005
"... In this paper we describe a simple approach to the automatic creation of a sense tagged corpus intended for multilingual word sense disambiguation (WSD). The approach is based on English-Portuguese parallel corpora and a set of straightforward heuristics. In experiments with two corpora contai ..."
Abstract
- Add to MetaCart
In this paper we describe a simple approach to the automatic creation of a sense tagged corpus intended for multilingual word sense disambiguation (WSD). The approach is based on English-Portuguese parallel corpora and a set of straightforward heuristics. In experiments with two corpora containing some verbs, a preliminary evaluation showed that, regardless of its simplicity, the proposed approach is quite promising. Besides the word senses tags, the resulting corpus provides other kinds of useful information for WSD, such as POS-tags. We plan to employ the corpus created in a supervised machine learning process in order to build a WSD model for machine translation.

