Results 1 - 10
of
299
Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures
- IN WORKSHOP ON WORDNET AND OTHER LEXICAL RESOURCES, SECOND MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 2001
"... Five different proposed measures of similarity or semantic distance in WordNet were experimentally compared by examining their performance in a real-word spelling correction system. It was found that Jiang and Conrath 's measure gave the best results overall. That of Hirst and St-Onge seriously over ..."
Abstract
-
Cited by 204 (4 self)
- Add to MetaCart
Five different proposed measures of similarity or semantic distance in WordNet were experimentally compared by examining their performance in a real-word spelling correction system. It was found that Jiang and Conrath 's measure gave the best results overall. That of Hirst and St-Onge seriously over-related, that of Resnik seriously under-related, and those of Lin and of Leacock and Chodorow fell in between.
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
- In Proceedings of the 20th International Joint Conference on Artificial Intelligence
, 2007
"... Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedi ..."
Abstract
-
Cited by 172 (7 self)
- Add to MetaCart
Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA results in substantial improvements in correlation of computed relatedness scores with human judgments: from r =0.56 to 0.75 for individual words and from r =0.60 to 0.72 for texts. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users. 1
WordNet::Similarity -- Measuring the Relatedness of Concepts
, 2004
"... WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity or relatedness between a pair of concepts (or word senses). It provides six measures of similarity, and three measures of relatedness, all of which are based on the lexical databa ..."
Abstract
-
Cited by 141 (3 self)
- Add to MetaCart
WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity or relatedness between a pair of concepts (or word senses). It provides six measures of similarity, and three measures of relatedness, all of which are based on the lexical database WordNet. These measures are implemented as Perl modules which take as input two concepts, and return a numeric value that represents the degree to which they are similar or related.
Extended gloss overlaps as a measure of semantic relatedness
- In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence
, 2003
"... This paper presents a new measure of semantic relatedness between concepts that is based on the number of shared words (overlaps) in their definitions (glosses). This measure is unique in that it extends the glosses of the concepts under consideration to include the glosses of other concepts to whic ..."
Abstract
-
Cited by 121 (5 self)
- Add to MetaCart
This paper presents a new measure of semantic relatedness between concepts that is based on the number of shared words (overlaps) in their definitions (glosses). This measure is unique in that it extends the glosses of the concepts under consideration to include the glosses of other concepts to which they are related according to a given concept hierarchy. We show that this new measure reasonably correlates to human judgments. We introduce a new method of word sense disambiguation based on extended gloss overlaps, and demonstrate that it fares well on the SENSEVAL-2 lexical sample data. 1
Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL
, 2001
"... This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of wo ..."
Abstract
-
Cited by 118 (10 self)
- Add to MetaCart
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).
Evaluating WordNet-based measures of lexical semantic relatedness
- Computational Linguistics
, 2006
"... The quantification of lexical semantic relatedness has many applications in NLP, and many different measures have been proposed. We evaluate five of these measures, all of which use WordNet as their central resource, by comparing their performance in detecting and correcting real-word spelling error ..."
Abstract
-
Cited by 88 (0 self)
- Add to MetaCart
The quantification of lexical semantic relatedness has many applications in NLP, and many different measures have been proposed. We evaluate five of these measures, all of which use WordNet as their central resource, by comparing their performance in detecting and correcting real-word spelling errors. An information-content–based measure proposed by Jiang and Conrath is found superior to those proposed by Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik. In addition, we explain why distributional similarity is not an adequate proxy for lexical semantic relatedness. 1.
Using Measures of Semantic Relatedness for Word Sense Disambiguation
, 2003
"... This paper generalizes the Adapted Lesk Algorithm of Banerjee and Pedersen (2002) to a method of word sense disambiguation based on semantic relatedness. This is possible since Lesk's original algorithm (1986) is based on gloss overlaps which can be viewed as a measure of semantic relatedness. We ev ..."
Abstract
-
Cited by 85 (7 self)
- Add to MetaCart
This paper generalizes the Adapted Lesk Algorithm of Banerjee and Pedersen (2002) to a method of word sense disambiguation based on semantic relatedness. This is possible since Lesk's original algorithm (1986) is based on gloss overlaps which can be viewed as a measure of semantic relatedness. We evaluate a variety of measures of semantic relatedness when applied to word sense disambiguation by carrying out experiments using the English lexical sample data of Senseval-2. We find that the gloss overlaps of Adapted Lesk and the semantic distance measure of Jiang and Conrath (1997) result in the highest accuracy.
Dependency-based construction of semantic space models
- Computational Linguistics
, 2007
"... Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that take syntactic relations into account. We introduce a formalization for this class of mo ..."
Abstract
-
Cited by 79 (6 self)
- Add to MetaCart
Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that take syntactic relations into account. We introduce a formalization for this class of models which allows linguistic knowledge to guide the construction process. We evaluate our framework on a range of tasks relevant for cognitive science and natural language processing: semantic priming, synonymy detection and word sense disambiguation. In all cases, our framework obtains results that are comparable or superior to the state of the art. 1.
Similarity-based models of word cooccurrence probabilities
- Machine Learning
, 1999
"... Abstract. In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach ” and “eat a beach ” is more likely. Statistical NLP met ..."
Abstract
-
Cited by 70 (0 self)
- Add to MetaCart
Abstract. In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach ” and “eat a beach ” is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on “most similar ” words. We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-off language model. The similaritybased method yields a 20 % perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error. We also compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency to avoid giving too much weight to easy-to-disambiguate high-frequency configurations. The similaritybased methods perform up to 40 % better on this particular task.
S.: Roget’s thesaurus and semantic similarity
- In: Proceedings of the RANLP-2003
, 2003
"... Roget’s Thesaurus has not been sufficiently appreciated in Natural Language Processing. We show that Roget's and WordNet are birds of a feather. In a few typical tests, we compare how the two resources help measure semantic similarity. One of the benchmarks is Miller and Charles ’ list of 30 noun pa ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
Roget’s Thesaurus has not been sufficiently appreciated in Natural Language Processing. We show that Roget's and WordNet are birds of a feather. In a few typical tests, we compare how the two resources help measure semantic similarity. One of the benchmarks is Miller and Charles ’ list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein and Goodenough’s 65 pairs, which we have also studied. Our Roget’sbased system gets correlations of.878 for the smaller and.818 for the larger list of noun pairs; this is quite close to the.885 that Resnik obtained when he employed humans to replicate the Miller and Charles experiment. We further evaluate our measure by using Roget’s and Word-Net to answer 80 TOEFL, 50 ESL and 300 Reader’s Digest questions: the correct synonym must be selected amongst a group of four words. Our system gets 78.75%, 82.00 % and 74.33 % of the questions respectively, better than any published results. 1

