Results 1 -
7 of
7
Evaluating WordNet-based measures of lexical semantic relatedness
- Computational Linguistics
, 2006
"... The quantification of lexical semantic relatedness has many applications in NLP, and many different measures have been proposed. We evaluate five of these measures, all of which use WordNet as their central resource, by comparing their performance in detecting and correcting real-word spelling error ..."
Abstract
-
Cited by 88 (0 self)
- Add to MetaCart
The quantification of lexical semantic relatedness has many applications in NLP, and many different measures have been proposed. We evaluate five of these measures, all of which use WordNet as their central resource, by comparing their performance in detecting and correcting real-word spelling errors. An information-content–based measure proposed by Jiang and Conrath is found superior to those proposed by Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik. In addition, we explain why distributional similarity is not an adequate proxy for lexical semantic relatedness. 1.
Dependency-based construction of semantic space models
- Computational Linguistics
, 2007
"... Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that take syntactic relations into account. We introduce a formalization for this class of mo ..."
Abstract
-
Cited by 79 (6 self)
- Add to MetaCart
Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that take syntactic relations into account. We introduce a formalization for this class of models which allows linguistic knowledge to guide the construction process. We evaluate our framework on a range of tasks relevant for cognitive science and natural language processing: semantic priming, synonymy detection and word sense disambiguation. In all cases, our framework obtains results that are comparable or superior to the state of the art. 1.
Correcting Real-Word Spelling Errors by Restoring Lexical Cohesion
, 2001
"... Spelling errors that happen to result in a real word in the lexicon cannot be detected by a conventional spelling checker. We present a method for detecting and correcting many such errors by identifying tokens that are semantically unrelated to their context and are spelling variations of words tha ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
Spelling errors that happen to result in a real word in the lexicon cannot be detected by a conventional spelling checker. We present a method for detecting and correcting many such errors by identifying tokens that are semantically unrelated to their context and are spelling variations of words that would be related to the context. Relatedness to context is determined by a measure of semantic distance initially proposed by Jiang and Conrath (1997). We tested the method on an artificial corpus of errors; it achieved recall of up to 50% and precision of 18 to 25% -- levels that approach practical usability.
Environmental Determinants of Lexical Processing Effort
, 2000
"... A central concern of psycholinguistic research is explaining the relative ease or difficulty involved in processing words. In this thesis, we explore the connection between lexical processing effort and measurable properties of the linguistic environment. Distributional information (information abou ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
A central concern of psycholinguistic research is explaining the relative ease or difficulty involved in processing words. In this thesis, we explore the connection between lexical processing effort and measurable properties of the linguistic environment. Distributional information (information about a word's contexts of use) is easily extracted from large language corpora in the form of co-occurrence statistics. We claim that such simple distributional statistics can form the basis of a parsimonious model of lexical processing effort.
Measures and Applications of Lexical Distributional Similarity
, 2003
"... This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, s ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, statistics, Information Retrieval (IR) and Information Theory. Our aim is to investigate the properties which make a good measure of lexical distributional similarity. We start by introducing the concept of lexical distributional similarity. We discuss potential applications, which can be roughly divided into distributional or language modelling applications and semantic applications, and methods of evaluation (Chapter 2). We look at existing measures of distributional similarity and carry out an empirical comparison of fifteen of these measures, paying particular attention to the effects of word frequency (Chapter 3). We propose a new general framework for distributional similarity based on the context of lexical substitutability, which me measure using the IR concepts of precision and recall. This framework allows us to investigate the key factors in similarity of asymmetry, the relative influence of different contexts and the extent to which words share a context (Chapter 4). Finally, we consider the application of distributional similarity in language modelling (Chapter 5) and as a predictor of semantic similarity using human judgements of similarity and a spelling correction task (Chapter 6).
A Context-based Model of Semantic Similarity
, 1997
"... Lexical co-occurrence counts from large corpora have been used to construct highdimensional vector-space models of language.. Distances between word vectors extracted from these models are generally considered to reflect semantic similarity. Implicit in this assumption is that `semantic distance' me ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Lexical co-occurrence counts from large corpora have been used to construct highdimensional vector-space models of language.. Distances between word vectors extracted from these models are generally considered to reflect semantic similarity. Implicit in this assumption is that `semantic distance' measurements correspond to human intuitions. This paper investigates the validity of one such measure, contextual similarity, calculated from the spoken part of the British National Corpus. In Experiment 1, a moderate correlation is found between human judgements of the semantic similarity between pairs of nouns and the model's measure of contextual similarity. The correlation between the two measures is confirmed in two additional experiments, using a new set of elicited ratings. The semantic similarity of same-category word pairs (Experiment 2A) and the similarity between words differing in syntactic category (Experiment 2B) is found to be predictable from contextual similarity. The results...
Improving word representations via global context and multiple word prototypes
- In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL
, 2012
"... Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models. 1 1

