Results 1 - 10
of
36
Measures of Distributional Similarity
- In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics
, 1999
"... We study distributional similarity measures for the purpose of improving probability estimation for unseen cooccurrences. Our contributions are three-fold: an empirical comparison of a broad range of measures; a classification of similarity functions based on the information that they incorporate; a ..."
Abstract
-
Cited by 173 (2 self)
- Add to MetaCart
We study distributional similarity measures for the purpose of improving probability estimation for unseen cooccurrences. Our contributions are three-fold: an empirical comparison of a broad range of measures; a classification of similarity functions based on the information that they incorporate; and the introduction of a novel function that is superior at evaluating potential proxy distributions.
Word sense disambiguation: The state of the art
- Computational Linguistics
, 1998
"... The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or ano ..."
Abstract
-
Cited by 92 (3 self)
- Add to MetaCart
The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or another to accomplish most natural language processing tasks. It is
Respect My Authority! HITS Without Hyperlinks, Utilizing Cluster-Based Language Models
, 2006
"... We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform re-ranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on t ..."
Abstract
-
Cited by 33 (9 self)
- Add to MetaCart
We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform re-ranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on the premise that these are mutually reinforcing entities. Links between entities are created via consideration of language models induced from them. We find that our cluster-document graphs give rise to much better retrieval performance than previously proposed document-only graphs do. For example, authority-based re-ranking of documents via a HITS-style cluster-based approach outperforms a previously-proposed PageRank-inspired algorithm applied to solely-document graphs. Moreover, we also show that computing authority scores for clusters constitutes an effective method for identifying clusters containing a large percentage of relevant documents.
A clustering approach for nearly unsupervised recognition of nonliteral language
- In Proceedings of EACL-06
, 2006
"... In this paper we present TroFi (Trope Finder), a system for automatically classifying literal and nonliteral usages of verbs through nearly unsupervised word-sense disambiguation and clustering techniques. TroFi uses sentential context instead of selectional constraint violations or paths in semanti ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
In this paper we present TroFi (Trope Finder), a system for automatically classifying literal and nonliteral usages of verbs through nearly unsupervised word-sense disambiguation and clustering techniques. TroFi uses sentential context instead of selectional constraint violations or paths in semantic hierarchies. It also uses literal and nonliteral seed sets acquired and cleaned without human supervision in order to bootstrap learning. We adapt a word-sense disambiguation algorithm to our task and augment it with multiple seed set learners, a voting schema, and additional features like SuperTags and extrasentential context. Detailed experiments on hand-annotated data show that our enhanced algorithm outperforms the baseline by 24.4%. Using the TroFi algorithm, we also build the TroFi Example Base, an extensible resource of annotated literal/nonliteral examples which is freely available to the NLP research community. 1
Gold Standard Datasets for Evaluating Word Sense Disambiguation Programs
- In Computer and the Humanities
, 1998
"... There are now many computer programs for automatically determining the sense in which a word is being used. One would like to be able to say which are better, which worse, and also which words, or varieties of language, present particular problems to which algorithms. An evaluation exercise is requi ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
There are now many computer programs for automatically determining the sense in which a word is being used. One would like to be able to say which are better, which worse, and also which words, or varieties of language, present particular problems to which algorithms. An evaluation exercise is required, and such an exercise requires a `gold standard' dataset of correct answers. Producing this proves to be a difficult and challenging task. In this paper I discuss the background, challenges and strategies, and present a detailed methodology for ensuring that the gold standard is not fool's gold. 1 Introduction There are now many computer programs for automatically determining the sense in which a word is being used. One would like to be able to say which are better, which worse, and also which words, or varieties of language, present particular problems to which algorithms. An evaluation exercise is required. A pilot (`SENSEVAL') is taking place under the auspices of ACL SIGLEX (the Le...
Context-Based Similarity Measures for Categorical Databases
- In PKDD
"... Similarity between complex data objects is one of the central notions in data mining. We propose certain similarity (or distance) measures between various components of a 0/1 relation. We define measures between attributes, between rows, and between subrelations of the database. They find import ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Similarity between complex data objects is one of the central notions in data mining. We propose certain similarity (or distance) measures between various components of a 0/1 relation. We define measures between attributes, between rows, and between subrelations of the database. They find important applications in clustering, classification, and several other data mining processes. Our measures are based on the contexts of individual components. For example, two products (i.e., attributes) are deemed similar if their respective sets of customers (i.e., subrelations) are similar. This reveals more subtle relationships between components, something that is usually missing in simpler measures. Our problem of finding distance measures can be formulated as a system of nonlinear equations. We present an iterative algorithm which, when seeded with random initial values, converges quickly to stable distances in practice (typically requiring less than five iterations). The algorithm requires only one database scan. Results on artificial and real data show that our method is efficient, and produces results with intuitive appeal.
Measures and Applications of Lexical Distributional Similarity
, 2003
"... This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, s ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, statistics, Information Retrieval (IR) and Information Theory. Our aim is to investigate the properties which make a good measure of lexical distributional similarity. We start by introducing the concept of lexical distributional similarity. We discuss potential applications, which can be roughly divided into distributional or language modelling applications and semantic applications, and methods of evaluation (Chapter 2). We look at existing measures of distributional similarity and carry out an empirical comparison of fifteen of these measures, paying particular attention to the effects of word frequency (Chapter 3). We propose a new general framework for distributional similarity based on the context of lexical substitutability, which me measure using the IR concepts of precision and recall. This framework allows us to investigate the key factors in similarity of asymmetry, the relative influence of different contexts and the extent to which words share a context (Chapter 4). Finally, we consider the application of distributional similarity in language modelling (Chapter 5) and as a predictor of semantic similarity using human judgements of similarity and a spelling correction task (Chapter 6).
Word sense disambiguation using label propagation based semi-supervised learning
- Proceedings of the ACL
, 2005
"... Shortage of manually sense-tagged data is an obstacle to supervised word sense disambiguation (WSD) methods. In this paper we investigate a label propagation based semi-supervised learning algorithm for WSD, which combines unlabeled data with labeled data in learning process by representing labeled ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Shortage of manually sense-tagged data is an obstacle to supervised word sense disambiguation (WSD) methods. In this paper we investigate a label propagation based semi-supervised learning algorithm for WSD, which combines unlabeled data with labeled data in learning process by representing labeled and unlabeled examples as vertices in a weighted graph and iteratively propagating the label information from any vertex to nearby vertices until this process converges. This label propagation process realizes a global consistency assumption: similar examples should have similar labels. Our experimental results on benchmark corpora indicate that it consistently outperforms SVM when only very few labeled examples are available, and its performance is also better than monolingual bootstrapping, and comparable to bilingual bootstrapping. 1
Unsupervised word sense disambiguation using bilingual comparable corpora
- In Proceedings of the 19th International Conference on Computational Linguistics
, 2002
"... An unsupervised method for word sense disambiguation using a bilingual comparable corpus was developed. First, it extracts statistically significant pairs of related words from the corpus of each language. Then, aligning pairs of related words translingually, it calculates the correlation between th ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
An unsupervised method for word sense disambiguation using a bilingual comparable corpus was developed. First, it extracts statistically significant pairs of related words from the corpus of each language. Then, aligning pairs of related words translingually, it calculates the correlation between the senses of a first-language polysemous word and the words related to the polysemous word, which can be regarded as clues for determining the most suitable sense. Finally, for each instance of the polysemous word, it selects the sense that maximizes the score, i.e., the sum of the correlations between each sense and the clues appearing in the context of the instance. To overcome both the problem of ambiguity in the translingual alignment of pairs of related words and that of disparity of topical coverage between corpora of different languages, an algorithm for calculating the correlation between senses and clues iteratively was devised. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora showed that the new method has promising performance; namely, the applicability and precision of its sense selection are 88.5 % and 77.7%, respectively, averaged over 60 test polysemous words. 1
Syntactic Features and Word Similarity for Supervised Metonymy Resolution
- In Proc. of ACL-03
, 2003
"... We present a supervised machine learning algorithm for metonymy resolution, which exploits the similarity between examples of conventional metonymy. We show that syntactic head-modifier relations are a high precision feature for metonymy recognition but suffer from data sparseness. ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We present a supervised machine learning algorithm for metonymy resolution, which exploits the similarity between examples of conventional metonymy. We show that syntactic head-modifier relations are a high precision feature for metonymy recognition but suffer from data sparseness.

