Results 1 - 10
of
43
Automatic Word Sense Discrimination
- Journal of Computational Linguistics
, 1998
"... This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closen ..."
Abstract
-
Cited by 272 (0 self)
- Add to MetaCart
This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training insta,nces or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words
Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces
"... This paper systematically compares unsupervised word sense discrimination techniques that cluster instances of a target word that occur in raw text using both vector and similarity spaces. The context of each instance is represented as a vector in a high dimensional feature space. Discriminati ..."
Abstract
-
Cited by 45 (11 self)
- Add to MetaCart
This paper systematically compares unsupervised word sense discrimination techniques that cluster instances of a target word that occur in raw text using both vector and similarity spaces. The context of each instance is represented as a vector in a high dimensional feature space. Discrimination is achieved by clustering these context vectors directly in vector space and also by finding pairwise similarities among the vectors and then clustering in similarity space. We employ two different representations of the context in which a target word occurs. First order context vectors represent the context of each instance of a target word as a vector of features that occur in that context.
Recognizing Subjectivity: A Case Study of Manual Tagging
- Natural Language Engineering
, 1999
"... In this paper, we describe a case study of a sentence-level categorization in which tagging instructions are developed and used by four judges to classify clauses from the Wall Street Journal as either subjective or objective. Agreement among the four judges is analyzed, and, based on that analysis, ..."
Abstract
-
Cited by 34 (7 self)
- Add to MetaCart
In this paper, we describe a case study of a sentence-level categorization in which tagging instructions are developed and used by four judges to classify clauses from the Wall Street Journal as either subjective or objective. Agreement among the four judges is analyzed, and, based on that analysis, each clause is given a final classification. To provide empirical support for the classifications, correlations are assessed in the data between the subjective category and a basic semantic class posited by Quirk et al. (1985).
Knowledge lean word-sense disambiguation
- In Proceedings of the Fifteenth National Conference on Artificial Intelligence
, 1998
"... We present a corpus{based approach to word{sense disambiguation that only requires information that can be automatically extracted from untagged text. We use unsupervised techniques to estimate the parameters of a model describing the conditional distribution of the sense group given the known conte ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
We present a corpus{based approach to word{sense disambiguation that only requires information that can be automatically extracted from untagged text. We use unsupervised techniques to estimate the parameters of a model describing the conditional distribution of the sense group given the known contextual features. Both the EM algorithm and Gibbs Sampling are evaluated to determine which is most appropriate for our data. We compare their disambiguation accuracy in an experiment with thirteen di erent words and three feature sets. Gibbs Sampling results in small but consistent improvement in disambiguation accuracy over the EM algorithm.
Word Translation Disambiguation Using Bilingual Bootstrapping
- COMPUTATIONAL LINGUISTICS
, 2002
"... This paper proposes a new method for word translation disambiguation using a machine learning technique called `Bilingual Bootstrapping'. Bilingual Bootstrapping makes use of # in learning# a small number of classified data and a large number of unclassified data in the source and the tar ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
This paper proposes a new method for word translation disambiguation using a machine learning technique called `Bilingual Bootstrapping'. Bilingual Bootstrapping makes use of # in learning# a small number of classified data and a large number of unclassified data in the source and the target languages in translation. It constructs classifiers in the two languages in parallel and repeatedly boosts the performances of the classifiers by further classifying data in each of the two languages and by exchanging between the two languages information regarding the classified data. Experimental results indicate that word translation disambiguation based on Bilingual Bootstrapping consistently and significantly outperforms the existing methods based on `Monolingual Bootstrapping'.
Word sense disambiguation: a survey
- ACM COMPUTING SURVEYS
, 2009
"... Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the motivations for solving the ambiguity of words and provide a description of the task. We overview supervised, unsupervised, and knowledge-based approaches. The assessment of WSD systems is discussed in the context of the Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating in several different disambiguation tasks. Finally, applications, open problems, and future directions are discussed.
Corpus-based Approaches to Semantic Interpretation in Natural . . .
, 1997
"... This article is an introduction to some of the emerging research in the application of corpusbased learning techniques to problems in semantic interpretation. In particular, we focus on two important problems in semantic interpretation, namely, word-sense disambiguation and semantic parsing ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
This article is an introduction to some of the emerging research in the application of corpusbased learning techniques to problems in semantic interpretation. In particular, we focus on two important problems in semantic interpretation, namely, word-sense disambiguation and semantic parsing
Evaluating high accuracy retrieval techniques
- In Proceedings of SIGIR
, 2004
"... ABSTRACT Although information retrieval research has always been concernedwith improving the effectiveness of search, in some applications, such as information analysis, a more specific requirement exists forhigh accuracy retrieval. This means that achieving high precision in the top document ranks ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
ABSTRACT Although information retrieval research has always been concernedwith improving the effectiveness of search, in some applications, such as information analysis, a more specific requirement exists forhigh accuracy retrieval. This means that achieving high precision in the top document ranks is paramount. In this paper we presentwork aimed at achieving high accuracy in ad-hoc document retrieval by incorporating approaches from question answering (QA).We focus on getting the first relevant result as high as possible in the ranked list and argue that traditional precision and recall are notappropriate measures for evaluating this task. We instead use the mean reciprocal rank (MRR) of the first relevant result. We eval-uate three different methods for modifying queries to achieve high accuracy. The experiments done on TREC data provide support forthe approach of using MRR and incorporating QA techniques for getting high accuracy in ad-hoc retrieval task. Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and Soft-ware--Performance evaluation (efficiency and effectiveness) ; H.3.3 [Information Storage and Retrieval]: Information Search andRetrieval--Query formulation
Word-Sense Distinguishability and Inter-Coder Agreement
, 1998
"... It is common in NLP that the categories into which text is classified do not have fully objective definitions. Examples of such categories are lexical distinctions such as part-of-speech tags and wordsense distinctions, sentence level distinctions such as phrase attachment, and discourse level disti ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
It is common in NLP that the categories into which text is classified do not have fully objective definitions. Examples of such categories are lexical distinctions such as part-of-speech tags and wordsense distinctions, sentence level distinctions such as phrase attachment, and discourse level distinctions such as topic or speech-act categorization. This paper presents an approach to analyzing the agreement among human judges for the purpose of formulating a refined and more reliable set of category designations. We use these techniques to analyze the sense tags assigned by five judges to the noun interest. The initial tag set is taken from Longman's Dictionary of Contemporary English. Through this process of analysis, we automatically identify and assign a revised set of sense tags for the data. The revised tags exhibit high reliability as measured by Cohen's . Such techniques are important for formulating and evaluating both human and automated classification systems. Introduction ...
Word sense disambiguation using label propagation based semi-supervised learning
- Proceedings of the ACL
, 2005
"... Shortage of manually sense-tagged data is an obstacle to supervised word sense disambiguation (WSD) methods. In this paper we investigate a label propagation based semi-supervised learning algorithm for WSD, which combines unlabeled data with labeled data in learning process by representing labeled ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Shortage of manually sense-tagged data is an obstacle to supervised word sense disambiguation (WSD) methods. In this paper we investigate a label propagation based semi-supervised learning algorithm for WSD, which combines unlabeled data with labeled data in learning process by representing labeled and unlabeled examples as vertices in a weighted graph and iteratively propagating the label information from any vertex to nearby vertices until this process converges. This label propagation process realizes a global consistency assumption: similar examples should have similar labels. Our experimental results on benchmark corpora indicate that it consistently outperforms SVM when only very few labeled examples are available, and its performance is also better than monolingual bootstrapping, and comparable to bilingual bootstrapping. 1

