Results 11 - 20
of
75
Word sense disambiguation based on semantic density
- In Proceedings of the Coling-ACL'98 Workshop “Usage of WordNet in Natural Language Processing Systems
, 1998
"... This paper presents a Word Sense Disambiguation method based on the idea of semantic density between words. The disambiguation is done in the context of WordNet. The Internet is used as a raw corpora to provide statistical information for word associations. A metric is introduced and used to measure ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
This paper presents a Word Sense Disambiguation method based on the idea of semantic density between words. The disambiguation is done in the context of WordNet. The Internet is used as a raw corpora to provide statistical information for word associations. A metric is introduced and used to measure the semantic density and to rank all possible combinations of the senses of two words. This method provides a precision of 58 % in indicating the correct sense for both words at the same time. The precision increases as we consider more choices: 70 % for top two ranked and 7'3 % for top three ranked. 1
Evaluating and Combining Approaches to Selectional Preference Acquisition
- In Proc. of the EACL
, 2003
"... Previous work on the induction of se- lectional preferences has been mainly carried out for English and has concentrated almost exclusively on verbs and their direct objects. In this paper, we focus on class-based models of selec- tional preferences for German verbs and take into account not ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Previous work on the induction of se- lectional preferences has been mainly carried out for English and has concentrated almost exclusively on verbs and their direct objects. In this paper, we focus on class-based models of selec- tional preferences for German verbs and take into account not only direct objects, but also subjects and prepositional complements. We evaluate model performance against human judgments and show that there is no single method that overall performs best. We explore a variety of parametrizations for our mod- els and demonstrate that model combi- nation enhances agreement with human ratings.
Enriching a Lexical Semantic Net with Selectional Preferences by Means of Statistical Corpus Analysis
"... . Broad-coverage ontologies which represent lexical semantic knowledge are being built for more and more natural languages. Such resources provide very useful information for word sense disambiguation, which is crucial for a variety of NLP tasks (e.g. semantic annotation of corpora, information retr ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
. Broad-coverage ontologies which represent lexical semantic knowledge are being built for more and more natural languages. Such resources provide very useful information for word sense disambiguation, which is crucial for a variety of NLP tasks (e.g. semantic annotation of corpora, information retrieval, or semantic inferencing). Since the manual encoding of such ontologies is very labour-intensive, the development of (semi-)automatic methods for acquiring lexical semantic information is an important task. This paper addresses the automatic acquisition of selectional preferences of verbs by means of statistical corpus analysis. Knowledge about such preferences is essential for inducing thematic relations, which link verbal concepts to nominal concepts that are selectionally preferred as their complements. Several approaches for learning selectional preferences from corpora have been proposed in the last years. However, their usefulness for ontology building is limited. This paper intr...
A Highly Accurate Bootstrapping Algorithm For Word Sense Disambiguation
, 2001
"... In this paper, we present a bootstrapping algorithm for Word Sense Disambiguation which succeeds in disambiguating a subset of the words in the input text with very high precision. It uses WordNet and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
In this paper, we present a bootstrapping algorithm for Word Sense Disambiguation which succeeds in disambiguating a subset of the words in the input text with very high precision. It uses WordNet and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. The bootstrapping process initializes a set of ambiguous words with all the nouns and verbs in the text. It then applies various disambiguation procedures and builds a set of disambiguated words: new words are sense tagged based on their relation to the already disambiguated words, and then added to the set. This process allows us to identify, in the original text, a set of words which can be disambiguated with high precision; 55% of the verbs and nouns are disambiguated with an accuracy of 92%
A Semantic Kernel to classify texts with very few training examples
- In In Proceedings of the Workshop on Learning in Web Search, at the 22nd International Conference on Machine Learning (ICML 2005
, 2005
"... Advanced techniques to access the information distributed on the Web often exploit automatic text categorization to filter out irrelevant data before activating specific searching procedures. The drawback of such approach is the need of a large number of training documents to train the target classi ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Advanced techniques to access the information distributed on the Web often exploit automatic text categorization to filter out irrelevant data before activating specific searching procedures. The drawback of such approach is the need of a large number of training documents to train the target classifiers. One way to reduce such number relates to the use of more effective document similarities based on prior knowledge. Unfortunately, previous work has shown that such information (e.g. WordNet) causes the decrease of retrieval accuracy. In this paper, we propose kernel functions to use prior knowledge in learning algorithms for document classification. Such kernels implement balanced and statistically coherent document similarities in a vector space by means of the term similarity based on the WordNet hierarchy. Cross-validation results show the benefit of the approach for Support Vector Machines when few training examples are available. Povzetek: Predstavljena je kategorizacija besedil na osnovi malo primerov. 1
Clustering concept hierarchies from text
- In Proceedings of LREC
, 2004
"... We present a novel approach to learning taxonomies or concept hierarchies from text. The approach is based on Formal Concept Analysis, a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. Our approach is based on the distributional hypoth ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We present a novel approach to learning taxonomies or concept hierarchies from text. The approach is based on Formal Concept Analysis, a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. Our approach is based on the distributional hypothesis, i.e. that nouns or terms are similar to the extent to which they share contexts. Further, we assume that verbs pose more or less strong selectional restrictions on their arguments. The concept hierarchy is built via Formal Concept Analysis using syntactic dependencies as attributes. The approach is evaluated by comparing the produced concept hierarchies against two handcrafted taxonomies from two different domains: tourism and finance. We compare the results of our approach against a hierarchical bottom-up clustering algorithm as well as against Bi-Section-Kmeans as an instance of a top-down clustering algorithm. 1.
The Use of WordNet Sense Tagging in FAQFinder
- IN PROCEEDINGS OF THE AAAI00 WORKSHOP ON AI AND WEB SEARCH
, 2000
"... FAQFinder is a Web-based, natural language question-answering system. It answers a user's question by searching the Usenet Frequently Asked Questions (FAQ) files for a similar FAQ question, and displaying its answer to the user. To find the most similar FAQ question, FAQFinder measures similar ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
FAQFinder is a Web-based, natural language question-answering system. It answers a user's question by searching the Usenet Frequently Asked Questions (FAQ) files for a similar FAQ question, and displaying its answer to the user. To find the most similar FAQ question, FAQFinder measures similarity in part by using WordNet (Miller, 1990). To increase the accuracy of the similarity metric, we have incorporated an automated WordNet sense tagger into the process. In this paper, we show that the use of this sense tagger improves FAQFinder's matching accuracy. We argue that WordNet sense tagging can also be used in more general Web search tasks.
Statistical Models for the Induction and Use of Selectional Preferences
- COGNITIVE SCIENCE
, 2002
"... Selectional preferences have a long history in both generative and computational linguistics. However, since the publication of Resnik's dissertation in 1993, a new approach has surfaced in the computational linguistics community. This new line of research combines knowledge represented in a pre-def ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Selectional preferences have a long history in both generative and computational linguistics. However, since the publication of Resnik's dissertation in 1993, a new approach has surfaced in the computational linguistics community. This new line of research combines knowledge represented in a pre-defined semantic class hierarchy with statistical tools including information theory, statistical modeling, and Bayesian inference. These tools are used to learn selectional preferences from examples in a corpus. Instead of simple sets of semantic classes, selectional preferences are viewed as probability distributions over various entities. We survey research that extends Resnik's initial work, discuss the strengths and weaknesses of each approach, and show how they together form a cohesive line of research.
Applications of Lexical Information for Algorithmically Composing MultipleChoice Cloze
- Items Proceedings of the Second Workshop on Building Educational Applications Using NLP. Association for Computational Linguistics
, 2005
"... ABSTRACT1 We report experience in applying techniques for natural language processing to algorithmically generating test items for both reading and listening cloze items. We propose a word sense disambiguationbased method for locating sentences in which designated words carry specific senses, and ap ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
ABSTRACT1 We report experience in applying techniques for natural language processing to algorithmically generating test items for both reading and listening cloze items. We propose a word sense disambiguationbased method for locating sentences in which designated words carry specific senses, and apply a collocation-based method for selecting distractors that are necessary for multiple-choice cloze items. Experimental results indicate that our system was able to produce a usable item for every 1.6 items it returned. We also attempt to measure distance between sounds of words by considering phonetic features of the words. With the help of voice synthesizers, we were able to assist the task of composing listening cloze items. By providing both reading and listening cloze items, we would like to offer a somewhat adaptive system for assisting Taiwanese children in learning English vocabulary. 1
Clustering Syntactic Positions with Similar Semantic Requirements
"... This paper describes an unsupervised strategy to acquire syntactico-semantic requirements of nouns, verbs, and adjectives from partially parsed text corpora. The linguistic notion of requirement underlying this strategy is based on two specific assumptions. First, it is assumed that two words in a d ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper describes an unsupervised strategy to acquire syntactico-semantic requirements of nouns, verbs, and adjectives from partially parsed text corpora. The linguistic notion of requirement underlying this strategy is based on two specific assumptions. First, it is assumed that two words in a dependency are mutually required. This phenomenon is called here "co-requirement". Second, it is also claimed that the set of words occurring in similar positions defines extensionally the requirements associated to these positions. The main aim of the learning strategy presented in this paper is to identify clusters of similar positions by identifying the words that define their requirements extensionally. This strategy allows us to learn the syntactic and semantic requirements of words in di#erent positions. This information is used to solve attachment ambiguities. Results of this particular task are evaluated at the end of the paper. Extensive experimentation was performed on Portuguese text corpora

