Results 1 -
6 of
6
Unsupervised word sense disambiguation rivaling supervised methods
- IN PROCEEDINGS OF THE 33RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1995
"... This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require time-consuming hand annotations. The algorithm is based on two powerful constraints -- that words tend to have ..."
Abstract
-
Cited by 383 (4 self)
- Add to MetaCart
This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require time-consuming hand annotations. The algorithm is based on two powerful constraints -- that words tend to have one sense per discourse and one sense per collocation -- exploited in an iterative bootstrapping procedure. Tested accuracy exceeds 96%.
Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora
, 1992
"... This paper describes a program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as approximations of conceptual classes. The categories listed for a word in Roget's index tend to correspond to ..."
Abstract
-
Cited by 265 (10 self)
- Add to MetaCart
This paper describes a program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as approximations of conceptual classes. The categories listed for a word in Roget's index tend to correspond to sense distinctions; thus selecting the most likely category provides a useful level of sense disambiguation. The selection of categories is accomplished by identifying and weighting words that are indicative of each category when seen in context, using a Bayesian theoretical framework. Other
Using WordNet for Building WordNets
, 1998
"... This paper summarises a set of methodologies and techniques for the fast construction of multilingual WordNets. The ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
This paper summarises a set of methodologies and techniques for the fast construction of multilingual WordNets. The
Degraded Text Recognition Using Visual And Linguistic Context
, 1995
"... Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depend ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depending on the extent of context used, there are different levels of postprocessing. In current commercial OCR systems, word-level postprocessing methods, such as dictionary-lookup, have been applied successfully. However, many OCR errors cannot be corrected by word-level postprocessing. To overcome this limitation, passage-level postprocessing, in which global contextual information is utilized, is necessary. In most current studies on passage-level postprocessing, linguistic context is the major resource to be exploited. This thesis addresses problems in degraded text recognition and discusses potential solutions through passage-level postprocessing. The objective is to develop a postprocessin...
Disambiguation by Association as a Practical Method: Experiments and Findings
- Journal of Quantitative Linguistics
, 1995
"... ... We have replicated two well known methods (of word sense disambiguation) due to Lesk (1986) and Ide and Veronis (1990), and have conducted trials using both methods on a corpus of 100 sentences. We also carried out experimentes to determine whether the use of syntactic tagging would improve resu ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
... We have replicated two well known methods (of word sense disambiguation) due to Lesk (1986) and Ide and Veronis (1990), and have conducted trials using both methods on a corpus of 100 sentences. We also carried out experimentes to determine whether the use of syntactic tagging would improve results. There are three principal findings of this work. Firstly, syntactic tagging improves the performance of all the disambiguation algorithms. Secondly, the Ide and Veronis method of depth 2 performs slightly better than the Lesk method. Thirdly, the performance of a particular algorithm is heavily dependent on the way in which it is measured.
Word Sense Disambiguation by Human Subjects: Computational and Psycholinguistic Applications
"... Although automated word sense disambiguation has become a popular activity within computational lexicology, evaluation of the accuracy of disambiguation systems is still mostly limited to manual checking by the developer. This paper describes our work in collecting data on the disambiguation beha ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Although automated word sense disambiguation has become a popular activity within computational lexicology, evaluation of the accuracy of disambiguation systems is still mostly limited to manual checking by the developer. This paper describes our work in collecting data on the disambiguation behavior of human subjects, with the intention of providing (1) a norm against which dictionary-based systems (and perhaps others) can be evaluated, and (2) a source of psycholinguistic information about previously unobserved aspects of human disambiguation, for the use of both psycholinguists and computational researchers. We also describe two of our most important tools: a questionnaire of ambiguous test words in various contexts, and a hypertext user interface for efficient and powerful collection of data from human subjects.

