Unsupervised Text Mining (1997)
| Citations: | 3 - 3 self |
BibTeX
@TECHREPORT{Pedersen97unsupervisedtext,
author = {Ted Pedersen and Rebecca Bruce},
title = {Unsupervised Text Mining},
institution = {},
year = {1997}
}
OpenURL
Abstract
We describe the results of performing text mining on a challenging problem in natural language processing, word sense disambiguation. We compare two methods of unsupervised learning, Ward's minimum--variance clustering and the EM algorithm, that distinguish the meaning of an ambiguous word based only on features that can be automatically identified in text. This is a significant advantage over most previous approaches which require a training sample where the meanings of ambiguous words have been manually disambiguated. The creation of sense tagged text sufficient to serve as a training sample is expensive and time consuming and is yet another example of the knowledge acquisition bottleneck. We present experimental results showing the application of each of these algorithms to the disambiguation of three nouns using five different feature sets. We find that these methods can distinguish two senses of bill with accuracy of up to 82 percent, three senses of interest







