Topical Clustering of MRD Senses Based on Information Retrieval Techniques (1998)
| Citations: | 11 - 1 self |
BibTeX
@MISC{Chen98topicalclustering,
author = {Jen Nan Chen and Jason S. Chang},
title = {Topical Clustering of MRD Senses Based on Information Retrieval Techniques},
year = {1998}
}
Years of Citing Articles
OpenURL
Abstract
relations B 183-318 Space C 319-446 Matter D 447-594 Intellect: the exercise of the mind E 595-816 Volition: the exercise of the will F 817-990 Emotion, religion and morality coarse semantic categories. We briefly describe the on-line thesauri, WordNet (Miller et al. 1993), Roget's Thesaurus, and LLOCE, which have been used as word sense divisions in the computational linguistics literature. WordNet is organized as a set of hierarchical, conceptual taxonomies of nouns, verbs, adjectives, and adverbs called synsets. The synsets are too fine-grained from the WSD perspective; WordNet contains 24,825 noun synsets for 32,264 distinct nouns with a total of 43,136 senses in its noun taxonomy alone. It would be difficult to acquire WSD knowledge for making such fine distinctions even from a substantial body of training materials. Roget's Thesaurus arranges words in a three-layer hierarchy and organizes over 30,000 distinct words into some 1,000 categories on the bottom layer. These categories are divided into 39 middle-layer sections that are further organized as 6 top-layer classes. Each category is given a three-digit reference code. To make the hierarchical structure explicit, an uppercase letter from A to F is added to the reference code to denote the top-layer class for each category, as indicated in Table 2. Similarly, the middle layer is denoted with a lowercase reference letter. The sections related to class B (Space) are shown in Table 3. Therefore, the reference code for each category is denoted by an uppercase class letter, a lowercase section letter, and a three-digit category number. Sections related to the Space class in Roget's.







