Rich Document Representation for Document Clustering (2004)
| Citations: | 1 - 1 self |
BibTeX
@MISC{Jalali04richdocument,
author = {Azam Jalali and Farhad Oroumchian},
title = {Rich Document Representation for Document Clustering},
year = {2004}
}
OpenURL
Abstract
In traditional document clustering models, a document is considered as a bag of words. In this paper we present a new method for generating feature vectors, using the sentence fragments that are called logical terms and statements, in PLIR system. PLIR is a Knowledge-Based Information system based on the theory of the Plausible Reasoning. We have conducted a number of experiments using OHSUMED document collection and the clustering methods K-Means with seven different similarity measures between documents. The Experiments seem to indicate that the use of richer features such as logical terms or statements for clustering tends to perform better than the simp le bag of words approaches within our domain of experiments that is second phase of a twostage retrieval system.







