Distributional Clustering Of Words For Text Categorization (2003)
| Citations: | 6 - 0 self |
BibTeX
@MISC{Bekkerman03distributionalclustering,
author = {Ron Bekkerman},
title = {Distributional Clustering Of Words For Text Categorization},
year = {2003}
}
Years of Citing Articles
OpenURL
Abstract
We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. The word-cluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and e#cient representation of documents. When combined with the classification power of the SVM, this method yields high performance in text categorization. We compare this technique with SVM-based categorization using the simple minded bag-of-words (BOW) representation. The comparison is performed over three known datasets. On one of these datasets (the 20 Newsgroups) the method that is based on word clusters significantly outperforms the word-based representation in terms of categorization accuracy or representation e#ciency. On the two other sets (Reuters-21578 and WebKB) the word-based representation slightly outperforms the word-cluster representation. We investigate the potential reasons for this behavior.







