A Re-Examination of Text Categorization Methods (1999)
Cached
Download Links
- [ranger.uta.edu]
- [www.cs.indiana.edu]
- [www.cs.cmu.edu]
- [nyc.lti.cs.cmu.edu]
- DBLP
Other Repositories/Bibliography
| Citations: | 533 - 15 self |
BibTeX
@MISC{Yang99are-examination,
author = {Yiming Yang and Xin Liu},
title = {A Re-Examination of Text Categorization Methods},
year = {1999}
}
Years of Citing Articles
OpenURL
Abstract
This paper reports a controlled study with statistical significance tests on five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a NaiveBayes (NB) classifier. We focus on the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category are small (less than ten), and that all the methods perform comparably when the categories are sufficiently common (over 300 instances).







