## A Scalability Analysis of Classifiers in Text Categorization (2003)

- DBLP

Venue: | in Proceedings of SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval, ACM |

Citations: | 38 - 3 self |

### BibTeX

@INPROCEEDINGS{Yang03ascalability,

author = {Yiming Yang and Jian Zhang and Bryan Kisiel},

title = {A Scalability Analysis of Classifiers in Text Categorization},

booktitle = {in Proceedings of SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval, ACM},

year = {2003},

pages = {96--103},

publisher = {Press}

}

### Abstract

Real-world applications of text categorization often require a system to deal with tens of thousands of categories de- ned over a large taxonomy. This paper addresses the problem with respect to a set of popular algorithms in text categorization, including Support Vector Machines, k-nearest neighbor, ridge regression, linear least square t and logistic regression. By providing a formal analysis of the computational complexity of each classi cation method, followed by an investigation on the usage of dierent classi ers in a hierarchical setting of categorization, we show how the scalability of a method depends on the topology of the hierarchy and the category distributions. In addition, we are able to obtain tight bounds for the complexities by using the power law to approximate category distributions over a hierarchy. Experiments with kNN and SVM classi ers on the OHSUMED corpus are reported on, as concrete examples.

