MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

A Re-Examination of Text Categorization Methods (1999) [452 citations — 14 self]

Abstract:

This paper reports a controlled study with statistical significance tests on five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a NaiveBayes (NB) classifier. We focus on the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category are small (less than ten), and that all the methods perform comparably when the categories are sufficiently common (over 300 instances).

Citations

1110 Support-vector networks – Cortes, Vapnik - 1995
1064 Text categorization with support vector machines: Learning with many relevant features – Joachims - 1999
574 A comparative study on feature selection in text categorization – Yang, Pedersen - 1997
519 A comparison of event model for naive Bayes text classification – McCallum, Nigam - 1998
353 An Evaluation of Statistical Approaches to Text Categorization – Yang - 1999
304 Hierachically classifying documents using very few words – Koller, Sahami - 1997
213 A comparison of two learning algorithms for text categorization – Lewis, Ringuette - 1994
211 Training Algorithms for Linear Text Classifiers – Lewis - 1996
199 Nearest Neighbor (NN) Norms: NN Pattern Recognition Classification Techniques – Dasarathy - 1991
194 Context sensitive learning methods for text categorization – Cohen - 1999
151 Sequential minimal optimization: A fast algorithm for training support vector machines – Platt
128 Support Vector Machines: Training and Applications – Osuna - 1998
128 Expert network: Effective and efficient learning from human decisions in text categorization and retrieval – Yang - 1994
121 A neural network approach to topic spotting – Wiener, Pedersen, et al. - 1995
89 Feature selection, perceptron learning, and a usability case study for text categorization – Ng, Goh, et al. - 1997
85 An example-based mapping method for text categorization and retrieval – Yang, Chute - 1994
74 Towards language independent automated learning of text categorization models – Apte, Dameru, et al. - 1994
72 Classifying News Stories using Memory based Reasoning”, The – Massand, Linoff, et al. - 1992
57 CONSTRUE/TIS: a system for content-based indexing of a database of news stories – HAYES, WEINSTEIN - 1990
56 Text categorization and relational learning – Cohen - 1995
55 Feature selection in statistical learning of text categorization – Yang, Pedersen - 1997
47 Using Generalized Instance Set for Automatic Texts Categorization – Lam, Ho - 1998
45 Automatic indexing based on bayesian inference networks – Tzeras, Hartman - 1993
44 Air/x - a rulebased multistage indexing systems for large subject fields – Fuhr, Hartmanna, et al. - 1991
44 Cluster-Based Text Categorization: A Comparison of Category Search Strategies – Iwayama, Tokunaga - 1995
39 Text categorization: a symbolic approach – Moulinier, Raskinis, et al. - 1996
20 Text mining with decision rules and decision trees – Apte, Damerau - 1998
14 The Nature of Statistical Learning Theory – Vapnic - 1995
12 Statistical Theory and Methods – Berry, Lindgren - 1998
11 Is learning bias an issue on the text categorization problem – Moulinier - 1997
10 Distributional clustering of words for text categorization – Baker, Mccallum - 1998
8 Sampling strategies and learning efficiency in text categorization – Yang - 1996
3 A comparison of event models for naivebayes text classi – McCallum, Nigam - 1998
3 Expert network: E ective and e cient learning from human decisions in text categorization and retrieval – Yang - 1994
1 Statistics: Theory and Methods. Brooks/Cole, Paci c – Berry, Lindgren - 1990
1 Sampling strategies and learning e ciency in text categorization – Yang - 1996