## A Re-Examination of Text Categorization Methods (1999)

### Cached

### Download Links

- [ranger.uta.edu]
- [www.cs.indiana.edu]
- [www.cs.cmu.edu]
- [nyc.lti.cs.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 637 - 19 self |

### BibTeX

@MISC{Yang99are-examination,

author = {Yiming Yang and Xin Liu},

title = {A Re-Examination of Text Categorization Methods},

year = {1999}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper reports a controlled study with statistical significance tests on five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a NaiveBayes (NB) classifier. We focus on the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category are small (less than ten), and that all the methods perform comparably when the categories are sufficiently common (over 300 instances).

### Citations

2162 | Support-vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ...uced by Vapnik in 1995 for solving two-class pattern recognition problems[27]. It is based on the Structural Risk Minimization principle for which error-bound analysis has been theoretically motivated=-=[27, 7]. The method is defi-=-ned over a vector space where the problem is to find a decision surface that "best" separates the data points in two classes. In order to define the "best" separation, we need to i... |

1692 | Text categorization with Support Vector Machines: Learning with many relevant features
- Joachims
- 1998
(Show Context)
Citation Context ...C literature. An increasing number of learning approaches have been applied, including regression models[9, 32], nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches =-=[25, 16, 20, 13, 12, 18, 3]-=-, decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21], neural networks[28, 22], on-line learning[6, 15] and Support Vector Machines [12]. While the rich literature provides valuable... |

952 | A comparative study on feature selection in text categorization
- Yang, Pedersen
- 1997
(Show Context)
Citation Context ... most commonly investigated application domains in the TC literature. An increasing number of learning approaches have been applied, including regression models[9, 32], nearest neighbor classification=-=[17, 29, 33, 31, 14]-=-, Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21], neural networks[28, 22], on-line learning[6, 15] and Support Ve... |

756 | A Comparison of Event Models for Naive Bayes Text Classication. AAAI98 Workshop on Learning for Text Categorization
- McCallum, Nigam
- 1998
(Show Context)
Citation Context ...C literature. An increasing number of learning approaches have been applied, including regression models[9, 32], nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches =-=[25, 16, 20, 13, 12, 18, 3]-=-, decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21], neural networks[28, 22], on-line learning[6, 15] and Support Vector Machines [12]. While the rich literature provides valuable... |

490 | An evaluation of statistical approaches to text categorization
- Yang
- 1999
(Show Context)
Citation Context ... most commonly investigated application domains in the TC literature. An increasing number of learning approaches have been applied, including regression models[9, 32], nearest neighbor classification=-=[17, 29, 33, 31, 14]-=-, Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21], neural networks[28, 22], on-line learning[6, 15] and Support Ve... |

420 | Hierarchically classifying documents using very few words
- Koller, Sahami
- 1997
(Show Context)
Citation Context ...parse document vectors. 3.5 NB Naive Bayes (NB) probabilistic classifiers are commonly studied in machine learning[19]. An increasing number of evaluations of NB methods on Reuters have been published=-=[16, 20, 13, 3, 18]-=-. The basic idea in NB approaches is to use the joint probabilities of words and categories to estimate the probabilities of categories given a document. The naive part of NB methods is the assumption... |

289 | Sequential minimal optimization: A fast algorithm for training support vector machines
- Platt
- 1998
(Show Context)
Citation Context ...w space become linearly separable[27, 7, 23]. Relatively efficient implementations of SVM include the SV M light system by Joachims[12] and the Sequential Minimal Optimization (SMO) algorithm by Platt=-=[24]-=-. An interesting property of SVM is that the decision surface is determined only by the data points which have exactly the distance 1 k~wk from the decision plane. Those points are called the support ... |

269 |
Nearest neighbor (NN) norms: NN pattern classification techniques. Los Alamitos, CA
- Dasarathy
- 1991
(Show Context)
Citation Context ...2 and our own version of kNN. 3.2 kNN kNN stands for k-nearest neighbor classification, a wellknown statistical approach which has been intensively studied in pattern recognition for over four decades=-=[8]-=-. kNN has been applied to text categorization since the early stages of the research [17, 29, 11]. It is one of the the top-performing methods on the benchmark Reuters corpus (the 21450 version, Apte ... |

267 | A comparison of two learning algorithms for text categorization
- Lewis, Ringuette
- 1994
(Show Context)
Citation Context ...C literature. An increasing number of learning approaches have been applied, including regression models[9, 32], nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches =-=[25, 16, 20, 13, 12, 18, 3]-=-, decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21], neural networks[28, 22], on-line learning[6, 15] and Support Vector Machines [12]. While the rich literature provides valuable... |

246 | Context-sensitive learning methods for text categorization
- Cohen, Singer
- 1999
(Show Context)
Citation Context ... regression models[9, 32], nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees[9, 16, 20, 2, 12], inductive rule learning=-=[1, 5, 6, 21]-=-, neural networks[28, 22], on-line learning[6, 15] and Support Vector Machines [12]. While the rich literature provides valuable information about individual methods, clear conclusions about crossmeth... |

244 | Training algorithms for linear text classifiers
- Lewis, Schapire, et al.
- 1996
(Show Context)
Citation Context ...on[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21], neural networks[28, 22], on-line learning=-=[6, 15]-=- and Support Vector Machines [12]. While the rich literature provides valuable information about individual methods, clear conclusions about crossmethod comparison have been difficult because often th... |

177 | Support vector machines: Training and applications
- Osuna, Freund, et al.
- 1997
(Show Context)
Citation Context ...Gamma bs+1 for y i = +1 (1) ~ w \Delta ~ x i \Gamma bs\Gamma1 for y i = \Gamma1 (2) and that the vector 2-norm of ~ w is minimized. The SVM problem can be solved using quadratic programming techniques=-=[27, 7, 23]-=-. The algorithms for solving linearly separable cases can be extended for solving linearly non-separable cases by either introducing soft margin hyperplanes, or by mapping the original data vectors to... |

153 | Expert network: effective and efficient learning from human decisions in text categorization and retrieval - Yang - 1994 |

152 | A neural network approach to topic spotting
- Wiener, Pedersen, et al.
- 1995
(Show Context)
Citation Context ...rest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21], neural networks=-=[28, 22]-=-, on-line learning[6, 15] and Support Vector Machines [12]. While the rich literature provides valuable information about individual methods, clear conclusions about crossmethod comparison have been d... |

115 |
Feature selection, perceptron learning, and a usability case study for text categorization
- Ng, Goh, et al.
- 1997
(Show Context)
Citation Context ...rest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21], neural networks=-=[28, 22]-=-, on-line learning[6, 15] and Support Vector Machines [12]. While the rich literature provides valuable information about individual methods, clear conclusions about crossmethod comparison have been d... |

115 |
An example-based mapping method for text categorization and retrieval
- Yang, Chute
- 1994
(Show Context)
Citation Context ...tion was confirmed by our experiments with both versions of kNN on Reuters-21578 (see the results in Section 5). 3.3 LLSF LLSF stands for Linear Least Squares Fit, a mapping approach developed by Yang=-=[32]-=-. A multivariate regression model is automatically learned from a training set of documents and their categories. The training data are represented in the form of input/output vector pairs where the i... |

92 | Classifying news stories using memory based reasoning - MASAND, LINOFF, et al. - 1992 |

84 | Towards language independent automated learning of text categorization models
- Apte, Damerau, et al.
- 1994
(Show Context)
Citation Context ... regression models[9, 32], nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees[9, 16, 20, 2, 12], inductive rule learning=-=[1, 5, 6, 21]-=-, neural networks[28, 22], on-line learning[6, 15] and Support Vector Machines [12]. While the rich literature provides valuable information about individual methods, clear conclusions about crossmeth... |

74 |
CONSTRUE-TIS: A system for content-based indexing of a database of news stories
- Hayes, Weinstein
- 1990
(Show Context)
Citation Context .... This corpus has become a new benchmark lately in TC evaluations, and is the refined version of several older versions, namely Reuters-22173 and Reuters-21450, on which many TC methods were evaluated=-=[10, 16, 1, 28, 6, 33, 22, 31]-=-, but the results on the older versions may not be directly comparable to the results on the new version. For this paper we use the ApteMod version of Reuters-21578, which was obtained by eliminating ... |

62 |
Using a generalized instance set for automatic text categorization
- Lam, Ho
- 1998
(Show Context)
Citation Context ... most commonly investigated application domains in the TC literature. An increasing number of learning approaches have been applied, including regression models[9, 32], nearest neighbor classification=-=[17, 29, 33, 31, 14]-=-, Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21], neural networks[28, 22], on-line learning[6, 15] and Support Ve... |

60 | Feature selection in statistical learning of text categorization - Yang, Pederson - 1997 |

56 | Text categorization and relational learning
- Cohen
- 1995
(Show Context)
Citation Context ... regression models[9, 32], nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees[9, 16, 20, 2, 12], inductive rule learning=-=[1, 5, 6, 21]-=-, neural networks[28, 22], on-line learning[6, 15] and Support Vector Machines [12]. While the rich literature provides valuable information about individual methods, clear conclusions about crossmeth... |

54 | Automatic indexing based on Bayesian inference networks
- Tzeras, Hartmann
- 1993
(Show Context)
Citation Context |

51 | Air/x - a rule-based multistage indexing systems for large subject elds
- Fuhr, Hartmanna, et al.
- 1991
(Show Context)
Citation Context ...ewswire stories, for example, is one the most commonly investigated application domains in the TC literature. An increasing number of learning approaches have been applied, including regression models=-=[9, 32]-=-, nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21], neural net... |

51 |
Cluster-based text categorization: a comparison of category search strategies
- Iwayama, Tokunaga
- 1995
(Show Context)
Citation Context ...a wellknown statistical approach which has been intensively studied in pattern recognition for over four decades[8]. kNN has been applied to text categorization since the early stages of the research =-=[17, 29, 11]-=-. It is one of the the top-performing methods on the benchmark Reuters corpus (the 21450 version, Apte set); the other top-performing methods include LLSF by Yang, decision trees with boosting by Apte... |

44 |
Text categorization: a symbolic approach
- Moulinier, Raskinis, et al.
- 1996
(Show Context)
Citation Context |

26 |
Text mining with decision rules and decision trees
- Apte, Damerau, et al.
- 1998
(Show Context)
Citation Context ...ing approaches have been applied, including regression models[9, 32], nearest neighbor classification[17, 29, 33, 31, 14], Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees=-=[9, 16, 20, 2, 12]-=-, inductive rule learning[1, 5, 6, 21], neural networks[28, 22], on-line learning[6, 15] and Support Vector Machines [12]. While the rich literature provides valuable information about individual meth... |

21 |
The Nature of Statistical Learning Theory
- Vapnic
- 1995
(Show Context)
Citation Context ...em's assignments (n \Theta m). 3 Classifiers 3.1 SVM Support Vector Machines (SVM) is a relatively new learning approach introduced by Vapnik in 1995 for solving two-class pattern recognition problems=-=[27]-=-. It is based on the Structural Risk Minimization principle for which error-bound analysis has been theoretically motivated[27, 7]. The method is defined over a vector space where the problem is to fi... |

17 |
Statistics theory and methods
- Berry, Lindgren
- 1996
(Show Context)
Citation Context ...1 for Tsd s:e:(sd) ; otherwise, the standard normal distribution is used instead. 4.4 Macro t-test after rank transformation To compare systems A and B based on the F1 values after rank transformation=-=[4]-=-, in which the F1 values of the two systems on individual categories are pooled together and sorted, then these values are replaced by the corresponding ranks. To make a distinction from the T-test ab... |

11 |
Is learning bias an issue on the text categorization problem
- Moulinier
- 1997
(Show Context)
Citation Context |

10 |
Distributional clustering of words for text categorization
- Baker, Mccallum
- 1998
(Show Context)
Citation Context |

9 | Sampling strategies and learning efficiency in text categorization
- Yang
(Show Context)
Citation Context ...nother open question for TC research is how robust methods are in solving problems with a skewed category distribution. Since categories typically have an extremely nonuniform distribution in practice=-=[30]-=-, it would be meaningful to compare the performance of different classifiers with respect to category frequencies, and to measure how much the effectiveness of each method depends on the amount of dat... |

7 | Expert network: E ective and e cient learning from human decisions in text categorization and retrieval - Yang - 1994 |

3 | A comparison of event models for naivebayes text classi - McCallum, Nigam - 1998 |

1 | Statistics: Theory and Methods. Brooks/Cole, Paci c - Berry, Lindgren - 1990 |

1 | Sampling strategies and learning e ciency in text categorization - Yang - 1996 |