A Loss Function Analysis for Classification Methods in Text Categorization (2003) [16 citations — 5 self]
Abstract:
This paper presents a formal analysis of popular text classification methods, focusing on their loss functions whose minimization is essential to the optimization of those methods, and whose decomposition into the training-set loss and the model complexity enables cross-method comparisons on a common basis from an optimization point of view. Those methods include Support Vector Machines, Linear Regression, Logistic Regression, Neural Network, Naive Bayes, K-Nearest Neighbor, Rocchio-style and Multi-class Prototype classifiers. Theoretical analysis (including our new derivations) is provided for each method, along with evaluation results for all the methods on the Reuters-21578 benchmark corpus. Using linear regression, neural networks and logistic regression methods as examples, we show that properly tuning the balance between the training-set loss and the complexity penalty would have a significant impact to the performance of a classifier. In linear regression, in particular, the tuning of the complexity penalty yielded a result (measured using macro-averaged F1) that outperformed all text categorization methods ever evaluated on that benchmark corpus, including Support Vector Machines.
Citations
| 5139 | Statistical learning theory – Vapnik - 1998 |
| 1065 | Text categorization with support vector machines: Learning with many relevant features – Joachims - 1998 |
| 520 | A comparison of event models for naive bayes text classification – McCallum, Nigam - 1998 |
| 454 | X.: A re-examination of text categorization methods – Yang, Liu - 1999 |
| 354 | An evaluation of statistical approaches to text categorization – Yang - 1997 |
| 265 | The elements of statistical learning: Data mining, inference and prediction – Hastie, Tibshirani, et al. - 2001 |
| 1 | RCV1: A New Text Categorization Test Collection to be appeared – Lewis, Yang, et al. - 2002 |
| 1 | An example-based mapping method for text classi and retrieval – Yang - 1994 |

