MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

A Loss Function Analysis for Classification Methods in Text Categorization (2003) [16 citations — 5 self]

by Fan Li ,  Yiming Yang
Add To MetaCart

Abstract:

This paper presents a formal analysis of popular text classification methods, focusing on their loss functions whose minimization is essential to the optimization of those methods, and whose decomposition into the training-set loss and the model complexity enables cross-method comparisons on a common basis from an optimization point of view. Those methods include Support Vector Machines, Linear Regression, Logistic Regression, Neural Network, Naive Bayes, K-Nearest Neighbor, Rocchio-style and Multi-class Prototype classifiers. Theoretical analysis (including our new derivations) is provided for each method, along with evaluation results for all the methods on the Reuters-21578 benchmark corpus. Using linear regression, neural networks and logistic regression methods as examples, we show that properly tuning the balance between the training-set loss and the complexity penalty would have a significant impact to the performance of a classifier. In linear regression, in particular, the tuning of the complexity penalty yielded a result (measured using macro-averaged F1) that outperformed all text categorization methods ever evaluated on that benchmark corpus, including Support Vector Machines.

Citations

5139 Statistical learning theory – Vapnik - 1998
1065 Text categorization with support vector machines: Learning with many relevant features – Joachims - 1998
520 A comparison of event models for naive bayes text classification – McCallum, Nigam - 1998
454 X.: A re-examination of text categorization methods – Yang, Liu - 1999
354 An evaluation of statistical approaches to text categorization – Yang - 1997
265 The elements of statistical learning: Data mining, inference and prediction – Hastie, Tibshirani, et al. - 2001
1 RCV1: A New Text Categorization Test Collection to be appeared – Lewis, Yang, et al. - 2002
1 An example-based mapping method for text classi and retrieval – Yang - 1994