Text Categorization with Support Vector Machines: Learning with Many Relevant Features (1998) [1053 citations — 10 self]
http://ranger.uta.edu/~alp/ix/readings/SVMsforText
http://www.cs.iastate.edu/~jtian/cs573/Papers/Joac
http://www.cs.cornell.edu/People/tj/publications/j
http://www-ai.informatik.uni-dortmund.de/DOKUMENTE
http://www-ai.informatik.uni-dortmund.de/DOKUMENTE
CACHED:
Abstract:
This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore, they are fully automatic, eliminating the need for manual parameter tuning.
Citations
| 5044 | Statistical Learning Theory – Vapnik - 1998 |
| 3356 | C4.5: Programs for Machine Learning – Quinlan - 1993 |
| 1091 | Support-vector network – Cortes, Vapnik - 1995 |
| 915 | Term-weighting approaches in automatic text retrieval – Salton, Buckley - 1988 |
| 594 | Relevance feedback in information retrieval – Rocchio - 1971 |
| 565 | A comparative study on feature selection in text categorization – Yang, Pedersen - 1997 |
| 346 | An evaluation of statistical approaches to text categorization – Yang - 1999 |
| 250 | A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization.ICML-97 – Joachims - 1997 |
| 49 | The perceptron algorithm vs. Winnow: Linear vs. logarithmic mistake bounds when few input variables are relevant – Kivinen, Warmuth, et al. - 1997 |
| 18 | Using corpus statistics to remove redundant words in text categorization – Yang, Wilbur - 1996 |

