MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

A Comparison of Two Learning Algorithms for Text Categorization (1994) [213 citations — 1 self]

by David D. Lewis ,  Marc Ringuette
In Third Annual Symposium on Document Analysis and Information Retrieval
Add To MetaCart

Abstract:

This paper examines the use of inductive learning to categorize natural language documents into predefined content categories. Categorization of text is of increasing importance in information retrieval and natural language processing systems. Previous research on automated text categorization has mixed machine learning and knowledge engineering methods, making it difficult to draw conclusions about the performance of particular methods. In this paper we present empirical results on the performance of a Bayesian classifier and a decision tree learning algorithm on two text categorization data sets. We find that both algorithms achieve reasonable performance and allow controlled tradeoffs between false positives and false negatives. The stepwise feature selection in the decision tree algorithm is particularly effective in dealing with the large feature sets common in text categorization. However, even this algorithm is aided by an initial prefiltering of features, confirming the results...

Citations

2538 Induction of decision trees – Quinlan - 1986
524 Knowledge acquisition via incremental conceptual clustering – Fisher - 1987
247 Heuristic classification – Clancey - 1985
172 An evaluation of phrasal and clustered representations on a text categorization task – Lewis - 1992
135 Representation and learning in information retrieval – Lewis - 1992
129 Pattern Classification and Scene Analysis. A Wiley-Inter science Publication – Duda, Hart - 1973
110 Shift of bias for inductive concept learning – Utgoff - 1986
78 SCISOR: Extracting information from on-line news – Jacobs, Rau - 1990
74 A Theory of Learning Classification Rules – Buntine - 1990
63 An overview of the FRUMP system – DeJong - 1982
61 Evaluating text categorization – Lewis - 1991
57 CONSTRUE/TIS: a system for content-based indexing of a database of news stories – HAYES, WEINSTEIN - 1990
54 Automatic indexing: An experimental inquiry – Maron - 1961
39 Poor estimates of context are worse than none – Gale, Church - 1990
27 Automatic document classification – Borko, Bernick - 1963
24 Introduction to ind and recursive partitioning – Buntine, Caruana - 1991
23 The Significance of the Cranfield Tests on Index Languages – Cleverdon - 1991
19 New york university: Description of the proteus system as used for muc-4 – Grishman, Macleod, et al. - 1992
14 Classification trees for information retrieval – Crawford, Fung, et al. - 1991
7 Trading MIPS and Memory for Knowledge Engineering: Automatic Classification of Census Returns on a Massively Parallel Supercomputer – Creecy, Masand, et al. - 1992
7 Data Extraction as Text Categorization: An Experiment with the MUC-3 Corpus – Lewis - 1991
6 Hughes Trainable Text Skimmer: Description of the TTS System as Used for MUC-3 – Dolan, Goldman, et al. - 1991
4 Description of the UNL/USL system used for MUC-3 – Deogun, Raghavan - 1991
3 Concept recognition in an automatic text-processing system for the life sciences – Vleduts-Stokolov - 1987
2 On recognizing planned deception – Hardt - 1988
2 The Intelligent Banking System: natural language processing for financial communications – Sahin, Sawyer - 1989
1 Learning with many irrelevant features. AAAI-91 – Almuallim, Dietterich - 1991
1 Advanced Decision Systems: Description of the CODEX system as used for MUC-3 – Balcom, Tong - 1991
1 A fuzzy measure of agreement between machine and manual assignment of documents to subject categories – Cerny, Okseniuk, et al. - 1983
1 SRI International: Description of the TACITUS system as used for MUC-3 – Hobbs - 1991