MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Robust Decision Trees: Removing Outliers from Databases (1995) [40 citations — 0 self]

by George H. John
In Knowledge Discovery and Data Mining
Add To MetaCart

Abstract:

Finding and removing outliers is an important problem in data mining. Errors in large databases can be extremely common, so an important property of a data mining algorithm is robustness with respect to errors in the database. Most sophisticated methods in machine learning address this problem to some extent, but not fully, and can be improved by addressing the problem more directly. In this paper we examine C4.5, a decision tree algorithm that is already quite robust -- few algorithms have been shown to consistently achieve higher accuracy. C4.5 incorporates a pruning scheme that partially addresses the outlier removal problem. In our Robust-C4.5 algorithm we extend the pruning method to fully remove the effect of outliers, and this results in improvement on many databases. In U. M. Fayyad and R. Uthurusamy, editors, Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 174--179, AAAI Press, Menlo Park, CA, 1995. Introduction...

Citations

995 Robust Statistics – Huber - 1980
937 Principles of Database and knowledge-base systems, volume 2 – Ullman - 1988
788 Instance-based learning algorithms – Aha, Kibler, et al. - 1991
697 Robust Regression and Outlier Detection – Rousseeuw, Leroy - 1987
693 Generalized additive models – Hastie, Tibshirani - 1986
638 UCI repository of machine learning databases [machine-readable data repository – Murphy, Aha - 1992
508 Neural networks and the bias/variance dilemma – Geman, Bienenstock, et al. - 1992
477 Irrelevant features and the subset selection problem – John, Kohavi, et al. - 1994
225 Improving generalization with active learning – Cohn, Atlas, et al. - 1994
197 Modern Applied Statistics with S-Plus – Venables, Ripley - 1997
37 An improved algorithm for incremental induction of decision trees – Utgoff - 1994
36 CJ: Classification and regression trees Chapman – Breiman, JH, et al. - 1993
26 Automatic capacity tuning of very large VCdimension classifiers – Guyon, Boser, et al. - 1993
10 Outliers in Statistical Data. Third edition – Barnett, Lewis - 1994
7 Robust linear discriminant trees – John - 1996