The Effect Of Small Disjuncts And Class Distribution On Decision Tree Learning (2003)
| Venue: | RUTGERS UNIVERSITY |
| Citations: | 3 - 0 self |
BibTeX
@MISC{Weiss03theeffect,
author = {Gary Mitchell Weiss},
title = {The Effect Of Small Disjuncts And Class Distribution On Decision Tree Learning},
year = {2003}
}
OpenURL
Abstract
The main goal of classifier learning is to generate a model that makes few misclassification errors. Given this emphasis on error minimization, it makes sense to try to understand how the induction process gives rise to classifiers that make errors and whether we can identify those parts of the classifier that generate most of the errors. In this thesis we provide the first comprehensive studies of two major sources of classification errors. The first study concerns small disjuncts, which are those disjuncts within a classifier that cover only a few training examples. An analysis of classifiers induced from thirty data sets shows that these small disjuncts are extremely error prone and often account for the majority of all classification errors. Because small disjuncts largely determine classifier performance, we use them as a "lens" through which to study classifier induction. Factors such as pruning, training-set size, noise and class imbalance are each analyzed to determine how they affect small disjuncts and, more generally, classifier learning. The second







