Reduced-Error Pruning With Significance Tests (1998)
| Venue: | Available: http://libra.msra.cn/paperdetail.aspx?id=305368 |
| Citations: | 4 - 0 self |
BibTeX
@INPROCEEDINGS{Frank98reduced-errorpruning,
author = {Eibe Frank and Ian H. Witten},
title = {Reduced-Error Pruning With Significance Tests},
booktitle = {Available: http://libra.msra.cn/paperdetail.aspx?id=305368},
year = {1998},
pages = {98}
}
OpenURL
Abstract
When building classification models, it is common practice to prune them to counter spurious effects of the training data: this often improves performance and reduces model size. "Reduced-error pruning" is a fast pruning procedure for decision trees that is known to produce small and accurate trees. Apart from the data from which the tree is grown, it uses an independent "pruning" set, and pruning decisions are based on the model's error rate on this fresh data. Recently it has been observed that reduced-error pruning overfits the pruning data, producing unnecessarily large decision trees. This paper investigates whether standard statistical significance tests can be used to counter this phenomenon. The problem of overfitting to the pruning set highlights the need for significance testing. We investigate two classes of test, "parametric" and "non-parametric." The standard chi-squared statistic can be used both in a parametric test and as the basis for a non-parametric permutation tes...







