Results 1  10
of
16
Overfitting Avoidance as Bias
, 1992
"... Strategies for increasing predictive accuracy through selective pruning have been widely adopted by researchers in decision tree induction. It is easy to get the impression from research reports that there are statistical reasons for believing that these overfitting avoidance strategies do increase ..."
Abstract

Cited by 126 (2 self)
 Add to MetaCart
Strategies for increasing predictive accuracy through selective pruning have been widely adopted by researchers in decision tree induction. It is easy to get the impression from research reports that there are statistical reasons for believing that these overfitting avoidance strategies do increase accuracy and that, as a research community, we are making progress toward developing powerful, general methods for guarding against overfitting in inducing decision trees. In fact, any overfitting avoidance strategy amounts to a form of bias and, as such, may degrade performance instead of improving it. If pruning methods have often proven successful in empirical tests, this is due, not to the methods, but to the choice of test problems. As examples in this article illustrate, overfitting avoidance strategies are not better or worse, but only more or less appropriate to specific application domains. We are notand cannot bemaking progress toward methods both powerful and general. The ...
The role of Occam’s Razor in knowledge discovery
 Data Mining and Knowledge Discovery
, 1999
"... Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite di ..."
Abstract

Cited by 86 (3 self)
 Add to MetaCart
Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy–comprehensibility tradeoff.
Multiple Comparisons in Induction Algorithms
 Machine Learning
, 1998
"... Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 010034610 4135453613 A single ..."
Abstract

Cited by 82 (10 self)
 Add to MetaCart
(Show Context)
Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 010034610 4135453613 A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a ( ). We analyze the statistical properties of and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and crossvalidation. Inductive learning, overfitting, oversearching, attribute selection, hypothesis testing, parameter estimation Multiple Com...
Large Datasets Lead to Overly Complex Models: An Explanation and a Solution
, 1998
"... This paper explores unexpected results that lie at the intersection of two common themes in the KDD community: large datasets and the goal of building compact models. Experiments with many different datasets and several model construction algorithms (including tree learning algorithms suchasc4. ..."
Abstract

Cited by 46 (4 self)
 Add to MetaCart
This paper explores unexpected results that lie at the intersection of two common themes in the KDD community: large datasets and the goal of building compact models. Experiments with many different datasets and several model construction algorithms (including tree learning algorithms suchasc4.5 with three different pruning methods, and rule learning algorithms such as c4.5rules and ripper) show that increasing the amount of data used to build a model often results in a linear increase in model size, even when that additional complexity results in no significantincrease in model accuracy. Despite the promise of better parameter estimation held out by large datasets, as a practical matter, models built with large amounts of data are often needlessly complex and cumbersome. In the case of decision trees, the cause of this pathology is identified as a bias inherentinseveral common pruning techniques. Pruning errors made low in the tree, where there is insufficient data to make accurate parameter estimates, are propagated and magnified higher in the tree, working against the accurate parameter estimates that are made possible there by abundant data. We propose a general solution to this problem based on a statistical technique known as randomization testing, and empirically evaluate its utility.
Overfitting Explained
, 1997
"... Overfitting arises when model components are evaluated against the wrong reference distribution. Most modeling algorithms iteratively find the best of several components and then test whether this component is good enough to add to the model. We show that for independently distributed random variabl ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
Overfitting arises when model components are evaluated against the wrong reference distribution. Most modeling algorithms iteratively find the best of several components and then test whether this component is good enough to add to the model. We show that for independently distributed random variables, the reference distribution for any one variable underestimates the reference distribution for the the highestvalued variable # thus variate values will appear significant when they are not, and model components will be added when they should not be added. We relate this problem to the wellknown statistical theory of multiple comparisons or simultaneous inference.
Adjusting for multiple comparisons in decision tree pruning
 Proc. 3rd Int. Conf. on Knowledge Discovery & Data Mining (KDD97
, 1997
"... Pruning is a common technique to avoid over tting in decision trees. Most pruning techniques do not account for one important factor  multiple comparisons. Multiple comparisons occur when an induction algorithm examines several candidate models and selects the one that best accords with the data. M ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
Pruning is a common technique to avoid over tting in decision trees. Most pruning techniques do not account for one important factor  multiple comparisons. Multiple comparisons occur when an induction algorithm examines several candidate models and selects the one that best accords with the data. Making multiple comparisons produces incorrect inferences about model accuracy. We examine a method that adjusts for multiple comparisons when pruning decision trees { Bonferroni pruning. In experiments with arti cial and realistic datasets, Bonferroni pruning produces smaller trees that are at least as accurate as trees pruned using other common approaches.
Data snooping, dredging and fishing: The dark side of data mining a SIGKDD99 panel report
 SIGKDD Explorations
, 2000
"... This article briefly describes a panel discussion at SIGKDD99. ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
This article briefly describes a panel discussion at SIGKDD99.
Adjusting for multiple testing in decision tree pruning
 Proc. Sixth International Workshop on Artificial Intelligence and Statistics (pp. 295302). Fort Lauderdale, FL: Society for Artificial Intelligence and Statistics
, 1997
"... Over tting is a widely observed pathology of induction algorithms. Over tted models contain unnecessary structure that re ects nothing more than chance variations in the particular data sample used to construct the model. Portions of these models are literally wrong, and can mislead users. Over tted ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Over tting is a widely observed pathology of induction algorithms. Over tted models contain unnecessary structure that re ects nothing more than chance variations in the particular data sample used to construct the model. Portions of these models are literally wrong, and can mislead users. Over tted models require more storage space and take longer to execute than their correctlysized
Permutation Tests for Studying Classifier Performance
"... Abstract—We explore the framework of permutationbased pvalues for assessing the behavior of the classification error. In this paper we study two simple permutation tests. The first test estimates the null distribution by permuting the labels in the data; this has been used extensively in classific ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We explore the framework of permutationbased pvalues for assessing the behavior of the classification error. In this paper we study two simple permutation tests. The first test estimates the null distribution by permuting the labels in the data; this has been used extensively in classification problems in computational biology. The second test produces permutations of the features within classes, inspired by restricted randomization techniques traditionally used in statistics. We study the properties of these tests and present an extensive empirical evaluation on real and synthetic data. Our analysis shows that studying the classification error via permutation tests is effective; in particular, the restricted permutation test clearly reveals whether the classifier exploits the interdependency between the features in the data. Keywordsclassification, labeled data, permutation tests, restricted randomization, significance testing I.
A Stratified Methodology for Classifier and Recognizer Evaluation
"... In this companion paper, we formally introduce STRAT, a stratification centric methodology for the empirical evaluation of classification systems. The motivating criteria for STRAT's development are discussed, as well as the potential consequences of departing from some common statistical assum ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this companion paper, we formally introduce STRAT, a stratification centric methodology for the empirical evaluation of classification systems. The motivating criteria for STRAT's development are discussed, as well as the potential consequences of departing from some common statistical assumptions made when applying more traditional methods. STRAT uses an established replicate statistical technique called balanced repeated replication, or BRR, that does not require the i.i.d. assumption needed for bootstrapping, jackknifing, or binomial techniques.