Results 1 -
8 of
8
Overfitting Avoidance as Bias
, 1992
"... Strategies for increasing predictive accuracy through selective pruning have been widely adopted by researchers in decision tree induction. It is easy to get the impression from research reports that there are statistical reasons for believing that these overfitting avoidance strategies do increase ..."
Abstract
-
Cited by 116 (2 self)
- Add to MetaCart
Strategies for increasing predictive accuracy through selective pruning have been widely adopted by researchers in decision tree induction. It is easy to get the impression from research reports that there are statistical reasons for believing that these overfitting avoidance strategies do increase accuracy and that, as a research community, we are making progress toward developing powerful, general methods for guarding against overfitting in inducing decision trees. In fact, any overfitting avoidance strategy amounts to a form of bias and, as such, may degrade performance instead of improving it. If pruning methods have often proven successful in empirical tests, this is due, not to the methods, but to the choice of test problems. As examples in this article illustrate, overfitting avoidance strategies are not better or worse, but only more or less appropriate to specific application domains. We are not---and cannot be---making progress toward methods both powerful and general. The ...
The role of Occam’s Razor in knowledge discovery
- Data Mining and Knowledge Discovery
, 1999
"... Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite di ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy–comprehensibility trade-off.
Multiple Comparisons in Induction Algorithms
- Machine Learning
, 1998
"... Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 01003-4610 413-545-3613 A single ..."
Abstract
-
Cited by 67 (9 self)
- Add to MetaCart
Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 01003-4610 413-545-3613 A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a ( ). We analyze the statistical properties of and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation. Inductive learning, overfitting, oversearching, attribute selection, hypothesis testing, parameter estimation Multiple Com...
Large Datasets Lead to Overly Complex Models: An Explanation and a Solution
, 1998
"... This paper explores unexpected results that lie at the intersection of two common themes in the KDD community: large datasets and the goal of building compact models. Experiments with many different datasets and several model construction algorithms (including tree learning algorithms suchasc4. ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
This paper explores unexpected results that lie at the intersection of two common themes in the KDD community: large datasets and the goal of building compact models. Experiments with many different datasets and several model construction algorithms (including tree learning algorithms suchasc4.5 with three different pruning methods, and rule learning algorithms such as c4.5rules and ripper) show that increasing the amount of data used to build a model often results in a linear increase in model size, even when that additional complexity results in no significantincrease in model accuracy. Despite the promise of better parameter estimation held out by large datasets, as a practical matter, models built with large amounts of data are often needlessly complex and cumbersome. In the case of decision trees, the cause of this pathology is identified as a bias inherentinseveral common pruning techniques. Pruning errors made low in the tree, where there is insufficient data to make accurate parameter estimates, are propagated and magnified higher in the tree, working against the accurate parameter estimates that are made possible there by abundant data. We propose a general solution to this problem based on a statistical technique known as randomization testing, and empirically evaluate its utility.
Overfitting Explained
, 1997
"... Overfitting arises when model components are evaluated against the wrong reference distribution. Most modeling algorithms iteratively find the best of several components and then test whether this component is good enough to add to the model. We show that for independently distributed random variabl ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Overfitting arises when model components are evaluated against the wrong reference distribution. Most modeling algorithms iteratively find the best of several components and then test whether this component is good enough to add to the model. We show that for independently distributed random variables, the reference distribution for any one variable underestimates the reference distribution for the the highest-valued variable # thus variate values will appear significant when they are not, and model components will be added when they should not be added. We relate this problem to the well-known statistical theory of multiple comparisons or simultaneous inference.
Data snooping, dredging and fishing: The dark side of data mining a SIGKDD99 panel report
- SIGKDD Explorations
, 2000
"... This article briefly describes a panel discussion at SIGKDD99. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This article briefly describes a panel discussion at SIGKDD99.
A Stratified Methodology for Classifier and Recognizer Evaluation
"... In this companion paper, we formally introduce STRAT, a stratification centric methodology for the empirical evaluation of classification systems. The motivating criteria for STRAT's development are discussed, as well as the potential consequences of departing from some common statistical assumption ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this companion paper, we formally introduce STRAT, a stratification centric methodology for the empirical evaluation of classification systems. The motivating criteria for STRAT's development are discussed, as well as the potential consequences of departing from some common statistical assumptions made when applying more traditional methods. STRAT uses an established replicate statistical technique called balanced repeated replication, or BRR, that does not require the i.i.d. assumption needed for bootstrapping, jackknifing, or binomial techniques.
Permutation Tests for Studying Classifier Performance
"... Abstract—We explore the framework of permutation-based p-values for assessing the behavior of the classification error. In this paper we study two simple permutation tests. The first test estimates the null distribution by permuting the labels in the data; this has been used extensively in classific ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—We explore the framework of permutation-based p-values for assessing the behavior of the classification error. In this paper we study two simple permutation tests. The first test estimates the null distribution by permuting the labels in the data; this has been used extensively in classification problems in computational biology. The second test produces permutations of the features within classes, inspired by restricted randomization techniques traditionally used in statistics. We study the properties of these tests and present an extensive empirical evaluation on real and synthetic data. Our analysis shows that studying the classification error via permutation tests is effective; in particular, the restricted permutation test clearly reveals whether the classifier exploits the interdependency between the features in the data. Keywords-classification, labeled data, permutation tests, restricted randomization, significance testing I.

