## A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (1995)

### Cached

### Download Links

- [www.cs.colorado.edu]
- [robotics.stanford.edu]
- [ai.stanford.edu]
- [robotics.stanford.edu]
- [www.sgi.com]
- [www-europe.sgi.com]
- DBLP

### Other Repositories/Bibliography

Venue: | INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE |

Citations: | 752 - 12 self |

### BibTeX

@INPROCEEDINGS{Kohavi95astudy,

author = {Ron Kohavi},

title = {A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection},

booktitle = {INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE},

year = {1995},

pages = {1137--1143},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment -- over half a million runs of C4.5 and a Naive-Bayes algorithm -- to estimate the effects of different parameters on these algorithms on real-world datasets. For cross-validation, we vary the number of folds and whether the folds are stratified or not; for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, the best method to use for model selection is ten-fold stratified cross validation, even if computation power allows using more folds.

### Citations

3909 | Classification and Regression Trees - Breiman, Friedman, et al. - 1984 |

2534 | An Introduction to the Bootstrap - Efron, Tibshirani - 1993 |

777 |
C4.5: Programs for
- Quinlan
- 1993
(Show Context)
Citation Context ... unlabelled instance to a label using internal data structures. An inducer, or an induction algorithm, builds a classifier from a given dataset. CART and C4.5 (Breiman, Friedman, Olshen & Stone 1984, =-=Quinlan 1993-=-) are decision tree inducers that build decision tree classifiers. In this paper, we are not interested in the specific method for inducing classifiers, but assume access to a dataset and an inducer o... |

741 |
Aha, “UCI repository of machine learning data bases,” http: //www.ics.uci.edu/~mlearn/MLRepository.html
- Murphy, W
- 1992
(Show Context)
Citation Context ...lidation and bootstrap estimates. To choose a set of datasets, we looked at the learning curves for C4.5 and Naive-Bayes for most of the supervised classi cation datasets at the UC Irvine repository (=-=Murphy & Aha 1995-=-) that contained more than 500 instances (about 25 such datasets). We felt that a minimum of 500 instances were required for testing. While the true accuracies of a real dataset cannot be computed bec... |

555 | Stacked generalization
- Wolpert
(Show Context)
Citation Context ...ced by supervised learning algorithms is important not only to predict its future prediction accuracy, but also for choosing a classifier from a given set (model selection), or combining classifiers (=-=Wolpert 1992-=-). For estimating the final accuracy of a classifier, we would like an estimation method with low bias and low variance. To choose a classifier or to combine classifiers, the absolute accuracies are l... |

334 | An analysis of Bayesian classifiers - Langley, Iba, et al. - 1992 |

211 |
Estimating the error rate of a prediction rule: improvement on crossvalidation
- Efron
- 1983
(Show Context)
Citation Context ...s beneficial, especially if the relative accuracies are more important than the exact values. For example, leave-one-out is almost unbiased, but it has high variance, leading to unreliable estimates (=-=Efron 1983-=-). For linear models, using leave-one-out cross-validation for model selection is asymptotically inconsistent in the sense that the probability of selecting the model with the best predictive power do... |

195 | Classi - cation and regression trees - Breiman, Friedman, et al. - 1984 |

146 |
A Conservation Law for Generalization Performance
- Schaffer
- 1994
(Show Context)
Citation Context ... assumptions made by the different estimation methods, and present concrete examples where each method fails. While it is known that no accuracy estimation can be correct all the time (Wolpert 1994b, =-=Schaffer 1994-=-), we are interested in identifying a method that is well suited for the biases and trends in typical real world datasets. Recent results, both theoretical and experimental, have shown that it is not ... |

98 | eld, \Machine learning library in C - Kohavi, Sommer - 1996 |

56 | Submodel selection and evaluation in regression: The x-random case - Breiman, Spector - 1992 |

40 | The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In The mathematics of generalization, edited by D. Wo1pert
- Wolpert
- 1995
(Show Context)
Citation Context ...ain some of the assumptions made by the different estimation methods, and present concrete examples where each method fails. While it is known that no accuracy estimation can be correct all the time (=-=Wolpert 1994-=-b, Schaffer 1994), we are interested in identifying a method that is well suited for the biases and trends in typical real world datasets. Recent results, both theoretical and experimental, have shown... |

18 | Estimating the accuracy of learned concepts - Bailey, Elkan - 1993 |

17 | Off-training set error and a priori distinctions between learning algorithms
- Wolpert
- 1994
(Show Context)
Citation Context ...ain some of the assumptions made by the different estimation methods, and present concrete examples where each method fails. While it is known that no accuracy estimation can be correct all the time (=-=Wolpert 1994-=-b, Schaffer 1994), we are interested in identifying a method that is well suited for the biases and trends in typical real world datasets. Recent results, both theoretical and experimental, have shown... |

17 | An analysis of Bayesian classi®ers - Langley, Thompson - 1992 |

14 |
Decision tree pruning: Biased or optimal
- Weiss, Indurkhya
- 1994
(Show Context)
Citation Context ...etic datasets involving Boolean concepts. They observed high variability and little bias in the leave-one-out estimates, and low variability but large bias in the .632 estimates. Weiss and Indurkyha (=-=Weiss & Indurkhya 1994-=-) conducted experiments on real-world data to determine the applicability of cross-validation to decision tree pruning. Their results were that for samples at least of size 200, using strati ed ten-fo... |

13 | A conservation law for generalization performance - er, C - 1994 |

8 | Small sample error rate estimation for k-nearest neighbor classifiers - Weiss - 1991 |

5 |
Linear model selection via cross-validation
- Shao
- 1993
(Show Context)
Citation Context ...ally inconsistent in the sense that the probability of selecting the model with the best predictive power does not converge to one as the total number of observations approaches infinity (Zhang 1992, =-=Shao 1993-=-). This paper is organized as follows. Section 2 describes the common accuracy estimation methods and ways of computing confidence bounds that hold under some assumptions. Section 3 discusses related ... |

2 | Bootstrap techniques for error estimation", IEEE transactions on pattern analysis and machine intelligence PAMI-9(5 - Jain, Dubes, et al. - 1987 |

1 | Bootstrap techniques for error estimation", IEEE transactions on pattern analysis and machine intelligence PAMI-9(5 - Jain, Dubes, et al. - 1987 |