## A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (1995)

Venue: | INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE |

Citations: | 752 - 12 self |

### BibTeX

@INPROCEEDINGS{Kohavi95astudy,

author = {Ron Kohavi},

title = {A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection},

booktitle = {INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE},

year = {1995},

pages = {1137--1143},

publisher = {}

}

We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment -- over half a million runs of C4.5 and a Naive-Bayes algorithm -- to estimate the effects of different parameters on these algorithms on real-world datasets. For cross-validation, we vary the number of folds and whether the folds are stratified or not; for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, the best method to use for model selection is ten-fold stratified cross validation, even if computation power allows using more folds.

