## Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation (1997)

### Cached

### Download Links

- [theory.lcs.mit.edu]
- [www.research.att.com]
- [www.cis.upenn.edu]
- [www.eng.tau.ac.il]
- DBLP

### Other Repositories/Bibliography

Venue: | Neural Computation |

Citations: | 106 - 0 self |

### BibTeX

@ARTICLE{Kearns97algorithmicstability,

author = {Michael Kearns and Dana Ron},

title = {Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation},

journal = {Neural Computation},

year = {1997},

volume = {11},

pages = {152--162}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we prove sanity-check bounds for the error of the leave-one-out cross-validation estimate of the generalization error: that is, bounds showing that the worst-case error of this estimate is not much worse than that of the training error estimate. The name sanity-check refers to the fact that although we often expect the leave-one-out estimate to perform considerably better than the training error estimate, we are here only seeking assurance that its performance will not be considerably worse. Perhaps surprisingly, such assurance has been given only for limited cases in the prior literature on cross-validation. Any nontrivial bound on the error of leave-one-out must rely on some notion of algorithmic stability. Previous bounds relied on the rather strong notion of hypothesis stability, whose application was primarily limited to nearest-neighbor and other local algorithms. Here we introduce the new and weaker notion of error stability, and apply it to obtain sanity-check b...

### Citations

4119 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...r functions and squared error make the sanity-check bound given in Theorem 4.7 of particular interest: ffl There exist polynomial-time algorithms for performing minimization of squared training error =-=[DH73]-=- by linear functions. These algorithms do not necessarily obey the constraint jjwjjsB, but we suspect this is not an obstacle to the validity of Theorem 4.7 in most practical settings. ffl There is an... |

3990 | Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images - Geman, Geman - 1984 |

1040 |
A probabilistic Theory of Pattern Recognition
- Devroye, Gyorfi, et al.
- 1996
(Show Context)
Citation Context ...risingly few previous results providing bounds on the accuracy of the various estimates [RW78, DW79a, DW79b, Vap82, Hol96b, Hol96a, KMN + 95, Kea96] (see the recent book of Devroye, Gyorfi and Lugosi =-=[DGL96]-=- for an excellent introduction and survey of the topic). Perhaps the most general results are those given for the (classification) training error estimate by Vapnik [Vap82], who proved that for any ta... |

830 | A study of the cross-validation and bootstrap for accuracy estimation and model selection, p. 1137–1143 - KOHAVI - 1995 |

829 |
Estimation of Dependences Based on Empirical Data
- Vapnik
- 1982
(Show Context)
Citation Context ... Devroye, Gyorfi and Lugosi [DGL96] for an excellent introduction and survey of the topic). Perhaps the most general results are those given for the (classification) training error estimate by Vapnik =-=[Vap82]-=-, who proved that for any target function and input distribution, and for any learning algorithm that chooses its hypotheses from a class of VC dimension d, the training error estimate is at most ~ O(... |

383 |
Decision theoretic generalizations of the pac model for neural nets and other learning applications
- Haussler
- 1992
(Show Context)
Citation Context ...uted directly from the data much more quickly. More generally, many of the results given in this paper can be generalized to other loss functions via the proper generalizations of uniform convergence =-=[Hau92]-=-. 4.4 Other Algorithms We now comment briefly on the application of Theorem 4.1 to algorithms other than error minimization and Bayesian procedures. As we have already noted, the only barrier to apply... |

258 |
Subset Selection in Regression
- Miller
- 1990
(Show Context)
Citation Context ...dity of Theorem 4.7 in most practical settings. ffl There is an efficient procedure for computing the leave-one-out estimate for training error minimization of the squared error over linear functions =-=[Mil90]-=-. Thus, it is not necessary to run the error minimization procedure m times; there is a closed-form solution for the leave-one-out estimate that can be computed directly from the data much more quickl... |

201 | Toward efficient agnostic learning
- Kearns, Schapire, et al.
- 1994
(Show Context)
Citation Context ...two specific algorithms in the realizable case (that is, when the target function is actually contained in the class of hypothesis functions). However, in the more realistic unrealizable (or agnostic =-=[KSS94]-=-) case, the notion of hypothesis stability may simply be too strong to be obeyed by many natural learning algorithms. For example, if there are many local minima of the true error, an algorithm that m... |

110 | An experimental and theoretical comparison of model selection methods
- Kearns, Mansour, et al.
- 1997
(Show Context)
Citation Context ...sffl that is close to the true (generalization) error ffl of the hypothesis function chosen by A. There are surprisingly few previous results providing bounds on the accuracy of the various estimates =-=[15, 2, 3, 17, 9, 8, 12, 10]-=- (see the recent book of Devroye, Gyorfi and Lugosi [1] for an excellent introduction and survey of the topic). Perhaps the most general results are those given for the (classification) training error... |

60 |
Distribution-free performance bounds for potential function rules
- Devroye, Wagner
- 1979
(Show Context)
Citation Context ...rger error than the full-sample hypothesis, it seems hard to expect the leave-one-out estimate to be accurate. The precise nature of the required form of stability is less obvious. Devroye and Wagner =-=[DW79b]-=- first identified a rather strong notion of algorithmic stability that we shall refer to as hypothesis stability, and showed that bounds on hypothesis stability directly lead to bounds on the error of... |

53 | Rigorous learning curve bounds from statistical mechanics, preprint - Haussler, Kearns, et al. - 1994 |

46 | Statistical Mechanics of Learning from Examples - Seung, Sompolinsky, et al. - 1993 |

26 |
A finite sample distribution-free performance bound for local discrimination rules
- Rogers, Wagner
- 1978
(Show Context)
Citation Context ...ize of the training sample. On the other hand, among the strongest bounds (in the sense of the quality of the estimate) are those given for the leave-one-out estimate by the work of Rogers and Wagner =-=[RW78]-=-, Devroye and Wagner [DW79a, DW79b], and Vapnik [Vap82]. The (classification error) leave-one-out estimate is computed by running the learning algorithm m times, each time removing one of the m traini... |

15 | Distribution-free inequalities for the deleted and holdout error estimates - Devroye, Wagner - 1979 |

5 |
PAC-like upper bounds for the sample complexity of leave-one-out cross validation
- Holden
- 1996
(Show Context)
Citation Context ...ence with respect to the input distribution. For algorithms drawing hypotheses from a class of fixed VC dimension, the first sanity-check bounds for the leave-one-out estimate were provided by Holden =-=[Hol96b]-=- for two specific algorithms in the realizable case (that is, when the target function is actually contained in the class of hypothesis functions). However, in the more realistic unrealizable (or agno... |

3 |
Cross-validation and the PAC learning model. Research Note RN/96/64
- Holden
- 1996
(Show Context)
Citation Context ... probability at least 1 \Gamma ffi 0 , dist(A(S m ); A(Sm\Gamma1 )) = O / d log(m=d) + log(1=ffi 0 ) m ! : (15) The theorem follows from Theorem 3.1, where ffi 0 is set to d=m. (Theorem 3.2) 4 Holden =-=[Hol96a]-=- has recently obtained sanity-check bounds, again for the realizable setting, for other cross-validation estimates. We should note immediately that the bound of Theorem 3.2 has a dependence on q 1=ffi... |

3 | A bound on the error of cross validation with consequences for the training-test split - Kearns - 1996 |