## Tutorial on Practical Prediction Theory for Classification (2005)

### Cached

### Download Links

- [www-2.cs.cmu.edu]
- [jmlr.csail.mit.edu]
- [www.jmlr.org]
- [jmlr.org]
- DBLP

### Other Repositories/Bibliography

Citations: | 83 - 3 self |

### BibTeX

@MISC{Langford05tutorialon,

author = {John Langford},

title = {Tutorial on Practical Prediction Theory for Classification},

year = {2005}

}

### Years of Citing Articles

### OpenURL

### Abstract

We discuss basic prediction theory and it's impact on classification success evaluation, implications for learning algorithm design, and uses in learning algorithm execution. This tutorial is meant to be a comprehensive compilation of results which are both theoretically rigorous and practically useful. There are two important implications...

### Citations

1551 | Probability inequalities for sums of bounded random variables
- HOEFFDING
- 1963
(Show Context)
Citation Context ...p ln + 1 − m k m k � � 1 − p ln m 1 − k � � k = −KL m m ||p � . Using the Chernoff bound, we can loosen the test set bound to achieve a more analytic form. k m 2. The closely related Hoeffding bound (=-=Hoeffding, 1963-=-) makes the same statement for sums of [0,1] random variables. 281 ,sLANGFORD Corollary 3.7 (Agnostic Test Set Bound) For all D, for all classifiers c, for all δ ∈ (0,1] Pr S∼Dm � � ĉS KL m ||cD � � 1... |

1040 |
A probabilistic Theory of Pattern Recognition
- Devroye, Gyorfi, et al.
- 1996
(Show Context)
Citation Context ...eak general results in this model are known for some variants of cross validation (see Blum et al., 1999). For specific learning algorithms (such as nearest neighbor), stronger results are known (see =-=Devroye et al., 1996-=-). There are a wide range of essentially unanalyzed methods and a successful analysis seems particularly tricky although very worthwhile if completed. 3.2 Test Set Bound Implications There are some co... |

971 | On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications - Vapnik, Chervonenkis - 1971 |

762 |
A measure of asymptotic efficiency for tests of hypothesis based on the sum of observations
- CHERNOFF
- 1952
(Show Context)
Citation Context ...> q and 0sPRACTICAL PREDICTION THEORY FOR CLASSIFICATION Lemma 3.6 (Relative Entropy Chernoff Bound) 2 For k m < p: Bin(m,k, p) = Pr X m ∼p m Bin(m,k, p) ≤ e −mKL+( k m ||p) . Proof (Originally from (=-=Chernoff, 1952-=-). The proof here is based on (Seung).) For all λ > 0, we have � m � = Pr Xm∼p m � e −mλ 1 m ∑mi=1 Xi −mλ ≥ e k � m . ∑ Xi ≤ k i=1 Using Markov’s inequality (X ≥ 0, EX = µ, ⇒ Pr(X ≥ δ) ≤ µ δ ), this m... |

137 | Additive Versus Exponentiated Gradient Updates for Linear Prediction - Kivinen, K - 1997 |

110 | Local Rademacher complexities - Bartlett, Bousquet, et al. - 2005 |

106 | Algorithmic stability and sanity-check bounds for leave-one-out cross-validation - Kearns, Ron - 1999 |

82 | PAC-Bayesian model averaging - McAllester - 1999 |

55 | A measure of the asymptotic eciency for tests of a hypothesis based on the sum of observations - Cherno - 1952 |

39 | Beating the hold-out: Bounds for k-fold and progressive cross-validation - Blum, Kalai, et al. - 1999 |

35 | PAC-Bayesian generalization error bounds for Gaussian process classification - Seeger |

6 | Large scale Bayes point machines
- Herbrich, Graepel
(Show Context)
Citation Context ... This is an implicit equation for ¯Q which can be easily solved numerically. The bound is stated in terms of dot products here, so naturally it is possible to kernelize the result using methods from (=-=Herbrich and Graepel, 2001-=-). In kernelized form, the bound applies to classifiers (as output by SVM learning algorithms) of the form: � c(x) = sign � m ∑ αik(xi,x) i=1 . (1) Since, by assumption, k is a kernel, we know that k(... |

6 | Combining train set and test set bounds
- Langford
- 2002
(Show Context)
Citation Context ...ions of PAC convergence (Valiant, 1984) results. 3. Shell bounds (Langford and McAllester, 2000) which take advantage of the distribution of true error rates on classifiers. 4. Train and test bounds (=-=Langford, 2002-=-) which combine train set and test set bounds. 5. (Local) Rademacher complexity (Bartlett et al., 2004) results which take advantage of the error geometry of nearby classifiers. ... and many other res... |

4 | and Avrim Blum 1999. Microchoice Bounds and Self Bounding learning algorithms - Langford |

3 |
The use of confidence interval or fiducial limits illustrated in the case of the binomial
- CLOPPER, andPEARSON
- 1934
(Show Context)
Citation Context ...es for the classical technique of using m fresh examples to evaluate a classifier. In a statistical setting, this can be viewed as computing a confidence interval for the binomial distribution as in (=-=Clopper and Pearson, 1934-=-). This section is organized into two subsections: • Subsection 3.1 presents the basic upper bound on the true error rate, handy approximations, and a lower bound. • Subsection 3.2 discusses the impli... |

3 | Bounds for Averaging Classiers - Langford, Seeger - 2001 |

1 | A Theory of the Leuarnable - Valiant - 1984 |