MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Bias Plus Variance Decomposition for Zero-One Loss Functions (1996) [131 citations — 3 self]

by Ron Kohavi ,  David H. Wolpert
Add To MetaCart

Abstract:

We present a bias-variance decomposition of expected misclassification rate, the most commonly used loss function in supervised classification learning. The bias-variance decomposition for quadratic loss functions is well known and serves as an important tool for analyzing learning algorithms, yet no decomposition was offered for the more commonly used zero-one (misclassification) loss functions until the recent work of Kong & Dietterich (1995) and Breiman (1996). Their decomposition suffers from some major shortcomings though (e.g., potentially negative variance), which our decomposition avoids. We show that, in practice, the naive frequency-based estimation of the decomposition terms is by itself biased and show how to correct for this bias. We illustrate the decomposition on various algorithms and datasets from the UCI repository. 1 Introduction The bias plus variance decomposition (Geman, Bienenstock & Doursat 1992) is a powerful tool from sampling theory statistics for analyzing ...

Citations

4923 Elements of Information Theory – Cover, Thomas - 1991
2526 Induction of decision trees – Quinlan - 1986
1565 Bagging predictors – Breiman - 1996
1213 An Introduction to the Bootstrap – Efron, Tibshirani - 1993
638 UCI repository of machine learning databases. For information contact ml-repository@ics.uci.edu – Murphy, Aha - 1994
508 Neural networks and the bias/variance dilemma – Geman, Bienenstock, et al. - 1992
379 Stacked generalization – Wolpert - 1992
115 Error-correcting output coding corrects bias and variance – Kong, Dietterich - 1995
86 variance and arcing classifiers – Breiman, “Bias - 1996
67 Improving regression estimation: averaging methods for variance reduction with extensions to general convex measure optimization – Perrone - 1993
40 Machine learning bias, statistical bias, and statistical variance of decision tree algorithms – Dietterich, Kong - 1995
38 Heuristics of instability in model selection – Breiman - 1994
21 Learning Probabilistic Relational Concept Descriptions – Ali - 1996
14 A Olshen. Almost sure consistent nonparametric regression from recursive partitioning schemes – Gordon, Richard - 1984