• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Any Two Learning Algorithms Are (Almost) Exactly Identical (2000)

Cached

  • Download as a PDF

Download Links

  • [ic.arc.nasa.gov]
  • [www.ic.arc.nasa.gov]
  • [ic.arc.nasa.gov]
  • [ti.arc.nasa.gov]
  • [www.ic.arc.nasa.gov]
  • [ti.arc.nasa.gov]
  • [ic.arc.nasa.gov]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by David Wolpert
Citations:1 - 0 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Wolpert00anytwo,
    author = {David Wolpert},
    title = {Any Two Learning Algorithms Are (Almost) Exactly Identical},
    year = {2000}
}

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

This paper shows that if one is provided with a loss function, it can be used in a natural way to specify a distance measure quantifying the similarityofany two supervised learning algorithms, even non-parametric algorithms. Intuitively, this measure gives the fraction of targets and training sets for which the expected performance of the two algorithms differs significantly. Bounds on the value of this distance are calculated for the case of binary outputs and 0-1 loss, indicating that anytwo learning algorithms are almost exactly identical for such scenarios. As an example, for any two algorithms B,even for small input spaces and training sets, for less than 2e of all targets will the difference between A's and B's generalization performance exceed 1%. In particular, this is true if B is bagging applied to A, or boosting applied to A. These bounds can be viewed alternatively as telling us, for example, that the simple English phrase "I expect that algorithm will generalize from the training set with an accuracy of at least 75% on the rest of the target" conveys 20,000 bytes of information concerning the target. The paper ends by discussing some of the subtleties of extending the distance measure to give a full (non-parametric) differential geometry of the manifold of learning algorithms.

Citations

463 Stacked generalization - Wolpert - 1992
264 Classification and Regression - Breiman, Friedman, et al. - 1984
264 General Relativity - Wald - 1984
22 On bias plus variance - Wolpert - 1997
13 Bayesian backpropagation over i-o functions rather than weights - Wolpert - 1994
12 Reconciling Bayesian and non-Bayesian analysis - Wolpert - 1994
8 variance and arcing classi ers - Bias - 1996
6 The lack of a prior distinction between learning algorithms - Wolpert - 1996
2 Bagging predictors. Univesity of California, Dept - Breiman - 1994
1 Combining predictors - Hansen - 2000
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University