## Stacked generalization (1992)

### Cached

### Download Links

- [www.doc.ic.ac.uk]
- [www.cs.utsa.edu]
- [archive.cis.ohio-state.edu]
- [www.cs.toronto.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Neural Networks |

Citations: | 549 - 7 self |

### BibTeX

@ARTICLE{Wolpert92stackedgeneralization,

author = {David H. Wolpert},

title = {Stacked generalization},

journal = {Neural Networks},

year = {1992},

volume = {5},

pages = {241--259}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of cross-validation, exploiting a strategy more sophisticated than cross-vali-dation’s crude winner-takes-all for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular ques-tion. After introducing stacked generalization and justifying its use, this paper presents two numer-ical experiments. The first demonstrates how stacked generalization improves upon a set of sepa-rate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surface-fitter. With the other ex-perimental evidence in the literature, the usual arguments supporting cross-validation, and the ab-stract justifications presented in this paper, the conclusion is that for almost any real-world gener-alization problem one should use some version of stacked generalization to minimize the general-ization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory. Key Words: generalization and induction, combining generalizers, learning set pre-processing, cross-validation, error estimation and correction.