## Ensemble Methods in Machine Learning (2000)

### Cached

### Download Links

- [web.engr.oregonstate.edu]
- [web.engr.oregonstate.edu]
- [ftp.cs.orst.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | MULTIPLE CLASSIFIER SYSTEMS, LBCS-1857 |

Citations: | 479 - 3 self |

### BibTeX

@INPROCEEDINGS{Dietterich00ensemblemethods,

author = {Thomas G. Dietterich},

title = {Ensemble Methods in Machine Learning},

booktitle = {MULTIPLE CLASSIFIER SYSTEMS, LBCS-1857},

year = {2000},

pages = {1--15},

publisher = {Springer}

}

### Years of Citing Articles

### OpenURL

### Abstract

Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.

### Citations

2804 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ...l training set of m items. Such a training set is called a bootstrap replicate of the original training set, and the technique is called bootstrap aggregation (from which the term Bagging is derived; =-=Breiman, 1996-=-). Each bootstrap replicate contains, on the average, 63.2% of the original training set, with several training examples appearing multiple times. Another training set sampling method is to construct ... |

2538 | A decision-theoretic generalization of on-line learning and an application to boosting - Freund, Schapire - 1995 |

1797 | Schapire: “Experiments with a New Boosting Algorithm - Freund, Robert - 1996 |

777 | Boosting the margin : A new explanation for the effectiveness of voting methods. Annals of Statistics - Schapire, Freund, et al. - 1998 |

744 | Improved Boosting Algorithms Using Confidence–rated Predictions - Schapire, Singer - 1999 |

607 | Solving multiclass learning problems via error-correcting output codes - Dietterich, Bakiri - 1995 |

597 | Probabilistic inference using markov chain monte carlo methods
- Neal
- 1993
(Show Context)
Citation Context ...nnot be enumerated, it is sometimes possible to approximate Bayesian voting by drawing a random sample of hypotheses distributed according to P (hjS). Recent work on Markov chain Monte Carlo methods (=-=Neal, 1993-=-) seeks to develop a set of tools for this task. The most idealized aspect of the Bayesian analysis is the prior belief P (h). If this prior completely captures all of the knowledge that we have about... |

586 | An empirical comparison of voting classification algorithms: bagging, boosting and variants - Bauer, Kohavi - 1999 |

553 |
Neural Networks ensembles
- Hansen, Salamon
- 1990
(Show Context)
Citation Context ...ual classi ers that make themup. A necessary and su cient condition for an ensemble of classi ers to be more accurate than any of its individual members is if the classi ers are accurate and diverse (=-=Hansen & Salamon, 1990-=-). An accurate classi er is one that has an error rate of better than random guessing on new x values. Two classi ers are2 diverse if they make di erent errors on newdatapoints. To see why accuracy a... |

479 | An experimental comparison of three methods for constructing ensembles of decision trees
- Dietterich
- 2000
(Show Context)
Citation Context ...tiple random initial weights third best on one synthetic data set and two medical diagnosis data sets. For the C4.5 decision tree algorithm, it is also easy to inject randomness (Kwok & Carter, 1990; =-=Dietterich, 2000-=-). The key decision of C4.5 is to choose a feature to test at each internal node in the decision tree. At each internal node, C4.5 applies a criterion known as the information gain ratio to rank-order... |

201 | Training a 3-node neural network is NP-complete
- Blum, Rivest
- 1992
(Show Context)
Citation Context ...e very difficult computationally for the learning algorithm to find the best hypothesis. Indeed, optimal training of both neural networks and decisions trees is NP-hard (Hyafil & Rivest, 1976; Blum & =-=Rivest, 1988-=-). An ensemble constructed by running the local search from many different starting points may provide a better approximation to the true unknown function than any of the individual classifiers, as sh... |

197 |
Constructing optimal binary decision trees is NP-complete
- Hyafil, Rivest
- 1976
(Show Context)
Citation Context ...sent), it may still be very difficult computationally for the learning algorithm to find the best hypothesis. Indeed, optimal training of both neural networks and decisions trees is NP-hard (Hyafil & =-=Rivest, 1976-=-; Blum & Rivest, 1988). An ensemble constructed by running the local search from many different starting points may provide a better approximation to the true unknown function than any of the individu... |

171 |
Universal Approximation of an Unknown Mapping and Its Derivatives using Multilayered Feedforward Networks
- Hornik, Stinchcombe, et al.
- 1990
(Show Context)
Citation Context ... trees are both very exible algorithms. Given enough training data, they will explore the space of all possible classi ers, and several people have proved asymptotic representation theorems for them (=-=Hornik, Stinchcombe, & White, 1990-=-). Nonetheless, with a nite training sample, these algorithms will explore only a nite set of hypotheses and they will stop searching when they nd an hypothesis that ts the training data. Hence, in Fi... |

134 | Error Reduction through Learning Multiple Descriptions - Ali, Pazzani |

98 | Using output codes to boost multiclass learning problems - Schapire - 1997 |

94 | Back propagation of sensitive to initial conditions - Kolen, Pollack - 1991 |

62 | Bootstrapping with Noise: An Effective Regularization Technique - Raviv, Intrator - 1996 |

61 | Human expert-level performance on a scienti image analysis by a system using combined arti neural networks - Cherkauer - 1996 |

61 |
Multiple decision trees
- Kwok, Carter
- 1990
(Show Context)
Citation Context ... second best, and multiple random initial weights third best on one synthetic data set and two medical diagnosis data sets. For the C4.5 decision tree algorithm, it is also easy to inject randomness (=-=Kwok & Carter, 1990-=-� Dietterich, 2000). The key decision of C4.5 is to choose a feature to test at each internal node in the decision tree. At each internal node, C4.5 applies a criterion known as the information gain r... |

27 |
H.: Improving Committee Diagnosis with Resampling Techniques
- Parmanto, Munro, et al.
- 1996
(Show Context)
Citation Context ...t one of these 10 subsets. This same procedure is employed to construct training sets for 10-fold crossvalidation, so ensembles constructed in this way are sometimes called crossvalidated committees (=-=Parmanto, Munro, & Doyle, 1996-=-). The third method for manipulating the training set is illustrated by the AdaBoost algorithm, developed by Freund and Schapire (1995, 1996, 1997, 1998). Like Bagging, AdaBoost manipulates the traini... |

21 |
Improved boosting algorithms using condence-rated predictions
- Schapire, Singer
- 1999
(Show Context)
Citation Context ...`(x), is constructed by aweighted vote of the individual classi ers. Each classi er is weighted (by w`) according to its accuracy on the weighted training set that it was trained on. Recent research (=-=Schapire & Singer, 1998-=-) has shown that AdaBoost can be viewed as a stage-wise algorithm for minimizing a particular error function. To de ne this error function, suppose that each training example is labeled as +1 or ;1, c... |

19 | An empirical comparison of voting classi cation algorithms: Bagging, boosting, and variants - Bauer, Kohavi |

10 | Extending local learners with error-correcting output codes - Ricci, Aha - 1997 |

4 | Bootstrapping with noise: an e ective regularization technique - Raviv, Intrator - 1996 |

2 | Boosting the margin: A new explanation for the e ectivenessofvoting methods - Schapire, Freund, et al. - 1997 |