. Methods for voting classificationalgorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and realworld datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation, reweighting, and combination techniques, a#ect classification error. We provide a bias and variance decompositionof the error to show how di#erent methods and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstable methods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves di#erently than AdaBoost if r...
|
3011
|
Pattern Classification and Scene Analysis
– Duda, Hart
- 1973
|
|
2227
|
UCI repository of machine learning databases
– Blake, Merz
|
|
1565
|
Bagging predictors
– Breiman
- 1996
|
|
1213
|
An Introduction to the Bootstrap
– Efron, Tibshirani
- 1993
|
|
1205
|
Schapire, “Decision-theoretic generalization of on-line learning and application to boosting
– Freund, E
- 1997
|
|
1045
|
Experiments with a new boosting algorithm
– Freund, Schapire
- 1996
|
|
600
|
Bayesian Theory
– Bernardo, Smith
- 1994
|
|
538
|
C4.5: Programs for
– Quinlan
- 1993
|
|
508
|
Neural networks and the bias/variance dilemma
– Geman, Bienenstock, et al.
- 1992
|
|
500
|
Boosting the margin: A new explanation for the effectiveness of voting methods
– Schapire, Freund, et al.
- 1998
|
|
457
|
The strength of weak learnability
– Schapire
- 1990
|
|
444
|
Multi-interval discretization of continuous-valued attributes for classification learning
– M, Irani
- 1993
|
|
366
|
A study of cross-validation and bootstrap for accuracy estimation and model selection
– Kohavi
- 1995
|
|
330
|
Very simple classification rules perform well on most commonly used datasets
– Holte
- 1993
|
|
324
|
Approximate statistical test for comparing supervised classification learning algorithms
– Dietterich
- 1998
|
|
294
|
Boosting a Weak Learning Algorithm by Majority
– Freund
- 1995
|
|
242
|
An analysis of Bayesian classifiers
– Langley, Iba, et al.
- 1992
|
|
234
|
Beyond independence: Conditions for the optimality of the simple Bayesian classifier
– Domingos, Pazzani
- 1996
|
|
222
|
Bagging, boosting, and C4.5
– Quinlan
- 1996
|
|
196
|
arcing classifiers
– Breiman, Bias
- 1996
|
|
131
|
Bias plus variance decomposition for zeroone loss functions
– Kohavi, Wolpert
- 1996
|
|
128
|
Data mining using MLC++: A machine learning library
– Kohavi, eld, et al.
- 1996
|
|
125
|
Conservation Law for Generalization Performance
– Schaffer
- 1994
|
|
115
|
The Estimation of Probabilities: An Essay on Modern Bayesian Methods
– Good
- 1965
|
|
115
|
Error-correcting output coding corrects bias and variance
– Kong, Dietterich
- 1995
|
|
96
|
Learning Classification Trees
– Buntine
- 1992
|
|
87
|
Wrappers for performance enhancement and oblivious decision graphs
– Kohavi
- 1995
|
|
82
|
Error-Based and Entropy-Based Discretization of Continuous Features
– Kohavi, Sahami
- 1996
|
|
78
|
Boosting decision trees
– Drucker, Cortes
- 1996
|
|
77
|
36 misclassification costs
– Pazzani, Merz, et al.
- 1994
|
|
77
|
Boosting the Margin: A new Explanation for the Eectiveness of Voting Methods'. The Annals of Statistics 26(5
– Schapire, Freund, et al.
- 1998
|
|
74
|
A Theory of Learning Classification Rules
– Buntine
- 1990
|
|
68
|
Error-correcting output codes: a general method for improving multiclass inductive learning programs
– Dietterich, Bakiri
- 1991
|
|
57
|
Multiple decision trees
– Kwok
- 1990
|
|
55
|
The effects of training set size on decision tree complexity
– Oates
- 1997
|
|
52
|
Arcing the edge
– Breiman
- 1997
|
|
52
|
Boosting and naive Bayesian learning
– Elkan
- 1997
|
|
45
|
On bias, variance, 0/1--loss, and the curse of dimensionality
– Friedman
- 1997
|
|
42
|
Comparing connectionist and symbolic learning methods
– Quinlan
- 1994
|
|
38
|
Heuristics of instability in model selection
– Breiman
- 1994
|
|
37
|
Learning symbolic rules using artificial neural networks
– Craven, Shavlik
- 1993
|
|
35
|
Stacked generalization”, Neural Networks 5
– Wolpert
- 1992
|
|
34
|
Induction of one-level decision trees
– Iba, Langley
- 1992
|
|
33
|
On pruning and averaging decision trees
– Oliver
- 1995
|
|
28
|
Visualizing the simple bayesian classifier
– Becker, Kohavi
- 1997
|
|
28
|
Option Decision Trees with Majority Votes
– Kohavi, Kunz
- 1997
|
|
28
|
Feature subset selection using the wrapper model: Overfitting and dynamic search space topology
– Kohavi
- 1995
|
|
22
|
Why Does Bagging Work? A Bayesian Account and its Implications
– Domingos
- 1997
|
|
21
|
Learning Probabilistic Relational Concept Descriptions
– Ali
- 1996
|
|
18
|
Interpretable Boosted Naive Bayes Classification
– Ridgeway, Madigan, et al.
- 1998
|