MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Soft Margins for AdaBoost (2000) [159 citations — 22 self]

by Gunnar Rätsch ,  G. R Atsch ,  Klaus-Robert Müller ,  Takashi Onoda ,  K. -r. M Uller
Machine Learning
Add To MetaCart

Abstract:

Recently ensemble methods like AdaBoost have been applied successfully in many problems, while seemingly defying the problems of overfitting. AdaBoost rarely overfits in the low noise regime, however, we show that it clearly does so for higher noise levels. Central to the understanding of this fact is the margin distribution. AdaBoost can be viewed as a constraint gradient descent in an error function with respect to the margin. We find that AdaBoost asymptotically achieves a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns that are interestingly very similar to Support Vectors. A hard margin is clearly a sub-optimal strategy in the noisy case, and regularization, in our case a ``mistrust'' in the data, must be introduced in the algorithm to alleviate the distortions that single difficult patterns (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original AdaBoost algorithm to achieve a soft margin. In particular we suggest (1) regularized AdaBoost-Reg where the gradient decent is done directly with respect to the soft margin and (2) regularized linear and quadratic programming (LP/QP-) AdaBoost, where the soft margin is attained by introducing slack variables. Extensive simulations demonstrate that the proposed regularized AdaBoost-type algorithms are useful and yield competitive results for noisy data.

Citations

5044 Statistical Learning Theory – Vapnik - 1998
3356 C4.5: Programs for Machine Learning – Quinlan - 1993
3316 Neural Networks for Pattern Recognition – Bishop - 1995
1565 Bagging predictors – Breiman - 1996
1205 Schapire, “Decision-theoretic generalization of on-line learning and application to boosting – Freund, E - 1997
1091 Support-vector network – Cortes, Vapnik - 1995
719 A training algorithm for optimal margin classifiers – Boser, Guyon, et al. - 1992
596 R.: Additive logistic regression: a statistical view of boosting – Friedman, Hastie, et al. - 1998
500 Boosting the margin: A new explanation for the effectiveness of voting methods – Schapire, Freund, et al. - 1998
400 Improved boosting algorithms using confidence-rated predictions – Schapire, Singer - 1999
395 Fast learning in networks of locally-tuned processing units – Moody, Darken - 1989
196 arcing classifiers – Breiman, Bias - 1996
142 Nonlinear Programming – Mangasarian - 1994
100 Introduction to support vector learning – Schölkopf, Burges, et al. - 1999
100 The connection between regularization operators and support vector kernels – Smola, Scholkopf, et al. - 1997
96 Prediction games and arcing algorithms – Breiman - 1999
94 Game theory, on-line prediction, and boosting – Freund, Schapire - 1996
92 Functional gradient techniques for combining hypotheses – Mason, Baxter, et al. - 1999
83 Boosting in the limit: Maximizing the margin of learned ensembles – Grove, Schuurmans - 1998
73 arcing classifiers – Bias - 1996
52 Arcing the edge – Breiman - 1997
52 Optimization by Simulated Annealing: Quantitative Studies – Kirkpatrick - 1984
47 2000, `Improved Generalization Through Explicit Optimization of Margins – Mason, Bartlett, et al.
44 Combining support vector and mathematical programming methods for classification – Bennett - 1999
36 Theoretical Views of Boosting – Schapire - 1999
28 Y.Benjio, “Boosting Neural Networks – Schwenk - 2000
23 Robust ensemble learning – Rätsch, Schölkopf, et al. - 2000
21 Using adaptive bagging to debias regressions – Breiman - 1999
17 Using support vector machines for time series prediction – Muller, Smola, et al. - 1997
15 Boosting first-order learning – Quinlan - 1996
12 Regularizing AdaBoost – Ratsch, Onoda, et al. - 1999
11 A simple cost function for boosting – Frean, Downs - 1998
11 New support vector algorithms, Neural Computation 12 – Schölkopf, Smola, et al. - 2000
9 A boosting algorithm for regression – Bertoni, Campadelli, et al. - 1997
8 An asymptotic analysis of adaboost in the binary classification case – Onoda, Ratsch, et al. - 1998
8 Greedy function approximation – Friedman - 1999
4 Ensemble learning methods for classification – Ratsch - 1998
2 Perceptrons in kernel feature space – Frie, Harrison - 1998
2 Density estimation using sv machines – Weston, Gammerman, et al. - 1997
1 Learning algorithms for classification: A comparism on handwritten digit recognistion. Neural Networks – LeCun, Jackel, et al. - 1995
1 Perceptrons in kernel feature space. Research report RR-720, Dept – Frie - 1998
1 Improving the generalization performance of the minimum classification error learning and its application to neural networks – Rokui - 1998