## On the Rate of Convergence of Regularized Boosting Classifiers (2003)

### Cached

### Download Links

Venue: | JOURNAL OF MACHINE LEARNING RESEARCH |

Citations: | 50 - 10 self |

### BibTeX

@MISC{Blanchard03onthe,

author = {Gilles Blanchard and Gabor Lugosi and Nicolas Vayatis},

title = { On the Rate of Convergence of Regularized Boosting Classifiers},

year = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite a large class of distributions, the probability of error converges to the Bayes risk at a rate faster than n -(V+2)/(4(V+1)) where V is the VC dimension of the "base" class whose elements are combined by boosting methods to obtain an aggregated classifier. The dimension-independent nature of the rates may partially explain the good behavior of these methods in practical problems. Under Tsybakov's noise condition the rate of convergence is even faster. We investigate the conditions necessary to obtain such rates for different base classes. The special case of boosting using decision stumps is studied in detail. We characterize the class of classifiers realizable by aggregating decision stumps.

### Citations

2626 | A decision-theoretic generalization of on-line learning and an application to boosting - Freund, Schapire - 1997 |

1804 | Generalized Additive Models - Hastie, Tibshirani - 1990 |

1467 | Multilayer feed–forward networks are universal approximators - Hornik - 1988 |

1363 | Additive Logistic Regression: a Statistical View of Boosting - Friedman, Hastie, et al. |

1106 |
A Probabilistic Theory of Pattern Recognition
- Devroye, Gyorfi, et al.
- 1996
(Show Context)
Citation Context ...e base class. The VC dimension equals V = d + 1 in the case of Clin, V = 2d + 1 for Crect, V = d + 2forCball, and is bounded by V = d(d + 1)/2 + 2forCell and by V = d log 2 (2d) for Ctree (see, e.g., =-=Devroye, Györfi, and Lugosi, 1996-=-). Clearly, the lower the VC dimension is, the faster the rate (estimation is easier). The following question arises naturally: find a class with VC dimension as small as possible whose convex hull is... |

970 | Approximations by superpositions of sigmoidal functions - Cybenko - 1989 |

790 | Boosting the margin: A new explanation for the effectiveness of voting methods - Schapire, Freund, et al. - 1998 |

726 | The strength of weak learnability - Schapire - 1990 |

686 | Convergence of stochastic processes - Pollard - 1984 |

453 | Boosting a weak learning algorithm by majority - Freund - 1995 |

417 | Universal Approximation Bounds for Superposition of a Sigmoidal Function - Barron - 1993 |

415 | Weak convergence and empirical processes - Vaart, Wellner - 1996 |

325 |
Probability in Banach Spaces
- Ledoux, Talagrand
- 1991
(Show Context)
Citation Context ...√ EP Eε sup � εif(Xi) � n � � , f∈F we have, following the proof of Lemma 2.5 in Mendelson (2002), and after applying standard chaining techniques (see Dudley, 1978) and contraction inequalities (see =-=Ledoux and Talagrand, 1991-=-), γ 1 2 Rn,P ≤ c 2 − p (τ 2 + T n 28 i=1 1 − 2 Rn,P ) 1 p (1− 2 2 ) .sRate of convergence of boosting classifiers (Note the slight difference as compared to Mendelson (2002) here as in this reference... |

312 | Arcing classifiers
- Breiman
- 1998
(Show Context)
Citation Context ...s tend to produce large-margin classifiers in a certain sense (see Schapire, Freund, Bartlett, and Lee (1998), Koltchinskii and Panchenko (2002)). This view was complemented by Breiman’s observation (=-=Breiman, 1998-=-) that boosting performs gradient descent optimization of an empirical cost function different from the number of misclassified samples, see also Mason, Baxter, Bartlett, and Frean (1999), Collins, Sc... |

282 | Rademacher and Gaussian complexities: Risk bounds and structural results - Bartlett, Mendelson |

218 | Logistic regression, adaboost and bregman distances - Collins, Schapire, et al. - 2002 |

170 | Optimal Aggregation of Classifiers in Statistical Learning - Tsybakov - 2004 |

151 |
Central limit theorems for empirical measures
- Dudley
- 1978
(Show Context)
Citation Context ...of of lemma 20. Putting Rn,P = 1 � � n� � � � � √ EP Eε sup � εif(Xi) � n � � , f∈F we have, following the proof of Lemma 2.5 in Mendelson (2002), and after applying standard chaining techniques (see =-=Dudley, 1978-=-) and contraction inequalities (see Ledoux and Talagrand, 1991), γ 1 2 Rn,P ≤ c 2 − p (τ 2 + T n 28 i=1 1 − 2 Rn,P ) 1 p (1− 2 2 ) .sRate of convergence of boosting classifiers (Note the slight differ... |

151 | Functional gradient techniques for combining hypotheses - Mason, Baxter, et al. |

146 | Boosting with the L2 loss: Regression and classification - Bühlmann, Yu - 2003 |

126 | Empirical margin distributions and bounding the generalization error of combined classifiers - Koltchninskii, Panchenko |

122 | Statistical behavior and consistency of classification methods based on convex risk minimization - Zhang |

121 | An introduction to Boosting and Leveraging - Meir, Rätsch - 2003 |

115 | Local rademacher complexities - Bartlett, Bousquet, et al. |

105 | Minimum contrast estimators on sieves: exponential bounds and rates of convergence - Birgé, Massart - 1998 |

94 | Some applications of concentration inequalities to statistics - Massart |

81 | Boosting the margin: A new explanation for the eectiveness of voting methods - Schapire, Freund, et al. - 1997 |

66 | On the Bayes-risk consistency of regularized boosting methods - Lugosi, Vayatis |

51 | Approximation theory of the MLP model in neural networks - Pinkus - 1999 |

49 | Statistical performance of support vector machines - Blanchard, Bousquet, et al. |

48 | Risk bounds for statistical learning - Massart, Nédélec - 2006 |

42 | Feedback stabilization using two-hidden-layer nets - Sontag - 1990 |

40 | Neural net approximation - Barron - 1992 |

38 | Improving the sample complexity using global data - Mendelson |

36 | Process consistency for adaboost - Jiang |

28 | Rates of convergence for radial basis functions and neural networks - Girosi, Anzellotti - 1993 |

28 | Arcing classi - Breiman - 1998 |

21 | The consistency of greedy algorithms for classification - Mannor, Meir, et al. - 2002 |

20 | Sparse regression ensembles in infinite and finite hypothesis space - Raetsch, Demiriz, et al. - 2000 |

20 | Rate of convex approximation in non-Hilbert spaces - Donahue, Gurvits, et al. - 1997 |

20 | Minimax nonparametric classification—Part I: Rates of convergence - Yang - 1979 |

15 | New approaches to statistical learning theory - Bousquet |

11 | Some infinite theory for predictor ensembles - Breiman - 2000 |

9 | Some in theory for predictor ensembles - Breiman - 2000 |

9 | Boosting with the l 2 loss: regression and classification - Bühlmann, Yu |

8 | On the Approximation of Functional Classes Equipped with a Uniform Measure using Ridge Functions. Jour. of Approximation Theory - Maiorov, Meir, et al. - 1999 |

7 | On the optimality of neural-network approximation using incremental algorithms - Meir, Maiorov - 2000 |

6 | Weak learners and improved convergence rate in boosting - Mannor, Meir - 2001 |

4 | Optimal aggregation of classi in statistical learning - Tsybakov |

3 | A note on the richness of convex hulls of VC classes - Koltchinskii, Lugosi, et al. |