## BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING

Venue: | SUBMITTED TO STATISTICAL SCIENCE |

Citations: | 38 - 5 self |

### BibTeX

@MISC{Bühlmann_boostingalgorithms:,

author = {Peter Bühlmann and Torsten Hothorn},

title = {BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING},

year = {}

}

### OpenURL

### Abstract

We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in high-dimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated open-source software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing user-specified loss functions.

### Citations

2862 |
UCI repository of machine learning databases
- BLAKE, C
- 1998
(Show Context)
Citation Context ...ibe22 P. BÜHLMANN AND T. HOTHORN characteristics of the cell nuclei present in the image) have been studied by Street, Mangasarian and Wolberg (author?) [80] (the data are part of the UCI repository =-=[11]-=-). We first analyze these data as a binary prediction problem (recurrence vs. nonrecurrence) and later in Section 8 by means of survival models. We are faced with many covariates (p = 32) for a limite... |

2488 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ... the resultsBOOSTING ALGORITHMS AND MODEL FITTING 3 from the previous iteration m − 1 only (memoryless with respect to iterations m − 2,m − 3,...). Examples of other ensemble schemes include bagging =-=[14]-=- or random forests [1, 17]. 1.2 AdaBoost The AdaBoost algorithm for binary classification [31] is the most well-known boosting algorithm. The base procedure is a classifier with values in {0,1} (sligh... |

2306 | A decision-theoretic generalization of on-line learning and an application to boosting
- Freund, Schapire
- 1997
(Show Context)
Citation Context ...d linear models, generalized additive models, gradient boosting, survival analysis, variable selection, software. 1. INTRODUCTION Freund and Schapire’s AdaBoost algorithm for classification (author?) =-=[29, 30, 31]-=- has attracted much attention in the machine learning community (cf. [76], and the references therein) as well as in related areas in statistics (author?) [15, 16, 33]. Various versions of the AdaBoos... |

1629 | Experiments with a new boosting algorithm
- Freund, Schapire
- 1996
(Show Context)
Citation Context ...d linear models, generalized additive models, gradient boosting, survival analysis, variable selection, software. 1. INTRODUCTION Freund and Schapire’s AdaBoost algorithm for classification (author?) =-=[29, 30, 31]-=- has attracted much attention in the machine learning community (cf. [76], and the references therein) as well as in related areas in statistics (author?) [15, 16, 33]. Various versions of the AdaBoos... |

1391 | Random forests
- Breiman
(Show Context)
Citation Context ...LGORITHMS AND MODEL FITTING 3 from the previous iteration m − 1 only (memoryless with respect to iterations m − 2,m − 3,...). Examples of other ensemble schemes include bagging [14] or random forests =-=[1, 17]-=-. 1.2 AdaBoost The AdaBoost algorithm for binary classification [31] is the most well-known boosting algorithm. The base procedure is a classifier with values in {0,1} (slightly different from a real-... |

1317 | Generalized Additive Models
- Nonparametric, Hastie, et al.
- 1990
(Show Context)
Citation Context ...lected by the boosting algorithm as well. 4.2 Componentwise Smoothing Spline for Additive Models Additive and generalized additive models, introduced by Hastie and Tibshirani (author?) [40] (see also =-=[41]-=-), have become very popular for adding more flexibility to the linear structure in generalized linear models. Such flexibility can also be added inboosting (whose framework is especially useful for h... |

1217 | Rejoinder: Additive logistic regression: a statistical view of boosting
- Friedman, Hastie, et al.
- 2000
(Show Context)
Citation Context ...ready gave some reasons at the end of Section 3.2.1 why the negative log-likelihood loss function in (3.1) is very useful for binary classification problems. Friedman, Hastie and Tibshirani (author?) =-=[33]-=- were first in advocating this, and they proposed Logit-Boost, which is very similar to the generic FGD algorithm when using the loss from (3.1): the deviation from FGD is the use of Newton’s method i... |

1046 | Matching pursuits with time-frequency dictionaries
- Mallat, Zhang
- 1993
(Show Context)
Citation Context ...t. As m tends to infinity, ˆf [m] (·) converges to a least squares solution which is unique if the design matrix has full rank p ≤ n. The method is also known as matching pursuit in signal processing =-=[60]-=-, weak greedy algorithm in computational mathematics [81], and it is a Gauss–Southwell algorithm [79] for solving a linear system of equations. We will discuss more properties of L2Boosting with compo... |

876 |
Exploratory Data Analysis
- Tukey
- 1977
(Show Context)
Citation Context ...older roots of boosting. In the context of regression, there is an immediate connection to the Gauss–Southwell algorithm [79] for solving a linear system of equations (see Section 4.1) and to Tukey’s =-=[83]-=- method of “twicing” (see Section 5.1). 2. FUNCTIONAL GRADIENT DESCENT Breiman (author?) [15, 16] showed that the AdaBoost algorithm can be represented as a steepest descent algorithm in function spac... |

862 |
The Elements of
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...t al. (author?) [62] and Rätsch, Onoda and Müller (author?) [70] developed related ideas which were mainly acknowledged in the machine learning community. In Hastie, Tibshirani and Friedman (author?) =-=[42]-=-, additional views on boosting are given; 12 P. BÜHLMANN AND T. HOTHORN in particular, the authors first pointed out the relation between boosting and ℓ 1 -penalized estimation. The insights of Fried... |

721 | Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics
- Schapire, Freund, et al.
- 1998
(Show Context)
Citation Context ...asets [64]. A stream of work has been devoted to develop VC-type bounds for the generalization (out-of-sample) error to explain why boosting is overfitting very slowly only. Schapire et al. (author?) =-=[77]-=- proved a remarkable bound for the generalization misclassification error for classifiers in the convex hull of a base procedure. This bound for the misclassification error has been improved by Koltch... |

664 | The strength of weak learnability
- Schapire
- 1989
(Show Context)
Citation Context ...ed that if individual classifiers perform at least slightly better than guessing at random, their predictions can be combined and averaged, yielding much better predictions. Later, Schapire (author?) =-=[75]-=- proposed a boosting algorithm with provable polynomial runtime to construct such a better ensemble of classifiers. The AdaBoost algorithm [29, 30, 31] is considered as a first path-breaking step towa... |

562 | Greedy function approximation: A gradient boosting machine
- Friedman
- 2001
(Show Context)
Citation Context ...i (author?) [33] opened new perspectives, namely to use boosting methods in many other contexts than classification. We mention here boosting methods for regression (including generalized regression) =-=[22, 32, 71]-=-, for density estimation [73], for survival analysis [45, 71] or for multivariate analysis [33, 59]. In quite a few of these proposals, boosting is not only a black-box prediction tool but also an est... |

486 | BOOSTEXTER: A boosting-based system for text categorization
- Schapire, Singer
- 2000
(Show Context)
Citation Context ...applications of boosting methods to real data problems. We mention here classification of tumor types from gene expressions [25, 26], multivariate financial time series [2, 3, 4], text classification =-=[78]-=-, document routing [50] or survival analysis [8] (different from the approach in Section 8). 9.2 Asymptotic Theory The asymptotic analysis of boosting algorithms includes consistency and minimax rate ... |

454 |
Nonparametric Regression and Generalized Linear Models: a Roughness Penalty Approach
- Green, Silverman
- 1994
(Show Context)
Citation Context ...i − f(X i=1 (j) i )) 2 (f ′′ (x)) 2 dx, where λ > 0 is a tuning parameter such that the trace of the corresponding hat matrix equals df. For further details, we refer to Green and Silverman (author?) =-=[36]-=-. As a note of caution, we use in the sequel the terminology of “hat matrix” in a broad sense: it is a linear operator but not a projection in general. The base procedure is then ˆg(x) = ˆ f ( ˆ S) (x... |

381 | High-dimensional graphs and variable selection with the lasso
- Meinshausen, Bühlmann
(Show Context)
Citation Context ...ble penalty parameter λ in (5.5), the Lasso finds the true underlying submodel (the predictor variables with corresponding regression coefficients = 0) with probability tending quickly to 1 as n → ∞ =-=[65]-=-. It is important to note the role of the sufficient and “almost” necessary condition of the Lasso for model selection: Zhao and Yu (author?) [94] call it the “irrepresentable condition” which has (ma... |

377 |
Bioconductor: open software development for computational biology and bioinformatics
- Gentleman
(Show Context)
Citation Context ...samples (data taken from [90]). For each sample, a binary response variable describes the lymph node status (25 negative and 24 positive). The data are stored in form of an exprSet object westbc (see =-=[35]-=-) and we first extract the matrix of expression levels and the response variable: R> x <- t(exprs(westbc)) R> y <- pData(westbc)$nodal.y We aim at using L2Boosting for classification (see Section 3.2.... |

323 |
Regression shrinkage and selection via the
- Tibshirani
- 1996
(Show Context)
Citation Context ...nds. 5.2.1 Connections to the Lasso. Hastie, Tibshirani and Friedman (author?) [42] pointed out first an intriguing connection between L2Boosting with componentwise linear least squares and the Lasso =-=[82]-=- which is the following ℓ 1 -penalty method: ˆβ(λ) = argmin n β −1 n∑ ( Yi − β0 − i=1 (5.5) p∑ + λ |β (j) |. j=1 p∑ j=1 β (j) X (j) ) 2 i Efron et al. (author?) [28] made the connection rigorous and e... |

319 | The boosting approach to machine learning: An overview
- Schapire
- 2002
(Show Context)
Citation Context ...s, variable selection, software. 1. INTRODUCTION Freund and Schapire’s AdaBoost algorithm for classification (author?) [29, 30, 31] has attracted much attention in the machine learning community (cf. =-=[76]-=-, and the references therein) as well as in related areas in statistics (author?) [15, 16, 33]. Various versions of the AdaBoost algorithm have proven to be very competitive in terms of prediction acc... |

304 | Cryptographic limitations on learning Boolean formulae and finite automata
- Kearns, Valiant
- 1994
(Show Context)
Citation Context ....4 Historical Remarks The idea of boosting as an ensemble method for improving the predictive performance of a base procedure seems to have its roots in machine learning. Kearns and Valiant (author?) =-=[52]-=- proved that if individual classifiers perform at least slightly better than guessing at random, their predictions can be combined and averaged, yielding much better predictions. Later, Schapire (auth... |

254 | Soft margins for AdaBoost
- Rätsch, Onoda, et al.
- 2001
(Show Context)
Citation Context ...tion 4), but refers to the fact that boosting is an additive (in fact, a linear) combination of “simple” (function) estimators. Also Mason et al. (author?) [62] and Rätsch, Onoda and Müller (author?) =-=[70]-=- developed related ideas which were mainly acknowledged in the machine learning community. In Hastie, Tibshirani and Friedman (author?) [42], additional views on boosting are given; 12 P. BÜHLMANN AN... |

250 |
The adaptive Lasso and its oracle properties
- Zou
- 2006
(Show Context)
Citation Context ...on optimal tuned Lasso selects a submodel which contains the true model with high probability. A nice proposal to correct Lasso’s overestimation behavior is the adaptive Lasso, given by Zou (author?) =-=[96]-=-. It is based on where ˆ βinit is an initial estimator, for example, the Lasso (from a first stage of Lasso estimation). Consistency of the adaptive Lasso for variable selection has been proved for th... |

224 |
Core Team. 2006. R: A Language and Environment for Statistical Computing
- Development
(Show Context)
Citation Context ...ing statistical models, we look at the methodology from a practical point of view as well. The dedicated add-on package mboost (“model-based boosting,” [43]) to the R system for statistical computing =-=[69]-=- implements computational tools which enable the data analyst to compute on the theoretical concepts explained in this paper as closely as possible. The illustrations presented throughout the paper fo... |

222 | On Model Selection Consistency of Lasso
- Zhao, Yu
- 2006
(Show Context)
Citation Context ... = 0) with probability tending quickly to 1 as n → ∞ [65]. It is important to note the role of the sufficient and “almost” necessary condition of the Lasso for model selection: Zhao and Yu (author?) =-=[94]-=- call it the “irrepresentable condition” which has (mainly) implications on the “degree of collinearity” of the design (predictor variables), and they give examples where it holds and where it fails t... |

192 |
2001, ‘Predicting the Clinical Status of Human Breast Cancer by Using Gene Expression Profiles
- West, Blanchette, et al.
(Show Context)
Citation Context ...ally important in high-dimensional situations. As an example, we study a binary classification problem involving p = 7129 gene expression levels in n = 49 breast cancer tumor samples (data taken from =-=[90]-=-). For each sample, a binary response variable describes the lymph node status (25 negative and 24 positive). The data are stored in form of an exprSet object westbc (see [35]) and we first extract th... |

178 | Shape quantization and recognition with randomized trees
- Amit, Geman
- 1997
(Show Context)
Citation Context ...a regression tree. Then, generating an ensemble from the base procedures, that is, an ensemble of function estimates or predictions, works generally as follows: base procedure reweighted data 1 −→ ˆg =-=[1]-=-(·) reweighted data 2 base procedure −→ ˆg [2](·) · · · · · · · · · · · · base procedure reweighted data M −→ ˆg [M](·) aggregation: ˆ fA(·) = M∑ αmˆg [m](·). m=1 What is termed here as “reweighted da... |

162 | A new approach to variable selection in least squares problems
- Osborne, Presnell, et al.
- 2000
(Show Context)
Citation Context ...LARS) algorithm from Efron et al. (author?) [28] (see also [68] for generalized linear models). The latter is very similar to the algorithm proposed earlier by Osborne, Presnell and Turlach (author?) =-=[67]-=-. In special cases where the design matrix satisfies a “positive cone condition,” FSLR, Lasso and LARS all coincide ([28], page 425). For more general situations, when adding some backward steps to bo... |

142 |
Functional gradient techniques for combining hypotheses
- Mason, Baxter, et al.
- 1999
(Show Context)
Citation Context ...h is additive in the covariates (see our Section 4), but refers to the fact that boosting is an additive (in fact, a linear) combination of “simple” (function) estimators. Also Mason et al. (author?) =-=[62]-=- and Rätsch, Onoda and Müller (author?) [70] developed related ideas which were mainly acknowledged in the machine learning community. In Hastie, Tibshirani and Friedman (author?) [42], additional vie... |

137 | Prediction games and arcing algorithms
- Breiman
- 1999
(Show Context)
Citation Context ...thm for classification (author?) [29, 30, 31] has attracted much attention in the machine learning community (cf. [76], and the references therein) as well as in related areas in statistics (author?) =-=[15, 16, 33]-=-. Various versions of the AdaBoost algorithm have proven to be very competitive in terms of prediction accuracy in a Peter Bühlmann is Professor, Seminar für Statistik, ETH Zürich, CH-8092 Zürich, Swi... |

127 | Boosting for tumor classification with gene expression data
- Dettling, Buhlmann
- 2003
(Show Context)
Citation Context ...e for modeling observational data, are proposed in [63]. There are numerous applications of boosting methods to real data problems. We mention here classification of tumor types from gene expressions =-=[25, 26]-=-, multivariate financial time series [2, 3, 4], text classification [78], document routing [50] or survival analysis [8] (different from the approach in Section 8). 9.2 Asymptotic Theory The asymptoti... |

121 | Convexity, classification, and risk bounds - Bartlett, Jordan, et al. - 2006 |

121 | Boosting with the l2-loss: Regression and classification
- Bühlmann, Yu
(Show Context)
Citation Context ...i (author?) [33] opened new perspectives, namely to use boosting methods in many other contexts than classification. We mention here boosting methods for regression (including generalized regression) =-=[22, 32, 71]-=-, for density estimation [73], for survival analysis [45, 71] or for multivariate analysis [33, 59]. In quite a few of these proposals, boosting is not only a black-box prediction tool but also an est... |

113 | Empirical margin distributions and bounding the generalization error of combined classi ers
- Koltchinskii, Panchenko
- 2002
(Show Context)
Citation Context ...the generalization misclassification error for classifiers in the convex hull of a base procedure. This bound for the misclassification error has been improved by Koltchinskii and Panchenko (author?) =-=[53]-=-, deriving also a generalization bound for AdaBoost which depends on the number of boosting iterations. It has been argued in [33], rejoinder, and [21] that the overfitting resistance (slow overfittin... |

111 | An introduction to boosting and leveraging
- Meir, Rätsch
- 2003
(Show Context)
Citation Context ... WORKS We briefly summarize here some other works which have not been mentioned in the earlier sections. A very different exposition than ours is the overview of boosting by Meir and Rätsch (author?) =-=[66]-=-. 9.1 Methodology and Applications Boosting methodology has been used for various other statistical models than what we have discussed in the previous sections. Models for multivariate responses are s... |

102 |
Better subset regression using the nonnegative garrote
- Breiman
- 1995
(Show Context)
Citation Context ...old (dotted-dashed), soft-threshold (dotted) and adaptive Lasso (solid) estimator in a linear model with orthonormal design. For this design, the adaptive Lasso coincides with the nonnegative garrote =-=[13]-=-. The value on the x-abscissa, denoted by z, is a single component of X ⊤ Y. penalty parameter λ in (5.5) such that the mean squared error is minimal, the probability for estimating the true submodel ... |

90 |
Least Angle Regression (with discussion
- Efron, Hastie, et al.
- 2004
(Show Context)
Citation Context ...desktop computer (Intel Pentium 4, 2.8 GHz). Thus, this form of estimation and variable selection is computationally very efficient. As a comparison, computing all Lasso solutions, using package lars =-=[28, 39]-=- in R (with use.Gram=FALSE), takes about 6.7 seconds. The question how to choose mstop can be addressed by the classical AIC criterion as follows: R> aic <- AIC(west_glm, method = "classical") R> msto... |

87 |
Smoothing parameter selection in nonparametric regression using an improved akaike information criterion
- Hurvich, Simonoff, et al.
- 1998
(Show Context)
Citation Context ...ting Having some degrees of freedom at hand, we can now use information criteria for estimating a good stopping iteration, without pursuing some sort of cross-validation. We can use the corrected AIC =-=[49]-=-: AICc(m) = log(ˆσ 2 ) + 1 + df(m)/n (1 − df(m) + 2)/n , ˆσ 2 = n −1 n∑ (Yi − (BmY)i) i=1 2 . In mboost, the corrected AIC criterion can be computed via AIC(x, method = "corrected") (with x being an o... |

65 | On the bayes-risk consistency of regularized boosting methods
- Lugosi, Vayatis
- 2004
(Show Context)
Citation Context ...sasco and Caponnetto (author?) [91] and Bissantz et al. (author?) [10]. In the machine learning community, there has been a substantial focus on estimation in the convex hull of function classes (cf. =-=[5, 6, 58]-=-). For example, one may want to estimate a regression or probability function by using ∞∑ ˆwkˆg [k] ∞∑ (·), ˆwk ≥ 0, ˆwk = 1, k=1 k=1 where the ˆg [k](·)’s belong to a function class such as stumps or... |

50 | Vector greedy algorithms
- Lutoborski, Temlyakov
- 2003
(Show Context)
Citation Context ...t squares solution which is unique if the design matrix has full rank p ≤ n. The method is also known as matching pursuit in signal processing [60], weak greedy algorithm in computational mathematics =-=[81]-=-, and it is a Gauss–Southwell algorithm [79] for solving a linear system of equations. We will discuss more properties of L2Boosting with componentwise linear least squares in Section 5.2. When using ... |

46 | On the rate of convergence of regularized boosting classifiers
- Blanchard, Lugosi, et al.
(Show Context)
Citation Context ...f such convex combination or ℓ 1 -regularized “boosting” methods has been given by Lugosi and Vayatis (author?) [58]. Mannor, Meir and Zhang (author?) [61] and Blanchard, Lugosi and Vayatis (author?) =-=[12]-=- derived results for rates of convergence of (versions of) convex combination schemes. APPENDIX A.1: SOFTWARE The data analyses presented in this paper have been performed using the mboost add-on pack... |

46 |
Methods for Censored Longitudinal Data and Causality
- VanderLaan
(Show Context)
Citation Context ...e make a restrictive assumption that Ci is conditionally independent of Ti given Xi (and we assume independence among different indices i); this implies that the coarsening at random assumption holds =-=[89]-=-. We consider the squared error loss for the complete data, ρ(y,f) = |y − f| 2 (without the irrelevant factor 1/2). For the observed data, the following weighted version turns out to be useful: ρobs(o... |

45 | Unbiased recursive partitioning: A conditional inference framework
- HOTHORN, HORNIK, et al.
- 2006
(Show Context)
Citation Context ...rees handle covariates measured at different scales (continuous, ordinal or nominal variables) in a unified way; unbiased split or variable selection in the context of different scales is proposed in =-=[47]-=-.12 P. BÜHLMANN AND T. HOTHORN Fig. 3. bodyfat data: Partial contributions of four covariates in an additive model (without centering of estimated functions to mean zero). When using stumps, that is,... |

42 | Boosting with early stopping: convergence and consistency, Annals of Statistics 33
- Zhang, Yu
- 2005
(Show Context)
Citation Context ...tency result for AdaBoost has been given by Jiang (author?) [51], and a different constructive proof with a range for the stopping value mstop = mstop,n is given in [7]. Later, Zhang and Yu (author?) =-=[92]-=- generalized the results for a functional gradient descent with an additional relaxation scheme, and their theory covers also more general loss functions than the exponential loss in AdaBoost. For L2B... |

39 | Boosting for high-dimensional linear models
- Buhlmann
(Show Context)
Citation Context ...alysis [33, 59]. In quite a few of these proposals, boosting is not only a black-box prediction tool but also an estimation method for models with a specific structure such as linearity or additivity =-=[18, 22, 45]-=-. Boosting can then be seen as an interesting regularization scheme for estimating a model. This statistical perspective will drive the focus of our exposition of boosting. We present here some cohere... |

38 | Adaptive Lasso for sparse highdimensional regression models
- Huang, Ma, et al.
- 2008
(Show Context)
Citation Context ...of Lasso estimation). Consistency of the adaptive Lasso for variable selection has been proved for the case with fixed predictordimension p [96] and also for the high-dimensional case with p = pn ≫ n =-=[48]-=-. We do not expect that boosting is free from the difficulties which occur when using the Lasso for variable selection. The hope is, though, that also boosting would produce an interesting set of subm... |

37 |
Generalized additive models (with discussion
- Hastie, Tibshirani
- 1986
(Show Context)
Citation Context ...h, have been selected by the boosting algorithm as well. 4.2 Componentwise Smoothing Spline for Additive Models Additive and generalized additive models, introduced by Hastie and Tibshirani (author?) =-=[40]-=- (see also [41]), have become very popular for adding more flexibility to the linear structure in generalized linear models. Such flexibility can also be added inboosting (whose framework is especial... |

34 |
An ℓ1 regularization-path algorithm for generalized linear models
- Park, Hastie
(Show Context)
Citation Context ...quivalence is derived by representing FSLR and Lasso as two different modifications of the computationally efficient least angle regression (LARS) algorithm from Efron et al. (author?) [28] (see also =-=[68]-=- for generalized linear models). The latter is very similar to the algorithm proposed earlier by Osborne, Presnell and Turlach (author?) [67]. In special cases where the design matrix satisfies a “pos... |

33 | Y.: Loss functions for binary class probability estimation and classification: Structure and applications
- Buja, Stuetzle, et al.
- 2005
(Show Context)
Citation Context ...ass(). All loss functions mentioned for binary classification (displayed in Figure 1) can be viewed and interpreted from the perspective of proper scoring rules; cf. Buja, Stuetzle and Shen (author?) =-=[24]-=-. We usually prefer the negative log-likelihood loss in (3.1) because: (i) it yields probability estimates; (ii) it is a monotone loss function of the margin value ˜yf; (iii) it grows linearly as the ... |

32 | Propensity Score Estimation with Boosted Regression for Evaluating Causal Effects in Observational Studies - McCaffrey, Ridgeway, et al. - 2004 |

29 |
Arcing classifiers (with discussion
- Breiman
- 1998
(Show Context)
Citation Context ...thm for classification (author?) [29, 30, 31] has attracted much attention in the machine learning community (cf. [76], and the references therein) as well as in related areas in statistics (author?) =-=[15, 16, 33]-=-. Various versions of the AdaBoost algorithm have proven to be very competitive in terms of prediction accuracy in a Peter Bühlmann is Professor, Seminar für Statistik, ETH Zürich, CH-8092 Zürich, Swi... |