MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Additive Logistic Regression: a Statistical View of Boosting (1998) [596 citations — 14 self]

by Jerome Friedman ,  Trevor Hastie ,  Robert Tibshirani
Annals of Statistics
Add To MetaCart

Abstract:

Boosting (Freund & Schapire 1995) is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data, and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well known statistical principles, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multi-class generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multi-cl...

Citations

2573 Classification and Regression Trees – Breiman, Friedman, et al. - 1984
1565 Bagging predictors – Breiman - 1996
1205 Schapire, “Decision-theoretic generalization of on-line learning and application to boosting – Freund, E - 1997
1045 Experiments with a new boosting algorithm – Freund, Schapire - 1996
691 Generalized Additive Models – Hastie, Tibshirani - 1990
615 Generalized linear models – Nelder, Wedderburn - 1972
500 Boosting the margin: A new explanation for the effectiveness of voting methods – Schapire, Freund, et al. - 1998
457 The strength of weak learnability – Schapire - 1990
431 Matching pursuits with time-frequency dictionaries – Mallat, Zhang - 1993
422 An Introduction to Computational Learning Theory – Kearns, Vazirani - 1994
330 Very simple classification rules perform well on most commonly used datasets – Holte - 1993
300 An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization – Dietterich
294 Boosting a Weak Learning Algorithm by Majority – Freund - 1995
268 Projection Pursuit Regression – Friedman, Stuetzle - 1981
168 Multivariate adaptive regression splines (with discussion), The – Friedman - 1991
96 Prediction games and arcing algorithms – Breiman - 1999
86 variance and arcing classifiers – Breiman, “Bias - 1996
80 Another Approach to Polychotomous Classification – Friedman - 1996
71 Flexible discriminant analysis by optimal scoring – Hastie, Tibshirani, et al. - 1994
39 Linear smoothers and additive models (with discussion – Buja, Hastie, et al. - 1989
11 Classification by pairwise coupling. The Annals of Statistics – Hastie, Tibshirani - 1998
5 Nearest neighbor pattern classification', Proc – Cover - 1967