Abstract:
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest--descent minimization. A general gradient--descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least--squares, least--absolute--deviation, and Huber--M loss functions for regression, and multi--class logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are decision trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of decision trees produces competitive, highly robust, interpretable procedures for regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire 1996, and Friedman, Has...
Citations
|
5044
|
Statistical Learning Theory
– Vapnik
- 1998
|
|
3356
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
2573
|
Classification and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
1045
|
Experiments with a new boosting algorithm
– Freund, Schapire
- 1996
|
|
767
|
Pattern recognition and Neural Networks
– Ripley
- 1996
|
|
691
|
Generalized Additive Models
– Hastie, Tibshirani
- 1990
|
|
596
|
R.: Additive logistic regression: a statistical view of boosting
– Friedman, Hastie, et al.
- 1998
|
|
485
|
Learning representations by back-propagating errors
– Rumelhart, Hinton, et al.
- 1986
|
|
400
|
Improved boosting algorithms using confidence-rated predictions
– Schapire, Singer
- 1999
|
|
211
|
Radial basis functions for multivariable interpolation: a review. In Algorithms for approximation
– POWELL
- 1987
|
|
168
|
Multivariate adaptive regression splines (with discussion), The
– Friedman
- 1991
|
|
159
|
Soft margins for AdaBoost
– Rätsch, Onoda, et al.
|
|
149
|
Robust estimation of a location parameter
– Huber
- 1964
|
|
96
|
Prediction games and arcing algorithms
– Breiman
- 1999
|
|
72
|
Nonlinear wavelet methods for recovery of signals, densities, and spectra from indirect and noisy data
– Donoho
- 1993
|
|
48
|
Improving regressors using boosting techniques
– Drucker
- 1997
|
|
45
|
The Visual Design and Control of Trellis Displays
– Becker, Cleveland, et al.
- 1996
|
|
32
|
Improved boosting algorithms using con predictions
– Schapire
- 1999
|
|
29
|
Pasting bites together for prediction in large data sets and on-line (Tech. Rep
– Breiman
- 1996
|
|
25
|
Matching pursuit with time frequency dictionaries
– Mallat, Zhang
- 1993
|
|
16
|
A geometric approach to leveraging weak learners
– Duffy, Helmbold
- 1999
|
|
10
|
Regression, prediction and shrinkage (with discussion
– Copas
- 1983
|
|
1
|
Cr--Pyrope garnets in lithospheric mantle
– Griffin, Fisher, et al.
- 1997
|
|
1
|
A mathematical model for medical diagnosis { application to congenital heart disease
– Warner, Toronto, et al.
- 1961
|