## Boosting Algorithms as Gradient Descent (2000)

### Cached

### Download Links

- [www.csail.mit.edu]
- [www.mcs.vuw.ac.nz]
- [www.lsmason.com]
- [www.cs.cmu.edu]
- [wwwcrasys.anu.edu.au]
- DBLP

### Other Repositories/Bibliography

Citations: | 118 - 1 self |

### BibTeX

@MISC{Mason00boostingalgorithms,

author = {Llew Mason and Jonathan Baxter and Peter Bartlett and Marcus Frean},

title = {Boosting Algorithms as Gradient Descent},

year = {2000}

}

### Years of Citing Articles

### OpenURL

### Abstract

Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier having large margins on the training data. We present an abstract algorithm for finding linear combinations of functions that minimize arbitrary cost functionals (i.e functionals that do not necessarily depend on the margin). Many existing voting methods can be shown to be special cases of this abstract algorithm. Then, following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such cost functions. Experiments on

### Citations

3085 |
UCI repository of machine learning databases
- Blake, Merz
- 1998
(Show Context)
Citation Context ...ithm the reader is referred to the full version of this paper [15]. We compared the performance of DOOM II and AdaBoost on a selection of nine data sets taken from the UCI machine learning repository =-=[4-=-] to which various levels of label noise had been applied. To simplify matters, only binary classication problems were considered. For all of the experiments axis orthogonal hyperplanes (also known as... |

2765 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ...oduction There has been considerable interest recently in voting methods for pattern classication, which predict the label of a particular example using a weighted vote over a set of base classiers [1=-=0, 2, 6, 9, 16, 5, 3, 19, 12, 17, 7, 11,-=- 8]. Recent theoretical results suggest that the eectiveness of these algorithms is due to their tendency to produce large margin classiers [1, 18]. Loosely speaking, if a combination of classiers cor... |

2492 | A decision-theoretic generalization of on-line learning and an application to boosting
- Freund, Schapire
- 1995
(Show Context)
Citation Context ... voting methods for pattern classification, which predict the label of a particular example using a weighted vote over a set of base classifiers. For example, Freund and Schapire's AdaBoost algorithm =-=[10]-=- and Breiman's Bagging algorithm [2] have been found to give significant performance improvements over algorithms for the corresponding base classifiers [6, 9, 16, 5], and have led to the study of man... |

1774 | Experiments with a new boosting algorithm
- Freund, Schapire
- 1996
(Show Context)
Citation Context ..., Freund and Schapire's AdaBoost algorithm [10] and Breiman's Bagging algorithm [2] have been found to give significant performance improvements over algorithms for the corresponding base classifiers =-=[6, 9, 16, 5]-=-, and have led to the study of many related algorithms [3, 19, 12, 17, 7, 11, 8]. Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce larg... |

1320 | Additive logistic regression: a statistical view of boosting
- Friedman, Hastie, et al.
(Show Context)
Citation Context ...g algorithm [2] have been found to give significant performance improvements over algorithms for the corresponding base classifiers [6, 9, 16, 5], and have led to the study of many related algorithms =-=[3, 19, 12, 17, 7, 11, 8]-=-. Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce large margin classifiers . The margin of an example is defined as the difference bet... |

771 | Boosting the margin: a new explanation of effectiveness of voting methods
- Schapire, Freund, et al.
- 1998
(Show Context)
Citation Context ...ence of correct classification: an example is classified correctly if and only if it has a positive margin, and a larger margin can be viewed as a confident correct classification. Results in [1] and =-=[18]-=- show that, loosely speaking, if a combination of classifiers correctly classifies most of the training data with a large margin, then its error probability is small. In [14], Mason, Bartlett and Baxt... |

735 | Improved boosting algorithm using confidence-rated predictions
- Schapire, Singer
- 1998
(Show Context)
Citation Context ...g algorithm [2] have been found to give significant performance improvements over algorithms for the corresponding base classifiers [6, 9, 16, 5], and have led to the study of many related algorithms =-=[3, 19, 12, 17, 7, 11, 8]-=-. Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce large margin classifiers . The margin of an example is defined as the difference bet... |

623 | Greedy function approximation: a gradient boosting machine
- Friedman
- 2001
(Show Context)
Citation Context ...g algorithm [2] have been found to give significant performance improvements over algorithms for the corresponding base classifiers [6, 9, 16, 5], and have led to the study of many related algorithms =-=[3, 19, 12, 17, 7, 11, 8]-=-. Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce large margin classifiers . The margin of an example is defined as the difference bet... |

582 | An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting and Variants - Bauer, Kohavi - 1999 |

472 | An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees
- Dietterich
- 1999
(Show Context)
Citation Context ...oduction There has been considerable interest recently in voting methods for pattern classication, which predict the label of a particular example using a weighted vote over a set of base classiers [1=-=0, 2, 6, 9, 16, 5, 3, 19, 12, 17, 7, 11,-=- 8]. Recent theoretical results suggest that the eectiveness of these algorithms is due to their tendency to produce large margin classiers [1, 18]. Loosely speaking, if a combination of classiers cor... |

287 | Bagging, boosting, and c4.5
- Quinlan
- 1996
(Show Context)
Citation Context ..., Freund and Schapire's AdaBoost algorithm [10] and Breiman's Bagging algorithm [2] have been found to give significant performance improvements over algorithms for the corresponding base classifiers =-=[6, 9, 16, 5]-=-, and have led to the study of many related algorithms [3, 19, 12, 17, 7, 11, 8]. Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce larg... |

274 | Soft margins for AdaBoost
- Rätsch, Onoda, et al.
- 2001
(Show Context)
Citation Context |

182 | The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network
- Bartlett
- 1998
(Show Context)
Citation Context ...e confidence of correct classification: an example is classified correctly if and only if it has a positive margin, and a larger margin can be viewed as a confident correct classification. Results in =-=[1]-=- and [18] show that, loosely speaking, if a combination of classifiers correctly classifies most of the training data with a large margin, then its error probability is small. In [14], Mason, Bartlett... |

146 | Prediction games and arcing algorithms
- Breiman
- 1999
(Show Context)
Citation Context ...oduction There has been considerable interest recently in voting methods for pattern classication, which predict the label of a particular example using a weighted vote over a set of base classiers [1=-=0, 2, 6, 9, 16, 5, 3, 19, 12, 17, 7, 11,-=- 8]. Recent theoretical results suggest that the eectiveness of these algorithms is due to their tendency to produce large margin classiers [1, 18]. Loosely speaking, if a combination of classiers cor... |

106 | Boosting in the limit: Maximizing the margin of learned ensembles
- Grove, Schuurmans
- 1998
(Show Context)
Citation Context ...ce combinations involving very large numbers of classifiers. However, recent studies have shown that this is not the case, even for base classifiers as simple as decision stumps. Grove and Schuurmans =-=[13]-=- demonstrated that running AdaBoost for hundreds of thousands of rounds can lead to significant overfitting, while a number of authors (e.g., [5, 17]) showed that, by adding label noise, overfitting c... |

92 |
Boosting decision trees
- Drucker, Cortes
- 1996
(Show Context)
Citation Context ..., Freund and Schapire's AdaBoost algorithm [10] and Breiman's Bagging algorithm [2] have been found to give significant performance improvements over algorithms for the corresponding base classifiers =-=[6, 9, 16, 5]-=-, and have led to the study of many related algorithms [3, 19, 12, 17, 7, 11, 8]. Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce larg... |

87 | An adaptive version of the boost by majority algorithm
- Freund
- 1999
(Show Context)
Citation Context |

87 | An empirical evaluation of bagging and boosting - Maclin, Opitz - 1997 |

69 | Improved generalization through explicit optimization of margins
- Mason, Bartlett, et al.
- 1998
(Show Context)
Citation Context ...ation. Results in [1] and [18] show that, loosely speaking, if a combination of classifiers correctly classifies most of the training data with a large margin, then its error probability is small. In =-=[14]-=-, Mason, Bartlett and Baxter have presented improved upper bounds on the misclassification probability of a combined classifier in terms of the average over the training data of a certain cost functio... |

49 | Boosting algorithms as gradient descent in function space - Mason, Baxter, et al. - 1999 |

25 | Boosting the margin: a new explanation for the e ectiveness of voting methods - Schapire, Freund, et al. - 1998 |

23 | A geometric approach to leveraging weak learners
- DUFFY, HELMBOLD
- 1999
(Show Context)
Citation Context |

23 | Training methods for adaptive boosting of neural networks - Schwenk, Bengio - 1998 |

21 | Improved boosting algorithms using condence-rated predictions - Schapire, Singer - 1999 |

14 |
The sample complexity of pattern classi with neural networks: the size of the weights is more important than the size of the network
- Bartlett
- 1998
(Show Context)
Citation Context ...base classiers [10, 2, 6, 9, 16, 5, 3, 19, 12, 17, 7, 11, 8]. Recent theoretical results suggest that the eectiveness of these algorithms is due to their tendency to produce large margin classiers [1,=-= 1-=-8]. Loosely speaking, if a combination of classiers correctly classies most of the training data with a large margin, then its error probability is small. In [14] we gave improved upper bounds on the ... |

12 | The sample complexity of pattern classi cation with neural networks: the size of the weights is more important than the size of the network - Bartlett - 1998 |

1 | A geometric approach toleveraging weak learners - y, Helmbold - 1999 |