Results 1 -
8 of
8
Soft Margins for AdaBoost
, 1998
"... Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this ..."
Abstract
-
Cited by 199 (22 self)
- Add to MetaCart
Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this fact is the margin distribution and we find that AdaBoost achieves -- doing gradient descent in an error function with respect to the margin -- asymptotically a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns (here an interesting overlap emerge to Support Vectors). This is clearly a sub-optimal strategy in the noisy case, and regularization, i.e. a mistrust in the data, must be introduced in the algorithm to alleviate the distortions that a difficult pattern (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original AdaBoost algorithm to achieve a soft margin -- a ...
An introduction to boosting and leveraging
- Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
Barrier Boosting
"... Boosting algorithms like AdaBoost and Arc-GV are iterative strategies to minimize a constrained objective function, equivalent to Barrier algorithms. ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
Boosting algorithms like AdaBoost and Arc-GV are iterative strategies to minimize a constrained objective function, equivalent to Barrier algorithms.
Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces
, 2000
"... We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combina- tions of base hypotheses generated by some boosting-type base learning algorithm. Unlike the classification case, for regression the set of possible h ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combina- tions of base hypotheses generated by some boosting-type base learning algorithm. Unlike the classification case, for regression the set of possible hypotheses producible by the base learning algorithm may be infinite. We explicitly tackle the issue of how to define and solve ensemble regression when the hypothesis space is infinite. Our approach is based on a semi-infinite linear program that has an infinite number of constraints and a finite number of variables. We show that the regression problem is well posed for infinite hypothesis spaces in both the primal and dual spaces. Most importantly, we prove there exists an optimal solution to the infinite hypothesisspace problem consisting of a finite number of hypothesis. We propose two algorithms for solving the infinite and finite hypothesis problems. One uses a column generation simplex-type algorithm and the other adopts an exponential barrier approach. Furthermore, we give sufficient conditions for the base learning algorithm and the hypothesis set to be used for infinite regression ensembles. Computational resultsshow that these methods are extremely promising.
Boosting Methods for Regression
- Machine Learning
, 200
"... In this paper we examine ensemble methods for regression that leverage or “boost” base regressors by iteratively calling them on modified samples. The most successful leveraging algorithm for classification is AdaBoost, an algorithm that requires only modest assumptions on the base learning method ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper we examine ensemble methods for regression that leverage or “boost” base regressors by iteratively calling them on modified samples. The most successful leveraging algorithm for classification is AdaBoost, an algorithm that requires only modest assumptions on the base learning method for its strong theoretical guarantees. We present several gradient descent leveraging algorithms for regression and prove AdaBoost-style bounds on their sample errors using intuitive assumptions on the base learners. We bound the complexity of the regression functions produced in order to derive PAC-style bounds on their generalization errors. Experiments validate our theoretical results.
Robust Regression by Boosting the Median
"... Most boosting regression algorithms use the weighted average of base regressors as their final regressor. In this paper we analyze the choice of the weighted median. We propose a general boosting algorithm based on this approach. ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Most boosting regression algorithms use the weighted average of base regressors as their final regressor. In this paper we analyze the choice of the weighted median. We propose a general boosting algorithm based on this approach.
Boosting Regression via Classification
, 1998
"... Boosting strategies are methods of improving the accuracy of a prediction (a classi\Thetacation rule) by combining many #weaker# predictions, each of which is only moderately accurate. In this paper we present a concise analysis of the Freund and Shapire's AdaBoost algorithm [FS97] from which we de ..."
Abstract
- Add to MetaCart
Boosting strategies are methods of improving the accuracy of a prediction (a classi\Thetacation rule) by combining many #weaker# predictions, each of which is only moderately accurate. In this paper we present a concise analysis of the Freund and Shapire's AdaBoost algorithm [FS97] from which we derive a new boosting strategy for the regression case which is an extension of the algorithm discussed in [BCP97]. 1 Boosting classi\Thetacation Classi\Thetacation refers in general to the problem of predicting a label in a \Thetanite set L for each element of a set I of instances according to some relationship between instances and labels that can be thought as an (unknown) deterministic mapping I 7 ! L or as a joint probability distribution over I \Theta L; a prediction is then a mapping I 7 ! L whose error is some measure of the discrepancy between predicted and intended label (according the unknown mapping or the joint distribution). When the cardinality of L is 2, the classi\Thetacation...
Designing a Context Dependent Movie Recommender: A Hierarchical Bayesian Approach
"... c○Daniel Pomerantz, 2009ACKNOWLEDGEMENTS I would like to thank everyone that helped during my graduate studies. I want to thank my supervisor Gregory Dudek for all his support, encouragement, and general kindness. I’d also like to thank all the members of the Mobile Robotics Lab who were very helpfu ..."
Abstract
- Add to MetaCart
c○Daniel Pomerantz, 2009ACKNOWLEDGEMENTS I would like to thank everyone that helped during my graduate studies. I want to thank my supervisor Gregory Dudek for all his support, encouragement, and general kindness. I’d also like to thank all the members of the Mobile Robotics Lab who were very helpful whenever I had a problem, be it math, computers, or otherwise. Finally, I’d like to thank my family and friends for all their support throughout my graduate studies and life. In this thesis, we analyze a context-dependent movie recommendation system using a Hierarchical Bayesian Network. Unlike most other recommender systems which either do not consider context or do so using collaborative filtering, our approach is content-based. This allows users to individually interpret contexts or invent their own contexts and continue to get good recommendations. By using a Hierarchical Bayesian Network, we can provide context recommendations when users have only provided a small amount of information about their preferences per context. At the

