Results 1 - 10
of
24
Improved Boosting Algorithms Using Confidence-rated Predictions
- MACHINE LEARNING
, 1999
"... We describe several improvements to Freund and Schapire’s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a simplified analysis of AdaBoost in this setting, and we show how this analysis can be used to find impr ..."
Abstract
-
Cited by 561 (23 self)
- Add to MetaCart
We describe several improvements to Freund and Schapire’s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a simplified analysis of AdaBoost in this setting, and we show how this analysis can be used to find improved parameter settings as well as a refined criterion for training weak hypotheses. We give a specific method for assigning confidences to the predictions of decision trees, a method closely related to one used by Quinlan. This method also suggests a technique for growing decision trees which turns out to be identical to one proposed by Kearns and Mansour. We focus next on how to apply the new boosting algorithms to multiclass classification problems, particularly to the multi-label case in which each example may belong to more than one class. We give two boosting methods for this problem, plus a third method based on output coding. One of these leads to a new method for handling the single-label case which is simpler but as effective as techniques suggested by Freund and Schapire. Finally, we give some experimental results comparing a few of the algorithms discussed in this paper.
FloatBoost Learning and Statistical Face Detection
- Ieee Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... A novel learning procedure, called FloatBoost, is proposed for learning a boosted classifier for achieving the minimum error rate. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate directly, rather than minimizing an exponential fun ..."
Abstract
-
Cited by 93 (3 self)
- Add to MetaCart
A novel learning procedure, called FloatBoost, is proposed for learning a boosted classifier for achieving the minimum error rate. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate directly, rather than minimizing an exponential function of the margin as in the traditional AdaBoost algorithms. A second contribution of the paper is a novel statistical model for learning best weak classifiers using a stagewise approximation of the posterior probability. These novel techniques lead to a classifier which requires fewer weak classifiers than AdaBoost yet achieves lower error rates in both training and testing, as demonstrated by extensive experiments. Applied to face detection, the FloatBoost learning method, together with a proposed detector pyramid architecture, leads to the first real-time multiview face detection system reported.
CBSA: Content-based Soft Annotation for Multimodal Image Retrieval Using Bayes Point Machines
- IEEE Transactions on Circuits and Systems for Video Technology
, 2003
"... We propose a content-based soft annotation (CBSA) procedure for providing images with semantical labels. The annotation procedure starts with labeling a small set of training images, each with one single semantical label (e.g., forest, animal, or sky). An ensemble of binary classifiers is then train ..."
Abstract
-
Cited by 71 (6 self)
- Add to MetaCart
We propose a content-based soft annotation (CBSA) procedure for providing images with semantical labels. The annotation procedure starts with labeling a small set of training images, each with one single semantical label (e.g., forest, animal, or sky). An ensemble of binary classifiers is then trained for predicting label membership for images. The trained ensemble is applied to each individual image to give the image multiple soft labels, and each label is associated with a label membership factor. To select a base binary-classifier for CBSA, we experiment with two learning methods, Support Vector Machines (SVMs) and Bayes Point Machines (BPMs, and compare their class-prediction accuracy. Our empirical study on a 116-category 25K-image set shows that the BPM-based ensemble provides better annotation quality than the SVM-based ensemble for supporting multimodal image retrievals. Keywords: Bayes Point Machines, Support Vector Machines, image annotation, multimodal image retrieval.
The application of AdaBoost for distributed, scalable and online learning
- Pages 362–366 of: SIGKDD Conference on Knowledge and Data Mining (KDD
, 1999
"... We propose to use AdaBoost to efficiently learn classifiers over very large and possibly distributed data sets that cannot fit into main memory, as well as on-line learning where new data become available periodically. We propose two new ways to apply AdaBoost. The first allows the use of a small sa ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
We propose to use AdaBoost to efficiently learn classifiers over very large and possibly distributed data sets that cannot fit into main memory, as well as on-line learning where new data become available periodically. We propose two new ways to apply AdaBoost. The first allows the use of a small sample of the weighted training set to compute a weak hypothesis. The second approach involves using AdaBoost as a means to re-weight classifiers in an ensemble, and thus to reuse previously computed classifiers along with new classifier computed on a new increment of data. These two techniques of using AdaBoost provide scalable, distributed and on-line learning. We discuss these methods and their implementation in JAM, an agent-based learning system. Empirical studies on four real world and artifical data sets have shown results that are either comparable to or better than learning classifiers over the complete training set and, in some cases, are comparable to boosting on the complete data set. However, our algorithms use much smaller samples of the training set and require much less memory.
The Consistency of Greedy Algorithms for Classification
- In Proceedings of the 15th Annual Conference on Computational Learning Theory
, 2002
"... We consider a class of algorithms for classification, which are based on sequential greedy minimization of a convex upper bound on the 0 - 1 loss function. A large class of recently popular algorithms falls within the scope of this approach, including many variants of Boosting algorithms. The ba ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
We consider a class of algorithms for classification, which are based on sequential greedy minimization of a convex upper bound on the 0 - 1 loss function. A large class of recently popular algorithms falls within the scope of this approach, including many variants of Boosting algorithms. The basic question adckessed in this paper relates to the statistical consistency of such approaches. We provide precise conditions which guarantee that sequential greedy procedures are consistent, and establish rates of convergence under the assumption that the Bayes decision boundary belongs to a certain class of smooth functions. The results are established using a form of regularization which constrains the search space at each iteration of the algorithm. In addition to providing general consistency results, we provide rates of convergence for smooth decision boundaries. A particularly interesting conclusion of our work is that Logistic function based Boosting provides faster rates of convergence than Boosting based on the exponential function used in AdaBoost.
Consistency of Random Forests and Other Averaging Classifiers
"... In the last years of his life, Leo Breiman promoted random forests for use in classification. He suggested using averaging as a means of obtaining good discrimination rules. The base classifiers used for averaging are simple and randomized, often based on random samples from the data. He left a few ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
In the last years of his life, Leo Breiman promoted random forests for use in classification. He suggested using averaging as a means of obtaining good discrimination rules. The base classifiers used for averaging are simple and randomized, often based on random samples from the data. He left a few questions unanswered regarding the consistency of such rules. In this paper, we give a number of theorems that establish the universal consistency of averaging rules. We also show that some popular classifiers, including one suggested by Breiman, are not universally consistent.
Ensemble-based discriminant learning with boosting for face recognition
- IEEE Transactions on Neural Networks
, 2006
"... In this paper, we propose a novel ensemble-based approach to boost performance of traditional Linear Discriminant Analysis (LDA)-based methods used in face recognition. The ensemble-based approach is based on the recently emerged technique known as “boosting”. However, it is generally believed that ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper, we propose a novel ensemble-based approach to boost performance of traditional Linear Discriminant Analysis (LDA)-based methods used in face recognition. The ensemble-based approach is based on the recently emerged technique known as “boosting”. However, it is generally believed that boosting-like learning rules are not suited to a strong and stable learner such as LDA. To break the limitation, a novel weakness analysis theory is developed here. The theory attempts to boost a strong learner by increasing the diversity between the classifiers created by the learner, at the expense of decreasing their margins, so as to achieve a trade-off suggested by recent boosting studies for a low generalization error. In addition, a novel distribution accounting for the pairwise class dis-criminant information is introduced for effective interaction between the booster and the LDA-based learner. The integration of all these methodologies proposed here leads to the novel ensemble-based discriminant learning approach, capable of taking advantage of both the boosting and LDA techniques. Promising experimental results obtained on various difficult face recognition scenarios demonstrate the effectiveness of the proposed approach. We believe that this work is especially beneficial in extending the boosting framework to accommodate general (strong/weak) learners.
Cost-Sensitive Boosting
, 2007
"... A novel framework, based on the statistical interpretation of boosting, is proposed for the design of cost sensitive boosting algorithms. It is argued that, although predictors produced with boosting converge to the ratio of posterior class probabilities that also appears in Bayes decision rule, thi ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
A novel framework, based on the statistical interpretation of boosting, is proposed for the design of cost sensitive boosting algorithms. It is argued that, although predictors produced with boosting converge to the ratio of posterior class probabilities that also appears in Bayes decision rule, this convergence only occurs in a small neighborhood of the optimal cost-insensitive classification boundary. This is due to a combination of the cost-insensitive nature of current boosting losses, and boosting’s sample reweighing mechanism. It is then shown that convergence in the neighborhood of a target cost-sensitive boundary can be achieved through boosting-style minimization of extended, cost-sensitive, losses. The framework is applied to the design of specific algorithms, by introduction of cost-sensitive extensions of the exponential and binomial losses. Minimization of these losses leads to cost sensitive extensions of the popular AdaBoost, RealBoost, and LogitBoost algorithms. Experimental validation, on various UCI datasets and the computer vision problem of face detection, shows that the new algorithms substantially improve performance over what was achievable with previous cost-sensitive boosting approaches. Author
Minimum Majority Classification and Boosting
- American Association for Artificial Intelligence,2002
, 2002
"... Motivated by a theoretical analysis of the generalization of boosting, we examine learning algorithms that work by trying to fit data using a simple majority vote over a small number of a collection of hypotheses. We provide experimental evidence that an algorithm based on this principle outputs ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Motivated by a theoretical analysis of the generalization of boosting, we examine learning algorithms that work by trying to fit data using a simple majority vote over a small number of a collection of hypotheses. We provide experimental evidence that an algorithm based on this principle outputs hypotheses that often generalize nearly as well as those output by boosting, and sometimes better. We also provide experimental evidence for an additional reason that boosting algorithms generalize well, that they take advantage of cases in which there are many simple hypotheses with independent errors.

