Results 1  10
of
15
Ensemble Methods in Machine Learning
 MULTIPLE CLASSIFIER SYSTEMS, LBCS1857
, 2000
"... Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include errorcorrecting output coding, Bagging, and boostin ..."
Abstract

Cited by 430 (3 self)
 Add to MetaCart
Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include errorcorrecting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.
Boosting Algorithms as Gradient Descent
, 2000
"... Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier h ..."
Abstract

Cited by 114 (2 self)
 Add to MetaCart
Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier having large margins on the training data. We present an abstract algorithm for finding linear combinations of functions that minimize arbitrary cost functionals (i.e functionals that do not necessarily depend on the margin). Many existing voting methods can be shown to be special cases of this abstract algorithm. Then, following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such cost functions. Experiments on
A simple, fast, and effective rule learner
 IN PROCEEDINGS OF ANNUAL CONFERENCE OFAMERICAN ASSOCIATION FOR ARTI CIAL INTELLIGENCE
, 1999
"... We describe SLIPPER, a new rule learner that generates rulesets by repeatedly boosting a simple, greedy, rulebuilder. Like the rulesets built by other rule learners, the ensemble of rules created by SLIPPER is compact and comprehensible. This is made possible by imposing appropriate constraints on ..."
Abstract

Cited by 93 (3 self)
 Add to MetaCart
We describe SLIPPER, a new rule learner that generates rulesets by repeatedly boosting a simple, greedy, rulebuilder. Like the rulesets built by other rule learners, the ensemble of rules created by SLIPPER is compact and comprehensible. This is made possible by imposing appropriate constraints on the rulebuilder, and by use of a recentlyproposed generalization of Adaboost called confidencerated boosting. In spite of its relative simplicity, SLIPPER is highly scalable, and an effiective learner. Experimentally, SLIPPER scales no worse than O(n log n), where n is the number of examples, and on a set of 32 benchmark problems, SLIPPER achieves lower error rates than RIPPER 20 times, and lower error rates than C4.5rules 22 times.
Boosting as Entropy Projection
, 1999
"... We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be seen ..."
Abstract

Cited by 58 (8 self)
 Add to MetaCart
We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be seen as an approximate solution to the following problem: Find a new distribution that is closest to the old distribution subject to the constraint that the new distribution is orthogonal to the vector of mistakes of the current weak hypothesis. The distance (or divergence) between distributions is measured by the relative entropy. Alternatively, we could say that AdaBoost approximately projects the distribution vector onto a hyperplane dened by the mistake vector. We show that this new view of AdaBoost as an entropy projection is dual to the usual view of AdaBoost as minimizing the normalization factors of the updated distributions.
Boosting Applied to Word Sense Disambiguation
 IN PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON MACHINE LEARNING
, 2000
"... In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplarbased approaches, which represent stateof ..."
Abstract

Cited by 54 (8 self)
 Add to MetaCart
In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplarbased approaches, which represent stateoftheart accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are studied. The best variant, which we call LazyBoosting, is tested on the largest sensetagged corpus available containing 192,800 examples of the 191 most frequent and ambiguous English words. Again, boosting compares favourably to the other benchmark algorithms.
Bagging and Boosting a Treebank Parser
, 2000
"... Bagging and boosting, two effective machine learning techniques, are applied to natural language parsing. Experiments using these techniques with a trainable statistical parser are described. The best resulting system provides roughly as large of a gain in Fmeasure as doubling the corpus size. Erro ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Bagging and boosting, two effective machine learning techniques, are applied to natural language parsing. Experiments using these techniques with a trainable statistical parser are described. The best resulting system provides roughly as large of a gain in Fmeasure as doubling the corpus size. Error analysis of the result of the boosting technique reveals some inconsistent annotations in the Penn Treebank, suggesting a semiautomatic method for finding inconsistent treebank annotations.
Boosting Gaussian Mixtures In An LVCSR System
 Proceedings of ICASSP 2000
, 2000
"... In this paper, we apply boosting to the problem of framelevel phone classification, and use the resulting system to perform voicemail transcription. We develop parallel, hierarchical, and restricted versions of the classic AdaBoost algorithm, which enable the technique to be used in largescale spe ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
In this paper, we apply boosting to the problem of framelevel phone classification, and use the resulting system to perform voicemail transcription. We develop parallel, hierarchical, and restricted versions of the classic AdaBoost algorithm, which enable the technique to be used in largescale speech recognition tasks with hundreds of thousands of Gaussians and tens of millions of training frames. We report small but consistent improvements in both frame recognition accuracy and word error rate. 1. INTRODUCTION Boosting is a technique for sequentially training and combining a collection of classifiers in such a way that the later classifiers make up for the deficiencies of the earlier ones. Many variants exist [1, 7, 2, 3], but all follow the same basic strategy. There is a sequence of iterations, and at each iteration a new classifier is trained on a weighted set of the training examples. Initially, every example gets the same weight, but in subsequent iterations, the weights of h...
Margin maximization with feedforward neural networks: a comparative study with SVM and AdaBoost
, 2004
"... ..."
Maximizing the Margin with Feedforward Neural Networks
, 2002
"... Feedforward Neural Networks (FNNs) and Support Vector Machines (SVMs) are two machine learning frameworks developed from very different starting points of view. In this work a new learning model for FNNs is proposed such that, in the linearly separable case, tends to obtain the same solution that S ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Feedforward Neural Networks (FNNs) and Support Vector Machines (SVMs) are two machine learning frameworks developed from very different starting points of view. In this work a new learning model for FNNs is proposed such that, in the linearly separable case, tends to obtain the same solution that SVMs. The key idea of the model is a weighting of the sumofsquares error function, which is inspired in the AdaBoost algorithm. The model depends on a parameter that controls the hardness of the margin, as in SVMs, so that it can be used for the nonlinearly separable case as well. In addition, it allows to deal with multiclass and multilabel problems in a natural way (as FNNs usually do), and it is not restricted to the use of kernel functions. Finally, it is independent of the concrete algorithm used to minimize the error function. Both theoretic and experimental results are shown to confirm these ideas.