Results 1 - 10
of
12
Ensemble Methods in Machine Learning
- MULTIPLE CLASSIFIER SYSTEMS, LBCS-1857
, 2000
"... Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boostin ..."
Abstract
-
Cited by 339 (2 self)
- Add to MetaCart
Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.
Boosting Algorithms as Gradient Descent
, 2000
"... Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier h ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier having large margins on the training data. We present an abstract algorithm for finding linear combinations of functions that minimize arbitrary cost functionals (i.e functionals that do not necessarily depend on the margin). Many existing voting methods can be shown to be special cases of this abstract algorithm. Then, following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such cost functions. Experiments on
Boosting as Entropy Projection
, 1999
"... We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be seen ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost 's choice of the new distribution can be seen as an approximate solution to the following problem: Find a new distribution that is closest to the old distribution subject to the constraint that the new distribution is orthogonal to the vector of mistakes of the current weak hypothesis. The distance (or divergence) between distributions is measured by the relative entropy. Alternatively, we could say that AdaBoost approximately projects the distribution vector onto a hyperplane dened by the mistake vector. We show that this new view of AdaBoost as an entropy projection is dual to the usual view of AdaBoost as minimizing the normalization factors of the updated distributions.
Boosting Applied to Word Sense Disambiguation
- IN PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON MACHINE LEARNING
, 2000
"... In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplar-based approaches, which represent state-of- ..."
Abstract
-
Cited by 47 (8 self)
- Add to MetaCart
In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are studied. The best variant, which we call LazyBoosting, is tested on the largest sense--tagged corpus available containing 192,800 examples of the 191 most frequent and ambiguous English words. Again, boosting compares favourably to the other benchmark algorithms.
Bagging and Boosting a Treebank Parser
, 2000
"... Bagging and boosting, two effective machine learning techniques, are applied to natural language parsing. Experiments using these techniques with a trainable statistical parser are described. The best resulting system provides roughly as large of a gain in F-measure as doubling the corpus size. Erro ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Bagging and boosting, two effective machine learning techniques, are applied to natural language parsing. Experiments using these techniques with a trainable statistical parser are described. The best resulting system provides roughly as large of a gain in F-measure as doubling the corpus size. Error analysis of the result of the boosting technique reveals some inconsistent annotations in the Penn Treebank, suggesting a semi-automatic method for finding inconsistent treebank annotations.
Margin maximization with feed-forward neural networks: a comparative study with SVM and AdaBoost
, 2004
"... ..."
Maximizing the Margin with Feed-forward Neural Networks
, 2002
"... Feed-forward Neural Networks (FNNs) and Support Vector Machines (SVMs) are two machine learning frameworks developed from very different starting points of view. In this work a new learning model for FNNs is proposed such that, in the linearly separable case, tends to obtain the same solution that S ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Feed-forward Neural Networks (FNNs) and Support Vector Machines (SVMs) are two machine learning frameworks developed from very different starting points of view. In this work a new learning model for FNNs is proposed such that, in the linearly separable case, tends to obtain the same solution that SVMs. The key idea of the model is a weighting of the sum-of-squares error function, which is inspired in the AdaBoost algorithm. The model depends on a parameter that controls the hardness of the margin, as in SVMs, so that it can be used for the non-linearly separable case as well. In addition, it allows to deal with multiclass and multilabel problems in a natural way (as FNNs usually do), and it is not restricted to the use of kernel functions. Finally, it is independent of the concrete algorithm used to minimize the error function. Both theoretic and experimental results are shown to confirm these ideas.
Complexity in the Case against Accuracy Estimation
, 2002
"... Some authors haverepeatedl pointed out that the use of the accuracy, inparticulq for comparingclgN -,::LN is not adequate. The main argument concerns some assumptions ofsel-I 11 vall,N or correctnessunderlnes the use of this criterion. In this paper, we study the computational burden of the accura ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Some authors haverepeatedl pointed out that the use of the accuracy, inparticulq for comparingclgN -,::LN is not adequate. The main argument concerns some assumptions ofsel-I 11 vall,N or correctnessunderlnes the use of this criterion. In this paper, we study the computational burden of the accuracy's replacy'sN forbuil:I# and comparingclaringNqP using 13 the framework of Inductive Logic Programming.Replamming is investigated in three ways: complIIIL of the accuracy with anadditional requirement,replrement of the accuracy with 15 bi-criterionrecentl introduced fromstatistical decision theory: the Receiver Operating Characteristicanalisti andrepl,I'NG# of the accuracy by asingl criterion. We prove very hard 17 resul, for most of thepossibl repllONG##I A #rstresul shows thataltNq': the arbitrary multraryNIII' ofcl-IPq appears to betotalq uselq# "Arbitrary" is to be taken in its broadest 19 meaning, inparticul# exponential The second point is the sudden appearance of the negative resuli which is not a function of the criteria's demands. The third point is theequivalNGin 21 di#culN of al these di#erent criteria. In contrast, thesingl accuracy's optimization appears to be tractabl in this framework. 23 c 2002Publ-LL: byEl-L:-O Science B.V. 1. I936361108 An essential task of Machine Learning (ML) and Data Mining (DM) systems is relIII to cl#L-#NGPIOIN ThisbasicalO consists in giving the most accurate answer 27 #TelN +33-596-72-73-64; fax: +33-596-72-73-62.

