• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. (1999)

by E Bauer, R Kohavi
Venue:Machine Learning,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 707
Next 10 →

Induction of Decision Trees

by J. R. Quinlan - MACH. LEARN , 1986
"... The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such syste ..."
Abstract - Cited by 4377 (4 self) - Add to MetaCart
The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.

Random forests

by Leo Breiman, E. Schapire - Machine Learning , 2001
"... Abstract. Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the fo ..."
Abstract - Cited by 3613 (2 self) - Add to MetaCart
Abstract. Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

Boosting the margin: A new explanation for the effectiveness of voting methods

by Robert E. Schapire, Yoav Freund, Peter Bartlett, Wee Sun Lee - IN PROCEEDINGS INTERNATIONAL CONFERENCE ON MACHINE LEARNING , 1997
"... One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this ..."
Abstract - Cited by 897 (52 self) - Add to MetaCart
One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximum number of votes received by any incorrect label. We show that techniques used in the analysis of Vapnik’s support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins of the training examples. Finally, we compare our explanation to those based on the bias-variance decomposition.

A Short Introduction to Boosting

by Yoav Freund, Robert E. Schapire , 1999
"... ..."
Abstract - Cited by 800 (5 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...a, overly complex weak hypotheses or weak hypotheses which are too weak. Boosting seems to be especially susceptible to noise [10]. AdaBoost has been tested empirically by many researchers, including =-=[2, 10, 12, 21, 25, 28, 36]-=-. For instance, Freund and Schapire [16] tested AdaBoost on a set of UCI benchmark datasets [27] using C4.5 [29] as a weak learning algorithm, as well as an algorithm whichsFigure 5: A sample of the e...

Ensemble Methods in Machine Learning

by Thomas G. Dietterich - MULTIPLE CLASSIFIER SYSTEMS, LBCS-1857 , 2000
"... Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boostin ..."
Abstract - Cited by 625 (3 self) - Add to MetaCart
Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.

An experimental comparison of three methods for constructing ensembles of decision trees

by Thomas G. Dietterich, Doug Fisher - Bagging, boosting, and randomization. Machine Learning , 2000
"... Abstract. Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base ” learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approac ..."
Abstract - Cited by 610 (6 self) - Add to MetaCart
Abstract. Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base ” learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approach to generating an ensemble is to randomize the internal decisions made by the base algorithm. This general approach has been studied previously by Ali and Pazzani and by Dietterich and Kong. This paper compares the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5. The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting. In situations with substantial classification noise, bagging is much better than boosting, and sometimes better than randomization.
(Show Context)

Citation Context

... voting the decisions of the individual classifiers in the ensemble. Many authors have demonstrated significant performance improvements through ensemble methods (Breiman, 1996b; Kohavi & Kunz, 1997; =-=Bauer & Kohavi, 1999-=-; Maclin & Opitz, 1997). Two of the most popular techniques for constructing ensembles are boostrap aggregation (“bagging”; Breiman, 1996a) and the Adaboost family of algorithms (“boosting”; Freund & ...

The Boosting Approach to Machine Learning: An Overview

by Robert E. Schapire , 2002
"... Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, this chapter overviews some of the recent work on boosting including analyses of AdaBoost's training error and generalization error; boosting's connecti ..."
Abstract - Cited by 440 (16 self) - Add to MetaCart
Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, this chapter overviews some of the recent work on boosting including analyses of AdaBoost's training error and generalization error; boosting's connection to game theory and linear programming; the relationship between boosting and logistic regression; extensions of AdaBoost for multiclass classification problems; methods of incorporating human knowledge into boosting; and experimental and applied work using boosting.

The Foundations of Cost-Sensitive Learning

by Charles Elkan - In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence , 2001
"... This paper revisits the problem of optimal learning and decision-making when different misclassification errors incur different penalties. We characterize precisely but intuitively when a cost matrix is reasonable, and we show how to avoid the mistake of defining a cost matrix that is economically i ..."
Abstract - Cited by 402 (6 self) - Add to MetaCart
This paper revisits the problem of optimal learning and decision-making when different misclassification errors incur different penalties. We characterize precisely but intuitively when a cost matrix is reasonable, and we show how to avoid the mistake of defining a cost matrix that is economically incoherent. For the two-class case, we prove a theorem that shows how to change the proportion of negative examples in a training set in order to make optimal cost-sensitive classification decisions using a classifier learned by a standard non-costsensitive learning method. However, we then argue that changing the balance of negative and positive training examples has little effect on the classifiers produced by standard Bayesian and decision tree learning methods. Accordingly, the recommended way of applying one of these methods in a domain with differing misclassification costs is to learn a classifier from the training set as given, and then to compute optimal decisions ...

Object Detection in Images by Components

by Anuj Mohan , 1999
"... In this paper we present a component based person detection system that is capable of detecting frontal, rear and near side views of people, and partially occluded persons in cluttered scenes. The framework that is described here for people is easily applied to other objects as well. The motivatio ..."
Abstract - Cited by 319 (13 self) - Add to MetaCart
In this paper we present a component based person detection system that is capable of detecting frontal, rear and near side views of people, and partially occluded persons in cluttered scenes. The framework that is described here for people is easily applied to other objects as well. The motivation for developing a component based approach istwofold: rst, to enhance the performance of person detection systems on frontal and rear views of people and second, to develop a framework that directly addresses the problem of detecting people who are partially occluded or whose body parts blend in with the background. The data classi cation is handled by several support vector machine classi ers arranged in two layers. This architecture is known as Adaptive Combination of Classi ers (ACC). The system performs very well and is capable of detecting people even when all components of a person are not found. The performance of the system is signi cantly better than a full body

Popular ensemble methods: an empirical study

by David Opitz, Richard Maclin - Journal of Artificial Intelligence Research , 1999
"... An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Baggi ..."
Abstract - Cited by 296 (4 self) - Add to MetaCart
An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Bagging (Breiman, 1996c) and Boosting (Freund & Schapire, 1996; Schapire, 1990) are two relatively new but popular methods for producing ensembles. In this paper we evaluate these methods on 23 data sets using both neural networks and decision trees as our classification algorithm. Our results clearly indicate a number of conclusions. First, while Bagging is almost always more accurate than a single classifier, it is sometimes much less accurate than Boosting. On the other hand, Boosting can create ensembles that are less accurate than a single classifier – especially when using neural networks. Analysis indicates that the performance of the Boosting methods is dependent on the characteristics of the data set being examined. In fact, further results show that Boosting ensembles may overfit noisy data sets, thus decreasing its performance. Finally, consistent with previous studies, our work suggests that most of the gain in an ensemble’s performance comes in the first few classifiers combined; however, relatively large gains can be seen up to 25 classifiers when Boosting decision trees. 1.
(Show Context)

Citation Context

...tworks. c○1999 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.sOpitz & Maclin Previous work has demonstrated that Bagging and Boosting are very effective for decision trees (=-=Bauer & Kohavi, 1999-=-; Drucker & Cortes, 1996; Breiman, 1996c, 1996b; Freund & Schapire, 1996; Quinlan, 1996); however, there has been little empirical testing with neural networks (especially with the new Boosting algori...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University