Results 11 - 20
of
55
Prediction Games and Arcing Algorithms
, 1997
"... The theory behind the success of adaptive reweighting and combining algorithms (arcing) such as Adaboost (Freund and Schapire [1995].[1996]) and others in reducing generalization error has not been well understood. By formulating prediction, both classification and regression, as a game where one pl ..."
Abstract
-
Cited by 113 (0 self)
- Add to MetaCart
The theory behind the success of adaptive reweighting and combining algorithms (arcing) such as Adaboost (Freund and Schapire [1995].[1996]) and others in reducing generalization error has not been well understood. By formulating prediction, both classification and regression, as a game where one player makes a selection from instances in the training set and the other a convex linear combination of predictors from a finite set, existing arcing algorithms are shown to be algorithms for finding good game strategies. An optimal game strategy finds a combined predictor that minimizes the maximum of the error over the training set. A bound on the generalization error for the combined predictors in terms of their maximum error is proven that is sharper than bounds to date. Arcing algorithms are described that converge to the optimal strategy. Schapire et.al. [1997] offered an explanation of why Adaboost works in terms of its ability to reduce the margin. Comparing Adaboost to our optimal ar...
Boosting Algorithms as Gradient Descent
, 2000
"... Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier h ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier having large margins on the training data. We present an abstract algorithm for finding linear combinations of functions that minimize arbitrary cost functionals (i.e functionals that do not necessarily depend on the margin). Many existing voting methods can be shown to be special cases of this abstract algorithm. Then, following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such cost functions. Experiments on
An empirical evaluation of bagging and boosting
- In Proceedings of the Fourteenth National Conference on Artificial Intelligence
, 1997
"... An ensemble consists of a set of independently trained classi ers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble as a whole is often more accurate than any of the single classiers in the ensemb ..."
Abstract
-
Cited by 80 (6 self)
- Add to MetaCart
An ensemble consists of a set of independently trained classi ers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble as a whole is often more accurate than any of the single classiers in the ensemble. Bagging (Breiman 1996a) and Boosting (Freund & Schapire 1996) are two relatively new but popular methods for producing ensembles. In this paper we evaluate these methods using both neural networks and decision trees as our classi cation algorithms. Our results clearly showtwo important facts. The rst is that even though Bagging almost always produces a better classi er than any of its individual component classi ers and is relatively impervious to over tting, it does not generalize any better than a baseline neural-network ensemble method. The second is that Boosting is apowerful technique that can usually produce better ensembles than Bagging � however, it is more susceptible to noise and can quickly over t a data set.
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract
-
Cited by 65 (4 self)
- Add to MetaCart
To Mom, Dad, and Susan, for their support and encouragement.
Linear and Order Statistics Combiners for Pattern Classification
- Combining Artificial Neural Nets
, 1999
"... Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification resul ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the "added" error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.
Actively Searching for an Effective Neural-Network Ensemble
- CONNECTION SCIENCE
, 1996
"... A neural-network ensemble is a very successful technique where the outputs of a set of separately trained neural network are combined to form one unified prediction. An effective ensemble should consist of a set of networks that are not only highly correct, but ones that make their errors on differe ..."
Abstract
-
Cited by 54 (6 self)
- Add to MetaCart
A neural-network ensemble is a very successful technique where the outputs of a set of separately trained neural network are combined to form one unified prediction. An effective ensemble should consist of a set of networks that are not only highly correct, but ones that make their errors on different parts of the input space as well; however, most existing techniques only indirectly address the problem of creating such a set. We present an algorithm called Addemup that uses genetic algorithms to explicitly search for a highly diverse set of accurate trained networks. Addemup works by first creating an initial population, then uses genetic operators to continually create new networks, keeping the set of networks that are highly accurate while disagreeing with each other as much as possible. Experiments on four real-world domains show that Addemup is able to generate a set of trained networks that is more accurate than several existing ensemble approaches. Experiments also show that Ad...
Theoretical Views of Boosting and Applications
, 1999
"... . Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, we briefly survey theoretical work on boosting including analyses of AdaBoost's training error and generalization error, connections between boosting and game theo ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
. Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, we briefly survey theoretical work on boosting including analyses of AdaBoost's training error and generalization error, connections between boosting and game theory, methods of estimating probabilities using boosting, and extensions of AdaBoost for multiclass classification problems. Some empirical work and applications are also described. Background Boosting is a general method which attempts to "boost" the accuracy of any given learning algorithm. Kearns and Valiant [29, 30] were the first to pose the question of whether a "weak" learning algorithm which performs just slightly better than random guessing in Valiant's PAC model [44] can be "boosted" into an arbitrarily accurate "strong" learning algorithm. Schapire [36] came up with the first provable polynomial-time boosting algorithm in 1989. A year later, Freund [16] developed a much more effici...
Boosting and hard-core sets
- In Proceedings of the Fortieth Annual Symposium on Foundations of Computer Science
, 1999
"... This paper connects two fundamental ideas from theoretical computer science: hard-core set construction, a type of hardness amplification from computational complexity, and boosting, a technique from computational learning theory. Using this connection we give fruitful applications of complexity-the ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
This paper connects two fundamental ideas from theoretical computer science: hard-core set construction, a type of hardness amplification from computational complexity, and boosting, a technique from computational learning theory. Using this connection we give fruitful applications of complexity-theoretic techniques to learning theory and vice versa. We show that the hard-core set construction of Impagliazzo [15], which establishes the existence of distributions under which boolean functions are highly inapproximable, may be viewed as a boosting algorithm. Using alternate boosting methods we give an improved bound for hard-core set construction which matches known lower bounds from boosting and thus is optimal within this class of techniques. We then show how to apply techniques from [15] to give a new version of Jackson’s celebrated Harmonic Sieve algorithm for learning DNF formulae under the uniform distribution using membership queries. Our new version has a significant asymptotic improvement in running time. Critical to our arguments is a careful analysis of the distributions which are employed in both boosting and hard-core set constructions.
Boosting Algorithms as Gradient Descent in Function Space
, 1999
"... Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier h ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier having large margins on the training data. We present abstract algorithms for finding linear and convex combinations of functions that minimize arbitrary cost functionals (i.e functionals that do not necessarily depend on the margin). Many existing voting methods can be shown to be special cases of these abstract algorithms. Then, following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such cost functions. Experiments on several data sets from the UC Irvine repository demonstrate that DOOM II gener...
Combinations of Weak Classifiers
, 1997
"... To obtain classification systems with both good generalizatìon performance and efficiency in space and time, we propose a learning method based on combinations of weak classifiers, where weak classifiers are linear classifiers (perceptrons) which can do a little better than making random guesses. A ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
To obtain classification systems with both good generalizatìon performance and efficiency in space and time, we propose a learning method based on combinations of weak classifiers, where weak classifiers are linear classifiers (perceptrons) which can do a little better than making random guesses. A randomized algorithm is proposed to find the weak classifiers. They are then combined through a majority vote. As demonstrated through systematic experiments, the method developed is able to obtain combinations of weak classifiers with good generalization performance and a fast training time on a variety of test problems and real applications. Theoretical analysis on one of the test problems investigated in our experiments provides insights on when and why the proposed method works. In particular, when the strength of weak classifiers is properly chosen, combinations of weak classifiers can achieve a good generalization performance with polynomial space- and time-complexity.

