Results 1  10
of
13
New Support Vector Algorithms
, 2000
"... this article with the regression case. To explain this, we will introduce a suitable definition of a margin that is maximized in both cases ..."
Abstract

Cited by 461 (42 self)
 Add to MetaCart
this article with the regression case. To explain this, we will introduce a suitable definition of a margin that is maximized in both cases
Boosting Algorithms as Gradient Descent
, 2000
"... Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier h ..."
Abstract

Cited by 152 (1 self)
 Add to MetaCart
Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier having large margins on the training data. We present an abstract algorithm for finding linear combinations of functions that minimize arbitrary cost functionals (i.e functionals that do not necessarily depend on the margin). Many existing voting methods can be shown to be special cases of this abstract algorithm. Then, following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such cost functions. Experiments on
Improved Generalization through Explicit Optimization of Margins
 Machine Learning
, 1999
"... Recent theoretical results have shown that the generalization performance of thresholded convex combinations of base classifiers is greatly improved if the underlying convex combination has large margins on the training data (correct examples are classified well away from the decision boundary). Neu ..."
Abstract

Cited by 71 (4 self)
 Add to MetaCart
Recent theoretical results have shown that the generalization performance of thresholded convex combinations of base classifiers is greatly improved if the underlying convex combination has large margins on the training data (correct examples are classified well away from the decision boundary). Neural network algorithms and AdaBoost have been shown to implicitly maximize margins, thus providing some theoretical justification for their remarkably good generalization performance. In this paper we are concerned with maximizing the margin explicitly. In particular, we prove a theorem bounding the generalization performance of convex combinations in terms of general cost functions of the margin (previous results were stated in terms of the particular cost function sgn(`;margin). We then present an algorithm (DOOM) for directly optimizing a piecewiselinear family of cost functions satisfying the conditions of the theorem. Experiments on several of the datasets in the UC Irvine database are presented in which AdaBoost was used to generate a set of base classifiers and then DOOM was used to find the optimal convex combination of those classifiers. In all but one case the convex combination generated by DOOM had lower test error than AdaBoost's combination. In many cases DOOM achieves these lower test errors by sacrificing training error, in the interests of reducing the new cost function. The margin plots also show that the size of the minimum margin is not relevant to generalization performance.
Direct Optimization of Margins Improves Generalization in Combined Classifiers
 Advances in Neural Information Processing Systems
, 1998
"... Sonar Cumulative training margin distributions for AdaBoost versus our "Direct Optimization Of Margins" (DOOM) algorithm. ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
(Show Context)
Sonar Cumulative training margin distributions for AdaBoost versus our "Direct Optimization Of Margins" (DOOM) algorithm.
Combining protein secondary structure prediction models with ensemble methods of optimal complexity
, 2004
"... ..."
Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants
, 1997
"... There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models provide a probabilistic framework for the discussion of gen ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models provide a probabilistic framework for the discussion of generalization and learning. This survey concentrates on the sample complexity questions in these models; that is, the emphasis is on how many examples should be used for training. Computational complexity considerations are briefly discussed for the basic PAC model. Throughout, the importance of the VapnikChervonenkis dimension is highlighted. Particular attention is devoted to describing how the probabilistic models apply in the context of neural network learning, both for networks with binaryvalued output and for networks with realvalued output.
Sample Complexity of Classifiers Taking Values in R^Q, Application to MultiClass SVMs
"... Bounds on the risk play a crucial role in statistical learning theory. They usually involve as capacity measure of the model studied the VC dimension or one of its extensions. In classification, such VC dimensions exist for models taking values in {0, 1}, [ 1, Q], and R. We introduce the generaliza ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Bounds on the risk play a crucial role in statistical learning theory. They usually involve as capacity measure of the model studied the VC dimension or one of its extensions. In classification, such VC dimensions exist for models taking values in {0, 1}, [ 1, Q], and R. We introduce the generalizations appropriate for the missing case, the one of models with values in RQ. This provides us with a new guaranteed risk for MSVMs. For those models, a sharper bound is obtained by using the Rademacher complexity.
Bayesian Classiers are Large Margin Hyperplanes in a Hilbert Space
 Machine Learning: Proceedings of the Fifteenth International Conference
, 1998
"... Bayesian algorithms for Neural Networks are known to produce classiers which are very resistent to overtting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of pro ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Bayesian algorithms for Neural Networks are known to produce classiers which are very resistent to overtting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a linear combination of classiers, whose coecients are given by Bayes theorem. One of the concepts used to deal with thresholded convex combinations is the `margin ' of the hyperplane with respect to the training sample, which is correlated to the predictive power of the hypothesis itself. We provide a novel theoretical analysis of such classi ers, based on DataDependent VC theory, proving that they can be expected to be large margin hyperplanes in a Hilbert space. We then present experimental evidence that the predictions of our model are correct, i.e. that bayesian classifers really nd hypotheses which have large margin on the training examples. This not only explains the remarkable resistance to over tting exhibited by such classiers, but also colocates them in the same class of other systems, like Support Vector machines and Adaboost, which have a similar performance.