Results 1  10
of
144
A tutorial on support vector machines for pattern recognition
 Data Mining and Knowledge Discovery
, 1998
"... The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and nonseparable data, working through a nontrivial example in detail. We describe a mechanical analogy, and discuss when SV ..."
Abstract

Cited by 2497 (11 self)
 Add to MetaCart
(Show Context)
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and nonseparable data, working through a nontrivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 540 (2 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 422 (49 self)
 Add to MetaCart
(Show Context)
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
On the algorithmic implementation of multiclass kernelbased vector machines
 Journal of Machine Learning Research
"... In this paper we describe the algorithmic implementation of multiclass kernelbased vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic ob ..."
Abstract

Cited by 382 (12 self)
 Add to MetaCart
In this paper we describe the algorithmic implementation of multiclass kernelbased vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic objective function. Unlike most of previous approaches which typically decompose a multiclass problem into multiple independent binary classification tasks, our notion of margin yields a direct method for training multiclass predictors. By using the dual of the optimization problem we are able to incorporate kernels with a compact set of constraints and decompose the dual problem into multiple optimization problems of reduced size. We describe an efficient fixedpoint algorithm for solving the reduced optimization problems and prove its convergence. We then discuss technical details that yield significant running time improvements for large datasets. Finally, we describe various experiments with our approach comparing it to previously studied kernelbased methods. Our experiments indicate that for multiclass problems we attain stateoftheart accuracy.
New Support Vector Algorithms
, 2000
"... this article with the regression case. To explain this, we will introduce a suitable definition of a margin that is maximized in both cases ..."
Abstract

Cited by 351 (42 self)
 Add to MetaCart
this article with the regression case. To explain this, we will introduce a suitable definition of a margin that is maximized in both cases
Soft Margins for AdaBoost
, 1998
"... Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this ..."
Abstract

Cited by 274 (22 self)
 Add to MetaCart
Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this fact is the margin distribution and we find that AdaBoost achieves  doing gradient descent in an error function with respect to the margin  asymptotically a hard margin distribution, i.e. the algorithm concentrates its resources on a few hardtolearn patterns (here an interesting overlap emerge to Support Vectors). This is clearly a suboptimal strategy in the noisy case, and regularization, i.e. a mistrust in the data, must be introduced in the algorithm to alleviate the distortions that a difficult pattern (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original AdaBoost algorithm to achieve a soft margin  a ...
Generalized Discriminant Analysis Using a Kernel Approach
, 2000
"... We present a new method that we call Generalized Discriminant Analysis (GDA) to deal with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the Support Vector Machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high di ..."
Abstract

Cited by 237 (2 self)
 Add to MetaCart
We present a new method that we call Generalized Discriminant Analysis (GDA) to deal with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the Support Vector Machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high dimensional feature space. In the transformed space, linear properties make it easy to extend and generalize the classical Linear Discriminant Analysis (LDA) to non linear discriminant analysis. The formulation is expressed as an eigenvalue problem resolution. Using a different kernel, one can cover a wide class of nonlinearities. For both simulated data and alternate kernels, we give classification results as well as the shape of the separating function. The results are confirmed using a real data to perform seed classification. 1. Introduction Linear discriminant analysis (LDA) is a traditional statistical method which has proven successful on classification problems [Fukunaga, 1990]. The p...
LeaveOneOut Support Vector Machines
, 1999
"... We present a new learning algorithm for pattern recognition inspired by a recent upper bound on leaveoneout error [ Jaakkola and Haussler, 1999 ] proved for Support Vector Machines (SVMs) [ Vapnik, 1995; 1998 ] . The new approach directly minimizes the expression given by the bound in an attempt ..."
Abstract

Cited by 235 (4 self)
 Add to MetaCart
We present a new learning algorithm for pattern recognition inspired by a recent upper bound on leaveoneout error [ Jaakkola and Haussler, 1999 ] proved for Support Vector Machines (SVMs) [ Vapnik, 1995; 1998 ] . The new approach directly minimizes the expression given by the bound in an attempt to minimize leaveoneout error. This gives a convex optimization problem which constructs a sparse linear classifier in feature space using the kernel technique. As such the algorithm possesses many of the same properties as SVMs. The main novelty of the algorithm is that apart from the choice of kernel, it is parameterless  the selection of the number of training errors is inherent in the algorithm and not chosen by an extra free parameter as in SVMs. First experiments using the method on benchmark datasets from the UCI repository show results similar to SVMs which have been tuned to have the best choice of parameter. 1 Introduction Support Vector Machines (SVMs), motivated by minim...
The connection between regularization operators and support vector kernels
, 1998
"... In this paper a correspondence is derived between regularization operators used in regularization networks and support vector kernels. We prove that the Green’s Functions associated with regularization operators are suitable support vector kernels with equivalent regularization properties. Moreover, ..."
Abstract

Cited by 154 (42 self)
 Add to MetaCart
In this paper a correspondence is derived between regularization operators used in regularization networks and support vector kernels. We prove that the Green’s Functions associated with regularization operators are suitable support vector kernels with equivalent regularization properties. Moreover, the paper provides an analysis of currently used support vector kernels in the view of regularization theory and corresponding operators associated with the classes of both polynomial kernels and translation invariant kernels. The latter are also analyzed on periodical domains. As a byproduct we show that a large number of radial basis functions, namely conditionally positive definite
Training Invariant Support Vector Machines
, 2002
"... Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide ..."
Abstract

Cited by 149 (16 self)
 Add to MetaCart
Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the wellknown MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.