Results 1  10
of
183
A tutorial on support vector machines for pattern recognition
 Data Mining and Knowledge Discovery
, 1998
"... The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and nonseparable data, working through a nontrivial example in detail. We describe a mechanical analogy, and discuss when SV ..."
Abstract

Cited by 2356 (11 self)
 Add to MetaCart
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and nonseparable data, working through a nontrivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.
Nonlinear component analysis as a kernel eigenvalue problem

, 1996
"... We describe a new method for performing a nonlinear form of Principal Component Analysis. By the use of integral operator kernel functions, we can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all ..."
Abstract

Cited by 1073 (71 self)
 Add to MetaCart
We describe a new method for performing a nonlinear form of Principal Component Analysis. By the use of integral operator kernel functions, we can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible 5pixel products in 16x16 images. We give the derivation of the method, along with a discussion of other techniques which can be made nonlinear with the kernel approach; and present first experimental results on nonlinear feature extraction for pattern recognition.
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propo ..."
Abstract

Cited by 521 (31 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 493 (2 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 391 (48 self)
 Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Manifold regularization: A geometric framework for learning from labeled and unlabeled examples
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. We focus on a semisupervised framework that incorporates labeled and unlabeled data in a generalpurpose learner. Some transductive graph learning al ..."
Abstract

Cited by 343 (13 self)
 Add to MetaCart
We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. We focus on a semisupervised framework that incorporates labeled and unlabeled data in a generalpurpose learner. Some transductive graph learning algorithms and standard methods including Support Vector Machines and Regularized Least Squares can be obtained as special cases. We utilize properties of Reproducing Kernel Hilbert spaces to prove new Representer theorems that provide theoretical basis for the algorithms. As a result (in contrast to purely graphbased approaches) we obtain a natural outofsample extension to novel examples and so are able to handle both transductive and truly semisupervised settings. We present experimental evidence suggesting that our semisupervised algorithms are able to use unlabeled data effectively. Finally we have a brief discussion of unsupervised and fully supervised learning within our general framework.
New Support Vector Algorithms
, 2000
"... this article with the regression case. To explain this, we will introduce a suitable definition of a margin that is maximized in both cases ..."
Abstract

Cited by 335 (43 self)
 Add to MetaCart
this article with the regression case. To explain this, we will introduce a suitable definition of a margin that is maximized in both cases
An Improved Training Algorithm for Support Vector Machines
, 1997
"... We investigate the problem of training a Support Vector Machine (SVM) [1, 2, 7] on a very large date base (e.g. 50,000 data points) in the case in which the number of support vectors is also very large (e.g. 40,000). Training a SVM is equivalent to solving a linearly constrained quadratic programmin ..."
Abstract

Cited by 259 (0 self)
 Add to MetaCart
We investigate the problem of training a Support Vector Machine (SVM) [1, 2, 7] on a very large date base (e.g. 50,000 data points) in the case in which the number of support vectors is also very large (e.g. 40,000). Training a SVM is equivalent to solving a linearly constrained quadratic programming (QP) problem in a number of variables equal to the number of data points. This optimization problem is known to be challenging when the number of data points exceeds few thousands. In previous work, done by us as well as by other researchers, the strategy used to solve the large scale QP problem takes advantage of the fact that the expected number of support vectors is small (! 3; 000). Therefore, the existing algorithms cannot deal with more than a few thousand support vectors. In this paper we present a decomposition algorithm that is guaranteed to solve the QP problem and that does not make assumptions on the expected number of support vectors. In order to present the feasibility of our...
Sparse Greedy Matrix Approximation for Machine Learning
, 2000
"... In kernel based methods such as Regularization Networks large datasets pose signi cant problems since the number of basis functions required for an optimal solution equals the number of samples. We present a sparse greedy approximation technique to construct a compressed representation of the ..."
Abstract

Cited by 183 (11 self)
 Add to MetaCart
In kernel based methods such as Regularization Networks large datasets pose signi cant problems since the number of basis functions required for an optimal solution equals the number of samples. We present a sparse greedy approximation technique to construct a compressed representation of the design matrix. Experimental results are given and connections to KernelPCA, Sparse Kernel Feature Analysis, and Matching Pursuit are pointed out. 1. Introduction Many recent advances in machine learning such as Support Vector Machines [Vapnik, 1995], Regularization Networks [Girosi et al., 1995], or Gaussian Processes [Williams, 1998] are based on kernel methods. Given an msample f(x 1 ; y 1 ); : : : ; (x m ; y m )g of patterns x i 2 X and target values y i 2 Y these algorithms minimize the regularized risk functional min f2H R reg [f ] = 1 m m X i=1 c(x i ; y i ; f(x i )) + 2 kfk 2 H : (1) Here H denotes a reproducing kernel Hilbert space (RKHS) [Aronszajn, 1950],...
Kernel principal component analysis
 ADVANCES IN KERNEL METHODS  SUPPORT VECTOR LEARNING
, 1999
"... A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all ..."
Abstract

Cited by 179 (7 self)
 Add to MetaCart
A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible dpixel products in images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.