Results 1  10
of
163
The Nature of Statistical Learning Theory
, 1995
"... Abstract—Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on ..."
Abstract

Cited by 9946 (28 self)
 Add to MetaCart
Abstract—Statistical learning theory was introduced in the late 1960’s. Until the 1990’s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990’s new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems.
A tutorial on support vector machines for pattern recognition
 Data Mining and Knowledge Discovery
, 1998
"... The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and nonseparable data, working through a nontrivial example in detail. We describe a mechanical analogy, and discuss when SV ..."
Abstract

Cited by 2497 (11 self)
 Add to MetaCart
(Show Context)
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and nonseparable data, working through a nontrivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.
Nonlinear component analysis as a kernel eigenvalue problem

, 1996
"... We describe a new method for performing a nonlinear form of Principal Component Analysis. By the use of integral operator kernel functions, we can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all ..."
Abstract

Cited by 1145 (73 self)
 Add to MetaCart
We describe a new method for performing a nonlinear form of Principal Component Analysis. By the use of integral operator kernel functions, we can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible 5pixel products in 16x16 images. We give the derivation of the method, along with a discussion of other techniques which can be made nonlinear with the kernel approach; and present first experimental results on nonlinear feature extraction for pattern recognition.
Sparse Bayesian Learning and the Relevance Vector Machine
, 2001
"... This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vec ..."
Abstract

Cited by 623 (5 self)
 Add to MetaCart
This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vector machine' (RVM), a model of identical functional form to the popular and stateoftheart `support vector machine' (SVM). We demonstrate that by exploiting a probabilistic Bayesian learning framework, we can derive accurate prediction models which typically utilise dramatically fewer basis functions than a comparable SVM while oering a number of additional advantages. These include the benets of probabilistic predictions, automatic estimation of `nuisance' parameters, and the facility to utilise arbitrary basis functions (e.g. non`Mercer' kernels).
Training Support Vector Machines: an Application to Face Detection
, 1997
"... We investigate the application of Support Vector Machines (SVMs) in computer vision. SVM is a learning technique developed by V. Vapnik and his team (AT&T Bell Labs.) that can be seen as a new method for training polynomial, neural network, or Radial Basis Functions classifiers. The decision sur ..."
Abstract

Cited by 607 (1 self)
 Add to MetaCart
(Show Context)
We investigate the application of Support Vector Machines (SVMs) in computer vision. SVM is a learning technique developed by V. Vapnik and his team (AT&T Bell Labs.) that can be seen as a new method for training polynomial, neural network, or Radial Basis Functions classifiers. The decision surfaces are found by solving a linearly constrained quadratic programming problem. This optimization problem is challenging because the quadratic form is completely dense and the memory requirements grow with the square of the number of data points. We present a decomposition algorithm that guarantees global optimality, and can be used to train SVM's over very large data sets. The main idea behind the decomposition is the iterative solution of subproblems and the evaluation of optimality conditions which are used both to generate improved iterative values, and also establish the stopping criteria for the algorithm. We present experimental results of our implementation of SVM, and demonstrate the ...
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 540 (2 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 422 (49 self)
 Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Fisher Discriminant Analysis With Kernels
, 1999
"... A nonlinear classification technique based on Fisher's discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) nonlinear decision f ..."
Abstract

Cited by 358 (15 self)
 Add to MetaCart
A nonlinear classification technique based on Fisher's discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) nonlinear decision function in input space. Large scale simulations demonstrate the competitiveness of our approach.
Generalized Discriminant Analysis Using a Kernel Approach
, 2000
"... We present a new method that we call Generalized Discriminant Analysis (GDA) to deal with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the Support Vector Machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high di ..."
Abstract

Cited by 237 (2 self)
 Add to MetaCart
We present a new method that we call Generalized Discriminant Analysis (GDA) to deal with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the Support Vector Machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high dimensional feature space. In the transformed space, linear properties make it easy to extend and generalize the classical Linear Discriminant Analysis (LDA) to non linear discriminant analysis. The formulation is expressed as an eigenvalue problem resolution. Using a different kernel, one can cover a wide class of nonlinearities. For both simulated data and alternate kernels, we give classification results as well as the shape of the separating function. The results are confirmed using a real data to perform seed classification. 1. Introduction Linear discriminant analysis (LDA) is a traditional statistical method which has proven successful on classification problems [Fukunaga, 1990]. The p...
Support vector machines: Training and applications
 A.I. MEMO 1602, MIT A. I. LAB
, 1997
"... The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Laboratories [3, 6, 8, 24]. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and MultiLayer Perc ..."
Abstract

Cited by 192 (3 self)
 Add to MetaCart
(Show Context)
The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Laboratories [3, 6, 8, 24]. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and MultiLayer Perceptron classifiers. The main idea behind the technique is to separate the classes with a surface that maximizes the margin between them. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle [23]. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Since Structural Risk Minimization is an inductive principle that aims at minimizing a bound on the generalization error of a model, rather than minimizing the Mean Square Error over the data set (as Empirical Risk Minimization methods do), training a SVM to obtain the maximum margin classi er requires a different objective function. This objective function is then optimized by solving a largescale quadratic programming problem with linear and box constraints. The problem is considered challenging, because the quadratic form is completely dense, so the memory