Results 1 - 10
of
256
A tutorial on support vector machines for pattern recognition
- Data Mining and Knowledge Discovery
, 1998
"... The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SV ..."
Abstract
-
Cited by 3393 (12 self)
- Add to MetaCart
(Show Context)
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract
-
Cited by 865 (3 self)
- Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Support vector machines for spam categorization
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1999
"... We study the use of support vector machines (SVM’s) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features ..."
Abstract
-
Cited by 342 (2 self)
- Add to MetaCart
We study the use of support vector machines (SVM’s) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000. SVM’s performed best when using binary features. For both data sets, boosting trees and SVM’s had acceptable test performance in terms of accuracy and speed. However, SVM’s had significantly less training time.
Generalized Discriminant Analysis Using a Kernel Approach
, 2000
"... We present a new method that we call Generalized Discriminant Analysis (GDA) to deal with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the Support Vector Machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high di ..."
Abstract
-
Cited by 336 (2 self)
- Add to MetaCart
We present a new method that we call Generalized Discriminant Analysis (GDA) to deal with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the Support Vector Machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high dimensional feature space. In the transformed space, linear properties make it easy to extend and generalize the classical Linear Discriminant Analysis (LDA) to non linear discriminant analysis. The formulation is expressed as an eigenvalue problem resolution. Using a different kernel, one can cover a wide class of nonlinearities. For both simulated data and alternate kernels, we give classification results as well as the shape of the separating function. The results are confirmed using a real data to perform seed classification. 1. Introduction Linear discriminant analysis (LDA) is a traditional statistical method which has proven successful on classification problems [Fukunaga, 1990]. The p...
Predicting Time Series with Support Vector Machines
, 1997
"... . Support Vector Machines are used for time series prediction and compared to radial basis function networks. We make use of two different cost functions for Support Vectors: training with (i) an ffl insensitive loss and (ii) Huber's robust loss function and discuss how to choose the regulariza ..."
Abstract
-
Cited by 189 (13 self)
- Add to MetaCart
(Show Context)
. Support Vector Machines are used for time series prediction and compared to radial basis function networks. We make use of two different cost functions for Support Vectors: training with (i) an ffl insensitive loss and (ii) Huber's robust loss function and discuss how to choose the regularization parameters in these models. Two applications are considered: data from (a) a noisy (normal and uniform noise) Mackey Glass equation and (b) the Santa Fe competition (set D). In both cases Support Vector Machines show an excellent performance. In case (b) the Support Vector approach improves the best known result on the benchmark by a factor of 29%. 1 Introduction Support Vector Machines have become a subject of intensive study (see e.g. [3, 14]). They have been applied successfully to classification tasks as OCR [14, 11] and more recently also to regression [5, 15]. In this contribution we use Support Vector Machines in the field of time series prediction and we find that they show an excel...
Ridge Regression Learning Algorithm in Dual Variables
- In Proceedings of the 15th International Conference on Machine Learning
, 1998
"... In this paper we study a dual version of the Ridge Regression procedure. It allows us to perform non-linear regression by constructing a linear regression function in a high dimensional feature space. The feature space representation can result in a large increase in the number of parameters used by ..."
Abstract
-
Cited by 164 (8 self)
- Add to MetaCart
(Show Context)
In this paper we study a dual version of the Ridge Regression procedure. It allows us to perform non-linear regression by constructing a linear regression function in a high dimensional feature space. The feature space representation can result in a large increase in the number of parameters used by the algorithm. In order to combat this "curse of dimensionality", the algorithm allows the use of kernel functions, as used in Support Vector methods. We also discuss a powerful family of kernel functions which is constructed using the ANOVA decomposition method from the kernel corresponding to splines with an infinite number of nodes. This paper introduces a regression estimation algorithm which is a combination of these two elements: the dual version of Ridge Regression is applied to the ANOVA enhancement of the infinitenode splines. Experimental results are then presented (based on the Boston Housing data set) which indicate the performance of this algorithm relative to other algorithms....
RSVM: Reduced support vector machines
- Data Mining Institute, Computer Sciences Department, University of Wisconsin
, 2001
"... Abstract An algorithm is proposed which generates a nonlinear kernel-based separating surface that requires as little as 1 % of a large dataset for its explicit evaluation. To generate this nonlinear surface, the entire dataset is used as a constraint in an optimization problem with very few variabl ..."
Abstract
-
Cited by 162 (19 self)
- Add to MetaCart
(Show Context)
Abstract An algorithm is proposed which generates a nonlinear kernel-based separating surface that requires as little as 1 % of a large dataset for its explicit evaluation. To generate this nonlinear surface, the entire dataset is used as a constraint in an optimization problem with very few variables corresponding to the 1%
Practical selection of SVM parameters and noise estimation for SVM regression”, Neural
- Netw
, 2004
"... Abstract We investigate practical selection of hyper-parameters for support vector machines (SVM) regression (that is, 1-insensitive zone and regularization parameter C). The proposed methodology advocates analytic parameter selection directly from the training data, rather than re-sampling approac ..."
Abstract
-
Cited by 112 (1 self)
- Add to MetaCart
(Show Context)
Abstract We investigate practical selection of hyper-parameters for support vector machines (SVM) regression (that is, 1-insensitive zone and regularization parameter C). The proposed methodology advocates analytic parameter selection directly from the training data, rather than re-sampling approaches commonly used in SVM applications. In particular, we describe a new analytical prescription for setting the value of insensitive zone 1; as a function of training sample size. Good generalization performance of the proposed parameter selection is demonstrated empirically using several low-and high-dimensional regression problems. Further, we point out the importance of Vapnik's 1-insensitive loss for regression problems with finite samples. To this end, we compare generalization performance of SVM regression (using proposed selection of 1-values) with regression using 'least-modulus' loss ð1 ¼ 0Þ and standard squared loss. These comparisons indicate superior generalization performance of SVM regression under sparse sample settings, for various types of additive noise. q
On a Kernel-based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion
, 1997
"... We present a Kernel--based framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridge-regression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connection ..."
Abstract
-
Cited by 93 (23 self)
- Add to MetaCart
We present a Kernel--based framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridge-regression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connections between the cost-function and some properties up to now believed to apply to Support Vector Machines only. The optimal solution of all the problems described above can be found by solving a simple quadratic programming problem. The paper closes with a proof of the equivalence between Support Vector kernels and Greene's functions of regularization operators.
Support Vector Regression and Classification Based Multi-view Face Detection and Recognition
- IN IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION
, 2000
"... A Support Vector Machine based multi-view face detection and recognition framework is described in this paper. Face detection is carried out by constructing several detectors, each of them in charge of one specific view. The symmetrical property of face images is employed to simplify the complexity ..."
Abstract
-
Cited by 81 (8 self)
- Add to MetaCart
A Support Vector Machine based multi-view face detection and recognition framework is described in this paper. Face detection is carried out by constructing several detectors, each of them in charge of one specific view. The symmetrical property of face images is employed to simplify the complexity of the modelling. The estimation of head pose, which is achieved by using the Support Vector Regression technique, provides crucial information for choosing the appropriate face detector. This helps to improve the accuracy and reduce the computation in multi-view face detection compared to other methods. For video sequences, further computational reduction can be achieved by using Pose Change Smoothing strategy. When face detectors find a face in frontal view, a Support Vector Machine based multi-class classifier is activated for face recognition. All the above issues are integrated under a Support Vector Machine framework. Test results on four video sequences are presented, among them, detection rate is above 95%, recognition accuracy is above 90%, average pose estimation error is around 10°, and the full detection and recognition speed is up to 4 frames/second on a PentiumII300 PC.