Results 1  10
of
39
Multicategory Support Vector Machines, theory, and application to the classification of microarray data and satellite radiance data
 Journal of the American Statistical Association
, 2004
"... Twocategory support vector machines (SVM) have been very popular in the machine learning community for classi � cation problems. Solving multicategory problems by a series of binary classi � ers is quite common in the SVM paradigm; however, this approach may fail under various circumstances. We pro ..."
Abstract

Cited by 175 (17 self)
 Add to MetaCart
Twocategory support vector machines (SVM) have been very popular in the machine learning community for classi � cation problems. Solving multicategory problems by a series of binary classi � ers is quite common in the SVM paradigm; however, this approach may fail under various circumstances. We propose the multicategory support vector machine (MSVM), which extends the binary SVM to the multicategory case and has good theoretical properties. The proposed method provides a unifying framework when there are either equal or unequal misclassi � cation costs. As a tuning criterion for the MSVM, an approximate leaveoneout crossvalidation function, called Generalized Approximate Cross Validation, is derived, analogous to the binary case. The effectiveness of the MSVM is demonstrated through the applications to cancer classi � cation using microarray data and cloud classi � cation with satellite radiance pro � les.
Stability and Generalization
, 2001
"... We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leaveoneout error. The methods we use can be applied in the regression framework as well as in the classification one when the classif ..."
Abstract

Cited by 167 (6 self)
 Add to MetaCart
We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leaveoneout error. The methods we use can be applied in the regression framework as well as in the classification one when the classifier is obtained by thresholding a realvalued function. We study the stability properties of large classes of learning algorithms such as regularization based algorithms. In particular we focus on Hilbert space regularization and KullbackLeibler regularization. We demonstrate how to apply the results to SVM for regression and classification.
Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2002
"... Monitoring gene expression profiles is a novel approach in cancer diagnosis. Several studies showed that prediction of cancer types using gene expression data is promising and very informative. The Support Vector Machine (SVM) is one of the classification methods successfully applied to the cancer d ..."
Abstract

Cited by 88 (4 self)
 Add to MetaCart
Monitoring gene expression profiles is a novel approach in cancer diagnosis. Several studies showed that prediction of cancer types using gene expression data is promising and very informative. The Support Vector Machine (SVM) is one of the classification methods successfully applied to the cancer diagnosis problems using gene expression data. However, its optimal extension to more than two classes was not obvious, which might impose limitations in its application to multiple tumor types. In this paper, we analyze a couple of published multiple cancer types data sets by the multicategory SVM, which is a recently proposed extension of the binary SVM.
Support vector machines and the Bayes rule in classification
 Data Mining Knowledge Disc
, 2002
"... Abstract. The Bayes rule is the optimal classification rule if the underlying distribution of the data is known. In practice we do not know the underlying distribution, and need to “learn ” classification rules from the data. One way to derive classification rules in practice is to implement the Bay ..."
Abstract

Cited by 84 (13 self)
 Add to MetaCart
Abstract. The Bayes rule is the optimal classification rule if the underlying distribution of the data is known. In practice we do not know the underlying distribution, and need to “learn ” classification rules from the data. One way to derive classification rules in practice is to implement the Bayes rule approximately by estimating an appropriate classification function. Traditional statistical methods use estimated log odds ratio as the classification function. Support vector machines (SVMs) are one type of large margin classifier, and the relationship between SVMs and the Bayes rule was not clear. In this paper, it is shown that the asymptotic target of SVMs are some interesting classification functions that are directly related to the Bayes rule. The rate of convergence of the solutions of SVMs to their corresponding target functions is explicitly established in the case of SVMs with quadratic or higher order loss functions and spline kernels. Simulations are given to illustrate the relation between SVMs and the Bayes rule in other cases. This helps understand the success of SVMs in many classification studies, and makes it easier to compare SVMs and traditional statistical methods.
Support Vector Machines for Classification in Nonstandard Situations
 MACHINE LEARNING
, 2000
"... The majority of classification algorithms are developed for the standard situation in which it is assumed that the examples in the training set come from the same distribution as that of the target population, and that the cost of misclassification into di#erent classes are the same. However, these ..."
Abstract

Cited by 74 (15 self)
 Add to MetaCart
The majority of classification algorithms are developed for the standard situation in which it is assumed that the examples in the training set come from the same distribution as that of the target population, and that the cost of misclassification into di#erent classes are the same. However, these assumptions are often violated in real world settings. For some classification methods, this can often be taken care of simply with a change of threshold; for others, additional e#ort is required. In this paper, we explain why the standard support vector machine is not suitable for the nonstandard situation, and introduce a simple procedure for adapting the support vector machine methodology to the nonstandard situation. Theoretical justification for the procedure is provided. Simulation study illustrates that the modified support vector machine significantly improves upon the standard support vector machine in the nonstandard situation. The computational load of the proposed procedure is th...
Gender classification with support vector machines
, 2000
"... Support Vector Machines (SVMs) are investigated for visual gender classification with low resolution “thumbnail” faces (21by12 pixels) processed from 1,755 images from the FERET face database. The performance of SVMs (3.4 % error) is shown to be superior to traditional pattern classifiers (Linear, ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
Support Vector Machines (SVMs) are investigated for visual gender classification with low resolution “thumbnail” faces (21by12 pixels) processed from 1,755 images from the FERET face database. The performance of SVMs (3.4 % error) is shown to be superior to traditional pattern classifiers (Linear, Quadratic, Fisher Linear Discriminant, NearestNeighbor) as well as more modern techniques such as Radial Basis Function (RBF) classifiers and large ensembleRBF networks. SVMs also outperformed human test subjects at the same task: in a perception study with 30 human test subjects, ranging in age from mid20s to mid40s, the average error rate was found to be 32 % for the “thumbnails ” and 6.7 % with higher resolution images. The difference in performance between low and high resolution tests with SVMs was only 1%, demonstrating robustness and relative scale invariance for visual classification. 1
Algorithmic Stability and Generalization Performance
, 2001
"... We present a novel way of obtaining PACstyle bounds on the generalization error of learning algorithms, explicitly using their stability properties. A stable learner is one for which the learned solution does not change much with small changes in the training set. The bounds we obtain do not depend ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
We present a novel way of obtaining PACstyle bounds on the generalization error of learning algorithms, explicitly using their stability properties. A stable learner is one for which the learned solution does not change much with small changes in the training set. The bounds we obtain do not depend on any measure of the complexity of the hypothesis space (e.g. VC dimension) but rather depend on how the learning algorithm searches this space, and can thus be applied even when the VC dimension is infinite. We demonstrate that regularization networks possess the required stability property and apply our method to obtain new bounds on their generalization performance.
A Note on Marginbased Loss Functions in Classification
 STATISTICS AND PROBABILITY LETTERS
, 2002
"... In many classification procedures, the classification function is obtained (or trained) by minimizing a certain empirical risk on the training sample. The classification is then based on the sign of the classification function. In recent years, there have been a host of classification methods pro ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
In many classification procedures, the classification function is obtained (or trained) by minimizing a certain empirical risk on the training sample. The classification is then based on the sign of the classification function. In recent years, there have been a host of classification methods proposed in machine learning that use di#erent marginbased loss functions in the training. Examples include the AdaBoost procedure, the support vector machine, and many variants of them. The marginbased loss functions used in these procedures are usually motivated as upper bounds of the misclassification loss, but this can not explain the statistical properties of the classification procedures. We consider the marginbased loss functions from a statistical point of view. We first show that under general conditions, marginbased loss functions are Fisher consistent for classification. That is, the population minimizer of the loss function leads to the Bayes optimal rule of classification. In particular, almost all marginbased loss functions that have appeared in the literature are Fisher consistent. We then study marginbased loss functions in the method of sieves and the method of regularization. We show that the Fisher consistency of marginbased loss functions often leads to consistency and rate of convergence (to the Bayes optimal risk) results under general conditions. The common notion of marginbased loss functions as upper bounds of the misclassification loss is formalized and investigated. It is shown that the hinge loss is the tightest convex upper bound of the misclassification loss. Simulations are carried out to compare some commonly used marginbased loss functions.
Image Representations for Object Detection Using Kernel Classifiers
 In Asian Conference on Computer Vision
, 2000
"... This paper presents experimental comparisons of various image representations for object detection using kernel classifiers. In particular it discusses the use of support vector machines (SVM) for object detection using as image representations raw pixel values, projections onto principal components ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
This paper presents experimental comparisons of various image representations for object detection using kernel classifiers. In particular it discusses the use of support vector machines (SVM) for object detection using as image representations raw pixel values, projections onto principal components, and Haar wavelets. General linear transformations of the images through the choice of the kernel of the SVM are considered. Experiments showing the effects of histogram equalization, a nonlinear transformation, are presented. Image representations derived from probabilistic models of the class of images considered, through the choice of the kernel of the SVM, are also evaluated. Finally, we present a feature selection method using SVMs, and show experimental results. Keywords: Support Vector Machines, Kernel, Wavelets, PCA, histogram equalization. 1. Introduction Detection of realworld objects in images, such as faces and people, is a challenging problem of fundamental importance in m...
Bayesian support vector regression using a unified loss function
 IEEE Transactions on Neural Networks
, 2004
"... In this paper, we use a unified loss function, called the soft insensitive loss function, for Bayesian support vector regression. We follow standard Gaussian processes for regression to set up the Bayesian framework, in which the unified loss function is used in the likelihood evaluation. Under this ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
In this paper, we use a unified loss function, called the soft insensitive loss function, for Bayesian support vector regression. We follow standard Gaussian processes for regression to set up the Bayesian framework, in which the unified loss function is used in the likelihood evaluation. Under this framework, the maximum a posteriori estimate of the function values corresponds to the solution of an extended support vector regression problem. The overall approach has the merits of support vector regression such as convex quadratic programming and sparsity in solution representation. It also has the advantages of Bayesian methods for model adaptation and error bars of its predictions. Experimental results on simulated and realworld data sets indicate that the approach works well even on large data sets.