Results 1  10
of
438
In defense of onevsall classification
 Journal of Machine Learning Research
, 2004
"... Editor: John ShaweTaylor We consider the problem of multiclass classification. Our main thesis is that a simple “onevsall ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are welltuned regularized classifiers such as support vector machines. This the ..."
Abstract

Cited by 312 (0 self)
 Add to MetaCart
Editor: John ShaweTaylor We consider the problem of multiclass classification. Our main thesis is that a simple “onevsall ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are welltuned regularized classifiers such as support vector machines. This thesis is interesting in that it disagrees with a large body of recent published work on multiclass classification. We support our position by means of a critical review of the existing literature, a substantial collection of carefully controlled experimental work, and theoretical arguments.
Proximal support vector machine classifiers
 Proceedings KDD2001: Knowledge Discovery and Data Mining
, 2001
"... Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the ..."
Abstract

Cited by 152 (16 self)
 Add to MetaCart
(Show Context)
Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the other data set. Each of the two nonparallel proximal planes is obtained by a single MATLAB command as the eigenvector corresponding to a smallest eigenvalue of a generalized eigenvalue problem. Classification by proximity to two distinct nonlinear surfaces generated by a nonlinear kernel also leads to two simple generalized eigenvalue problems. The effectiveness of the proposed method is demonstrated by tests on simple examples as well as on a number of public data sets. These examples show the advantages of the proposed approach in both computation time and test set correctness. Index Terms—Support vector machines, proximal classification, generalized eigenvalues. 1
A modified finite newton method for fast solution of large scale linear svms
 Journal of Machine Learning Research
, 2005
"... This paper develops a fast method for solving linear SVMs with L2 loss function that is suited for large scale data mining tasks such as text classification. This is done by modifying the finite Newton method of Mangasarian in several ways. Experiments indicate that the method is much faster than de ..."
Abstract

Cited by 109 (8 self)
 Add to MetaCart
(Show Context)
This paper develops a fast method for solving linear SVMs with L2 loss function that is suited for large scale data mining tasks such as text classification. This is done by modifying the finite Newton method of Mangasarian in several ways. Experiments indicate that the method is much faster than decomposition methods such as SVM light, SMO and BSVM (e.g., 4100 fold), especially when the number of examples is large. The paper also suggests ways of extending the method to other loss functions such as the modified Huber’s loss function and the L1 loss function, and also for solving ordinal regression.
Everything Old Is New Again: A Fresh Look at Historical Approaches
 in Machine Learning. PhD thesis, MIT
, 2002
"... 2 Everything Old Is New Again: A Fresh Look at Historical ..."
Abstract

Cited by 106 (7 self)
 Add to MetaCart
(Show Context)
2 Everything Old Is New Again: A Fresh Look at Historical
A robust minimax approach to classification
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2002
"... When constructing a classifier, the probability of correct classification of future data points should be maximized. We consider a binary classification problem where the mean and covariance matrix of each class are assumed to be known. No further assumptions are made with respect to the classcondi ..."
Abstract

Cited by 102 (7 self)
 Add to MetaCart
When constructing a classifier, the probability of correct classification of future data points should be maximized. We consider a binary classification problem where the mean and covariance matrix of each class are assumed to be known. No further assumptions are made with respect to the classconditional distributions. Misclassification probabilities are then controlled in a worstcase setting: that is, under all possible choices of classconditional densities with given mean and covariance matrix, we minimize the worstcase (maximum) probability of misclassification of future data points. For a linear decision boundary, this desideratum is translated in a very direct way into a (convex) second order cone optimization problem, with complexity similar to a support vector machine problem. The minimax problem can be interpreted geometrically as minimizing the maximum of the Mahalanobis distances to the two classes. We address the issue of robustness with respect to estimation errors (in the means and covariances of the
Regularized LeastSquares Classification
"... We consider the solution of binary classification problems via Tikhonov regularization in a Reproducing Kernel Hilbert Space using the square loss, and denote the resulting algorithm Regularized LeastSquares Classification (RLSC). We sketch ..."
Abstract

Cited by 100 (1 self)
 Add to MetaCart
We consider the solution of binary classification problems via Tikhonov regularization in a Reproducing Kernel Hilbert Space using the square loss, and denote the resulting algorithm Regularized LeastSquares Classification (RLSC). We sketch
Weighted Least Squares Support Vector Machines: robustness and sparse approximation
 NEUROCOMPUTING
"... Least Squares Support Vector Machines (LSSVM) is an SVM version which involves equality instead of inequality constraints and works with a least squares cost function. In this way the solution follows from a linear KarushKuhnTucker system instead of a quadratic programming problem. However, sp ..."
Abstract

Cited by 90 (19 self)
 Add to MetaCart
(Show Context)
Least Squares Support Vector Machines (LSSVM) is an SVM version which involves equality instead of inequality constraints and works with a least squares cost function. In this way the solution follows from a linear KarushKuhnTucker system instead of a quadratic programming problem. However, sparseness is lost in the LSSVM case and the estimation of the support values is only optimal in the case of a Gaussian distribution of the error variables. In this paper we discuss a method which can overcome these two drawbacks. We show how to obtain robust estimates for regression by applying a weighted version of LSSVM. We also discuss a sparse approximation procedure for weighted and unweighted LSSVM. It is basically a pruning method which is able to do pruning based upon the physical meaning of the sorted support values, while pruning procedures for classical multilayer perceptrons require the computation of a Hessian matrix or its inverse. The methods of this paper are illustrated for RBF kernels and demonstrate how to obtain robust estimates with selection of an appropriate number of hidden units, in the case of outliers or nonGaussian error distributions with heavy tails.
Predictive lowrank decomposition for kernel methods
 ICML
, 2005
"... Lowrank matrix decompositions are essential tools in the application of kernel methods to largescale learning problems. These decompositions have generally been treated as black boxes—the decomposition of the kernel matrix that they deliver is independent of the specific learning task at hand— and ..."
Abstract

Cited by 89 (7 self)
 Add to MetaCart
(Show Context)
Lowrank matrix decompositions are essential tools in the application of kernel methods to largescale learning problems. These decompositions have generally been treated as black boxes—the decomposition of the kernel matrix that they deliver is independent of the specific learning task at hand— and this is a potentially significant source of inefficiency. In this paper, we present an algorithm that can exploit side information (e.g., classification labels, regression responses) in the computation of lowrank decompositions for kernel matrices. Our algorithm has the same favorable scaling as stateoftheart methods such as incomplete Cholesky decomposition—it is linear in the number of data points and quadratic in the rank of the approximation. We present simulation results that show that our algorithm yields decompositions of significantly smaller rank than those found by incomplete Cholesky decomposition. 1.
Multiclass Least Squares Support Vector Machines
 In Proceedings of the International Joint Conference on Neural Networks (IJCNN’99
, 1999
"... In this paper we present an extension of least squares support vector machines (LSSVM's) to the multiclass case. While standard SVM solutions involve solving quadratic or linear programming problems, the least squares version of SVM's corresponds to solving a set of linear equations, due ..."
Abstract

Cited by 80 (11 self)
 Add to MetaCart
In this paper we present an extension of least squares support vector machines (LSSVM's) to the multiclass case. While standard SVM solutions involve solving quadratic or linear programming problems, the least squares version of SVM's corresponds to solving a set of linear equations, due to equality instead of inequality constraints in the problem formulation. In LSSVM's Mercer condition is still applicable. Hence several type of kernels such as polynomial, RBF's and MLP's can be used. The multiclass case that we discuss here is related to classical neural net approaches for classification where multi classes are encoded by considering multiple outputs for the network. Efficient methods for solving large scale LSSVM's are available. 1 Introduction Support vector machines have been introduced in [16] for solving pattern recognition and nonlinear function estimation problems. In this method one maps the data into a high dimensional input space in which one constructs an optimal sepa...
Component selection and smoothing in multivariate nonparametric regression
"... We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The “COSSO ” is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in ..."
Abstract

Cited by 78 (1 self)
 Add to MetaCart
(Show Context)
We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The “COSSO ” is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditional smoothing spline method. The COSSO provides a unified framework for several recent proposals for model selection in linear models and smoothing spline ANOVA models. Theoretical properties, such as the existence and the rate of convergence of the COSSO estimator, are studied. In the special case of a tensor product design with periodic functions, a detailed analysis reveals that the COSSO does model selection by applying a novel soft thresholding type operation to the function components. We give an equivalent formulation of the COSSO estimator which leads naturally to an iterative algorithm. We compare the COSSO with MARS, a popular method that builds functional ANOVA models, in simulations and real examples. The COSSO method can be extended to classification problems and we compare its performance with those of a number of machine learning algorithms on real datasets. The COSSO gives very competitive performance in these studies. 1. Introduction. Consider