Results 1  10
of
19
A tutorial on support vector machines for pattern recognition
 Data Mining and Knowledge Discovery
, 1998
"... The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and nonseparable data, working through a nontrivial example in detail. We describe a mechanical analogy, and discuss when SV ..."
Abstract

Cited by 2497 (11 self)
 Add to MetaCart
(Show Context)
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and nonseparable data, working through a nontrivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.
Semisupervised support vector machines
 Advances in Neural Information Processing Systems
, 1998
"... We introduce a semisupervised support vector machine (S 3 VM) method. Given a training set of labeled data and a working set of unlabeled data, S 3 VM constructs a support vector machine using both the training and working sets. We use S 3 VM to solve the transduction problem using overall risk min ..."
Abstract

Cited by 191 (7 self)
 Add to MetaCart
(Show Context)
We introduce a semisupervised support vector machine (S 3 VM) method. Given a training set of labeled data and a working set of unlabeled data, S 3 VM constructs a support vector machine using both the training and working sets. We use S 3 VM to solve the transduction problem using overall risk minimization (ORM) posed by Vapnik. The transduction problem is to estimate the value of a classification function at the given points in the working set. This contrasts with the standard inductive learning problem of estimating the classification function at all possible values and then using the fixed function to deduce the classes of the working set data. We propose a general S 3 VM model that minimizes both the misclassification error and the function capacity based on all the available data. We show how the S 3 VM model for 1norm linear support vector machines can be converted to a mixedinteger program and then solved exactly using integer programming. Results of S 3 VM and the standard 1norm support vector machine approach are compared on eleven data sets. Our computational results support the statistical learning theory results showing that incorporating working data improves generalization when insufficient training information is available. In every case, S 3 VM either improved or showed no significant difference in generalization compared to the traditional approach.
Dimensionality Reduction via Sparse Support Vector Machines
 Journal of Machine Learning Research
, 2003
"... We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to prod ..."
Abstract

Cited by 79 (13 self)
 Add to MetaCart
(Show Context)
We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model. The method exploits the fact that a linear SVM (no kernels) with # 1 norm regularization inherently performs variable selection as a sidee#ect of minimizing capacity of the SVM model. The distribution of the linear model weights provides a mechanism for ranking and interpreting the e#ects of variables.
A fast iterative nearest point algorithm for support vector machine classifier design
 IEEE Transactions on Neural Networks
, 2000
"... Abstract—In this paper we give a new fast iterative algorithm for support vector machine (SVM) classifier design. The basic problem treated is one that does not allow classification violations. The problem is converted to a problem of computing the nearest point between two convex polytopes. The sui ..."
Abstract

Cited by 75 (3 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper we give a new fast iterative algorithm for support vector machine (SVM) classifier design. The basic problem treated is one that does not allow classification violations. The problem is converted to a problem of computing the nearest point between two convex polytopes. The suitability of two classical nearest point algorithms, due to Gilbert, and Mitchell et al., is studied. Ideas from both these algorithms are combined and modified to derive our fast algorithm. For problems which require classification violations to be allowed, the violations are quadratically penalized and an idea due to Cortes and Vapnik and Frieß is used to convert it to a problem in which there are no classification violations. Comparative computational evaluation of our algorithm against powerful SVM methods such as Platt's sequential minimal optimization shows that our algorithm is very competitive. Index Terms—Classification, nearest point algorithm, quadratic programming, support vector machine. I.
Multicategory Classification by Support Vector Machines
 Computational Optimizations and Applications
, 1999
"... We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how twoclass discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programm ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
(Show Context)
We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how twoclass discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programming (QP) approaches based on Vapnik's Support Vector Machines (SVM) can be combined to yield two new approaches to the multiclass problem. In LP multiclass discrimination, a single linear program is used to construct a piecewise linear classification function. In our proposed multiclass SVM method, a single quadratic program is used to construct a piecewise nonlinear classification function. Each piece of this function can take the form of a polynomial, radial basis function, or even a neural network. For the k > 2 class problems, the SVM method as originally proposed required the construction of a twoclass SVM to separate each class from the remaining classes. Similarily, k twoclass linear programs can be used for the multiclass problem. We performed an empirical study of the original LP method, the proposed k LP method, the proposed single QP method and the original k QP methods. We discuss the advantages and disadvantages of each approach. 1 1
Mathematical Programming in Data Mining
 Data Mining and Knowledge Discovery
, 1996
"... Mathematical programming approaches to three fundamental problems will be described: feature selection, clustering and robust representation. The feature selection problem considered is that of discriminating between two sets while recognizing irrelevant and redundant features and suppressing them. ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
(Show Context)
Mathematical programming approaches to three fundamental problems will be described: feature selection, clustering and robust representation. The feature selection problem considered is that of discriminating between two sets while recognizing irrelevant and redundant features and suppressing them. This creates a lean model that often generalizes better to new unseen data. Computational results on real data confirm improved generalization of leaner models. Clustering is exemplified by the unsupervised learning of patterns and clusters that may exist in a given database and is a useful tool for knowledge discovery in databases (KDD). A mathematical programming formulation of this problem is proposed that is theoretically justifiable and computationally implementable in a finite number of steps. A resulting kMedian Algorithm is utilized to discover very useful survival curves for breast cancer patients from a medical database. Robust representation is concerned with minimizing trained m...
Optimization Approaches to SemiSupervised Learning
, 2000
"... We examine mathematical models for semisupervised support vector machines (S VM). Given a training set of labeled data and a working set of unlabeled data, S VM constructs a support vector machine using both the training and working sets. We use S VM to solve the transductive inference problem pose ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
We examine mathematical models for semisupervised support vector machines (S VM). Given a training set of labeled data and a working set of unlabeled data, S VM constructs a support vector machine using both the training and working sets. We use S VM to solve the transductive inference problem posed by Vapnik. In transduction, the task is to estimate the value of a classification function at the given points in the working set. This contrasts with inductive inference which estimates the classification function at all possible values. We propose a general S VM model that minimizes both the misclassification error and the function capacity based on all the available data. Depending on how poorlyestimated unlabeled data are penalized, different mathematical models result. We examine several practical algorithms for solving these model. The first approach utilizes the S VM model for 1norm linear support vector machines converted to a mixedinteger program (MIP). A global solution of the ...
Feature Minimization within Decision Trees
 Computational Optimization and Applications
, 1996
"... Decision trees for classification can be constructed using mathematical programming. Within decision tree algorithms, the feature minimization problem is to construct accurate decisions using as few features or attributes within each decision as possible. Feature minimization is an important aspe ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Decision trees for classification can be constructed using mathematical programming. Within decision tree algorithms, the feature minimization problem is to construct accurate decisions using as few features or attributes within each decision as possible. Feature minimization is an important aspect of data mining since it helps identify what attributes are important and helps produce accurate and interpretable decision trees. In feature minimization with bounded accuracy, we minimize the number of features using a given misclassification error tolerance. This problem can be formulated as a parametric bilinear program and is shown to be NPcomplete. A parametric FrankWolfe method is used to solve the bilinear subproblems. The resulting minimization algorithm produces more compact, accurate, and interpretable trees. This procedure can be applied to many di#erent error functions. Formulations and results for two error functions are given. One method, FM RLPP, dramatically reduced the number of features of one dataset from 147 to 2 while maintaining an 83.6% testing accuracy. Computational results compare favorably with the standard univariate decision tree method, C4.5, as well as with linear programming methods of tree construction. Key Words: Data mining, machine learning, feature minimization, decision trees, bilinear programming. # Knowledge Discovery and Data Mining Group, Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180. Email bredee@rpi.edu, bennek@rpi.edu. Telephone (518) 2766899. FAX (518) 2764824. This material is based on research supported by National Science Foundation Grant 949427. 1
On support vector decision trees for database marketing
 Department of Mathematical Sciences Math Report No. 98100, Rensselaer Polytechnic Institute
, 1998
"... We introduce a support vector decision tree method for customer targeting in the framework of large databases (database marketing). The goal is to provide a tool to identify the best customers based on historical data. Then this tool is used to forecast the best potential customers among a pool of p ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
We introduce a support vector decision tree method for customer targeting in the framework of large databases (database marketing). The goal is to provide a tool to identify the best customers based on historical data. Then this tool is used to forecast the best potential customers among a pool of prospects. We begin by recursively constructing a decision tree. Each decision consists of a linear combination of independent attributes. A linear program motivated by the support vector machine method from Vapnik’s Statistical Learning Theory is used to construct each decision. This linear program automatically selects the relevant subset of attributes for each decision. Each customer is scored based on the decision tree. A gainschart table is used to verify the goodness of fit of the targeting, to determine the likely prospects and the expected utility or profit. Successful results are given for three industrial problems. The method consistently pro
Support Vector Machines for Linear Programming: Motivation and Formulations
, 1999
"... In this paper we introduce two formulations for training support vector machines using linear programming. The formulations are based on considering the L1 and L1 norms instead of the currently used L2 norm, and maximising the margin between the separating hyperplane and the two data sets using L1 a ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
In this paper we introduce two formulations for training support vector machines using linear programming. The formulations are based on considering the L1 and L1 norms instead of the currently used L2 norm, and maximising the margin between the separating hyperplane and the two data sets using L1 and L1 distances. The fact that the separation problem becomes solvable by a standard linear programming software opens a wide range of applications, for example cases where the training speed is crucial, or the data sets are too large for quadratic programming software to be used. We report results obtained for some standard benchmark problems, and compare them with those obtained by other methods, conrming that the performance of all the formulations is similar. 1 1 Introduction Linear programming approaches to support vector machines have recently been addressed [1, 3, 10]. Their main advantage concerns the possibility of using solvers for linear problems, with improved reliability and...