Results 1  10
of
12
Lagrangian Support Vector Machines
, 2000
"... An implicit Lagrangian for the dual of a simple reformulation of the standard quadratic program of a linear support vector machine is proposed. This leads to the minimization of an unconstrained differentiable convex function in a space of dimensionality equal to the number of classified points. Thi ..."
Abstract

Cited by 86 (11 self)
 Add to MetaCart
An implicit Lagrangian for the dual of a simple reformulation of the standard quadratic program of a linear support vector machine is proposed. This leads to the minimization of an unconstrained differentiable convex function in a space of dimensionality equal to the number of classified points. This problem is solvable by an extremely simple linearly convergent Lagrangian support vector machine (LSVM) algorithm. LSVM requires the inversion at the outset of a single matrix of the order of the much smaller dimensionality of the original input space plus one. The full algorithm is given in this paper in 11 lines of MATLAB code without any special optimization tools such as linear or quadratic programming solvers. This LSVM code can be used "as is" to solve classification problems with millions of points. For example, 2 million points in 10 dimensional input space were classified by a linear surface in 82 minutes on a Pentium III 500 MHz notebook with 384 megabytes of memory (and additional swap space), and in 7 minutes on a 250 MHz UltraSPARC II processor with 2 gigabytes of memory. Other standard classification test problems were also solved. Nonlinear kernel classification can also be solved by LSVM. Although it does not scale up to very large problems, it can handle any positive semidefinite kernel and is guaranteed to converge.
Data Selection for Support Vector Machine Classifiers
 In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... The problem of extracting a minimal number of data points from a large dataset, in order to generate a support vector machine (SVM) classifier, is formulated as a concave min imization problem and solved by a finite number of linear programs. This minimal set of data points, which is the smallest n ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
The problem of extracting a minimal number of data points from a large dataset, in order to generate a support vector machine (SVM) classifier, is formulated as a concave min imization problem and solved by a finite number of linear programs. This minimal set of data points, which is the smallest number of support vectors that completely characterize a separating plane classifier, is considerably smaller than that required by a standard 1norm support vector machine with or without feature selection. The proposed approach also incorporates a feature selection procedure that results in a minimal number of input features used by the classifier. Tenfold cross validation gives as good or better test results using the proposed minimal support vector ma chine (MSVM) classifier based on the smaller set of data points compared to a standard 1norm support vector machine classifier. The reduction in data points used by an MSVM classifier over those used by a 1norm SVM classifier averaged 66% on seven public datasets and was as high as 81%. This makes MSVM a useful incremental classification tool which maintains only a small fraction of a large dataset before merging and processing it with new incoming data.
Robust Linear and Support Vector Regression
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... ..."
Optimization Methods In Massive Datasets
"... We describe the role of generalized support vector machines in separating massive and complex data using arbitrary nonlinear kernels. Feature selection that improves generalization is implemented via an effective procedure that utilizes a polyhedral norm or a concave function minimization. Massive d ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We describe the role of generalized support vector machines in separating massive and complex data using arbitrary nonlinear kernels. Feature selection that improves generalization is implemented via an effective procedure that utilizes a polyhedral norm or a concave function minimization. Massive data is separated using a linear programming chunking algorithm as well as a successive overrelaxation algorithm, each of which is capable of processing data with millions of points. 1 2 1. INTRODUCTION We address here the problem of classifying data in ndimensional real (Euclidean) space R n into one of two disjoint nite point sets (i.e. classes). The support vector machine (SVM) approach to classication [57, 2, 25, 58, 13, 54, 55] attempts to separate points belonging to two given sets in R n by a nonlinear surface, often only implicitly dened by a kernel function. Since the nonlinear surface in R n is typically linear in its parameters, it can be represented as a linear func...
Breast Cancer Survival and Chemotherapy: A Support Vector Machine Analysis
 Data Mining Institute, Computer Sciences Department, University of Wisconsin
, 2000
"... A linear support vector machine (SVM) is used to extract 6 fea tures from a total of 31 features in a dataset of 253 breast cancer patients. Five features are nuclear features obtained during a noninvasive diagnostic procedure while one feature, tumor size, is obtained during surgery. The linear S ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
A linear support vector machine (SVM) is used to extract 6 fea tures from a total of 31 features in a dataset of 253 breast cancer patients. Five features are nuclear features obtained during a noninvasive diagnostic procedure while one feature, tumor size, is obtained during surgery. The linear SVM selected the 6 features in the process of classifying the patients into nodepositive (patients with some metastasized lymph nodes) and nodenegative (patients with no metastasized lymph nodes). Nodepositive patients are typically those who undergo chemotherapy. The 6 features were then used in a Gaussian kernel nonlinear SVM to classify the patients into three prog nostic groups: good (nodenegative), intermediate (1 to 4 metastasized nodes) and poor (more than 4 metastasized nodes). Very well separated KaplanMeier survival curves were constructed for the three groups with pairwise pvalue of less than 0.009 based on the logrank statistic. Patients in the good prognostic group had the highest survival, while patients in the poor prognostic group had the lowest. The majority (72.8%) of the good group did not receive chemotherapy, while the majority (87.5%) of the poor group received chemotherapy. Just over half (56.7%) of the intermediate group received chemotherapy. New patients can be assigned to one of these three prognostic groups with its associated survival curve, based only on 6 features obtained before and during surgery, but without the potentially risky procedure of removing lymph nodes to determine how many of them have metastasized.
Data Mining via Support Vector Machines
 IFIP Conference on System Modelling and Optimization
, 2001
"... Support vector machines (SVMs) have played a key role in broad classes of problems arising in various fields. Much more recently, SVMs have become the tool of choice for problems arising in data classification and mining. This paper emphasizes some recent developments that the author and his colleag ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Support vector machines (SVMs) have played a key role in broad classes of problems arising in various fields. Much more recently, SVMs have become the tool of choice for problems arising in data classification and mining. This paper emphasizes some recent developments that the author and his colleagues have contributed to such as: generalized SVMs (a very general mathematical programming framework for SVMs), smooth SVMs (a smooth nonlinear equation representation of SVMs solvable by a fast Newton method), Lagrangian SVMs (an unconstrained Lagrangian representation of SVMs leading to an extremely simple iterative scheme capable of solving classification problems with millions of points) and reduced SVMs (a rectangular kernel classifier that utilizes as little as 1% of the data).
Kernel rewards regression: An information efficient batch policy iteration approach
 In Proc. of the IASTED Conference on Artificial Intelligence and Applications
, 2006
"... We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Learning on continuous state domains. Our method is able to obtain very useful policies observing just a few state action transitions. It considers the Reinforcement Learning problem as a regression tas ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Learning on continuous state domains. Our method is able to obtain very useful policies observing just a few state action transitions. It considers the Reinforcement Learning problem as a regression task for which any appropriate technique may be applied. The use of kernel methods, e.g. the Support Vector Machine, enables the user to incorporate different types of structural prior knowledge about the state space by redefining the inner product. Furthermore KRR is a completely Offpolicy method. The observations may be constructed by any sufficiently exploring policy, even the fully random one. We tested the algorithm on three typical Reinforcement Learning benchmarks. Moreover we give a proof for the correctness of our model and an error bound for estimating the Qfunctions.
Data Mining Via Mathematical Programming And Machine Learning
, 2000
"... This work explores solving largescale data mining problems through the use of mathe matical programming methods. In particular, algorithms are proposed for the support vector machine (SVM) classification problem, which consists of constructing a separating surface that can discriminate between poi ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This work explores solving largescale data mining problems through the use of mathe matical programming methods. In particular, algorithms are proposed for the support vector machine (SVM) classification problem, which consists of constructing a separating surface that can discriminate between points from one of two classes. An algorithm based on successive overrelaxation (SOR) is presented which can process very large datasets that need not reside in memory. Concepts from generalized SVMs are combined with SOR and with linear programming to find nonlinear separating surfaces. An "active set" strategy is used to generate a fast algorithm that consists of solving a finite num ber of linear equations of the order of the dimensionality of the original input space at each step. This ASVM active set algorithm requires no specialized quadratic or linear programming code, but merely a linear equation solver which is publicly available. An implicit Lagrangian for the dual of an SVM is used to lead to the simple linearly conver gent Lagrangian SVM (LSVM) algorithm. LSVM requires the inversion at the outset of a single (typically small) matrix, and the full algorithm is given in 11 lines of MATLAB code.
Regularization Approaches in Learning Theory
, 2006
"... Learning from examples can be seen as a very general framework for modeling a variety of different statistical inference problems. Such statistical problems are at the basis of the design of programs which are trained, instead of programmed, to perform a task. In particular supervised learning aim ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Learning from examples can be seen as a very general framework for modeling a variety of different statistical inference problems. Such statistical problems are at the basis of the design of programs which are trained, instead of programmed, to perform a task. In particular supervised learning aims at finding an unknown inputoutput relation given a (possibly small) number of inputoutput instances (the examples). The main goal in this setting is not to describe the available data but to predict the output when a new input is given, that is to be able to generalize. A learning algorithm should be able to avoid overfitting the data that is to overestimate the importance of the available information loosing generalization properties. Regularization Theory was originally developed and formalized as a way to find stable solutions to illposed problems. Eventually some regularization techniques became popular in the context of machine learning as an effective way to avoid