Results 1  10
of
84
A training algorithm for optimal margin classifiers
 PROCEEDINGS OF THE 5TH ANNUAL ACM WORKSHOP ON COMPUTATIONAL LEARNING THEORY
, 1992
"... A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of classifiaction functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjust ..."
Abstract

Cited by 1279 (44 self)
 Add to MetaCart
A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of classifiaction functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leaveoneout method and the VCdimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a metho ..."
Abstract

Cited by 501 (32 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
Large Margin Classification Using the Perceptron Algorithm
 Machine Learning
, 1998
"... We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leaveoneout method. Like Vapnik 's maximalmargin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compa ..."
Abstract

Cited by 415 (1 self)
 Add to MetaCart
We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leaveoneout method. Like Vapnik 's maximalmargin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much simpler to implement, and much more efficient in terms of computation time. We also show that our algorithm can be efficiently used in very high dimensional spaces using kernel functions. We performed some experiments using our algorithm, and some variants of it, for classifying images of handwritten digits. The performance of our algorithm is close to, but not as good as, the performance of maximalmargin classifiers on the same problem, while saving significantly on computation time and programming effort. 1 Introduction One of the most influential developments in the theory of machine learning in the last few years is Vapnik's work on supp...
Semisupervised support vector machines
 Advances in Neural Information Processing Systems
, 1998
"... We introduce a semisupervised support vector machine (S 3 VM) method. Given a training set of labeled data and a working set of unlabeled data, S 3 VM constructs a support vector machine using both the training and working sets. We use S 3 VM to solve the transduction problem using overall risk min ..."
Abstract

Cited by 173 (7 self)
 Add to MetaCart
We introduce a semisupervised support vector machine (S 3 VM) method. Given a training set of labeled data and a working set of unlabeled data, S 3 VM constructs a support vector machine using both the training and working sets. We use S 3 VM to solve the transduction problem using overall risk minimization (ORM) posed by Vapnik. The transduction problem is to estimate the value of a classification function at the given points in the working set. This contrasts with the standard inductive learning problem of estimating the classification function at all possible values and then using the fixed function to deduce the classes of the working set data. We propose a general S 3 VM model that minimizes both the misclassification error and the function capacity based on all the available data. We show how the S 3 VM model for 1norm linear support vector machines can be converted to a mixedinteger program and then solved exactly using integer programming. Results of S 3 VM and the standard 1norm support vector machine approach are compared on eleven data sets. Our computational results support the statistical learning theory results showing that incorporating working data improves generalization when insufficient training information is available. In every case, S 3 VM either improved or showed no significant difference in generalization compared to the traditional approach.
Optimal aggregation of classifiers in statistical learning
 Ann. Statist
, 2004
"... Classification can be considered as nonparametric estimation of sets, where the risk is defined by means of a specific distance between sets associated with misclassification error. It is shown that the rates of convergence of classifiers depend on two parameters: the complexity of the class of cand ..."
Abstract

Cited by 153 (5 self)
 Add to MetaCart
Classification can be considered as nonparametric estimation of sets, where the risk is defined by means of a specific distance between sets associated with misclassification error. It is shown that the rates of convergence of classifiers depend on two parameters: the complexity of the class of candidate sets and the margin parameter. The dependence is explicitly given, indicating that optimal fast rates approaching O(n−1) can be attained, where n is the sample size, and that the proposed classifiers have the property of robustness to the margin. The main result of the paper concerns optimal aggregation of classifiers: we suggest a classifier that automatically adapts both to the complexity and to the margin, and attains the optimal fast rates, up to a logarithmic factor. 1. Introduction. Let (Xi,Yi)
Multicategory Classification by Support Vector Machines
 Computational Optimizations and Applications
, 1999
"... We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how twoclass discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programm ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how twoclass discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programming (QP) approaches based on Vapnik's Support Vector Machines (SVM) can be combined to yield two new approaches to the multiclass problem. In LP multiclass discrimination, a single linear program is used to construct a piecewise linear classification function. In our proposed multiclass SVM method, a single quadratic program is used to construct a piecewise nonlinear classification function. Each piece of this function can take the form of a polynomial, radial basis function, or even a neural network. For the k > 2 class problems, the SVM method as originally proposed required the construction of a twoclass SVM to separate each class from the remaining classes. Similarily, k twoclass linear programs can be used for the multiclass problem. We performed an empirical study of the original LP method, the proposed k LP method, the proposed single QP method and the original k QP methods. We discuss the advantages and disadvantages of each approach. 1 1
Introduction to Statistical Learning Theory
 In , O. Bousquet, U.v. Luxburg, and G. Rsch (Editors
, 2004
"... ..."
Risk bounds for Statistical Learning
"... We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi…cation framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weig ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi…cation framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with other ways of measuring the ”size”of a class of classi…ers than entropy with bracketing as in Tsybakov’s work. In particular we derive new risk bounds for the ERM when the classi…cation rules belong to some VCclass under margin conditions and discuss the optimality of those bounds in a minimax sense.
Using sample size to limit exposure to data mining
 Journal of Computer Security
"... Data mining introduces new problems in database security. The basic problem of using nonsensitive data to infer sensitive data is made more difficult by the “probabilistic” inferences possible with data mining. This paper shows how lower bounds from pattern recognition theory can be used to determi ..."
Abstract

Cited by 38 (8 self)
 Add to MetaCart
Data mining introduces new problems in database security. The basic problem of using nonsensitive data to infer sensitive data is made more difficult by the “probabilistic” inferences possible with data mining. This paper shows how lower bounds from pattern recognition theory can be used to determine sample sizes where data mining tools cannot obtain reliable results. 1