Results 1  10
of
61
Large Margin Classification Using the Perceptron Algorithm
 Machine Learning
, 1998
"... We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leaveoneout method. Like Vapnik 's maximalmargin classifier, our algorithm takes advantage of data that are linearly separable with large ..."
Abstract

Cited by 428 (1 self)
 Add to MetaCart
(Show Context)
We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leaveoneout method. Like Vapnik 's maximalmargin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much simpler to implement, and much more efficient in terms of computation time. We also show that our algorithm can be efficiently used in very high dimensional spaces using kernel functions. We performed some experiments using our algorithm, and some variants of it, for classifying images of handwritten digits. The performance of our algorithm is close to, but not as good as, the performance of maximalmargin classifiers on the same problem, while saving significantly on computation time and programming effort. 1 Introduction One of the most influential developments in the theory of machine learning in the last few years is Vapnik's work on supp...
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 422 (49 self)
 Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Ultraconservative Online Algorithms for Multiclass Problems
 Journal of Machine Learning Research
, 2001
"... In this paper we study online classification algorithms for multiclass problems in the mistake bound model. The hypotheses we use maintain one prototype vector per class. Given an input instance, a multiclass hypothesis computes a similarityscore between each prototype and the input instance and th ..."
Abstract

Cited by 262 (21 self)
 Add to MetaCart
(Show Context)
In this paper we study online classification algorithms for multiclass problems in the mistake bound model. The hypotheses we use maintain one prototype vector per class. Given an input instance, a multiclass hypothesis computes a similarityscore between each prototype and the input instance and then sets the predicted label to be the index of the prototype achieving the highest similarity. To design and analyze the learning algorithms in this paper we introduce the notion of ultraconservativeness. Ultraconservative algorithms are algorithms that update only the prototypes attaining similarityscores which are higher than the score of the correct label's prototype. We start by describing a family of additive ultraconservative algorithms where each algorithm in the family updates its prototypes by finding a feasible solution for a set of linear constraints that depend on the instantaneous similarityscores. We then discuss a specific online algorithm that seeks a set of prototypes which have a small norm. The resulting algorithm, which we term MIRA (for Margin Infused Relaxed Algorithm) is ultraconservative as well. We derive mistake bounds for all the algorithms and provide further analysis of MIRA using a generalized notion of the margin for multiclass problems.
A fast iterative nearest point algorithm for support vector machine classifier design
 IEEE Transactions on Neural Networks
, 2000
"... Abstract—In this paper we give a new fast iterative algorithm for support vector machine (SVM) classifier design. The basic problem treated is one that does not allow classification violations. The problem is converted to a problem of computing the nearest point between two convex polytopes. The sui ..."
Abstract

Cited by 75 (3 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper we give a new fast iterative algorithm for support vector machine (SVM) classifier design. The basic problem treated is one that does not allow classification violations. The problem is converted to a problem of computing the nearest point between two convex polytopes. The suitability of two classical nearest point algorithms, due to Gilbert, and Mitchell et al., is studied. Ideas from both these algorithms are combined and modified to derive our fast algorithm. For problems which require classification violations to be allowed, the violations are quadratically penalized and an idea due to Cortes and Vapnik and Frieß is used to convert it to a problem in which there are no classification violations. Comparative computational evaluation of our algorithm against powerful SVM methods such as Platt's sequential minimal optimization shows that our algorithm is very competitive. Index Terms—Classification, nearest point algorithm, quadratic programming, support vector machine. I.
Gaussian Processes for Classification: Mean Field Algorithms
 Neural Computation
, 1999
"... We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leaveoneout estimator for the generalization error which is computed wit ..."
Abstract

Cited by 75 (13 self)
 Add to MetaCart
We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leaveoneout estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler `naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leaveoneout estimator for model selection and 2. the builtin leaveoneout estimators are extremely precise when compared to the exact leaveoneout estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach. 1 1
Improving Support Vector Machine Classifiers by Modifying Kernel Functions
 NEURAL NETWORKS
, 1999
"... We propose a method of modifying a kernel function to improve the performance of a support vector machine classifier. This is based on the Riemannian geometrical structure induced by the kernel function. The idea is to enlarge the spatial resolution around the separating boundary surface by a con ..."
Abstract

Cited by 71 (3 self)
 Add to MetaCart
We propose a method of modifying a kernel function to improve the performance of a support vector machine classifier. This is based on the Riemannian geometrical structure induced by the kernel function. The idea is to enlarge the spatial resolution around the separating boundary surface by a conformal mapping such that the separability between classes is increased. Examples are given specifically for modifying Gaussian Radial Basis Function kernels. Simulation results for both artificial and real data show remarkable improvement of generalization errors, supporting our idea.
Playing Billiard in Version Space
, 1997
"... A raytracing method inspired by ergodic billiards is used to estimate the theoretically best decision rule for a given set of linear separable examples. For randomly distributed examples the billiard estimate of the single Perceptron with best average generalization probability agrees with know ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
A raytracing method inspired by ergodic billiards is used to estimate the theoretically best decision rule for a given set of linear separable examples. For randomly distributed examples the billiard estimate of the single Perceptron with best average generalization probability agrees with known analytic results, while for reallife classification problems the generalization probability is consistently enhanced when compared to the maximal stability Perceptron. 1 Introduction Neural networks can be used for both concept learning (classification) and for function interpolation and/or extrapolation. Two basic mathematical methods seem to be particularly adequate for studying neural networks: geometry (especially combinatorial geometry) and probability theory (statistical physics). Geometry is illuminating and probability theory is powerful. In this paper I consider the perhaps simplest neural network, the venerable Perceptron [1]: given a set of examples falling in two classes,...
Regularized Winnow methods
 In Advances in Neural Information Processing Systems 13
, 2001
"... In theory, the Winnow multiplicative update has certain advantages over the Perceptron additive update when there are many irrelevant attributes. Recently, there has been much effort on enhancing the Perceptron algorithm by using regularization, leading to a class of linear classification methods ca ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
In theory, the Winnow multiplicative update has certain advantages over the Perceptron additive update when there are many irrelevant attributes. Recently, there has been much effort on enhancing the Perceptron algorithm by using regularization, leading to a class of linear classification methods called support vector machines. Similarly, it is also possible to apply the regularization idea to the Winnow algorithm, which gives methods we call regularized Winnows. We show that the resulting methods compare with the basic Winnows in a similar way that a support vector machine compares with the Perceptron. We investigate algorithmic issues and learning properties of the derived methods. Some experimental results will also be provided to illustrate different methods. 1
Gaussian Process Classification and SVM: Mean Field Results and LeaveOneOut Estimator
"... In this chapter, we elaborate on the wellknown relationship between Gaussian Processes (GP) and Support Vector Machines (SVM). Secondly, we present approximate solutions for two computational problems arising in GP and SVM. The first one is the calculation of the posterior mean for GP classifiers u ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
In this chapter, we elaborate on the wellknown relationship between Gaussian Processes (GP) and Support Vector Machines (SVM). Secondly, we present approximate solutions for two computational problems arising in GP and SVM. The first one is the calculation of the posterior mean for GP classifiers using a `naive' mean field approach. The second one is a leaveoneout estimator for the generalization error of SVM based on a linear response method. Simulation results on a benchmark dataset show similar performances for the GP mean field algorithm and the SVM algorithm. The approximate leaveoneout estimator is found to be in very good agreement with the exact leaveoneout error. 1 Introduction It is wellknown that Gaussian Processes (GP) and Support Vector Machines (SVM) are closely related, see eg. [7]. Both approaches are nonparametric. This means that they allow for infinitely many parameters to be tuned, but increasing with the amount of data, only a finite number of them are a...
Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance
 PERFORMANCE, SUPPORT VECTOR MACHINES: THEORY AND APPLICATIONS, SPRINGERVERLAG,.STUDIES IN FUZZINESS AND SOFT COMPUTING
, 2005
"... The chapter introduces the latest developments and results of Iterative Single Data Algorithm (ISDA) for solving largescale support vector machines (SVMs) problems. First, the equality of a Kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and the Sequential Minimal ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
The chapter introduces the latest developments and results of Iterative Single Data Algorithm (ISDA) for solving largescale support vector machines (SVMs) problems. First, the equality of a Kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and the Sequential Minimal Optimization (SMO) learning algorithm (based on an analytic quadratic programming step for a model without bias term b) in designing SVMs with positive definite kernels is shown for both the nonlinear classification and the nonlinear regression tasks. The chapter also introduces the classic GaussSeidel (GS) procedure and its derivative known as the successive overrelaxation (SOR) algorithm as viable (and usually faster) training algorithms. The convergence theorem for these related iterative algorithms is proven. The second part of the chapter presents the effects and the methods of incorporating explicit bias term b into the ISDA. The algorithms shown here implement the single training data based iteration routine (a.k.a. perpattern learning). This makes the proposed ISDAs remarkably quick. The final solution in a dual domain is not an approximate one, but it is the optimal set of dual variables which would have been obtained by using any of existing and proven QP problem solvers if they only could deal with huge data sets.