Results 1  10
of
10
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little u ..."
Abstract

Cited by 2029 (128 self)
 Add to MetaCart
Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for realtime applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst case loss bounds and moreover we show the convergence of the hypothesis to the minimiser of the regularised risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection. In addition
Generalization Bounds via Eigenvalues of the Gram Matrix
, 1999
"... Model selection in Support Vector machines is usually carried out by minimizing the quotient of the radius of the smallest enclosing sphere of the data and the observed margin on the training set. We provide a new criterion taking the distribution within that sphere into account by considering the G ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
Model selection in Support Vector machines is usually carried out by minimizing the quotient of the radius of the smallest enclosing sphere of the data and the observed margin on the training set. We provide a new criterion taking the distribution within that sphere into account by considering the Gram matrix of the data. In particular, this makes use of the eigenvalue distribution of the matrix. Experimental results on real world data show that this new criterion provides a good prediction of the shape of the curve relating generalization error to kernel width.
Entropy Numbers, Operators and Support Vector Kernels
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1998
"... We derive new bounds for the generalization error of feature space machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs are based on a viewpoint that is apparently novel in the field of statistical learning theory ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
We derive new bounds for the generalization error of feature space machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs are based on a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinite dimensional unit ball in feature space into a finite dimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence we are able to theoretically explain the effect of the choice of kernel functions on the generalization performance of support vector machines.
Generalization Bounds for Convex Combinations of Kernel Functions
, 1998
"... We derive new bounds on covering numbers for hypothesis classes generated by convex combinations of basis functions. These are useful in bounding the generalization performance of algorithms such as RBFnetworks, boosting and a new class of linear programming machines similar to SV machines. We show ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We derive new bounds on covering numbers for hypothesis classes generated by convex combinations of basis functions. These are useful in bounding the generalization performance of algorithms such as RBFnetworks, boosting and a new class of linear programming machines similar to SV machines. We show that pconvex combinations with p ? 1 lead to diverging bounds, whereas for p = 1 good bounds in terms of entropy numbers can be obtained. In the case of kernel expansions, significantly better bounds can be obtained depending on the eigenvalues of the corresponding integral operators. 1 Introduction It has been shown [13] that good bounds on the generalization error can be obtained in the case of Support Vector (SV) Machines. These carry out regularization in feature space by restricting the weight vector w to lie inside some ball of radius Rw in feature space. Recently new methods have been proposed [3, 11] to compute SV like expansions using linear programming algorithms. The method is ...
Sample Based Generalization Bounds
, 1999
"... It is known that the covering numbers of a function class on a double sample (length 2m, where m is the number of points in the sample) can be used to bound the generalization performance of a classifier by using a margin based analysis. Traditionally this has been done using a "Sauerlike" relation ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
It is known that the covering numbers of a function class on a double sample (length 2m, where m is the number of points in the sample) can be used to bound the generalization performance of a classifier by using a margin based analysis. Traditionally this has been done using a "Sauerlike" relationship involving a combinatorial dimension such as the fatshattering dimension. In this paper we show that one can utilize an analogous argument in terms of the observed covering numbers on a single msample (being the actual observed data points). The significance of this is that for certain interesting classes of functions, such as support vector machines, one can readily estimate the empirical covering numbers quite well. We show how to do so in terms of the eigenvalues of the Gram matrix created from the data. These covering numbers can be much less than a priori bounds indicate in situations where the particular data received is "easy". The work can be considered an extension of previous results which provided generalization performance bounds in terms of the VCdimension of the class of hypotheses restricted to the sample, with the considerable advantage that the covering numbers can be readily computed, and they often are small.
The Entropy Regularization Information Criterion
"... Effective methods of capacity control via uniform convergence bounds for function expansions have been largely limited to Support Vector machines, where good bounds are obtainable by the entropy number approach. We extend these methods to systems with expansions in terms of arbitrary (parametriz ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Effective methods of capacity control via uniform convergence bounds for function expansions have been largely limited to Support Vector machines, where good bounds are obtainable by the entropy number approach. We extend these methods to systems with expansions in terms of arbitrary (parametrized) basis functions and a wide range of regularization methods covering the whole range of general linear additive models. This is achieved by a data dependent analysis of the eigenvalues of the corresponding design matrix.
The Entropy Regularization . . .
"... Effective methods of capacity control via uniform convergence bounds for function expansions have been largely limited to Support Vector machines, where good bounds are obtainable by the entropy number approach. We extend these methods to systems with expansions in terms of arbitrary (parametrized) ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Effective methods of capacity control via uniform convergence bounds for function expansions have been largely limited to Support Vector machines, where good bounds are obtainable by the entropy number approach. We extend these methods to systems with expansions in terms of arbitrary (parametrized) basis functions and a wide range of regularization methods covering the whole range of general linear additive models. This is achieved by a data dependent analysis of the eigenvalues of the corresponding design matrix.
/08/25 16:31
"... the fact that a collection of chapters can never be as homogeneous as a book conceived by a single person. We have tried to compensate for this by the selection and refereeing process of the submissions. In addition, we have written an introductory chapter describing the SV algorithm in some detail ..."
Abstract
 Add to MetaCart
the fact that a collection of chapters can never be as homogeneous as a book conceived by a single person. We have tried to compensate for this by the selection and refereeing process of the submissions. In addition, we have written an introductory chapter describing the SV algorithm in some detail (chapter 1), and added a roadmap (chapter 2) which describes the actual contributions which are to follow in chapters 3 through 20. Bernhard Scholkopf, Christopher J.C. Burges, Alexander J. Smola Berlin, Holmdel, July 1998/08/25 16:31 1 Introduction to Support Vector Learning The goal of this chapter, which describes the central ideas of SV learning, is twofold. First, we want to provide an introduction for readers unfamiliar with this field. Second, this introduction serves as a source of the basic equations for the chapters of this book. For more exhaustive treatments, we refer the interested reader to Vapnik (1995); Scholkopf (1997); Burges (1998). 1.1
Combining Support Vector and Mathematical . . .
 ADVANCES IN KERNEL METHODS  SUPPORT VECTOR LEARNING
, 1998
"... ..."
Produced as part of the ESPRIT Working Group in Neural and Computational Learning II,
, 1998
"... We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use ..."
Abstract
 Add to MetaCart
We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use