## The Effect of the Input Density Distribution on Kernel-based Classifiers (2000)

Venue: | Proceedings of the 17th International Conference on Machine Learning |

Citations: | 49 - 6 self |

### BibTeX

@INPROCEEDINGS{Williams00theeffect,

author = {Christopher Williams and Matthias Seeger},

title = {The Effect of the Input Density Distribution on Kernel-based Classifiers},

booktitle = {Proceedings of the 17th International Conference on Machine Learning},

year = {2000},

pages = {1159--1166},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

The eigenfunction expansion of a kernel function K(x, y) as used in support vector machines or Gaussian process predictors is studied when the input data is drawn from a distribution p(x). In this case it is shown that the eigenfunctions f i g obey the equation K(x, y)p(x) i (x)dx = i i (y). This has a number of consequences including (i) the eigenvalues/vectors of the n × n Gram matrix K obtained by evaluating the kernel at all pairs of training points K(x i , x j ) converge to the eigenvalues and eigenfunctions of the integral equation above as n ! 1 and (ii) the dependence of the eigenfunctions on p(x) may be useful for the class-discrimination task. We show that on a number of datasets using the RBF kernel the eigenvalue spectrum of the Gram matrix decays rapidly, and discuss how this property might be used to speed up kernel-based predictors.

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...matrix decays rapidly, and discuss how this property might be used to speed up kernel-based predictors. 1. Introduction In recent years kernel-based classiers such as support vector machines (SVMs) (V=-=apnik, 199-=-5), Gaussian process classiers (e.g. see Williams & Barber, 1998) and spline methods (Wahba, 1990) have become popular. This paper studies kernel methods from the perspective of an eigenfunction expan... |

1273 |
Spline models for observational data
- Wahba
- 1990
(Show Context)
Citation Context ...tors. 1. Introduction In recent years kernel-based classiers such as support vector machines (SVMs) (Vapnik, 1995), Gaussian process classiers (e.g. see Williams & Barber, 1998) and spline methods (Wa=-=hba, 1990-=-) have become popular. This paper studies kernel methods from the perspective of an eigenfunction expansion which is dependent on the input data density p(x) as well as the kernel function K(x; y). In... |

1048 | Nonlinear component analysis as a kernel eigenvalue problem - Schölkopf, Smola, et al. - 1998 |

1040 | Linear and Nonlinear Programming - Luenberger - 1984 |

185 | Extracting support data for a given task - Schölkopf, Burges, et al. - 1995 |

113 | Input space vs. feature space in kernel-based methods - Schölkopf, Mika, et al. - 1999 |

95 |
The numerical treatment of integral equations
- Baker
- 1977
(Show Context)
Citation Context ...ata points x i , i = 1; : : : ; n. First we observe that Z K(x; y)p(x) i (x)dx ' 1 n n X k=1 K(x k ; y) i (x k ) (5) when the x k 's are sampled from p(x). The standard numerical method (see, e.g., Ba=-=ker, 1977-=-, chapter 3) for approximating the eigenfunctions and eigenvalues of equation 3 is to use a numerical routine such as equation 5 to approximate the integral, and then plug in y = x k for k = 1; : : : ... |

46 | Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers
- Seeger
- 1999
(Show Context)
Citation Context ...rm 800 21 30 3.8 112 14.0 USPS 7291 256 268 3.7 1508 20.7 3.3 The Eigenspectrum of K for Some Test Problems The eigenspectrum of the K matrix was calculated for the six datasets that were studied in (=-=Seeger, 200-=-0) and for the USPS handwritten digit database. This calculation is similar to those carried out in the kernel PCA paper of Scholkopf et al. (1998) but here we focus on the fraction of variance explai... |

23 | Generalization bounds via eigenvalues of the gram matrix - Schölkopf, Shawe-Taylor, et al. - 1999 |

9 | Finite-dimensional approximation of Gaussian processes
- Ferrari-Trecate, Williams, et al.
- 1999
(Show Context)
Citation Context ... and that we observe n noisy samples from this function (with noise variance 2 ). Then the posterior mean ^ i estimated for each i is roughly ^ i i i + 2 n i (10) (e.g., see Ferrari Trecate e=-=t al., 199-=-9). Notice that this is a shrinkage of the true i s. Hence eigenfunctions with i 2 =n are eectively \zeroed-out" of the calculation. Suppose there is only one non-zero (say the j-th). Due to... |

7 |
Regression and classi using Gaussian process priors (with discussion
- Neal
- 1998
(Show Context)
Citation Context ...2) where x is minimum of E(x) and x 0 is the starting value of x. In a Gaussian process classier one usually adds a \jitter" term 2 J I to K to give better conditioning of the covariance matrix=-= (Neal, 1998-=-). If there are many small eigenvalues of the unjittered matrix then b is close to a and the factor (b a=b+a) will be small. 5. Discussion This paper makes a number of contributions to the study of ke... |

2 | Efcient Implementation of Gaussian Processes. Unpublished manuscript - MacKay - 1997 |

2 | Bayesian classi cation with Gaussian processes - Williams - 1998 |

1 | Multivariate integration and approximation of random satisfying Sacks-Ylvisaker conditions - Ritter, Wasilkowski - 1995 |