## Classes of kernels for machine learning: a statistics perspective (2001)

Venue: | Journal of Machine Learning Research |

Citations: | 57 - 2 self |

### BibTeX

@ARTICLE{Genton01classesof,

author = {Marc G. Genton and Nello Cristianini and John Shawe-taylor and Robert Williamson},

title = {Classes of kernels for machine learning: a statistics perspective},

journal = {Journal of Machine Learning Research},

year = {2001},

volume = {2},

pages = {299--312}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper, we present classes of kernels for machine learning from a statistics perspective. Indeed, kernels are positive definite functions and thus also covariances. After discussing key properties of kernels, as well as a new formula to construct kernels, we present several important classes of kernels: anisotropic stationary kernels, isotropic stationary kernels, compactly supportedkernels, locally stationary kernels, nonstationary kernels, andseparable nonstationary kernels. Compactly supportedkernels andseparable nonstationary kernels are of prime interest because they provide a computational reduction for kernelbased methods. We describe the spectral representation of the various classes of kernels and conclude with a discussion on the characterization of nonlinear maps that reduce nonstationary kernels to either stationarity or local stationarity.

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...vedconsiderable attention. The main reason is that kernels allow to map the data into a high dimensional feature space in order to increase the computational power of linear machines (see for example =-=Vapnik, 1995-=-, 1998, Cristianini andShawe-Taylor, 2000). Thus, it is a way of extending linear hypotheses to nonlinear ones, andthis step can be performedimplicitly. Support vector machines, kernel principal compo... |

1552 |
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambrige
- Cristianini, Shawe-Taylor
- 2000
(Show Context)
Citation Context ... reason is that kernels allow to map the data into a high dimensional feature space in order to increase the computational power of linear machines (see for example Vapnik, 1995, 1998, Cristianini and=-=Shawe-Taylor, 2000-=-). Thus, it is a way of extending linear hypotheses to nonlinear ones, andthis step can be performedimplicitly. Support vector machines, kernel principal component analysis, kernel Gram-Schmidt, Bayes... |

1110 |
Statistics for Spatial Data
- Cressie
- 1993
(Show Context)
Citation Context ...he direction and the length of the lag vector. The assumption of stationarity has been extensively usedin time series (see for example Brockwell andDavis, 1991) andspatial statistics (see for example =-=Cressie, 1993-=-) because it allows for inference on K basedon all pairs of examples separatedby the same lag vector. Many stationary kernels can be constructedfrom their spectral representation derived by Bochner (1... |

777 | Theory of reproducing kernels - Aronszajn - 1950 |

647 |
Time Series: Theory and Methods
- Brockwell, Davis
- 1991
(Show Context)
Citation Context ...rnel, in order to emphasize the dependence on both the direction and the length of the lag vector. The assumption of stationarity has been extensively usedin time series (see for example Brockwell and=-=Davis, 1991-=-) andspatial statistics (see for example Cressie, 1993) because it allows for inference on K basedon all pairs of examples separatedby the same lag vector. Many stationary kernels can be constructedfr... |

494 | Fractional Brownian motions, fractional noises and applications - Mandelbrot, Ness - 1968 |

255 |
Interpolation of spatial data: some theory for kriging
- Stein
- 1999
(Show Context)
Citation Context ...1−|x − z|) + /2 is positive definite in R 1 but not in R 2 , see Cressie (1993, p.84) for a counterexample. It is interesting to remark from (11) that an isotropic stationary kernel has a lower bound(=-=Stein, 1999-=-): thus yielding: KI(�x − z�)/KI(0) ≥ inf x≥0 Ωd(x), KI(�x − z�)/KI(0) ≥ −1 in R 1 KI(�x − z�)/KI(0) ≥ −0.403 in R 2 KI(�x − z�)/KI(0) ≥ −0.218 in R 3 KI(�x − z�)/KI(0) ≥ 0 in R ∞ . The isotropic stat... |

237 |
Kronecker Products and Matrix Calculus: with Applications
- Graham
- 1981
(Show Context)
Citation Context ...s of Kernels for Machine Learning kernels possess the property that their Gram matrix G, whose ij-th element is Gij = K(xi, xj), can be written as a tensor product (also called Kronecker product, see =-=Graham, 1981-=-) of two vectors defined by K1 and K2 respectively. This is especially useful to reduce computational burden when dealing with massive data sets. For instance, consider a set of l examples x1,...,xl. ... |

108 |
Nonparametric estimation of nonstationary spatial covariance structure
- Sampson, Guttorp
- 1992
(Show Context)
Citation Context ... the characterization of nonlinear maps that reduce nonstationary kernels to either stationarity or local stationarity. The main idea is to find a new feature space where stationarity (see Sampson and=-=Guttorp, 1992-=-) or local stationarity (see Genton andPerrin, 2001) can be achieved. We say that a nonstationary kernel K(x, z) is stationary reducible if there exist a bijective deformation Φ such that: K(x, z) =K ... |

65 |
Harmonic Analysis and the theory of probability
- Bochner
- 1955
(Show Context)
Citation Context ...aidto be isotropic (or homogeneous), andis thus only a function of distance: K(x, z) =KI(�x − z�). The spectral representation of isotropic stationary kernels has been derived from Bochner’s theorem (=-=Bochner, 1955-=-) by Yaglom (1957): where KI(�x − z�) = Ωd(x) = � ∞ 0 � � Ωd ω�x − z� F (dω), (11) � � (d−2)/2 � � 2 d Γ J x 2 (d−2)/2(x), form a basis for functions in R d . Here F is any nondecreasing bounded funct... |

60 |
The intrinsic random functions and their applications
- Matheron
- 1973
(Show Context)
Citation Context ...rest are nonstationary kernels obtainedfrom (19) with ω1 = ω2 but with a spectral density that is not integrable in a neighborhoodaroundthe origin. Such kernels are referred to as generalizedkernels (=-=Matheron, 1973-=-). For instance, the Brownian motion generalized kernel corresponds to a spectral density f(ω) =1/�ω� 2 (Mandelbrot and Van Ness, 1968). A particular family of nonstationary kernels is the one of sepa... |

46 |
Spatial Variation
- Matérn
- 1986
(Show Context)
Citation Context ...ue for the exponential kernel. The rational quadratic, Gaussian, and wave kernels have a parabolic behavior at the origin. This indicates a different degree of smoothness. Finally, the Matérn kernel (=-=Matérn, 1960-=-) has recently received considerable attention, because it allows to control the smoothness with a parameter ν. The Matérn kernel is defined by: 1 KI(�x − z�)/KI(0) = 2ν−1 � √ �ν � √ � 2 ν�x − z� 2 ν�... |

42 | Classes of nonseparable, spatio-temporal stationary covariance functions - Cressie, Huang - 1999 |

32 | Modern spatiotemporal geostatistics - Christakos - 2000 |

21 |
Locally stationary random processes
- Silverman
(Show Context)
Citation Context ...ult will not be positive definite in general. 3. Locally Stationary Kernels A simple departure from the stationary kernels discussed in the previous section is provided by locally stationary kernels (=-=Silverman, 1957-=-, 1959): � x + z � K(x, z) =K1 K2(x − z), (13) 2 where K1 is a nonnegative function and K2 is a stationary kernel. Note that if K1 is a positive constant, then (13) reduces to a stationary kernel. Thu... |

12 | Some classes of random fields in n-dimensional space, related to stationary random processes, Theory Probab - Yaglom - 1957 |

8 | Reducing non-stationary stochastic processes to stationarity by a time deformation - PERRIN, SENOUSSI - 1999 |

5 | Reducing non-stationary random fields to stationarity and isotropy using a space deformation - Perrin, Senoussi - 2000 |

4 | Fonctions aléatoires du second ordre - Loève - 1946 |

3 | Norm-dependent covariance permissibility of weakly homogeneous spatial random fields - Christakos, Papanicolaou - 2000 |

2 | stationary covariance functions for space-time data - Nonseparable |

1 | On the problem of permissible covariance andvariogram models - Christakos - 1984 |

1 | On a time deformation reducing nonstationary stochastic processes to local stationarity - Genton, Perrin - 2001 |

1 | Compactly supportedcorrelation functions - Gneiting |

1 |
Fonctions aléatoires àdécomposition orthogonale exponentielle
- Loève
- 1946
(Show Context)
Citation Context ...t stationary kernels are locally stationary. Another special class of locally stationary kernels is defined by kernels of the form: K(x, z) =K1(x + z), (17) the so-calledexponentially convex kernels (=-=Loève, 1946-=-, 1948). From (16), we see immediately that K1(x + z) ≥ 0. Actually, as notedby Loève, any two-sided Laplace transform of a nonnegative function is an exponentially convex kernel. A large class of loc... |

1 | Functions of positive andnegative type andtheir connection with the theory of integral equations - Mercer - 1909 |

1 | Metric spaces andcompletely monotone functions - Schoenberg - 1938 |

1 | A matching theorem for locally stationary random processes - Silverman - 1959 |