## Diffusion kernels on graphs and other discrete structures (2002)

### Cached

### Download Links

Venue: | In Proceedings of the ICML |

Citations: | 128 - 4 self |

### BibTeX

@INPROCEEDINGS{Kondor02diffusionkernels,

author = {Risi Imre Kondor and John Lafferty},

title = {Diffusion kernels on graphs and other discrete structures},

booktitle = {In Proceedings of the ICML},

year = {2002},

pages = {315--322}

}

### Years of Citing Articles

### OpenURL

### Abstract

The application of kernel-based learning algorithms has, so far, largely been confined to realvalued data and a few special data types, such as strings. In this paper we propose a general method of constructing natural families of kernels over discrete structures, based on the matrix exponentiation idea. In particular, we focus on generating kernels on graphs, for which we propose a special class of exponential kernels, based on the heat equation, called diffusion kernels, and show that these can be regarded as the discretisation of the familiar Gaussian kernel of Euclidean space.

### Citations

2294 | A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery
- Burges
- 1998
(Show Context)
Citation Context ... can be regarded as the discretization of the familiar Gaussian kernel of Euclidean space. 1. Introduction Kernel-based algorithms, such as Gaussian processes (Mackay, 1997), support vector machines (=-=Burges, 1998-=-), and kernel PCA (Mika et al., 1998), are enjoying great popularity in the statistical learning community. The common idea behind these methods is to express our prior beliefs about the correlations,... |

2044 | Online learning with kernels
- Kivinen, Smola, et al.
- 2004
(Show Context)
Citation Context ...¤¦ ¨§ ©£� function , and thereby to implicitly construct a ��¢s�§ ©���� mapping to a Hilbert space ��� , in which the kernel appears as the inner product, (1) ¡�������������������������������������� (=-=Schölkopf & Smola, 2001-=-). With respect to a basis � � of , each datapoint then splits into (a possibly infinite number of) independent features, a property which can be exploited to great effect. Graph-like structures occur... |

1706 | Text categorization with support vector machines: Learning with many relevant features
- Joachims
- 1998
(Show Context)
Citation Context ...rinsically discrete data spaces, the usual approach has been either to map the data to Euclidean space first (as is commonly done in text classification, treating integer word counts as real numbers (=-=Joachims, 1998-=-)) or, when no such simple mapping is forthcoming, to forgo using kernel methods altogether. A notable exception to this is the line of work stemming from the convolution kernel idea introduced in (Ha... |

1461 |
An Introduction to Probability Theory
- Feller
- 1971
(Show Context)
Citation Context ... form continuous families fK() =K(1) g, indexed by a real parameters, and are related to infinitely divisible probability distributions, which are the limits of sums of independent random variables (F=-=eller, 1-=-971). The tautology K() = [K(1)s=n ] n becomes, as n goes to infinity, K = lim n!1 1 + n dK ds=0 n ; which is equivalent to (3) for H = dK dsjs=0 . The above already suggests looking for kernels ove... |

831 | Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
- Durbin, Eddy, et al.
- 1998
(Show Context)
Citation Context ...k i+1;j ) + i+1;j+1 k i;j where i;j = ( 1 if x i+1 = x 0 j+1 B otherwise. For the derivation of recursive formul such as this, and comparison to other measures of similarity between strings, see (Du=-=rbin et al., 1998-=-). 6. Experiments on UCI datasets In this section we describe preliminary experiments with diffusion kernels, focusing on the use of kernel-based methods for classifying categorical data. For such pro... |

740 | Laplacian eigenmaps for dimensionality reduction and data representation
- Belkin, Niyogi
- 2003
(Show Context)
Citation Context ...g to one of our original objects. Finally, adjacency graphs are sometimes used when data is expected to be confined to a manifold of lower dimensionality than the original space (Saul & Roweis, 2001; =-=Belkin & Niyogi, 2001-=-) and (Szummer & Jaakkola, 2002). In all of these cases, the challenge is to capture in the kernel ¡ the local and global structure of the graph. In addition to adequately expressing the known or hypo... |

413 | Large margin classification using the perceptron algorithm
- Freund, Schapire
- 1999
(Show Context)
Citation Context ...rcube, as described in Section 4.4, can result in good performance for such data. For ease of experimentation we use a large margin classifier based on the voted perceptron, as described in (Freund & =-=Schapire, 1999-=-). 3 In each set of experiments we compare models trained using a diffusion kernel, the Euclidean distance, and a kernel based on the Hamming distance. Data sets having a majority of categorical varia... |

369 | Convolution kernels on discrete structures
- Haussler
- 1999
(Show Context)
Citation Context ...98)) or, when no such simple mapping is forthcoming, to forgo using kernel methods altogether. A notable exception to this is the line of work stemming from the convolution kernel idea introduced in (=-=Haussler, 1999-=-) and related but independently conceived ideas on string kernels first presented in (Watkins, 1999). Despite the promise of these ideas, relatively little work has been done on discrete kernels since... |

206 | Partially labeled classification with markov random walks
- Szummer, Jaakkola
- 2006
(Show Context)
Citation Context ...ects. Finally, adjacency graphs are sometimes used when data is expected to be confined to a manifold of lower dimensionality than the original space (Saul & Roweis, 2001; Belkin & Niyogi, 2001) and (=-=Szummer & Jaakkola, 2002-=-). In all of these cases, the challenge is to capture in the kernel ¡ the local and global structure of the graph. In addition to adequately expressing the known or hypothesized structure of the data ... |

125 | Kernel PCA and de-noising in feature spaces
- Mika, Schölkopf, et al.
- 1999
(Show Context)
Citation Context ...ization of the familiar Gaussian kernel of Euclidean space. 1. Introduction Kernel-based algorithms, such as Gaussian processes (Mackay, 1997), support vector machines (Burges, 1998), and kernel PCA (=-=Mika et al., 1998-=-), are enjoying great popularity in the statistical learning community. The common idea behind these methods is to express our prior beliefs about the correlations, or more generally, the similarities... |

125 |
Metric spaces and completely monotone functions
- Schoenberg
- 1938
(Show Context)
Citation Context ...ed to the result described in (Berg et al., 1984; Haussler, 1999) and (Scholkopf & Smola, 2001), based on Schoenberg's pioneering work in the late 1930's in the theory of positive definite functions (=-=Schoenberg, 1938-=-). This work shows that any positive semi-definite K can be written as K(x; x 0 ) = e M(x;x 0 ) (7) where M is a conditionally positive semi-definite kernel; that is, it satisfies (2) under the additi... |

102 |
Harmonic analysis on semigroups. Theory of positive definite and related functions
- Berg, Christensen, et al.
- 1984
(Show Context)
Citation Context ...y emerging ¡ in . We ����� call an exponential family of kernels, with � generator and bandwidth parameter . � Note that the exponential kernel construction is not related to the result described in (=-=Berg et al., 1984-=-; Haussler, 1999) and (Schölkopf & Smola, 2001), based on Schoenberg’s pioneering work in the late 1930’s in the theory of positive definite functions (Schoenberg, 1938). This work shows that any posi... |

25 | 2000] “An introduction to locally linear embedding
- Saul, Roweis
(Show Context)
Citation Context ...h vertex corresponding to one of our original objects. Finally, adjacency graphs are sometimes used when data is expected to be confined to a manifold of lower dimensionality than the original space (=-=Saul & Roweis, 2001-=-; Belkin & Niyogi, 2001) and (Szummer & Jaakkola, 2002). In all of these cases, the challenge is to capture in the kernel the local and global structure of the graph. In addition to adequately express... |

24 | heat kernels and spanning trees - Chung, Yau, et al. - 1999 |

17 |
Spectral graph theory. No
- Chung
- 1997
(Show Context)
Citation Context ...e of vertex i (number of edges emanating from vertex i). The negative of this matrix (sometimes up to normalization) is called the Laplacian of , and it plays a central role in spectral graph theory (=-=Chung, 1997-=-). It is instructive to note that for any vector w 2 R jV j , w > H w = X fi;jg2E (w i w j ) 2 ; showing that H is, in fact, negative semi-definite. Acting on functions f f : V 7! R g by (Hf)(x) = P x... |

11 |
Gaussian processes: A replacement for neural networks
- Mackay
- 1997
(Show Context)
Citation Context ...d diffusion kernels, and show that these can be regarded as the discretization of the familiar Gaussian kernel of Euclidean space. 1. Introduction Kernel-based algorithms, such as Gaussian processes (=-=Mackay, 1997-=-), support vector machines (Burges, 1998), and kernel PCA (Mika et al., 1998), are enjoying great popularity in the statistical learning community. The common idea behind these methods is to express o... |

2 |
An introduction to locally linear embedding. Available from
- Saul, Roweis
- 2001
(Show Context)
Citation Context ...h vertex corresponding to one of our original objects. Finally, adjacency graphs are sometimes used when data is expected to be confined to a manifold of lower dimensionality than the original space (=-=Saul & Roweis, 2001-=-; Belkin & Niyogi, 2001) and (Szummer & Jaakkola, 2002). In all of these cases, the challenge is to capture in the kernel ¡ the local and global structure of the graph. In addition to adequately expre... |

1 |
Statistical mechanics of complex networks. Available from
- Albert, Barabási
- 2001
(Show Context)
Citation Context ...nts related to one another by links, such as the hyperlink structure of the World Wide Web. Other examples include social networks, citations between scientific articles, and networks in linguistics (=-=Albert & Barabási, 2001-=-). Graphs are also sometimes used to model complicated or only partially understood structures in a first approximation. In chemistry or molecular biology, for example, it might be anticipated that mo... |