## Diffusion Kernels on Graphs and Other Discrete Input Spaces (2002)

### Cached

### Download Links

- [cbio.ensmp.fr]
- [www-2.cs.cmu.edu]
- [www.ml.cmu.edu]
- [pdf.aminer.org]
- [www.ml.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 191 - 7 self |

### BibTeX

@INPROCEEDINGS{Kondor02diffusionkernels,

author = {Risi Imre Kondor and John Lafferty},

title = {Diffusion Kernels on Graphs and Other Discrete Input Spaces},

booktitle = {},

year = {2002},

pages = {315--322},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

The application of kernel-based learning algorithms has, so far, largely been confined to realvalued data and a few special data types, such as strings. In this paper we propose a general method of constructing natural families of kernels over discrete structures, based on the matrix exponentiation idea. In particular, we focus on generating kernels on graphs, for which we propose a special class of exponential kernels called diffusion kernels, which are based on the heat equation and can be regarded as the discretization of the familiar Gaussian kernel of Euclidean space.

### Citations

2473 | A Tutorial on Support Vector Machines for Pattern Recognition
- Burges
- 1998
(Show Context)
Citation Context ... can be regarded as the discretization of the familiar Gaussian kernel of Euclidean space. 1. Introduction Kernel-based algorithms, such as Gaussian processes (Mackay, 1997), support vector machines (=-=Burges, 1998-=-), and kernel PCA (Mika et al., 1998), are enjoying great popularity in the statistical learning community. The common idea behind these methods is to express our prior beliefs about the correlations,... |

2208 | Online Learning with Kernels
- Kivinen, Smola, et al.
- 2004
(Show Context)
Citation Context ... kernel function K : X X 7! R , and thereby to implicitly construct a mapping : X 7! HK to a Hilbert space HK , in which the kernel appears as the inner product, K(x; x 0 ) = h(x); (x 0 )i (1) (Scho=-=lkopf & Smola, 2001-=-). With respect to a basis of HK , each datapoint then splits into (a possibly infinite number of) independent features, a property which can be exploited to great effect. Graph-like structures occur ... |

1865 | Text categorization with support vector machines: Learning with many relevant features
- Joachims
- 1998
(Show Context)
Citation Context ...rinsically discrete data spaces, the usual approach has been either to map the data to Euclidean space first (as is commonly done in text classification, treating integer word counts as real numbers (=-=Joachims, 1998-=-)) or, when no such simple mapping is forthcoming, to forgo using kernel methods altogether. A notable exception to this is the line of work stemming from the convolution kernel idea introduced in (Ha... |

1653 |
An Introduction to Probability Theory and its
- Feller
- 1966
(Show Context)
Citation Context ... form continuous families fK() =K(1) g, indexed by a real parameters, and are related to infinitely divisible probability distributions, which are the limits of sums of independent random variables (F=-=eller, 1-=-971). The tautology K() = [K(1)s=n ] n becomes, as n goes to infinity, K = lim n!1 1 + n dK ds=0 n ; which is equivalent to (3) for H = dK dsjs=0 . The above already suggests looking for kernels ove... |

1387 | Statistical mechanics of complex networks
- Albert, Barabasi
(Show Context)
Citation Context ...nts related to one another by links, such as the hyperlink structure of the World Wide Web. Other examples include social networks, citations between scientific articles, and networks in linguistics (=-=Albert & Barabasi, 2002-=-). Graphs are also sometimes used to model complicated or only partially understood structures in a first approximation. In chemistry or molecular biology, for example, it might be anticipated that mo... |

908 | Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids
- Durbin
- 1999
(Show Context)
Citation Context ...j ) + i+1;j+1 k i;j , where i;j = 1 when x i+1 = x 0 j+1 and B otherwise. For the derivation of recursive formul such as this, and comparison to other measures of similarity between strings, see (Du=-=rbin et al., 1998-=-). 5. Experiments on UCI Datasets In this section we describe preliminary experiments with diffusion kernels, focusing on the use of kernel-based methods for classifying categorical data. For such pro... |

783 | Laplacian eigenmaps for dimensionality reduction and data representation
- Belkin, Niyogi
- 2003
(Show Context)
Citation Context ...g to one of our original objects. Finally, adjacency graphs are sometimes used when data is expected to be confined to a manifold of lower dimensionality than the original space (Saul & Roweis, 2001; =-=Belkin & Niyogi, 2001-=-) and (Szummer & Jaakkola, 2002). In all of these cases, the challenge is to capture in the kernel K the local and global structure of the graph. In addition to adequately expressing the known or hypo... |

422 | Large margin classification using the perceptron algorithm
- Freund, Schapire
- 1999
(Show Context)
Citation Context ... the hypercube, as described in Section 4.4, can result in good performance for such data. For ease of experimentation we use a large margin classifier based on the voted perceptron, as described in (=-=Freund & Schapire, 1999-=-). In each set of experiments we compare models trained using a diffusion kernel and a kernel based on the Hamming distance, KH (x; x 0 ) = n P n i=1 (x i ; x 0 i ). Data sets having a majority of cat... |

396 | Convolution kernels on discrete structures
- Haussler
- 1999
(Show Context)
Citation Context ...98)) or, when no such simple mapping is forthcoming, to forgo using kernel methods altogether. A notable exception to this is the line of work stemming from the convolution kernel idea introduced in (=-=Haussler, 1999-=-) and related but independently conceived ideas on string kernels first presented in (Watkins, 1999). Despite the promise of these ideas, relatively little work has been done on discrete kernels since... |

218 | Partially labeled classification with markov random walks
- Szummer, Jaakkola
- 2002
(Show Context)
Citation Context ...ects. Finally, adjacency graphs are sometimes used when data is expected to be confined to a manifold of lower dimensionality than the original space (Saul & Roweis, 2001; Belkin & Niyogi, 2001) and (=-=Szummer & Jaakkola, 2002-=-). In all of these cases, the challenge is to capture in the kernel K the local and global structure of the graph. In addition to adequately expressing the known or hypothesized structure of the data ... |

139 |
Metric spaces and completely monotone functions
- Schoenberg
- 1938
(Show Context)
Citation Context ...ed to the result described in (Berg et al., 1984; Haussler, 1999) and (Scholkopf & Smola, 2001), based on Schoenberg's pioneering work in the late 1930's in the theory of positive definite functions (=-=Schoenberg, 1938-=-). This work shows that any positive semi-definite K can be written as K(x; x 0 ) = e M(x;x 0 ) (7) where M is a conditionally positive semi-definite kernel; that is, it satisfies (2) under the additi... |

133 | Kernel PCA and de–noising in feature spaces
- Mika, Schölkopf, et al.
- 1999
(Show Context)
Citation Context ...ization of the familiar Gaussian kernel of Euclidean space. 1. Introduction Kernel-based algorithms, such as Gaussian processes (Mackay, 1997), support vector machines (Burges, 1998), and kernel PCA (=-=Mika et al., 1998-=-), are enjoying great popularity in the statistical learning community. The common idea behind these methods is to express our prior beliefs about the correlations, or more generally, the similarities... |

109 |
Harmonic Analysis on Semigroups. Theory of Positive De¯nite and Related Functions
- Berg, Christensen, et al.
- 1984
(Show Context)
Citation Context ...urally emerging in K. We call (5) an exponential family of kernels, with generator H and bandwidth parameters. Note that the exponential kernel construction is not related to the result described in (=-=Berg et al., 1984-=-; Haussler, 1999) and (Scholkopf & Smola, 2001), based on Schoenberg's pioneering work in the late 1930's in the theory of positive definite functions (Schoenberg, 1938). This work shows that any posi... |

27 | An Introduction to Locally Linear Embedding
- Saul, Roweis
- 2001
(Show Context)
Citation Context ...h vertex corresponding to one of our original objects. Finally, adjacency graphs are sometimes used when data is expected to be confined to a manifold of lower dimensionality than the original space (=-=Saul & Roweis, 2001-=-; Belkin & Niyogi, 2001) and (Szummer & Jaakkola, 2002). In all of these cases, the challenge is to capture in the kernel Ã the local and global structure of the graph. In addition to adequately expre... |

25 | Coverings, heat kernels and spanning trees - Chung, Yau - 1999 |

18 |
Spectral graph theory, no
- Chung
- 1997
(Show Context)
Citation Context ...e of vertex i (number of edges emanating from vertex i). The negative of this matrix (sometimes up to normalization) is called the Laplacian of , and it plays a central role in spectral graph theory (=-=Chung, 1997-=-). It is instructive to note that for any vector w 2 R jV j , w > H w = X fi;jg2E (w i w j ) 2 ; showing that H is, in fact, negative semi-definite. Acting on functions f f : V 7! R g by (Hf)(x) = P x... |

11 |
Gaussian processes: A replacement for neural networks
- Mackay
- 1997
(Show Context)
Citation Context ...which are based on the heat equation and can be regarded as the discretization of the familiar Gaussian kernel of Euclidean space. 1. Introduction Kernel-based algorithms, such as Gaussian processes (=-=Mackay, 1997-=-), support vector machines (Burges, 1998), and kernel PCA (Mika et al., 1998), are enjoying great popularity in the statistical learning community. The common idea behind these methods is to express o... |

2 |
An introduction to locally linear embedding. Available from
- Saul, Roweis
- 2001
(Show Context)
Citation Context ...h vertex corresponding to one of our original objects. Finally, adjacency graphs are sometimes used when data is expected to be confined to a manifold of lower dimensionality than the original space (=-=Saul & Roweis, 2001-=-; Belkin & Niyogi, 2001) and (Szummer & Jaakkola, 2002). In all of these cases, the challenge is to capture in the kernel K the local and global structure of the graph. In addition to adequately expre... |