## Graph Kernels (2007)

### Cached

### Download Links

- [www.its.caltech.edu]
- [jmlr.org]
- [people.cs.uchicago.edu]
- [jmlr.csail.mit.edu]
- [cnl.salk.edu]
- [edoc.ub.uni-muenchen.de]
- DBLP

### Other Repositories/Bibliography

Citations: | 44 - 4 self |

### BibTeX

@TECHREPORT{Vishwanathan07graphkernels,

author = {S. V. N. Vishwanathan and Nicol N. Schraudolph and Risi Kondor and Karsten M. Borgwardt and John Lafferty},

title = {Graph Kernels},

institution = {},

year = {2007}

}

### OpenURL

### Abstract

We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexity of kernel computation between unlabeled graphs with n vertices from O(n 6) to O(n 3). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn 3) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels, and O(n 4) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n 2) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment kernel of Fröhlich et al. (2006) yet provably positive semi-definite.

### Citations

11502 |
Computers and Intractability, A Guide to the Theory of NPCompleteness
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...ifference between the graphs would yield a non-zero d(G,G ′ ), they are isomorphic iff d(G,G ′ ) = 0. The graph isomorphism problem, however, is widely believed to be not solvable in polynomial time (=-=Garey and Johnson, 1979-=-). Gärtner et al. (2003) also showed that computing inner products in a feature space constructed over all subgraphs of a graph is NP-hard. One must therefore choose which substructures to distinguish... |

2238 | Learning with Kernels
- Schölkopf, Smola
- 2002
(Show Context)
Citation Context ...g similarities between the nodes of a graph. Taking this one step further, detecting mirrored sets of web pages requires computing the similarity between the graphs representing them. Kernel methods (=-=Schölkopf and Smola, 2002-=-) offer a natural framework to study these questions. Roughly speaking, a kernel k(x,x ′ ) is a measure of similarity between objects x and x ′ . It must satisfy two mathematical requirements: it must... |

2196 |
Numerical Optimization
- Nocedal, Wright
- 2006
(Show Context)
Citation Context ...earest Kronecker product approximation (Section 4.5). 4.2 Conjugate Gradient Methods Given a matrix M and a vector b, conjugate gradient (CG) methods solve the system of equations Mx = b efficiently (=-=Nocedal and Wright, 1999-=-). While they are designed for symmetric p.s.d. matrices, CG solvers can also be used to solve other linear systems efficiently. They are particularly efficient if the matrix is rank deficient, or has... |

2100 | Matrix computations - Golub, Loan - 1983 |

844 | The Protein Data Bank - Berman - 2000 |

486 | gspan: Graph-based substructure pattern mining - Yan, Han - 2002 |

354 | expression profiling predicts clinical outcome of breast cancer - Veer, Dai, et al. |

248 | Transductions and Context-Free Languages
- Berstel
- 1979
(Show Context)
Citation Context ...ribing algebraic path problems, wherein disparate problems related to graphs, automata, and transducers are described in a common framework using matrices and tensors (Eilenberg, 1974; Lehmann, 1977; =-=Berstel, 1979-=-; Kuich and Salomaa, 1986). 1222GRAPH KERNELS 6.1 Semirings At the most general level, weighted transducers are defined over semirings. In a semiring addition and multiplication are generalized to ab... |

182 | Kernels and regularization on graphs - Smola, Kondor - 2003 |

169 | Finding the K Shortest Loopless Paths in a Network - Yen - 1971 |

157 | Product Graphs: Structure and Recognition - Imrich, Klavžar - 2000 |

141 | On graph kernels: Hardness results and efficient alternatives
- Gärtner, Flach, et al.
- 2003
(Show Context)
Citation Context ...HRAUDOLPH.ORG RISI@CALTECH.EDU KARSTEN.BORGWARDT@TUEBINGEN.MPG.DE Editor: John Lafferty Abstract We present a unified framework to study graph kernels, special cases of which include the random walk (=-=Gärtner et al., 2003-=-; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexity of kernel computation... |

138 | Diffusion kernels on graphs and other discrete structures - Kondor, Lafferty - 2002 |

138 | Correcting sample selection bias by unlabeled data - Huang, Smola, et al. - 2007 |

129 | Structural determination of paraffin boiling points - Wiener |

127 | Speech recognition by composition of weighted finite automata
- Pereira, Riley
- 1997
(Show Context)
Citation Context ...ere H∗ ab = ¯ L c∈ΣHac ¯⊗Hbc. There exists a general and efficient algorithm for composing transducers as in (43) which takes advantage of the sparseness of the input transducers (Mohri et al., 1996; =-=Pereira and Riley, 1997-=-). 6.3 Weighted Automata A weighted automaton is a transducer with identical input and output symbols. The transition matrix of a weighted automaton is therefore a three-dimensional tensor in K n×|Σ|×... |

80 | Marginalized kernels for biological sequences - Tsuda, Kin, et al. |

78 | Protein function prediction via graph kernels. Bioinformatics, 21(Suppl 1):i47–i56
- Borgwardt, Ong, et al.
- 2005
(Show Context)
Citation Context ...TECH.EDU KARSTEN.BORGWARDT@TUEBINGEN.MPG.DE Editor: John Lafferty Abstract We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; =-=Borgwardt et al., 2005-=-) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexity of kernel computation between unlabeled graph... |

76 | Semiring frameworks and algorithms for shortest-distance problems - Mohri |

71 | The ubiquitous Kronecker product - Loan |

71 | A kernel method for the two-sample-problem - Gretton, Borgwardt, et al. |

68 | Classes of Kernels for Machine Learning: A Statistics Perspective - Genton - 2001 |

61 |
Matrix Mathematics
- Bernstein
- 2005
(Show Context)
Citation Context ... ⊗ B := ⎣ where A∗ j denotes the j th column of A. A11B A12B ... A1mB . . An1B An2B ... AnmB . . ⎥ ⎢ ⎦, vec(A) := ⎣ The Kronecker product and vec operator are linked by the well-known property (e.g., =-=Bernstein, 2005-=-, Proposition 7.1.9): A∗1 . A∗m ⎥ ⎦, vec(ABC) = (C ⊤ ⊗ A)vec(B). (1) Another well-known property of the Kronecker product which we make use of is (Bernstein, 2005, Proposition 7.1.6): (A ⊗ B)(C ⊗ D) =... |

56 | Cyclic pattern kernels for predictive graph mining - Horváth, Gärtner, et al. - 2004 |

54 | Integrating structured biological data by kernel maximum mean discrepancy - Borgwardt, Gretton, et al. |

51 |
Solution of the sylvester matrix equation
- Gardiner, Laub, et al.
- 1992
(Show Context)
Citation Context ...ylvester or Lyapunov equation: M = SMT + M0. (20) Here, S,T,M0 ∈ R n×n are given and we need to solve for M ∈ R n×n . These equations can be readily solved in O(n 3 ) time with freely available code (=-=Gardiner et al., 1992-=-), such as Matlab’s dlyap method. Solving the generalized Sylvester equation M = d ∑ i=1 SiMTi + M0 (21) involves computing generalized simultaneous Schur factorizations of d symmetric matrices (Latha... |

44 | Computation of the canonical decomposition by means of a simultaneous generalized Schur decomposition - Lathauwer, Moor, et al. |

42 |
Graph kernels for chemical informatics
- Ralaivola, Swamidass, et al.
- 2005
(Show Context)
Citation Context ...t product between these vectors. Other decompositions of graphs which are well suited for particular application domains include molecular fingerprints based on various types of depth-first searches (=-=Ralaivola et al., 2005-=-) and structural elements such as rings or functional groups (Fröhlich et al., 2006). 7.2 R-Convolutions in Abstract Semirings There have been a few attempts to extend the R-convolution kernel (59) to... |

40 | Shortest-path kernels on graphs - Borgwardt, Kriegel - 2005 |

36 | Supervised feature selection via dependence estimation - Song, Smola, et al. - 2007 |

33 |
Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes
- Warnat
- 2005
(Show Context)
Citation Context ...nly considers 10% of the genes examined in both studies, is able to capture some relevant biological information, which in turn leads to better classification accuracy on these challenging data sets (=-=Warnat et al., 2005-=-). 6. Rational Kernels Rational kernels (Cortes et al., 2004) were conceived to compute similarity between variable-length sequences and, more generally, weighted automata. For instance, the output of... |

30 | A dependence maximization view of clustering - Song, Smola, et al. - 2007 |

29 | Fast computation of graph kernels - Vishwanathan, Borgwardt, et al. |

28 |
H (2005) Correlation between gene expression profiles and protein–protein interactions within and across genomes
- Bhardwaj, Lu
(Show Context)
Citation Context ...ted into proteins that are more likely to interact. Recent studies confirm that this assumption holds significantly more often for co-expressed than for random pairs of proteins (Fraser et al., 2004; =-=Bhardwaj and Lu, 2005-=-). To measure similarity between these networks in a biologically meaningful manner, we compare which groups of proteins interact and are co-regulated in each patient. For this purpose, a random walk ... |

24 |
Algebraic structures for transitive closure
- Lehmann
- 1977
(Show Context)
Citation Context ...y used for describing algebraic path problems, wherein disparate problems related to graphs, automata, and transducers are described in a common framework using matrices and tensors (Eilenberg, 1974; =-=Lehmann, 1977-=-; Berstel, 1979; Kuich and Salomaa, 1986). 1222GRAPH KERNELS 6.1 Semirings At the most general level, weighted transducers are defined over semirings. In a semiring addition and multiplication are ge... |

24 | Efficient graphlet kernels for large graph comparison - Shervashidze, Vishwanathan, et al. - 2009 |

19 | Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia - Bullinger - 2004 |

19 |
The Kronecker Product in Approximation and Fast Transform Generation
- Pitsianis, thesis
- 1997
(Show Context)
Citation Context ...ing the nearest Kronecker product approximating a matrix such as W× is a well-studied problem in numerical linear algebra, and efficient algorithms which can exploit the sparsity of W× are available (=-=Pitsianis, 1992-=-; Van Loan, 2000). Formally, these methods minimize the Frobenius norm ‖W× −S⊗T‖F by computing the largest singular value of ˆW×, a permuted version of W×. We employ the power method 3 for this purpos... |

18 |
Coevolution of gene expression among interacting proteins
- Fraser
- 2004
(Show Context)
Citation Context ...on levels are translated into proteins that are more likely to interact. Recent studies confirm that this assumption holds significantly more often for co-expressed than for random pairs of proteins (=-=Fraser et al., 2004-=-; Bhardwaj and Lu, 2005). To measure similarity between these networks in a biologically meaningful manner, we compare which groups of proteins interact and are co-regulated in each patient. For this ... |

18 | Pattern mining in frequent dynamic subgraphs - Borgwardt, Kriegel, et al. |

15 | Kernel functions for attributed molecular graphs—a new similarity based approach to ADME prediction in classification and regression - Fröhrich, Wegner, et al. - 2006 |

15 | The skew spectrum of graphs
- Kondor, Borgwardt
- 2008
(Show Context)
Citation Context ...ized kernels (Tsuda et al., 2002) was extended to graphs by Kashima et al. (2003, 2004), then further refined by Mahé et al. (2004). Another algebraic approach to graph kernels has appeared recently (=-=Kondor and Borgwardt, 2008-=-). A seemingly independent line of research investigates the so-called rational kernels, which are kernels between finite state automata based on the algebra of abstract semirings (Cortes et al., 2002... |

14 | Graph kernels for disease outcome prediction from protein-protein interaction networks - Borgwardt, Kriegel, et al. - 2007 |

13 | Statistical evaluation of the predictive toxicology challenge 2000-2001 - Toivonen, Srinivasan, et al. - 2003 |

10 | A kernel approach to comparing distributions - Gretton, Borgwardt, et al. |

9 | Akihiro Inokuchi. Kernels for graphs - Kashima, Tsuda - 2004 |

9 |
Drug research: myths, hype and reality. Nat Rev Drug Discov 2:665–668
- Kubinyi
- 2003
(Show Context)
Citation Context ...rnels, transducers, semirings, random walks 1. Introduction Machine learning in domains such as bioinformatics (Sharan and Ideker, 2006), chemoinformatics (Bonchev and Rouvray, 1991), drug discovery (=-=Kubinyi, 2003-=-), web data mining ∗. Also at the Max Planck Institute for Biological Cybernetics. c○2010 S.V. N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, and Karsten M. Borgwardt.VISHWANATHAN, SCHRAUDOLPH, ... |

9 |
The optimal assignment kernel is not positive definite
- Vert
- 2008
(Show Context)
Citation Context ...ng these lines, using the tropical semiring. It can be defined as k(x,x ′ ) = max ( µ(⃗x,⃗x ′ )+ ⃗x∈R −1 (x) ⃗x ′ ∈R−1 (x′ ) i=1 D ∑ i=1 κi(xi,x ′ ) i) . (64) Unfortunately (64) is not always p.s.d. (=-=Vert, 2008-=-). The problem is that the class of p.s.d. kernels is not closed under the max operation (Berg et al., 1984). For semirings that have a morphism ψ to the reals, however, we can rewrite (63) as ψ(k(x,x... |

8 | Class prediction from time series gene expression profiles using dynamical systems kernel - Borgwardt - 2006 |

7 | Mehryar Mohri. Rational kernels - Cortes, Haffner - 2002 |