## Kernels and Distances for Structured Data (2004)

Venue: | Machine Learning |

Citations: | 50 - 3 self |

### BibTeX

@INPROCEEDINGS{Gärtner04kernelsand,

author = {Thomas Gärtner and John W. Lloyd and Peter A. Flach},

title = {Kernels and Distances for Structured Data},

booktitle = {Machine Learning},

year = {2004},

pages = {2004}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higher-order logic. Our main theoretical result is the positive definiteness of any kernel thus defined. We report encouraging experimental results on a range of real-world datasets. By converting our kernel to a distance pseudo-metric for 1-nearest neighbour, we were able to improve the best accuracy from the literature on the Diterpene dataset by more than 10%.

### Citations

8984 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...ning (Dˇzeroski and Lavrač, 2001) aims to reduce these pre-processing efforts by considering learning from multi-relational data representations directly. Support vector machines (Boser et al., 1992=-=; Vapnik, 1995-=-) are a popular recent development within the machine learning and data mining communities. Along with some other learning algorithms like Gaussian processes and kernel principal component analysis, t... |

2967 | Data Mining: Practical Machine learning tools and techniques - Witten, Frank, et al. - 2011 |

1553 | An Introduction to Support Vector Machines (and other kernel-based learning methods - Cristianini, Shawe-Taylor - 2000 |

1292 | A training algorithm for optimal margin classifiers
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ...i-relational data mining (Dˇzeroski and Lavrač, 2001) aims to reduce these pre-processing efforts by considering learning from multi-relational data representations directly. Support vector machines=-= (Boser et al., 1992-=-; Vapnik, 1995) are a popular recent development within the machine learning and data mining communities. Along with some other learning algorithms like Gaussian processes and kernel principal compone... |

1273 |
Spline models for observational data
- Wahba
- 1990
(Show Context)
Citation Context ...different loss functions, see also (Evgeniou et al., 2000). Support vector machines, for example, arise from using the so-called hinge loss V (y, f (x)) = max{0,1 − y f (x)}. The Representer Theorem=-= (Wahba, 1990; Schölk-=-opf et al., 2001) shows that, under quite general conditions, the solution found by minimising the regularised risk has the form f ∗ (·) = n ∑ i=1 cik(xi,·) where k(·,·) is the kernel correspo... |

847 |
A formulation of the simple theory of types
- Church
- 1940
(Show Context)
Citation Context ...a typed, higher-order logic that provides a variety of important data types, including sets, multisets, and graphs for representing individuals. The logic is based on Church’s simple theory of types=-= (Church, 1940) -=-with several extensions. First, we assume there is given a set of type constructors T of various arities. Included in T is the constructor Ω of arity 0. The domain corresponding to Ω is the set cont... |

777 |
Theory of reproducing kernels
- Aronszajn
- 1950
(Show Context)
Citation Context ...on k : X × X → R, a feature transformation φ : X → H into the Hilbert space H exists, such that k(x,x ′ ) = 〈φ(x),φ(x ′ )〉 for all x,x ′ ∈ X , can be checked by verifying that k is=-= positive definite (Aronszajn, 1950). Thi-=-s means that any set, whether a linear space or not, that admits a positive definite kernel can be embedded into a linear space. Thus, throughout the paper, we take ‘valid’ to mean ‘positive def... |

385 | Watkins,C.,Text classification using string kernel - Lodhi, Cristianini, et al. - 2001 |

375 | An Introduction to Kernel-Based Learning Algorithms - Muller, Mika, et al. - 2001 |

368 | Convolution Kernels on Discrete Structures
- Haussler
- 1999
(Show Context)
Citation Context ...s. In particular, they are closed under sum, direct sum, multiplication by a scalar, product, tensor product, zero extension, pointwise limits, and exponentiation (Cristianini and Shawe-Taylor, 2000; =-=Haussler, 1999-=-). It should be noted that, for a kernel method to perform well on a domain, positive definiteness of the kernel is not the only issue. While there is always a valid kernel that performs poorly (e.g.,... |

254 | Convolution kernels for natural language - Collins, Duffy - 2002 |

186 | Support vector machines for multiple-instance learning - Andrews, Tsochantaridis, et al. - 2003 |

178 | A framework for multiple-instance learning
- Maron, Lonzano-Pérez
- 1998
(Show Context)
Citation Context ...ed KeS) to the results reported in (Andrews et al., 2003). All results have been achieved by multiple ten-fold cross-validations 2 . The algorithms compared are EMDD (Zhang and Goldman, 2002), maxDD (=-=Maron and Lozano-Pérez, 1998-=-), MI-NN (Ramon and De Raedt, 2000), IAPR (Dietterich et al., 1997), mi-SVM and MI-SVM (Andrews et al., 2003). The parameter of the Gaussian modifier is chosen by leave-one-out cross-validation within... |

177 | Kernel principal component analysis
- Schölkopf, Müller
- 1999
(Show Context)
Citation Context ...e different learning tasks, e.g., support vector machines for supervised learning, support vector clustering (Ben-Hur et al., 2001) for unsupervised learning, and kernel principal component analysis (=-=Schölkopf et al., 1999;-=- Schölkopf and Smola, 2002) for feature extraction. While the kernel machine encapsulates the learning task and the way in which a solution is sought, the kernel function encapsulates the hypothesis ... |

164 | Support vector clustering - Ben-Hur, Horn, et al. - 2001 |

126 | On graph kernels: Hardness results and efficient alternatives - Gärtner, Flach, et al. - 2003 |

112 | EM-DD: An improved multipleinstance learning technique - Zhang, Goldman |

85 | Inductive constraint logic - Raedt, Laer - 1995 |

66 | Relational instance-based learning
- Emde, Wettschereck
- 1996
(Show Context)
Citation Context ...8), Clerodan (356), Portulan (5), 5,10-seco-Clerodan (4), 8,9-seco-Labdan (6), and seven classes with only one example each. The accuracies reported in literature range up to 86.5%, achieved by RIBL (=-=Emde and Wettschereck, 1996-=-). Other results were reported for FOIL (Quinlan, 1990), TILDE (Blockeel and De Raedt, 1998), and ICL (De Raedt and Van Laer, 1995). See Table II for details. After including some manually constructed... |

46 |
Integrated segmentation and recognition of hand-printed numerals
- Keeler, Rumelhart, et al.
- 1990
(Show Context)
Citation Context ...N Multi-instance problems have been introduced under this name in (Dietterich et al., 1997). However, similar problems and algorithms have been considered earlier, for example in pattern recognition (=-=Keeler et al., 1991-=-). Within the last couple of years, several approaches have been proposed to upgrade attribute-value learning algorithms to tackle multi-instance problems. Other approaches focused on new algorithms s... |

31 | AJ Smola (2002). Learning with kernels - Schölkopf |

30 |
To the international computing community: A new East-West challenge (Technical Report
- Michie, Muggleton, et al.
- 1994
(Show Context)
Citation Context ... directly on the type structure (specifying the structure of the domain and the declarative bias). We introduce our suggested kernel definition syntax by means of an example: the East/West challenge (=-=Michie et al., 1994-=-). eastbound :: Train -> Bool type Train = Car -> Bool with modifier gaussian 0.1 main.tex; 21/01/2004; 18:26; p.14sKernels and Distances for Structured Data 15 type Car = (Shape,Length,Roof,Wheels,Lo... |

24 | Exponential and Geometric Kernels for Graphs - Gärtner - 2002 |

14 | Logic for Learning - Lloyd - 2003 |

10 | T.: Robust classification for imprecise environments. Machine learning 42(3 - Provost, Fawcett - 2001 |

9 | L.: Attribute value learning versus inductive logic programming: The missing links (extended abstract - Raedt - 1998 |

9 | Diterpene structure elucidation from 13 c NMR spectra with inductive logic programming - Dˇzeroski, Schulze-Kremer, et al. - 1998 |

7 | A survey of kernels for structured data. SIGKDD Explorations, 5, S268–S275. Mutual Information Clustering 43 - Gärtner - 2003 |

7 |
Learning logical definitions from relations, Machine Learning 5
- Quinlan
- 1990
(Show Context)
Citation Context ...abdan (6), and seven classes with only one example each. The accuracies reported in literature range up to 86.5%, achieved by RIBL (Emde and Wettschereck, 1996). Other results were reported for FOIL (=-=Quinlan, 1990-=-), TILDE (Blockeel and De Raedt, 1998), and ICL (De Raedt and Van Laer, 1995). See Table II for details. After including some manually constructed features, 91.2% accuracy has been achieved by the bes... |

6 | Arsenin: 1977, Solutions of Ill-posed problems - Tikhonov, Y |

5 | Improved distances for structured data - Mavroeidis, Flach - 2003 |

2 | Raedt: 1998, ‘Top-down Induction of First Order Logical Decision Trees - Blockeel, De |

2 | Raedt: 2000, ‘Multi Instance Neural Networks - Ramon, D |

2 | Raedt (2001). How to upgrade propositional learners to first order logic: A case study - Laer, De |

1 | eds.): 1998, Haskell98: A Non-Strict Purely Functional Language. Available athttp://haskell.org - Jones, Hughes |

1 | Inokuchi: 2002, ‘Kernels for Graph Classification - Kashima, A |

1 | Smola: 2001, ‘A Generalized Representer Theorem - Schölkopf, Herbrich, et al. |